On 01/04/2025 14:25, Steve Vestal wrote:
Below is what I had been using, which I assumed uses a byte array at
some point. What confuses me is how to convert the Model internal data
structure to text that can go into a file.
finalByteArrayOutputStream outStream= newByteArrayOutputStream();
finalPrintStream out= newPrintStream(outStream);
model.write(out, outputFormat); // tried RDF/XML and TTL as outputFormat
out.close();
That buffers bytes to memory and over 2G bytes, ByteArrayOutputStream
breaks.
finalInputStream inStream=
newByteArrayInputStream(outStream.toByteArray());
outFile.create(inStream, IResource.FORCE, null); // an Eclipse thing
Why not write to the file directly? It's more efficient and doesn't
stress out the heap.
Or a temporary file and then make it an Eclipse RCP file.
Otherwise, if there are no blank nodes, you can write chunks of graph in
Turtle or N-Triples to separate files - a chunk being part of the
iterator for a single call of graph.find.
Or see WriterStreamRDFBatched/WriterStreamRDFBlocks and modify to split
on batch boundaries.
Andy
FWIW PrintStream does nothing here. The writers use general encoding to
UTF-8 with new OutputStreamWriter(out, StandardCharsets.UTF_8);
On 3/31/2025 3:52 PM, Andy Seaborne wrote:
Hi Steve,
How are you writing the RDF model?
StringWriter should not be involved unless you asked for an in-memory
string e.g. RDFWriter.asString./toString.
Can you not write output direct to the file?
(For PipedOutputStream, is there a concurrent PipedInputStream?)
Andy
Some formats require some processing to get the "pretty" features.
This uses memory but not the OOME you show below.
There are variants that do not print using these features and stream
to the output stream.
RDFFormat.TURTLE_BLOCKS
RDFFormat.TURTLE_FLAT
RDFFormat.RDFXML_PLAIN
called as:
OutputStream out = ... ;
RDFWriter.source(model).format(RDFFormat.TURTLE_BLOCKS).output(out);
On 31/03/2025 16:27, Martynas Jusevičius wrote:
I would try a streamable syntax such as N-Triples.
On Mon, 31 Mar 2025 at 17.20, Steve Vestal
<steve.ves...@galois.com.invalid>
wrote:
I'm having trouble writing a large Model (36293418 statements) into a
file with a given syntax (e.g., TTL, RDF/XML). I'd appreciate ideas.
Model, RDFDataMgr, and RDFWriter use OutputStream and StringWriter.
With the exception (I assume) of PipedOutputStream, they all seem to
buffer the data in a byte array or String when converting Model content
to a text file format. The maximum length of a Java String or array is
the 32 bit int type, 2,147,483,647 bytes. I get the exception
"java.lang.OutOfMemoryError: Required array length 2147483639 + 9 is
too
large". Is there a way to feed chunks of data from a Model into a
PipedOutputStream, where the PipedInputStream provides the model
content
in a selected text syntax? Is there a simple way to split a Model into
smaller Models, write those as text files, and merge those files back
together?