On 01/04/2025 14:25, Steve Vestal wrote:
Below is what I had been using, which I assumed uses a byte array at some point.  What confuses me is how to convert the Model internal data structure to text that can go into a file.


finalByteArrayOutputStream outStream= newByteArrayOutputStream();
finalPrintStream out= newPrintStream(outStream);
model.write(out, outputFormat); // tried RDF/XML and TTL as outputFormat
out.close();

That buffers bytes to memory and over 2G bytes, ByteArrayOutputStream breaks.

finalInputStream inStream= newByteArrayInputStream(outStream.toByteArray());
outFile.create(inStream, IResource.FORCE, null); // an Eclipse thing

Why not write to the file directly? It's more efficient and doesn't stress out the heap.

Or a temporary file and then make it an Eclipse RCP file.

Otherwise, if there are no blank nodes, you can write chunks of graph in Turtle or N-Triples to separate files - a chunk being part of the iterator for a single call of graph.find.

Or see WriterStreamRDFBatched/WriterStreamRDFBlocks and modify to split on batch boundaries.

    Andy


FWIW PrintStream does nothing here. The writers use general encoding to UTF-8 with new OutputStreamWriter(out, StandardCharsets.UTF_8);




On 3/31/2025 3:52 PM, Andy Seaborne wrote:
Hi Steve,

How are you writing the RDF model?

StringWriter should not be involved unless you asked for an in-memory string e.g. RDFWriter.asString./toString.

Can you not write output direct to the file?

(For PipedOutputStream, is there a concurrent PipedInputStream?)

    Andy

Some formats require some processing to get the "pretty" features.
This uses memory but not the OOME you show below.

There are variants that do not print using these features and stream to the output stream.

RDFFormat.TURTLE_BLOCKS
RDFFormat.TURTLE_FLAT
RDFFormat.RDFXML_PLAIN

called as:

OutputStream out = ... ;
RDFWriter.source(model).format(RDFFormat.TURTLE_BLOCKS).output(out);


On 31/03/2025 16:27, Martynas Jusevičius wrote:
I would try a streamable syntax such as N-Triples.

On Mon, 31 Mar 2025 at 17.20, Steve Vestal <steve.ves...@galois.com.invalid>
wrote:

I'm having trouble writing a large Model (36293418 statements) into a
file with a given syntax (e.g., TTL, RDF/XML).  I'd appreciate ideas.

Model, RDFDataMgr, and RDFWriter use OutputStream and StringWriter.
With the exception (I assume) of PipedOutputStream, they all seem to
buffer the data in a byte array or String when converting Model content
to a text file format.  The maximum length of a Java String or array is
the 32 bit int type, 2,147,483,647 bytes. I get the exception
"java.lang.OutOfMemoryError: Required array length 2147483639 + 9 is too
large".  Is there a way to feed chunks of data from a Model into a
PipedOutputStream, where the PipedInputStream provides the model content
in a selected text syntax?  Is there a simple way to split a Model into
smaller Models, write those as text files, and merge those files back
together?





Reply via email to