Am 06.11.2019 um 08:25 schrieb Ralf Baumert:
Hello list,


I'm trying to render rather large pdf files with pdfBox (current) and
I'm running into memory issues.

I created the PDDocument with .setupTempFileOnly() and I can see it's
creating a scratch file.

However it still consumes loads of memory and in the end it crashes with
an OutOfMemoryException.

The heap dump show loads of COSStream objects.


My question: is this a known bug / limitation ? Is there a workaround ?


Some details:

- Of course I increased xmx, but sooner or later it will run out of memory.

- I'm opening a new PDPageContentStream for each element (like a table
or a paragraph), is this the correct way to do

things or am I supposed to only have one stream ? (note: I'm using
boxable, they create a stream for each table)

- I noticed the saveIncremental() method, but it states that this can
only be used when the pdf has

been read from a file. Now i could try to create the first page, then
save the file and load it again

to add some pages and then call this method. Is this feasible ?

- The resulting pdf will be about 5GB in size, this is a hard requirement.



saveIncremental() is best for signing, and it still loads stuff into memory. You could still try to save your file and then load it and add pages and save normally, to see if that makes things better - I don't think so.

I suspect that memory usage gets worse if you have many small page content streams instead of one large, because of the page buffers memory management. So what you could try is after being finished with a page, copy the stream back as once.


byte[] ba;
try (InputStream is = page.getContents())
{
    ba = IOUtils.toByteArray(is);
}
PDStream newPDStream = new PDStream(doc);
try (OutputStream os = newPDStream.createOutputStream(COSName.FLATE_DECODE))
{
    os.write(ba);
}
Iterator<PDStream> it = page.getContentStreams();
while (it.hasNext())
{
    PDStream pds = it.next();
    pds.getCOSObject().close();
}

page.setContents(newPDStream);


Please tell whether this improved things, i.e. that you could create more pages before OOM. (That code can be optimized even more by copying directly without the byte array)


Other ways to optimize big PDF files: use the font only once, i.e. don't create a new font object for each page. Same for images, e.g. a company logo. Create your image object only once.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to