Am 01.08.2017 um 22:09 schrieb Christopher Schultz:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Tilman,

On 8/1/17 3:22 PM, Tilman Hausherr wrote:
The only thing that comes close to what you want is to create your
PDDocument with MemoryUsageSetting.setupMixed(...) as parameter.
So that we can buffer to disk if the in-memory representation gets too
big? That sounds like a good approach, and probably the most useful to m
e.

It also appears that I can set a maximum in-memory limit like this:

MemoryUsageSetting mus = MemoryUsageSetting.setupMainMemoryOnly(1 *
1024 * 1024);
PDDocument doc = new PDDocument(mus);

Yes. Although this would mean you'd get an exception if you use more. That's why I recommend the mixed one. You could use the memory limit for stress tests, i.e. create the "worst" possible file and see what you need.

Note that only streams are cached. Ordinary java structures (e.g. maps, numbers, strings) are not.



... and then this should enforce a 1MiB size limit, no? I think that's
all I want... there shouldn't be any reason for me to have to touch
the disk: my files are really quite small. I just don't want something
to go wrong with my client code and inadvertently go into an infinite
loop adding "Hello World" to the document over and over until I have
50k pages in the PDF and an OOME on my hands.

What you should do is to care to not have anything duplicate. So if
you have a company logo on every page, create your object object
only once. Same for fonts.
We have something like:

private Font _theFont;

...
contentStream.setFont(_theFont);
contentStream.newLineAtOffset(x,y);
contentStream.showText("Hello, world");
...


Many many times. The Font object reference stays the same, so I'm
guessing that's okay and the font is used once and referenced many
times, right?

Yes!

To create small PDF files, use PDType0Font.load() instead of PDTrueTypeFont.load(), this will subset the fonts after saving.



And try to have only one content stream per page. (We recently had
a guy who had a huge number of content streams and wondered why his
PDF was so big).
Check: we have only one PDPageContentStream per page.

We have a single logo on the first page and nothing repeated.

Our PDFs are almost 100% plain-text with lots of whitespace (which
doesn't count, I know). When base64 encoded, they are typically only a
few kb in size.

I'm mostly operating from a position of borderline unhealthy paranoia,
but I'd rather have a bit of code added to ensure that I don't have to
get paged in the middle of the night to restart a service that has
suffered an OOME.

This all sounds harmless. All the memory problems I can think of were related to rendering, not PDF creation.

We've had a least one speed complaint, but that one is solved in the current version.

Tilman



Thanks for the pointers.

- -chris

Am 01.08.2017 um 20:04 schrieb Christopher Schultz: All,

We use PDFBox on a server that must handle many transactions with
(somewhat) limited memory. I'd like to limit the amount of memory
used to generate our PDFs, which we then serialize to byte-array,
base64-encode, etc. for ultimate delivery to some endpoint.

I can obviously limit the number of bytes produced by using a
size-limited OutputStream passed-into
PDDocument.save(OutputStream), but I'm wondering if PDFBox has any
facilities within it to limit the size of the object-tree in memory
(or estimate its size, and we can stop operations when it reaches a
certain size) so that we don't end up with a multi-GB object-tree
that then fails to serialize to byte[] because it is too big.

We are building our PDF documents from scratch, starting with the
page definitions, fonts, etc. then adding titles, paragraphs of
text, etc. It's all fairly straightforward, and we have full
control over the whole process up to and including the call to
PDDocument.save(OutputStream).

We are manually constructing our pages as well, so I suppose we
could simply limit the number of pages, but I'm more concerned
about the size of the memory used and not the number of pages.

Is there anything in PDFBox that can help us with this? We can
always count e.g. the number of bytes/characters we have written to
the PDF, but that seems less important than what is going on inside
of the PDF structure itself.

-chris
---------------------------------------------------------------------


To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------


To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJZgN/2AAoJEBzwKT+lPKRYlLUQAK/eAna/kwigraXZ/ghwfB+U
qe36r5yqUc9TMmCa7cunJuLJxMAnH6UnbNzNJm4IChMXmtLk++uF9YMKpPN0irQr
RxAaNlUbNpnyJqXR/W/7ZTVo4gP2l7JYQqARcSLjxuROLqALF1jp8BoXMw0Zz8L4
rfEub/dVk3EIBvg+ithGeqzzb67yoPEbCP9LVsXoxyvrTER1mB28BmmSZsw2hVD5
HLKzmu3e4XLXdi+MKBfJfF0Y+S4/7/yq+4f0KBq/AD7VlNeUwOv6j0kiVkT5Tdv/
tJGtheC1M6dXVLqQD7/G/q37/kdgCeG12yTbpw8FUMbfn4yHrtd8Fqmxz6au8qpm
Fu0xhGy1SobxiGXgpFCNED0fdGz0f56TYFPb8KgtAveHZuoPlDcyq9WdDThRl/zn
Oxs1ytkFf4W0RbdNcR/wtQLxVUVbPUuNE5gFKqNf282H7fj5q/I3cyCmafUnecz0
bjcHfCS4EpciYnfJT1OihRGDGBXSHZfwXEqFva8hyQ5cRLWuyqsz8Ii2DaiLoe4g
Y8pP3/dWNV5SgtQxrgVAScry10G06ybIoYj9rXz/QW6a30Hj4Dt2bFrr/n/FS1L9
G3qtsg41hXRMXT5Oly0WzgYv+fwfNCO3pJ4MB7dpuNHcTsi1Jp/capK7oA5aKqEn
bo9GBaEOciUoVYbP1vb+
=F6Jq
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to