Am 02.08.2017 um 01:17 schrieb Christopher Schultz:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Tilman,
On 8/1/17 4:42 PM, Tilman Hausherr wrote:
Am 01.08.2017 um 22:09 schrieb Christopher Schultz: Tilman,
On 8/1/17 3:22 PM, Tilman Hausherr wrote:
The only thing that comes close to what you want is to create
your PDDocument with MemoryUsageSetting.setupMixed(...) as
parameter.
So that we can buffer to disk if the in-memory representation gets
too big? That sounds like a good approach, and probably the most
useful to m e.
It also appears that I can set a maximum in-memory limit like
this:
MemoryUsageSetting mus = MemoryUsageSetting.setupMainMemoryOnly(1
* 1024 * 1024); PDDocument doc = new PDDocument(mus);
Yes. Although this would mean you'd get an exception if you use
more. That's why I recommend the mixed one. You could use the
memory limit for stress tests, i.e. create the "worst" possible
file and see what you need.
I think I'm okay with an exception in these cases. As I said, our PDFs
only end up being a few kiB in size, so I've put a 1MiB cap on the
memory-only memory usage strategy for the time being.
I'm curious about what's being constrained, here... does PDFBox
estimate its current memory-usage of various PD* objects in memory and
push to disk when that's exceeded, or does it just limit the amount of
memory that gets used when serializing out to a stream.
There is no estimate... it writes in the dedicated space and if it is
full, it's either exception (if memory only) or writing to disk cache.
Note that only streams are cached. Ordinary java structures (e.g.
maps, numbers, strings) are not.
Can you tell me a little more about that? When you say "streams are
cached", what does that mean exactly?
Or have I essentially already asked that question above?
Yes... it's mostly images, fonts and page content streams.
... and then this should enforce a 1MiB size limit, no? I think
that's all I want... there shouldn't be any reason for me to have
to touch the disk: my files are really quite small. I just don't
want something to go wrong with my client code and inadvertently go
into an infinite loop adding "Hello World" to the document over and
over until I have 50k pages in the PDF and an OOME on my hands.
What you should do is to care to not have anything duplicate.
So if you have a company logo on every page, create your
object object only once. Same for fonts.
We have something like:
private Font _theFont;
... contentStream.setFont(_theFont);
contentStream.newLineAtOffset(x,y); contentStream.showText("Hello,
world"); ...
Many many times. The Font object reference stays the same, so I'm
guessing that's okay and the font is used once and referenced many
times, right?
Yes!
To create small PDF files, use PDType0Font.load() instead of
PDTrueTypeFont.load(), this will subset the fonts after saving.
We are using PDType1Font.FONTNAME for everything, so we aren't calling
.load for anything at all.
That is even better, because it doesn't use any additional space (and is
faster too). Your application is a very simple one :-)
You really should worry about other things... choose one or many:
climate change, russian hackers, terrorism, rising interest rates,
traffic jams, heavy rain flooding your basement, people who don't wash
their hands, whatever :-)
Tilman
And try to have only one content stream per page. (We
recently had a guy who had a huge number of content streams
and wondered why his PDF was so big).
Check: we have only one PDPageContentStream per page.
We have a single logo on the first page and nothing repeated.
Our PDFs are almost 100% plain-text with lots of whitespace (which
doesn't count, I know). When base64 encoded, they are typically
only a few kb in size.
I'm mostly operating from a position of borderline unhealthy
paranoia, but I'd rather have a bit of code added to ensure that I
don't have to get paged in the middle of the night to restart a
service that has suffered an OOME.
This all sounds harmless. All the memory problems I can think of
were related to rendering, not PDF creation.
Sounds good.
We've had a least one speed complaint, but that one is solved in
the current version.
I'll make sure we are up-to-date.
Thanks,
- -chris
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQIcBAEBCAAGBQJZgQwMAAoJEBzwKT+lPKRYws4P/RvvC0+6xp5fMINPAey98Pj6
cxTSeAkm0RsLl9lZrCxBjVRHNGsKBd1G70fgFEp6uB+5tU14Na0m1nZZ2WNGtiko
dwTseWL/m/FiggHDrzsT+RQVlbBoUzhBpyHYmEkRnbfQnS98eE0ZTSlN59IAStzn
yD7jFEds/nJucJZk9O6so9lOa9waGMf+s2MEp1YfMizytuIRK4ch3JG5/cBVQa8S
2W3J/Y/fIQWXOAx433XuVG9rC00RKtaMJahjOwyhmUIznNlR/yGH+0iiqwziUyXX
UtqsPTyFrGHQcHr4gaiewug6V//P5HC+XYhqyU0AR1EJolYSGXPY0UtRuTgCtAQ0
FXFjaYPppumKCjV9QMIfRcps7XclwoV/kiip5H3DIZwIL81PRE3rjthuE75uAjps
OEtGWjte9DDfDkkV6gudp0DmCBWq6oMyw7m4vm7rLACPXt0ziZtEKU698N7m88T6
vFxLtZloUbGVj0UAe4Sr6e31fw+5+dp2gpFNgKSP8FBGWAGLA+6srSA9sucpsqev
yG4QgReFNclDgO7i/6H5W1DcNZeTOwLJ+vT5BJafSvgHBGhLGy3F1uM3IyeFMgf7
XBHr4Em8p41aGS0BCvtGQ+xFMPCPKIHEvZxLZ+1JxboS0g5+KT8LHnCWvXjc6gSa
w9Dyle4TNPUoJHp24k/p
=YM5j
-----END PGP SIGNATURE-----
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]