Re: Loading documents with a large amount of annotations

Tilman Hausherr Mon, 12 Nov 2018 23:28:50 -0800

Hi Nick,

PDFBox doesn't support parse on demand, so the only solution is toincrease memory (-Xmx).

Yes you could hack into COSParser to ignore /Annots objects in adictionary. I don't know what mayhem will happen.


Tilman

Am 13.11.2018 um 07:17 schrieb Nick Westerly:

Hi -

I am trying to load a document that has a lot of annotations (50k+) (i.e.
comments, highlights, etc) However, just calling 'load' on the document is
extremely slow, and uses a lot of memory (2G+).

I actually don't need to use or access annotations at all (I'm using PDFBOX
through a separate library that doesn't need them), but do need access to
the PDDocument. Is there a way to load a document, but ignore all
annotations when parsing? Similarly, ignoring all items such as fonts
associated with those annotation objects.

I was browsing through PDFParser#initialiParse and COSParser, but a little
out of my depth.
Even something as simple as ignoring objects if they are of some 'type' i
could check.

Any suggestions, even partial, would be helpful.

Thanks.

Nick



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Loading documents with a large amount of annotations

Reply via email to