Try specifying the second parameter of the load method so that it uses a
temp file instead of an in-memory variable. That can help when dealing with
very large or complex files, I believe.

On Tue, Nov 13, 2018 at 8:23 AM Nick Westerly <[email protected]> wrote:

> Hi -
>
> I am trying to load a document that has a lot of annotations (50k+) (i.e.
> comments, highlights, etc) However, just calling 'load' on the document is
> extremely slow, and uses a lot of memory (2G+).
>
> I actually don't need to use or access annotations at all (I'm using PDFBOX
> through a separate library that doesn't need them), but do need access to
> the PDDocument. Is there a way to load a document, but ignore all
> annotations when parsing? Similarly, ignoring all items such as fonts
> associated with those annotation objects.
>
> I was browsing through PDFParser#initialiParse and COSParser, but a little
> out of my depth.
> Even something as simple as ignoring objects if they are of some 'type' i
> could check.
>
> Any suggestions, even partial, would be helpful.
>
> Thanks.
>
> Nick
>

Reply via email to