Try specifying the second parameter of the load method so that it uses a temp file instead of an in-memory variable. That can help when dealing with very large or complex files, I believe.
On Tue, Nov 13, 2018 at 8:23 AM Nick Westerly <[email protected]> wrote: > Hi - > > I am trying to load a document that has a lot of annotations (50k+) (i.e. > comments, highlights, etc) However, just calling 'load' on the document is > extremely slow, and uses a lot of memory (2G+). > > I actually don't need to use or access annotations at all (I'm using PDFBOX > through a separate library that doesn't need them), but do need access to > the PDDocument. Is there a way to load a document, but ignore all > annotations when parsing? Similarly, ignoring all items such as fonts > associated with those annotation objects. > > I was browsing through PDFParser#initialiParse and COSParser, but a little > out of my depth. > Even something as simple as ignoring objects if they are of some 'type' i > could check. > > Any suggestions, even partial, would be helpful. > > Thanks. > > Nick >

