Hi there,

I'm trying to validate random pdfs (potentially huge - 100s of MBs)
according to the following rule set:
- Dimensions of all pages should be A4 (297 mm * 210 mm)
- There should be no content within a certain rectangular area of a page
(left margin where the print shop inserts a bar code)
- Number of pages should be less than N
- PDF version used

So far we've been using

PDDocument.load with a scratch file, but with huge documents (e.g. product
catalogues), things explode.
Is there a way to stream parse a PDF similar to stream parsing an XML
document (e.g. using StAX) and validate one page at a time?

Cheers

Stefan

Reply via email to