My question is: was this intentional to silently fail? We realize that with the wide amount of content that we receive that there are going to be "bad" PDFs which is fine, but currently we are relying on PDFBox to tell us *when* it is something that we shouldn't continue any further post-processing on or not but if it silently fails, we think that if nothing blows up that it means that we've received all of the pages. If we were to go to alpha3, this would not be a true assumption any longer.
This has been for years that we have allowed all sort of broken PDFs to pass, because this was the majority of the users wish, expressed by the often repeated emotional text "But it renders with Adobe Reader!".
Using PDFBox to check whether a PDF is valid isn't a good idea. Try a tool like JHOVE.
Tilman
Effectively we loop through a PDF to extract pages like so: Splitter splitter = new Splitter(); for(PDDocument page : splitter.split(document)) { // save each page for consumption later } Thanks in advance for any information that you can provide regarding our expectations of this behavior. - Levi
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org