Re: WARNING: Did not found XRef object at specified startxref position

Thomas Chojecki Sat, 02 Nov 2013 01:51:50 -0700


Zitat von Rodrigo Caniçali <[email protected]>:

Hi,

Hi Rodrigo,

I found on a mailing list of 2012-jun-14 that this problem has beenalready discussed, but here is pretty different.

I think I found the discussion.

I also get the warning "Did not found XRef object at specifiedstartxref position xxx" when executing the main functionof org.apache.pdfbox.ExtractText class. However, some PDF texts areignored and are not printed on the output TXT file. These same textsare displayed by Acrobat Reader and can be copyed by the user astexts from this program.

Your document is broken and it work with Acrobat Reader, because heisn't strict enough against the specification.

Many developer that try to create a pdf writer, test it against theAcrobat Reader and does not follow always the specification. So thereference is to create Acrobat Reader and not specification conformantdocuments. This lead to the problem that 3rd party libraries likepdfbox can't sometimes parse such documents.

In your case the xref table isn't there, where the parser supposingit. If you can provide use such document, we can try to find the causeof the problem and maybe fixing it.

If the option "-nonSeq" is selected, then appears a"java.io.IOException: Error: Expected a long type, actual=..." whichstops the text extraction.

Maybe you can post the first three lines from the stacktrace, thiswill help debugging the problem.

Please, is there any way to make it work?

It is nearly impossible reconstructing such cases. If you can provideus more informations or maybe the document, it will help use improvingthe parser, if possible.

We do our best to support as many document as we can, but in somecases we need to be strict to support the existing fine parsingdocuments. This problem is also one point on the agenda of the pdfbox2.0.0 version.


Thanks,

Rodrigo


Best regards
Thomas

Re: WARNING: Did not found XRef object at specified startxref position

Reply via email to