Re: Error on PDDocument.load

John Hewson Wed, 11 Feb 2015 09:43:05 -0800

Can we get a JIRA issue open for this, preferably with the file attached?

-- John


> On 11 Feb 2015, at 00:29, Tilman Hausherr <thaush...@t-online.de> wrote:
> 
> Yes, they made hacks. So did we, for many types of malformed files. Please 
> send the file also to Andreas, unless you already did, he did many 
> workarounds for malformed files.
> 
> Tilman
> 
>> Am 11.02.2015 um 09:05 schrieb Kevin Morin:
>> Ok. Why other softwares are able to open it (like xpf)? I guess they made a 
>> hack to fix this? Are you going to do something too?
>> 
>> Thanks
>> BR
>> 
>> Kevin
>> 
>>> On 11/02/2015 08:53, Tilman Hausherr wrote:
>>> Hi,
>>> 
>>> I can reproduce the error. Your file is malformed. Please open it with
>>> NOTEPAD++ and go to the end:
>>> 
>>> xref
>>> 1 7
>>> 0000000000 65535 f
>>> 0000000009 00000 n
>>> 0000358745 00000 n
>>> 0000358842 00000 n
>>> 0000359029 00000 n
>>> 0000359087 00000 n
>>> 0000359138 00000 n
>>> trailer
>>> 
>>> The first number (1) means the number of the first object. So it would
>>> be 1. The second number(7) is the size of the table. The number 1 is
>>> incorrect, it should be 0, because "0000000000 65535 f" is the dummy
>>> object 0. Press CTRL-G and enter the offsets (e.g. 9, 45, 358745, ...)
>>> and you will see what I mean.
>>> 
>>> From the pdf spec:
>>> 
>>> The free entries in the cross-reference table form a linked list, with
>>> each free entry containing the object number of the next. The first
>>> entry in the table (object number 0) is always free and has a generation
>>> number of 65,535; it is the head of the linked list of free objects
>>> 
>>> Tilman
>>> 
>>> 
>>>> Am 11.02.2015 um 08:21 schrieb Kevin Morin:
>>>> Hi,
>>>> 
>>>> I am sorry, it seems that I did not send you the right file...
>>>> Actually, I was testing the wrong file on linux from the begining
>>>> also. The file is displaying blank also on linux and on java 7 or 8...
>>>> Here is the right file.
>>>> 
>>>> I am sorry to make you work for nothing...
>>>> 
>>>> BR
>>>> 
>>>> Kevin
>>>> 
>>>> 
>>>>> On 10/02/2015 21:32, Tilman Hausherr wrote:
>>>>> So we e-mailed and the result is
>>>>> - you're really working on W2008 with the file that you sent me
>>>>> - you get the same error on W2008 with the app (and I don't)
>>>>> 
>>>>> I have analysed that file and did some debug traces. If loading that on
>>>>> W2008 is a no-no, you'd have to build from source and I'll tell you the
>>>>> changes.
>>>>> 
>>>>> http://home.snafu.de/tilman/tmp/pdfbox-app-2.0.0-TILMAN.jar
>>>>> 
>>>>> Don't use that version for production. It contains lots of stuff for my
>>>>> own tests. Only use it for this problem. Here's the output that you
>>>>> should get:
>>>>> 
>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.COSParser
>>>>> parseXrefStream
>>>>> INFORMATION: parseXrefStream: objByteOffset = 116
>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>> parse
>>>>> INFORMATION: PDFXrefStreamParser: 7 0 obj at offset: 16
>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>> parse
>>>>> INFORMATION: PDFXrefStreamParser: 8 0 obj at offset: 573
>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>> parse
>>>>> INFORMATION: PDFXrefStreamParser: 9 0 obj at offset: 633
>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>> parse
>>>>> INFORMATION: PDFXrefStreamParser: 10 0 obj at offset: 817
>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>> parse
>>>>> INFORMATION: PDFXrefStreamParser: 11 0 obj at offset: 914
>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>> parse
>>>>> INFORMATION: PDFXrefStreamParser: 12 0 obj at offset: 116
>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>> parse
>>>>> INFORMATION: PDFXrefStreamParser: 13 0 obj at offset: 436
>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.COSParser
>>>>> parseXrefStream
>>>>> INFORMATION: parseXrefStream: objByteOffset = 363505
>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>> parse
>>>>> INFORMATION: PDFXrefStreamParser: 1 0 obj at offset: 359638
>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>> parse
>>>>> INFORMATION: PDFXrefStreamParser: 2 0 obj at offset: 363167
>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>> parse
>>>>> INFORMATION: PDFXrefStreamParser: 3 0 obj at offset: 363307
>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>> parse
>>>>> INFORMATION: PDFXrefStreamParser: 4 0 obj at offset: 363505
>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>> parse
>>>>> INFORMATION: PDFXrefStreamParser: 5 stmnr: 2
>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>> parse
>>>>> INFORMATION: PDFXrefStreamParser: 6 stmnr: 3
>>>>> 
>>>>> What I wonder is if the offsets will be the same.
>>>>> 
>>>>> Tilman
>>>>> 
>>>>> PS: Sorry I usually can't help during EU business hours. Day job :-)
>>>>> 
>>>>> 
>>>>>> Am 09.02.2015 um 11:26 schrieb Kevin Morin:
>>>>>> Hi,
>>>>>> 
>>>>>> I will probably have to migrate to java 8 because of a bug in java 7
>>>>>> which throws an error when rendering a certain type of PDF (cf thread
>>>>>> Error on PDFRenderer.renderImage (PDFBox 2.0)). Could someone please
>>>>>> check why it is not working on Windows Server 2008 R2 Standard? If you
>>>>>> do not have this OS, tell me what I can do to help you.
>>>>>> 
>>>>>> Thanks
>>>>>> BR
>>>>>> 
>>>>>> Kevin
>>>>>> 
>>>>>>> On 21/01/2015 12:26, Andreas Lehmkühler wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>>> Kevin Morin <mo...@codelutin.com> hat am 21. Januar 2015 um 12:14
>>>>>>>> geschrieben:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I thought I was running java 7 but it's java 8... I tried with java 7
>>>>>>>> and it works. I do not need it to work with java 8, java 7 is ok for
>>>>>>>> me.
>>>>>>> It works for me using java 8 on win7 and linux as well. I guess, the
>>>>>>> issue has
>>>>>>> to be something else....
>>>>>>> 
>>>>>>> 
>>>>>>> BR
>>>>>>> Andreas Lehmkühler
>>>>>>> 
>>>>>>>> Thanks for your help and for all your work.
>>>>>>>> 
>>>>>>>> Kevin
>>>>>>>> 
>>>>>>>>> On 21/01/2015 11:54, Maruan Sahyoun wrote:
>>>>>>>>> Hi Kevin
>>>>>>>>> 
>>>>>>>>> works for me - what's your Java Version?
>>>>>>>>> 
>>>>>>>>> BR
>>>>>>>>> Maruan
>>>>>>>>> 
>>>>>>>>>> Am 21.01.2015 um 11:24 schrieb Kevin Morin <mo...@codelutin.com>:
>>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> it does not work with PDFToImage either, I still get a blank
>>>>>>>>>> image. Plus, I
>>>>>>>>>> did not set the nonSeq option however it seems to be using the non
>>>>>>>>>> sequential parser. And I have the following traces:
>>>>>>>>>> janv. 21, 2015 11:20:02 AM
>>>>>>>>>> org.apache.pdfbox.pdfparser.NonSequentialPDFParser ch
>>>>>>>>>> eckXrefOffsets
>>>>>>>>>> GRAVE: Can't find the object 7 0 (origin offset 359138)
>>>>>>>>>> janv. 21, 2015 11:20:03 AM
>>>>>>>>>> org.apache.pdfbox.contentstream.PDFStreamEngine
>>>>>>>>>> opera
>>>>>>>>>> torException
>>>>>>>>>> GRAVE: Missing XObject: Im1
>>>>>>>>>> 
>>>>>>>>>> BR
>>>>>>>>>> 
>>>>>>>>>> Kevin
>>>>>>>>>> 
>>>>>>>>>>> On 21/01/2015 11:11, Maruan Sahyoun wrote:
>>>>>>>>>>> Hi Kevin,
>>>>>>>>>>> 
>>>>>>>>>>> you can test with the PDFToImage command [1] available in from the
>>>>>>>>>>> pdfbox-app [2] if the issue happens there. The source for
>>>>>>>>>>> PDFToImage is
>>>>>>>>>>> available in the tools section of the SVN repo or online viewable
>>>>>>>>>>> [3].
>>>>>>>>>>> 
>>>>>>>>>>> BR
>>>>>>>>>>> Maruan
>>>>>>>>>>> 
>>>>>>>>>>> [1] https://pdfbox.apache.org/1.8/commandline.html#pdfToImage
>>>>>>>>>>> [2]
>>>>>>>>>>> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/
>>>>>>>>>>>  
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> [3]
>>>>>>>>>>> http://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/java/org/apache/pdfbox/tools/PDFToImage.java?view=markup
>>>>>>>>>>>  
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> Am 21.01.2015 um 11:00 schrieb Kevin Morin <mo...@codelutin.com>:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi Andreas,
>>>>>>>>>>>> 
>>>>>>>>>>>> I am using the latest snapshot available on the maven
>>>>>>>>>>>> repository. And I
>>>>>>>>>>>> am running my app on Windows Server 2008 R2 Standard and it does
>>>>>>>>>>>> not work
>>>>>>>>>>>> (white page). Could send me the code or a jar to test on this
>>>>>>>>>>>> server to
>>>>>>>>>>>> check if it does not come from my code?
>>>>>>>>>>>> 
>>>>>>>>>>>> BR
>>>>>>>>>>>> 
>>>>>>>>>>>> Kevin
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 19/01/2015 19:13, Andreas Lehmkuehler wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Am 19.01.2015 um 12:45 schrieb Kevin Morin:
>>>>>>>>>>>>>> Actually, the issue is not only these traces. The real issue
>>>>>>>>>>>>>> is that I
>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>> blank image when I try to render the document.
>>>>>>>>>>>>> I've checked your PDF and everything renders fine. I've tried
>>>>>>>>>>>>> SNAPSHOT-891 on linux (running java 1.8, 1.7 and 1.6) and the
>>>>>>>>>>>>> latest
>>>>>>>>>>>>> SNAPSHOT-947 on win7 running java 1.7
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Maybe your SNAPSHOT is outdated?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> BR
>>>>>>>>>>>>> Andreas Lehmkühler
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 19/01/2015 12:39, Kevin Morin wrote:
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I am using the 2.0 snapshot version to images of pdfs, but on
>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>> documents, I have the following error when I call
>>>>>>>>>>>>>>> PDDocument.load(file):
>>>>>>>>>>>>>>> 2015/01/19 12:32:48 ERROR
>>>>>>>>>>>>>>> (org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1864) -
>>>>>>>>>>>>>>> Can't find
>>>>>>>>>>>>>>> the object 7 0 (origin offset 359138)
>>>>>>>>>>>>>>> 2015/01/19 12:32:48 ERROR
>>>>>>>>>>>>>>> (org.apache.pdfbox.contentstream.PDFStreamEngine:840) -
>>>>>>>>>>>>>>> Missing
>>>>>>>>>>>>>>> XObject:
>>>>>>>>>>>>>>> Im1
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I first had it a few days ago (I did not report it, shame on
>>>>>>>>>>>>>>> me) but
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> error did not occur when I called the loadLegacy method on
>>>>>>>>>>>>>>> PDDocument.
>>>>>>>>>>>>>>> But the loadLegacy method is not available anymore...
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The issue happens on Windows (works fine on Debian).
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks fo your help
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Kevin
>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
>>>>>> For additional commands, e-mail: users-h...@pdfbox.apache.org
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
>>>>> For additional commands, e-mail: users-h...@pdfbox.apache.org
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
>>>> For additional commands, e-mail: users-h...@pdfbox.apache.org
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: users-h...@pdfbox.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Error on PDDocument.load

Reply via email to