Follow up: On Tue, Jul 26, 2011 at 12:15:13PM +0200, Michal Hocko wrote: > I have looked at the document and at the first glance the update hasn't > screwed anything obvious. > At first I thought that we haven't updated the number of objects (stored > in the Xref stream in the original revision and Trailer in the new > revision) because those numbers are same for both while we have > obviously added new objects. This turned out to be OK because object > numbers are sparse and we are reusing those numbers which are not > used. > > Then I have looked at the Root object which is reported to be missing > and this started to look interesting. > Original revision reports: > 825 0 obj << > /Type /XRef > /Index [0 826] > /Size 826 > /W [1 3 1] > /Root 823 0 R > /Info 824 0 R > /ID [<9B0D6E3CC66605F7CE12FB9EAAB1356F> > <9B0D6E3CC66605F7CE12FB9EAAB1356F>] > /Length 2230 > /Filter /FlateDecode > >> > > and the new one: > trailer > << > /Size 826 > /Root 823 0 R > /Info 824 0 R > /ID [ <9b0d6e3cc66605f7ce12fb9eaab1356f> > <9b0d6e3cc66605f7ce12fb9eaab1356f> ] > /Prev 773827 > >> > > It is an object with reference number [823 0]. The problem is that I > cannot see that object in the file: > $ grep --binary-files=text "823 0 obj" eflow2.pdf > $ > > I guess that it is just embeded somewhere because I can see it with our > tools: > ./toos/pdf_object_printer --ref "823 0" --file ~/tmp/eflow2.pdf > Document: "/home/miso/tmp/eflow2.pdf" > [823 0]: > << > /Type /Catalog > /Pages 800 0 R > /Outlines 801 0 R > /Names 822 0 R > /PageMode /UseOutlines > /PageLabels << > /Nums [ 0 << > /S /D > >> 1 << > /S /D > >> ] > >> > /OpenAction 30 0 R > >> >
So the Catalog object [823 0] is really compressed in ObjStm (stream object) [815 0] which looks as follows (I have skipped objects that are of no interest at the moment): $ ./tools/pdf_object_printer --ref "815 0" --decode 1 --file ~/tmp/eflow2.pdf Document: "/home/miso/tmp/eflow2.pdf" [815 0]: 814 0 816 153 817 314 818 398 819 492 820 584 821 651 822 721 823 742 [...] << /Type /Catalog /Pages 800 0 R /Outlines 801 0 R /Names 822 0 R /PageMode/UseOutlines/PageLabels<</Nums[0<</S/D>>1<</S/D>>]>> /OpenAction 30 0 R >> The xref table which defines your change looks like: xref 44 1 0000776307 00000 n 71 1 0000776400 00000 n 88 1 0000779952 00000 n 97 1 0000782990 00000 n 798 1 0000786170 00000 n 800 1 0000786290 00000 n No section refers to the object 823. So what could be wrong? My gut feeling says me that Acrobat is "buggy" here. All the above is saying that all new objects have been added correctly and the document structure is accessible. The problem seems to be that the original revision uses cross reference stream while the incremental update uses xref table. This is perfectly legal according to PDF specification AFAIU. PDFedit as well as other code based on the original xpdf code (same with poppler) parses all cross reference tables/streams first so we know where all objects are stored. We do not care much about xref tables vs. streams because that is handled when an indirect object is referenced. I guess that Acrobat is complaining because the Root [823 0] object is a part of object stream that is not immediately visible from the xref table directly. Whether this is complying to the specification is not 100% clear to me. Specification says (3.4.6 Object Streams): " Indirect references to objects inside object streams use the normal syntax: for example, 14 0 R. Access to these objects requires a different way of storing cross-reference information; see Section 3.4.7, “Cross-Reference Streams.” Although an application must support PDF 1.5 to use compressed objects, the objects can be stored in a manner that is compatible with PDF 1.4. Applications that do not support PDF 1.5 can ignore the objects; see “Compatibility with PDF 1.4” on page 85. " As you can see there _is_ a cross reference stream for this object. A section about incremental update says (3.4.5 Incremental Updates): " In an incremental update, any new or changed objects are appended to the file, a cross-reference section is added, and a new trailer is inserted. The resulting file has the structure shown in Figure 3.3. A complete example of an updated file is shown in Section G.6, “Updating Example.” The cross-reference section added when a file is updated contains entries only for objects that have been changed, replaced, or deleted. Deleted objects are left unchanged in the file, but are marked as deleted by means of their cross-reference entries. The added trailer contains all the entries (perhaps modified) from the previous trailer, as well as a Prev entry giving the location of the previous cross- reference section (see Table 3.13 on page 73). As shown in Figure 3.3, a file that has been updated several times contains several trailers; each trailer is terminated by its own end-of-file (%%EOF) marker. " There are no restrictions about combining xref stream vs. table mentioned here. OK, enough lawyering here. I would try to use a newer Acroread (mine is 9.2 and it is affected as well) or report that to Acrobat or use PDFedit to flatten the file (this will create a new document with all reachable object with a xref table and then you can update it without issues). Hope it will help. -- Michal Hocko ------------------------------------------------------------------------------ Magic Quadrant for Content-Aware Data Loss Prevention Research study explores the data loss prevention market. Includes in-depth analysis on the changes within the DLP market, and the criteria used to evaluate the strengths and weaknesses of these DLP solutions. http://www.accelacomm.com/jaw/sfnl/114/51385063/ _______________________________________________ Pdfedit-support mailing list Pdfedit-support@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pdfedit-support