Maruan and Duff, This is my first experience using a help forum like this, and the response was great. I appreciate the help.
I will look into the documentation and hopefully be able to figure out what I am doing wrong. Colette -----Original Message----- From: Duff Johnson [mailto:[email protected]] Sent: June-13-14 1:59 PM To: [email protected] Subject: Re: Unable to mark document as tagged Colette, It might be a good idea to take a look at 14.8 of ISO 32000-1, which defines tagged PDF. You can download it for free: http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf Duff. On Jun 13, 2014, at 1:52 PM, Maruan Sahyoun <[email protected]> wrote: > Colette, > > you are not corrupting the PDF document but the structure Information needed > for tagged PDF is missing. > > Maruan Sahyoun > >> Am 13.06.2014 um 19:41 schrieb Colette Joubarne >> <[email protected]>: >> >> Maruan, >> >> I use the parser to tokenize, and then loop thru the tokens. If a token is a >> TJ or Tj operator, I grab the text, in certain cases I replace some of the >> text (letter by letter, maintaining the existing structure), and add these >> tokens to a new token list. If it is not a TJ or Tj operator I just copy the >> token to the new token list. I then write the token list to the doc and save. >> >> If I am corrupting the structure, how is it that the document displays >> correctly? >> >> Colette >> >> -----Original Message----- >> From: Maruan Sahyoun [mailto:[email protected]] >> Sent: June-13-14 12:54 PM >> To: [email protected] >> Subject: Re: Unable to mark document as tagged >> >> Hi Colette, >> >> the modified version does not contain the structure information needed for >> tagged PDFs. How do you create the modified version from the first one? >> >> BR >> Maruan >> >>> Am 13.06.2014 um 17:48 schrieb Colette Joubarne >>> <[email protected]>: >>> >>> Maruan, >>> >>> I am copying the entire structure from a tagged document and just replacing >>> some of the text, so I would think that the structure is unchanged. Then >>> again who knows what I might have messed up. >>> >>> James-pdf is the original file: >>> https://dl.dropboxusercontent.com/u/7689859/James.pdf >>> >>> James-mod.pdf is the modified file: >>> https://dl.dropboxusercontent.com/u/7689859/James-mod.pdf >>> >>> Colette >>> >>> -----Original Message----- >>> From: Maruan Sahyoun [mailto:[email protected]] >>> Sent: June-13-14 10:45 AM >>> To: [email protected] >>> Subject: Re: Unable to mark document as tagged >>> >>> Hi Colette, >>> >>> this information alone doesn't make a document a tagged PDF! You might not >>> have the structure information needed within your PDF. Would you have a >>> works / doesn't work sample which you could upload to a public location as >>> attachments are not allowed on the mailing list? >>> >>> BR >>> Maruan >>> >>>> Am 13.06.2014 um 15:44 schrieb Colette Joubarne >>>> <[email protected]>: >>>> >>>> Maruan, >>>> >>>> Yes you are right, however why is it that when I look at the properties in >>>> Adobe Reader it indicates that the document is not tagged? >>>> >>>> 3 0 obj >>>> << >>>> /Marked true >>>> >>>> Colette >>>> -----Original Message----- >>>> From: Maruan Sahyoun [mailto:[email protected]] >>>> Sent: June-13-14 9:19 AM >>>> To: [email protected] >>>> Subject: Re: Unable to mark document as tagged >>>> >>>> Dear Colette, >>>> >>>> /MarkInfo 3 0 R indicates that the information you are looking for is >>>> referenced and should be available in 3 0 obj. Could you verify that? >>>> >>>> With kind regards >>>> >>>> Maruan >>>> >>>>> Am 13.06.2014 um 14:21 schrieb Colette Joubarne >>>>> <[email protected]>: >>>>> >>>>> I have a tagged pdf doc with the following header: >>>>> >>>>> /Type/Catalog/Pages 2 0 R/Lang(en-CA) /StructTreeRoot 10 0 >>>>> R/MarkInfo<</Marked true >>>>> >>>>> I read in the contents, replace some of the text and create a new doc. I >>>>> copy the document information from the original doc and set marked to >>>>> true. >>>>> >>>>> newDoc = new PDDocument(); >>>>> >>>>> newDoc.setDocumentInformation(PTConstants.pdfDoc.getDocumentInformation()); >>>>> >>>>> PDMarkInfo markinfo = new PDMarkInfo(); >>>>> markinfo.setMarked(true); >>>>> newDoc.getDocumentCatalog().setMarkInfo(markinfo); >>>>> >>>>> and when I check that it was set, it returns true: >>>>> >>>>> PDMarkInfo markInfo = >>>>> PTConstants.pdfDoc.getDocumentCatalog().getMarkInfo(); >>>>> if ((markInfo != null) && (markInfo.isMarked())) >>>>> System.out.println("true"); >>>>> >>>>> But, while the resulting document displays correctly, the header >>>>> indicates that it is not tagged: >>>>> >>>>> /Type /Catalog >>>>> /Version /1.4 >>>>> /Pages 2 0 R >>>>> /MarkInfo 3 0 R >>>>> >>>>> Any idea what is going on? >>>>> >>>>> Colette >>

