Maruan, I use the parser to tokenize, and then loop thru the tokens. If a token is a TJ or Tj operator, I grab the text, in certain cases I replace some of the text (letter by letter, maintaining the existing structure), and add these tokens to a new token list. If it is not a TJ or Tj operator I just copy the token to the new token list. I then write the token list to the doc and save.
If I am corrupting the structure, how is it that the document displays correctly? Colette -----Original Message----- From: Maruan Sahyoun [mailto:[email protected]] Sent: June-13-14 12:54 PM To: [email protected] Subject: Re: Unable to mark document as tagged Hi Colette, the modified version does not contain the structure information needed for tagged PDFs. How do you create the modified version from the first one? BR Maruan Am 13.06.2014 um 17:48 schrieb Colette Joubarne <[email protected]>: > Maruan, > > I am copying the entire structure from a tagged document and just replacing > some of the text, so I would think that the structure is unchanged. Then > again who knows what I might have messed up. > > James-pdf is the original file: > https://dl.dropboxusercontent.com/u/7689859/James.pdf > > James-mod.pdf is the modified file: > https://dl.dropboxusercontent.com/u/7689859/James-mod.pdf > > Colette > > -----Original Message----- > From: Maruan Sahyoun [mailto:[email protected]] > Sent: June-13-14 10:45 AM > To: [email protected] > Subject: Re: Unable to mark document as tagged > > Hi Colette, > > this information alone doesn't make a document a tagged PDF! You might not > have the structure information needed within your PDF. Would you have a works > / doesn't work sample which you could upload to a public location as > attachments are not allowed on the mailing list? > > BR > Maruan > > Am 13.06.2014 um 15:44 schrieb Colette Joubarne > <[email protected]>: > >> Maruan, >> >> Yes you are right, however why is it that when I look at the properties in >> Adobe Reader it indicates that the document is not tagged? >> >> 3 0 obj >> << >> /Marked true >>>> >> >> Colette >> -----Original Message----- >> From: Maruan Sahyoun [mailto:[email protected]] >> Sent: June-13-14 9:19 AM >> To: [email protected] >> Subject: Re: Unable to mark document as tagged >> >> Dear Colette, >> >> /MarkInfo 3 0 R indicates that the information you are looking for is >> referenced and should be available in 3 0 obj. Could you verify that? >> >> With kind regards >> >> Maruan >> >> Am 13.06.2014 um 14:21 schrieb Colette Joubarne >> <[email protected]>: >> >>> I have a tagged pdf doc with the following header: >>> >>> /Type/Catalog/Pages 2 0 R/Lang(en-CA) /StructTreeRoot 10 0 >>> R/MarkInfo<</Marked true >>> >>> I read in the contents, replace some of the text and create a new doc. I >>> copy the document information from the original doc and set marked to true. >>> >>> newDoc = new PDDocument(); >>> >>> newDoc.setDocumentInformation(PTConstants.pdfDoc.getDocumentInformation()); >>> >>> PDMarkInfo markinfo = new PDMarkInfo(); >>> markinfo.setMarked(true); >>> newDoc.getDocumentCatalog().setMarkInfo(markinfo); >>> >>> and when I check that it was set, it returns true: >>> >>> PDMarkInfo markInfo = >>> PTConstants.pdfDoc.getDocumentCatalog().getMarkInfo(); >>> if ((markInfo != null) && (markInfo.isMarked())) >>> System.out.println("true"); >>> >>> But, while the resulting document displays correctly, the header indicates >>> that it is not tagged: >>> >>> /Type /Catalog >>> /Version /1.4 >>> /Pages 2 0 R >>> /MarkInfo 3 0 R >>> >>> Any idea what is going on? >>> >>> Colette >> >

