Thanks for your answer. But I really need to process the streams one by one (a special requirement in my project).
Anyways, your answer gave me an idea for detecting the issue: I can compare the tokens for the individual streams with the tokens from pdPage.getContents().... double processing, but still useful. Any other ideas are wellcome. Esteban ________________________________ De: Maruan Sahyoun <[email protected]> Enviado: lunes, 05 de febrero de 2018 03:25 p.m. Para: [email protected] Asunto: Re: Stream parsing issue in multi-stream page Hi, > Am 05.02.2018 um 15:43 schrieb Esteban R <[email protected]>: > > Hello. I need to rewrite a PDPage with many streams, one by one (making some > transformations, and there is a special need to do it one stream at a time). > Parsing (and pdfdebug) returns "wrong" tokens if one command begins at the > end of the first stream and ends at the begining of the next one. I'm using > pdfbox-2.0.8. > > Rewriting the stream with those tokens produces a corrupted page. > How could we re-write the page without getting a corrupted page? > Or, at least, how can we detect this kind of failures (or this one)? > > Please find a simplified example here: > http://www.filedropper.com/out3unc > > The first stream is: > /F1 10 Tf > BT > 40 764.138 Td > 0 -12.138 Td > [ > > and the second one is: > (CD) ] TJ > ET > > In this case, running the following code: > Iterator<PDStream> itStreams = pdPage.getContentStreams(); > while (itStreams.hasNext()) { > PDStream pdstream = itStreams.next(); > PDFStreamParser parser = new > PDFStreamParser(pdstream.toByteArray()); > parser.parse(); > List<Object> tokens = parser.getTokens(); > for (Object token: tokens){ > System.out.println("Token: "+token); > } > } > instead of using pdPage.getContentStreams() and parsing the stream individually use pdPage.getContents() and read all content into a byte[]. You can then pass that to PDFStreamParser. That will give you this output Token: COSName{F1} Token: COSInt{10} Token: PDFOperator{Tf} Token: PDFOperator{BT} Token: COSInt{40} Token: COSFloat{764.138} Token: PDFOperator{Td} Token: COSInt{0} Token: COSFloat{-12.138} Token: PDFOperator{Td} Token: COSArray{[COSString{CD}]} Token: PDFOperator{TJ} Token: PDFOperator{ET} BR Maruan > shows: > Token: COSName{F1} > Token: COSInt{10} > Token: PDFOperator{Tf} > Token: PDFOperator{BT} > Token: COSInt{40} > Token: COSFloat{764.138} > Token: PDFOperator{Td} > Token: COSInt{0} > Token: COSFloat{-12.138} > Token: PDFOperator{Td} > Token: COSArray{[]} !!!!! empty array detected, end of > first stream > Token: COSString{CD} !!!!! begining of second stream > Token: COSNull{} !!!!! closing "]" > Token: PDFOperator{TJ} > Token: PDFOperator{ET} > > > Esteban --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

