Stream parsing issue in multi-stream page

Esteban R Mon, 05 Feb 2018 06:44:00 -0800

Hello. I need to rewrite a PDPage with many streams, one by one (making some 
transformations, and there is a special need to do it one stream at a time). 
Parsing (and pdfdebug) returns "wrong" tokens if one command begins at the end 
of the first stream and ends at the begining of the next one. I'm using 
pdfbox-2.0.8.


Rewriting the stream with those tokens produces a corrupted page.
How could we re-write the page without getting a corrupted page?
Or, at least, how can we detect this kind of failures (or this one)?

Please find a simplified example here:
http://www.filedropper.com/out3unc

The first stream is:
/F1 10 Tf
BT
40 764.138 Td
0 -12.138 Td
[

and the second one is:
(CD) ] TJ
ET

In this case, running the following code:
        Iterator<PDStream> itStreams = pdPage.getContentStreams();
        while (itStreams.hasNext()) {
            PDStream pdstream = itStreams.next();
            PDFStreamParser parser = new 
PDFStreamParser(pdstream.toByteArray());
            parser.parse();
            List<Object> tokens = parser.getTokens();
            for (Object token: tokens){
                System.out.println("Token: "+token);
            }
        }

shows:
Token: COSName{F1}
Token: COSInt{10}
Token: PDFOperator{Tf}
Token: PDFOperator{BT}
Token: COSInt{40}
Token: COSFloat{764.138}
Token: PDFOperator{Td}
Token: COSInt{0}
Token: COSFloat{-12.138}
Token: PDFOperator{Td}
Token: COSArray{[]}                    !!!!! empty array detected, end of first 
stream
Token: COSString{CD}                 !!!!! begining of second stream
Token: COSNull{}                         !!!!! closing "]"
Token: PDFOperator{TJ}
Token: PDFOperator{ET}


Esteban

Stream parsing issue in multi-stream page

Reply via email to