RES: copy entire stream of a page ignoring images

José Rodolfo Carrijo de Freitas Fri, 22 Oct 2010 09:14:27 -0700

Is there a fixed way which an image is created with tokens in a pdstream?

After parsing some documents, I ended up gathering that 
When a stream starts with PDFOperator{q} ends with PDFOperator{Q} and has
PDFOperator{Do} in the middle, it is an image.
So I extract all those tokens to remove image from the page.
So, in an stream for example, if I find this set of operators:



PDFOperator{q}, COSInt{596}, COSInt{0}, COSInt{0}, COSInt{840}, COSInt{0},
COSInt{0}, PDFOperator{cm}, COSName{Im1}, PDFOperator{Do}, PDFOperator{Q}

I'll cut them all to remove the image.
But nothing is easy as it seems, so this measure is ruining some fonts of
the page.

Is there someone who understand this better and can give me a light on this
problem?




Atenciosamente,
José Rodolfo Carrijo de Freitas
Analista de Sistemas
Softplan - Departamento de pesquisa e desenvolvimento
Sistema da Qualidade Certificado ISO 9001:2008
(48) 3027 8000 Ramal 8359
http://www.softplan.com.br

-----Mensagem original-----
De: José Rodolfo Carrijo de Freitas [mailto:[email protected]] 
Enviada em: sexta-feira, 22 de outubro de 2010 09:39
Para: [email protected]
Assunto: copy entire stream of a page ignoring images

Hello, 

Im trying to write a function to copy the stream of a page to another page.

The thing is that it seems the PDFStreamParser is not parsing texts, cause
I´m not getting any texts on my new page.

And besides,  I´m getting a warning when opening the newpages on adobe
reader.

Have someone made a similar function, or could give me a little help here?

 

 

Ps: does someone known a pdf utility which could look at elements of a
stream?

 

 

private void copyPageWithoutImage(PDPage page, PDPage newpage) throws
IOException {

            PDStream contents = page.getContents();

            contents.getStream();

            PDFStreamParser parser = new
PDFStreamParser(contents.getStream());

            try {

                  List tokensNovos = new LinkedList();

                  Iterator<Object> iter = parser.getTokenIterator();

                  List arguments = new ArrayList();

                  while (iter.hasNext()) {

                        boolean allowNext = true;

                        Object next = iter.next();

                        Object aux2 = next;

                        if (aux2 instanceof COSName) {

                             COSName objectName2 = (COSName) aux2;

                             System.out.println(objectName2.getName());

                        }

                        if (next instanceof COSObject) {

                             arguments.add(((COSObject) next).getObject());

                        } else if (next instanceof PDFOperator) {

                             if (next instanceof PDFOperator) {

                                   PDFOperator op = (PDFOperator) next;

                                   String operation = op.getOperation();

                                   if (operation.equals("Do")) {

                                         if (arguments.size() > 0) {

                                               Object aux =
arguments.get(0);

                                               if (aux instanceof COSName) {

                                                     COSName objectName =
(COSName) aux;

                                                     PDXObject xobject =
(PDXObject) page.getResources().getXObjects().get(objectName.getName());

                                                     if (xobject instanceof
PDXObjectImage) {

                                                           allowNext =
false;

                                                     }

                                               }

                                         }

                                   }

                             }

                             arguments = new ArrayList();

                        } else {

                             arguments.add(next);

                        }

                        if (allowNext) {

                             tokensNovos.add(next);

                        }

                  }

 

                  PDPageContentStream contentStream = new
PDPageContentStream(this.pdf, newpage);

                  contentStream.beginText();

                  contentStream.endText();

                  contentStream.close();

                  PDStream updatedStream = newpage.getContents();

                  ContentStreamWriter tokenWriter = new
ContentStreamWriter(updatedStream.createOutputStream());

                  tokenWriter.writeTokens(tokensNovos);

                  newpage.setContents(updatedStream);

            } finally {

                  if (parser != null) {

                        parser.close();

                  }

            }

      }

RES: copy entire stream of a page ignoring images

Reply via email to