Hi,
Am 29.12.2010 03:33, schrieb Alan Thomas:
I used the ReplaceString example that comes with PDFBox on a PDF file I
have. However, it does not find the text I want to replace.
In looking at the code and putting in some debugging statements, I found
out that the code was looking for a "PDFOperator" operation
Correct.
(from the getOperation() method) of "Tj" and "TJ". However, my PDF file has
neither.
Question: Where can I find the list of all the operators that display
strings in a PDF file? (Or is there an easier way to search and replace
strings?)
Textcontent may be defined in different ways within pdfs. In most cases text
will be splitted into several chunks. They often consist of one or more
characters, but not necessarily whole words or lines of text. Consequently one
has to combine all these text chunks to identify the given text. The
PDFTextStripper class [1] works like that.
Have a look at the PDF reference at [2] section 9.3 "Text State Parameters and
Operators" for further information.
BR
Andreas Lehmkühler
[1]
http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/PDFTextStripper.java
[2] http://www.adobe.com/devnet/pdf/pdf_reference.html