Hi,
Am 01.01.2011 03:18, schrieb Alan Thomas:
I am trying to understand the PDFTextStripper class.
Here is where I get lost: Its processPage method calls
PDFStreamEngine`s processStream method, which calls the processSubStream
method of PDFStreamEngine. The processSubStream method calls the
processOperator method, which uses the process method (among others) of the
OperatorProcessor class. However, the OperatorProcessor class is abstract,
and the process method is defined as an abstract class. I cannot find where
this abstract class is subclassed.
Can anyone point me in the right direction?
You can find all supported operators within the packages
org.apache.pdfbox.util.operator
org.apache.pdfbox.util.operator.pagedrawer (only needed for rendering)
The property file PDFTextStripper.properties [1] lists all operators which are
needed for text extraction and PageDrawer.properties [2] all which are needed
for rendering.
BR
Andreas Lehmkühler
[1]
http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/resources/org/apache/pdfbox/resources/PDFTextStripper.properties
[2]
http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/resources/org/apache/pdfbox/resources/PageDrawer.properties