Hi,

Am 01.01.2011 03:18, schrieb Alan Thomas:
               I am trying to understand the PDFTextStripper class.



               Here is where I get lost: Its processPage method calls
PDFStreamEngine`s processStream method, which calls the processSubStream
method of PDFStreamEngine.  The processSubStream method calls the
processOperator method, which uses the process method (among others) of the
OperatorProcessor class.  However, the OperatorProcessor class is abstract,
and the process method is defined as an abstract class.  I cannot find where
this abstract class is subclassed.



               Can anyone point me in the right direction?
You can find all supported operators within the packages

org.apache.pdfbox.util.operator
org.apache.pdfbox.util.operator.pagedrawer (only needed for rendering)

The property file PDFTextStripper.properties [1] lists all operators which are needed for text extraction and PageDrawer.properties [2] all which are needed for rendering.

BR
Andreas Lehmkühler

[1] http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/resources/org/apache/pdfbox/resources/PDFTextStripper.properties [2] http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/resources/org/apache/pdfbox/resources/PageDrawer.properties

Reply via email to