[ 
https://issues.apache.org/jira/browse/PDFBOX-533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758737#action_12758737
 ] 

Navendu Garg edited comment on PDFBOX-533 at 9/23/09 8:40 AM:
--------------------------------------------------------------

Mel,
Thanks for implementing writeCharacters method. It is going to save me a lot of 
time.

I tried to use PDFTextStripper2. However, it is giving me the following 
info/error messages:

INFO: unsupported/disabled operation: BDC
Sep 23, 2009 10:35:54 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: g
Sep 23, 2009 10:35:54 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Exception in thread "main" java.lang.ExceptionInInitializerError
        at 
org.apache.pdfbox.encoding.EncodingManager.<clinit>(EncodingManager.java:38)
        at org.apache.pdfbox.pdmodel.font.PDFont.getEncoding(PDFont.java:518)
        at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:438)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:343)
        at 
org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:66)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:516)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:229)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:188)
        at 
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:367)
        at 
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:291)
        at 
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:247)
        at 
org.apache.pdfbox.util.TestPDFTextStripperPerf.main(TestPDFTextStripperPerf.java:27)
Caused by: java.lang.NullPointerException
        at java.io.Reader.<init>(Reader.java:61)
        at java.io.InputStreamReader.<init>(InputStreamReader.java:55)
        at org.apache.pdfbox.encoding.Encoding.loadGlyphList(Encoding.java:98)
        at org.apache.pdfbox.encoding.Encoding.<clinit>(Encoding.java:58)
        ... 12 more


      was (Author: navendugarg):
    Mel,

I tried to use PDFTextStripper2. However, it is giving me the following 
info/error messages:

INFO: unsupported/disabled operation: BDC
Sep 23, 2009 10:35:54 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: g
Sep 23, 2009 10:35:54 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Exception in thread "main" java.lang.ExceptionInInitializerError
        at 
org.apache.pdfbox.encoding.EncodingManager.<clinit>(EncodingManager.java:38)
        at org.apache.pdfbox.pdmodel.font.PDFont.getEncoding(PDFont.java:518)
        at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:438)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:343)
        at 
org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:66)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:516)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:229)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:188)
        at 
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:367)
        at 
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:291)
        at 
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:247)
        at 
org.apache.pdfbox.util.TestPDFTextStripperPerf.main(TestPDFTextStripperPerf.java:27)
Caused by: java.lang.NullPointerException
        at java.io.Reader.<init>(Reader.java:61)
        at java.io.InputStreamReader.<init>(InputStreamReader.java:55)
        at org.apache.pdfbox.encoding.Encoding.loadGlyphList(Encoding.java:98)
        at org.apache.pdfbox.encoding.Encoding.<clinit>(Encoding.java:58)
        ... 12 more

  
> PDFTextStripper.writeCharacters is called no where in the class
> ---------------------------------------------------------------
>
>                 Key: PDFBOX-533
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-533
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: Navendu Garg
>         Attachments: TestPDFTextStripperPerf.java
>
>
> It seems writeCharacters method is not called anywhere in the PDFTextStripper 
> class. This makes it impossible for handling character TextPosition as well 
> as Line Separator because processLineSeparator method is no longer there and 
> writeLineSeparator is called when actual writing happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to