Hello

See perhaps this answer to a similar question:

http://www.mail-archive.com/[email protected]/msg01812.html

Regards,
Patrick

Stephen Haggai wrote:
Hello,

I too tried to extract text from a PDF file but I keep getting these
errors though the text seems to be fully extracted (not verified
though).

My code:

import java.io.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.util.*;

public class PDFTest {

 public static void main(String[] args){
 PDDocument pd;
 BufferedWriter wr;
 try {
         File input = new File("C:\\invoice.pdf");
         File output = new File("C:\\SampleText.txt");
         pd = PDDocument.load(input);
         System.out.println(pd.getNumberOfPages());
         System.out.println(pd.isEncrypted());
         //pd.save("new.pdf");
         PDFTextStripper stripper = new PDFTextStripper();
         //String text = stripper.getText(pd);
         wr = new BufferedWriter(new OutputStreamWriter(new
FileOutputStream(output)));
         stripper.writeText(pd, wr);
         //System.out.println(text);
         if (pd != null) {
             pd.close();
         }
 } catch (Exception e){
         e.printStackTrace();
                }
        }
}

The "error' or message that I get is

--------------------Configuration: <Default>--------------------
5
false
20/11/2009 2:17:24 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: g
20/11/2009 2:17:24 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: rg
20/11/2009 2:17:24 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: RG
20/11/2009 2:17:24 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: n
20/11/2009 2:17:24 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: re
20/11/2009 2:17:24 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: W
20/11/2009 2:17:24 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BI
20/11/2009 2:17:24 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EI
20/11/2009 2:17:26 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: m
20/11/2009 2:17:26 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: l
20/11/2009 2:17:26 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: h
20/11/2009 2:17:26 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: S

Process completed.

Is my code wrong somewhere?

Thanks,
Stephen

Reply via email to