Hello,
I too tried to extract text from a PDF file but I keep getting these
errors though the text seems to be fully extracted (not verified
though).
My code:
import java.io.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.util.*;
public class PDFTest {
public static void main(String[] args){
PDDocument pd;
BufferedWriter wr;
try {
File input = new File("C:\\invoice.pdf");
File output = new File("C:\\SampleText.txt");
pd = PDDocument.load(input);
System.out.println(pd.getNumberOfPages());
System.out.println(pd.isEncrypted());
//pd.save("new.pdf");
PDFTextStripper stripper = new PDFTextStripper();
//String text = stripper.getText(pd);
wr = new BufferedWriter(new OutputStreamWriter(new
FileOutputStream(output)));
stripper.writeText(pd, wr);
//System.out.println(text);
if (pd != null) {
pd.close();
}
} catch (Exception e){
e.printStackTrace();
}
}
}
The "error' or message that I get is
--------------------Configuration: <Default>--------------------
5
false
20/11/2009 2:17:24 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: g
20/11/2009 2:17:24 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: rg
20/11/2009 2:17:24 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: RG
20/11/2009 2:17:24 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: n
20/11/2009 2:17:24 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: re
20/11/2009 2:17:24 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: W
20/11/2009 2:17:24 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BI
20/11/2009 2:17:24 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EI
20/11/2009 2:17:26 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: m
20/11/2009 2:17:26 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: l
20/11/2009 2:17:26 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: h
20/11/2009 2:17:26 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: S
Process completed.
Is my code wrong somewhere?
Thanks,
Stephen