Text extraction from PDF file fails

Chris Bamford Mon, 20 Feb 2012 06:40:33 -0800

Hi there,

Please can someone explain to me why I cannot extract text from the attached 
PDF?
I use PDFBox 1.6.0 and when I run the following code I get 0 pages returned:


document = PDDocument.load(is);

PDFTextStripper stripper = new PDFTextStripper();

long pageCount = document.getDocumentCatalog().getAllPages().size();

for (int i = 1; i <= pageCount; i++) {
stripper.setStartPage( i );
stripper.setEndPage( i );
rtnBuffer.append(stripper.getText(document));
}

Is there something wrong with my code or is the document malformed?

Thanks,

- Chris


Chris Bamford
Software Engineer

2 - 8 Balfe Street
Kings Cross,
London, N1 9EG

mobile +44 7860 405292
tel: +44 (0) 207 843 2300
web www.mimecast.com


The information contained in this communication from [email protected] is 
confidential and may be legally privileged. It is intended solely for use by 
[email protected] and others authorized to receive it. If you are not 
[email protected] you are hereby notified that any disclosure, copying, 
distribution or taking action in reliance of the contents of this information 
is strictly prohibited and may be unlawful.


Mimecast Ltd. is a company registered in England and Wales with the company 
number 4698693 VAT No. GB 832 5179 29
Registered Office:2 - 8 Balfe Street, Kings Cross London, N1 9EG Email Address: 
[email protected]

This email message has been scanned for viruses by Mimecast.
Mimecast delivers a complete managed email solution from a single web based 
platform.
For more information please visit http://www.mimecast.com

<<inline: 112022014395400301.gif>>

<<inline: 112022014395400401.gif>>

Text extraction from PDF file fails

Reply via email to