Hi, Gesendet: Mo, 07. Feb 2011 Von: Yogesh<[email protected]>
> Hello, > > I am trying to extract Text from PDFs, mostly scientific literature. > Average > number of pages the documents have is 10. > When I run the extraction code, I get text for only the 1st page. For the > rest, I get the following error > > Feb 7, 2011 5:18:13 PM org.apache.pdfbox.pdmodel.font.PDSimpleFont > extractToUnicodeEncoding > SEVERE: Error: Could not load embedded CMAP > The handle is invalid > > What might be wrong. Please help. Thanks What version of PDFBox are you using? Sounds like an issue which is already fixed in the current trunk. BR Andreas Lehmkühler

