Hello, I am trying to extract Text from PDFs, mostly scientific literature. Average number of pages the documents have is 10. When I run the extraction code, I get text for only the 1st page. For the rest, I get the following error
Feb 7, 2011 5:18:13 PM org.apache.pdfbox.pdmodel.font.PDSimpleFont extractToUnicodeEncoding SEVERE: Error: Could not load embedded CMAP The handle is invalid What might be wrong. Please help. Thanks -Yogesh

