Hi,
Am 05.11.2011 02:05, schrieb Enio Lopes:
Hello,
I'm using Pdfbox on a .NET project, and so far it has been really good for
reading the pdfs I've been using.
Now I need to identify different fonts that are present on the pdf, I saw that
there is a function called getFonts on the stripper instance, but when I use it
I get the following error:
EmptyStackException was unhandled
Here is the code: https://gist.github.com/1340915
Your code can't work, as the doc and the stripper instance aren't connected in
any way. However, you won't get the fonts you're looking for even if you
initialize the stripper using the doc.
As every page most likely has it own resources, you should implement something
like the following:
- load the document
- get all pages by calling doc.getDocumentCatalog().getAllPages()
- iterate over the list containing the pages
- retrieve the resources from every page using page.getResources()
- get all fonts by calling resources.getFonts()
You may have a look at the command line tool ExtractImages [1] which works
similar except that it extracts all images instead of all fonts.
Thank you.
BR
Andreas Lehmkühler
[1]
http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/ExtractImages.java