Dear Hesham, Thank you very much for your response!
The purpose of my question is: I need to find out if all fonts used inside the PDF are embedded. But if a PDF only contains images and no text, I don't need to check for embedded fonts. At the moment I'm doing that: public boolean containsText(String pdfFile) throws IOException { PDDocument document = PDDocument.load(pdfFile); PDFTextStripper stripper = new PDFTextStripper(); String text = stripper.getText(document); if(text != null && text.length() > 0) { return true; } else { return false; } } But if the document is very large, this method can take a while. As soon some text is found I could already return true. But I couldn't figure out how to do that. Best Regards, Andreas -----Ursprüngliche Nachricht----- Von: Hesham G. [mailto:heshamgne...@gmail.com] Gesendet: Donnerstag, 4. Februar 2010 07:47 An: users@pdfbox.apache.org Betreff: Re: PDF contains any text? I remember there was somehow in PDFBox to read some resources from the PDF and skip others, I don't remember how but I think there's some way to skip parsing images in the PDF. Best regards , Hesham -------------------------------------------------- From: "Erik Scholtz, ArgonSoft GmbH" <escho...@argonsoft.de> Sent: Wednesday, February 03, 2010 6:03 PM To: <users@pdfbox.apache.org> Subject: Re: PDF contains any text? > Andreas, > > without parsing the content of a document and telling about its > contents > sounds to me like you are looking for the PDDocument.oracle_of_delphi() > method :) > > But to answer your question: No - you have to look at the resources of > each page whether there are text-resources or not, to find out about that. > There is no "central resource_available dictionary" in PDF. > > > Best regards, > Erik > > Roeder, Andreas wrote: >> Hi, >> >> Is there a way to find out if a PDF contains any text without parsing >> the >> whole document? >> Some PDF contain just images. >> >> Best Regards, >> >> Andreas >> >