Hi, Apache PDFBox can't help you here, I'm afraid. What you're after is OCR functionality (http://en.wikipedia.org/wiki/Optical_character_recognition) and PDFBox doesn't provide that. The only thing you can do is to extract the bitmap images using PDFBox and then attempt to decipher the text contained in them using an external OCR process. Just a warning: don't expect an OCR process to be 100% accurate.
If you're looking for an open source OCR engine, Tesseract is probably the most popular one: http://en.wikipedia.org/wiki/Tesseract_%28software%29 HTH Jeremias Maerki On 12.10.2012 15:47:40 Kishore Babu wrote: > Hi All, > Is it possible to extract text from an image (JPEG) using pdfbox or is there > any open source java code for this? > > When I try to convert pdf to text, it is showing blank output. Then I > converted into JPEG image. The image contains the text properly, which I am > failing to extract. > > For normal pdf documents I am extracting text nicely using the standard > process but when the pdf document is an image, I am failing to extract the > text that is present in the image. > > Can anyone give directions on this, please? > > Thanks in advance. > > Regards, > Kishore Babu I Developer