Hi Yogesh, First and foremost, the image samples that you are using are not good enough to extract all the data.
Then, coming to your problem statement, Are you modifying the images before providing it to tesseract? If no: 1. You need to rotate the images before providing them to the engine. You can get the rotation information by using the *PSM mode "0"*. It creates an "*.osd*" which has the information about the rotation. 2. *Deskew* the image to align the text angle properly. 3. Convert the image to *greyscaled* one and play with the image contrast that the text is more visible. (Optional: you can invert the colors too. Ex: Convert text to white color and background to black) 4. Increase the *DPI *of the image as it can increase the accuracy of the detected text. Note: If the image quality is high, it will also increase the accuracy of the detected text. Furthermore, you can read about the Page Segmentation Modes (PSM) and Optical Engine Modes (OEM) modes in the official documentation. They can help you a lot too. Also, if you can, you can test Google Cloud Vision too. The accuracy is way more than tesseract. Although it's a paid API but you can create a free account and each month you can OCR up to 1000 pages for free of cost. After that you will be charged but it's affordable. And upon signing up for free account you will get 300 dollars for an year from Google itself. Regards Lakshay Saini On Thursday, May 28, 2020 at 11:20:14 AM UTC+5:30, YOGESH KUMBHARE wrote: > > Hi Team, > > I am planning to used tesseract OCR engine to rendering the image > extraction data library ... > but some image not able to extract the data in proper formate, what is the > solution for that. > how to resolve that? > Please, guys, anyone can help me with those images what should I have to > do, any config is needed for that in tesseract OCR library. > > Please let me know as soon as possible. > > sample code ... > > public class Test { > > public static void main(String[] args) { > > try { > File imageFile = new File("Sample1_3.png"); > > ITesseract instance = new Tesseract(); // JNA Interface Mapping > System.out.print(imageFile.canRead()); > > instance.setDatapath("tessdata"); > instance.setTessVariable("user_defined_dpi", "300"); > instance.setLanguage("eng"); > //instance.setDatapath(tessDataFolder.getPath());; > String text = instance.doOCR(imageFile); > // path of your image file > > } catch (TesseractException e) { > e.printStackTrace(); > > } > } > } > > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0fd6af55-f23d-4573-afd9-aeb2e646f43f%40googlegroups.com.