Hi Yogesh,

First and foremost, the image samples that you are using are not good 
enough to extract all the data.

Then, coming to your problem statement, Are you modifying the images before 
providing it to tesseract?

If no:

1. You need to rotate the images before providing them to the engine. You 
can get the rotation information by using the *PSM mode "0"*. It creates an 
"*.osd*" which has the information about the rotation. 
2. *Deskew* the image to align the text angle properly.
3. Convert the image to *greyscaled* one and play with the image contrast 
that the text is more visible. (Optional: you can invert the colors too. 
Ex: Convert text to white color and background to black)
4. Increase the *DPI *of the image as it can increase the accuracy of the 
detected text.

Note: If the image quality is high, it will also increase the accuracy of 
the detected text.

Furthermore, you can read about the Page Segmentation Modes (PSM) and 
Optical Engine Modes (OEM) modes in the official documentation. They can 
help you a lot too.

Also, if you can, you can test Google Cloud Vision too. The accuracy is way 
more than tesseract. Although it's a paid API but you can create a free 
account and each month you can OCR up to 1000 pages for free of cost. After 
that you will be charged but it's affordable. And upon signing up for free 
account you will get 300 dollars for an year from Google itself. 

Regards
Lakshay Saini

On Thursday, May 28, 2020 at 11:20:14 AM UTC+5:30, YOGESH KUMBHARE wrote:
>
> Hi Team,
>
> I am planning to used tesseract OCR engine to rendering the image 
> extraction data library ...
> but some image not able to extract the data in proper formate, what is the 
> solution for that.
> how to resolve that? 
> Please, guys, anyone can help me with those images what should I have to 
> do, any config is needed for that in tesseract OCR library.
>
> Please let me know as soon as possible.
>
> sample code ...
>
> public class Test {
>
>     public static void main(String[] args) {
>
>         try {
>             File imageFile = new File("Sample1_3.png");
>
>             ITesseract instance = new Tesseract(); // JNA Interface Mapping
>             System.out.print(imageFile.canRead());
>
>             instance.setDatapath("tessdata");
>             instance.setTessVariable("user_defined_dpi", "300");
>             instance.setLanguage("eng");
>             //instance.setDatapath(tessDataFolder.getPath());;
>             String text = instance.doOCR(imageFile);
>             // path of your image file
>             
>         } catch (TesseractException e) {
>             e.printStackTrace();
>
>         }
>     }
> }
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0fd6af55-f23d-4573-afd9-aeb2e646f43f%40googlegroups.com.

Reply via email to