[tesseract-ocr] OCR not behaving well on clean image | tessaract

boyapally srikanth Tue, 10 May 2022 05:12:59 -0700

 <https://stackoverflow.com/posts/72185956/timeline>


I have been working on project which involves extracting text from an 
image. I have researched that tesseract is one of the best libraries 
available and I decided to use the same along with opencv. Opencv is needed 
for image manipulation.

I have been playing a lot with tessaract engine and it does not seems to be 
giving the expected results to me. I have attached the sample image as an 
reference. Output I got is:

1] =501 [

Instead, expected output is

TM10-50%L

What I have done so far:

   - Remove noise
   - Adaptive threshold
   - Sending it tesseract ocr engine

Are there any other suggestions to improve the algorithm?

Thanks in advance.

Snippet of the code:
import cv2 
import sys 
import pytesseract
 import numpy as np
 from PIL import Image
 if __name__ == '__main__': i
     f len(sys.argv) < 2: 
          print('Usage: python ocr_simple.py image.jpg')
          sys.exit(1) 
     # Read image path from command line
     imPath = sys.argv[1]
     gray = cv2.imread(imPath, 0)
     # Blur
      blur = cv2.GaussianBlur(gray,(9,9), 0)
     # Binarizing thres = cv2.adaptiveThreshold(blur, 255,   
cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 5, 3)
     text = pytesseract.image_to_string(thresh)
     print(text) 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/73c2c2e1-431b-4343-9bb8-091286065159n%40googlegroups.com.

[tesseract-ocr] OCR not behaving well on clean image | tessaract

Reply via email to