Hi Richard,
First of all I would recommend you to visualize each step of your process
(Threshold, Median ...).
I use matplotlib for that propuse you can find an example bellow.
The problem is not Tesseract but your preprocess, you shouldn't do a blur
to a text that small and for the Threshold you can just use
cv2.THRESH_BINARY.
import cv2
import pytesseract
from loguru import logger
from matplotlib import pyplot as plt
image = cv2.imread("test.png")
x = 375
y = 0
h = 40
w = 160
# IN GREY
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
plt.figure(figsize = (40,40))
plt.imshow(image, cmap = "gray")
plt.title('image')
plt.show()
# THRESH
image = cv2.threshold(image, 200, 255, cv2.THRESH_BINARY)[1]
plt.figure(figsize = (40,40))
plt.imshow(image, cmap = "gray")
plt.title('image')
plt.show()
image = image[y:y+h, x:x+w]
plt.figure(figsize = (40,40))
plt.imshow(image, cmap = "gray")
plt.title('image')
plt.show()
cv2.imwrite("output.png", image)
for arg in range(1, 14):
try:
text = pytesseract.image_to_string(image, config=f"-l eng --oem 1
--psm {arg}")
logger.info(f"Parsed text [{text}] with [{arg}]")
except Exception as exception:
logger.exception(exception)
On Monday, January 13, 2020 at 4:18:40 PM UTC+1, Richard wrote:
>
> Hello,
>
> I'm starting a software that would analyze eSport matchs, mining
> statistics from screenshots (i.e. video).
> I've started simple, I'm trying to extract team names.
>
> I've tried to crop/gray but it's not efficient (i.e. text doesn't match).
>
> Any tip to improve text recognition ?
>
> Code sample used:
>
> import cv2
> import pytesseract
> from loguru import logger
>
>
> image = cv2.imread("test.png")
>
> x = 375
> y = 0
> h = 40
> w = 160
>
> # IN GREY
> image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
>
> # BLUR
> image = cv2.medianBlur(image, 3)
>
> # THRESH
> #image = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
>
> #image = image[y:y+h, x:x+w]
> cv2.imwrite("output.png", image)
>
> for arg in range(1, 14):
> try:
> text = pytesseract.image_to_string(image, config=f"-l eng --oem 1
> --psm {arg}")
> logger.info(f"Parsed text [{text}] with [{arg}]")
> except Exception as exception:
> logger.exception(exception)
>
>
>
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/be4deaf4-6932-437f-b889-e7854ec40cc5%40googlegroups.com.