Re: [tesseract-ocr] bad quality!?

Cyrus Yip Mon, 03 Jan 2022 18:06:36 -0800

i tried learning some opencv and doing the mask thing:
boxes = [
    (45, 0, 245, im.height),
    (320, 0, 515, im.height),
    (600, 0, 785, im.height),
]
if im.width > 1000:
    boxes.append(
       (865, 0, 1065, im.height)
    )
mask = np.zeros(data.shape[:2], np.uint8)


for box in boxes:
    cv2.rectangle(mask, (box[0], box[1]), (box[2], box[3]), 255, -1)

mask2 = np.zeros(data.shape[:2], np.uint8)
boxes = [
    (0, 58, im.width, 110),
    (0, 312, im.width, 360)
]
for box in boxes:
    cv2.rectangle(mask2, (box[0], box[1]), (box[2], box[3]), 255, -1)

mask = cv2.bitwise_and(mask, mask2)

image_final = cv2.bitwise_and(data, data, mask=mask)
image_final = cv2.threshold(cv2.cvtColor(image_final, cv2.COLOR_BGR2GRAY),
0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

mask1 = np.zeros((image_final.shape[0] + 2, image_final.shape[1] + 2), 
np.uint8)
cv2.floodFill(image_final, mask1, (0, 0), 255)
the results aren't that good and i don't know if this is a good way to make 
a mask.
On Monday, January 3, 2022 at 5:07:00 PM UTC-8 Cyrus Yip wrote:

> for this image
> [image: drop12.png]
> it still fails to get the text from the bottom right
> cards:
> ['MasumiMushishiZokuShou', 'TamaoHino*Eyeshield21', 
> "DiegoBrando~'sBizarreAdi:tocolBalRan", '']
>
> On Monday, January 3, 2022 at 10:50:42 AM UTC-8 zdenop wrote:
>
>> increase parameter in getStructuringElement from 4 to 5 when creating 
>> mask:
>>
>> kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9, 5))
>>
>>
>> Zdenko
>>
>>
>> po 3. 1. 2022 o 0:08 Cyrus Yip <[email protected]> napísal(a):
>>
>>> Ok, I will look into how to do that. But do you have an idea why some of 
>>> the letters go missing?
>>>
>>> On Sunday, January 2, 2022 at 1:10:45 PM UTC-8 zdenop wrote:
>>>
>>>> All images you presented have the same size and the text is always in 
>>>> the same regions.
>>>> So you can create a mask for these regions and apply it to the 
>>>> thresholded input images. This could give you extra speed as you do not 
>>>> need to create a mask for each image individually...
>>>>
>>>> Zdenko
>>>>
>>>>
>>>> ne 2. 1. 2022 o 21:01 Cyrus Yip <[email protected]> napísal(a):
>>>>
>>>>> I tried the opencv version, but it fails with images like this:
>>>>> [image: drop12.png][image: hi.png]
>>>>>
>>>>> On Saturday, January 1, 2022 at 12:29:34 PM UTC-8 zdenop wrote:
>>>>>
>>>>>> And here is opencv2 version with IMO better quality:
>>>>>>
>>>>>>
>>>>>> import cv2
>>>>>> data = cv2.imread("mina.png")
>>>>>> mask_text = cv2.inRange(data, (51, 51, 51), (51, 51, 51))
>>>>>>
>>>>>> # Morph open to remove noise
>>>>>> kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
>>>>>> morph = cv2.morphologyEx(mask_text, cv2.MORPH_OPEN, kernel, 
>>>>>> iterations=1)
>>>>>>
>>>>>> kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9, 4))
>>>>>> dilate = cv2.dilate(morph, kernel, iterations=4)
>>>>>>
>>>>>> tresh = cv2.threshold(cv2.cvtColor(data, cv2.COLOR_BGR2GRAY),
>>>>>>                       0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
>>>>>> image_final = cv2.bitwise_and(tresh, tresh, mask=dilate)
>>>>>> # replace background with white
>>>>>> mask1 = np.zeros(( image_final.shape[0] + 2,  image_final.shape[1] + 
>>>>>> 2), np.uint8)
>>>>>> cv2.floodFill(image_final, mask1, (0, 0), 255)
>>>>>>
>>>>>> display(Image.fromarray(image_final))
>>>>>>
>>>>>>
>>>>>> [image: image.png]
>>>>>>
>>>>>>
>>>>>> Zdenko
>>>>>>
>>>>>>
>>>>>> so 1. 1. 2022 o 20:40 Zdenko Podobny <[email protected]> napísal(a):
>>>>>>
>>>>>>> What is your code? Does it work on your local computer?
>>>>>>>
>>>>>>> BTW: here is proven numpy code:
>>>>>>>
>>>>>>> filter_colors = [(51, 51, 51), (69, 69, 65), (65, 64, 60), (59, 58, 
>>>>>>> 56), (67, 66, 62),
>>>>>>>           (67, 67, 63), (67, 67, 62), (53, 53, 53), (54, 54, 53), 
>>>>>>> (61, 61, 58),
>>>>>>>           (62, 62, 60), (55, 55, 54), (59, 59, 57), (56, 56, 55)]
>>>>>>>
>>>>>>> image = np.array(Image.open('mina.png').convert("RGB"))
>>>>>>>
>>>>>>> *A, B = image.shape
>>>>>>> mask = (image.reshape((-1,B)) == 
>>>>>>> np.array(filter_colors)[:,None]).all(-1).any(0).reshape(A)
>>>>>>> img = Image.fromarray(~mask)
>>>>>>>
>>>>>>>
>>>>>>> Zdenko
>>>>>>>
>>>>>>>
>>>>>>> so 1. 1. 2022 o 19:49 Cyrus Yip <[email protected]> napísal(a):
>>>>>>>
>>>>>>>> i managed to install tesseract 5, but the numpy mask doesn't work 
>>>>>>>> now.
>>>>>>>> it makes pictures like:
>>>>>>>> [image: image.png]
>>>>>>>> not:
>>>>>>>> [image: image.png]
>>>>>>>>
>>>>>>>>
>>>>>>>> Dockerfile:
>>>>>>>> # syntax=docker/dockerfile:1 ARG TOKEN FROM ubuntu:18.04 RUN 
>>>>>>>> apt-get update RUN apt-get install -y software-properties-common 
>>>>>>>> RUN apt-get install -y python3.8 RUN apt-get install -y python3-pip 
>>>>>>>> RUN apt-get update RUN apt-get install -y build-essential RUN 
>>>>>>>> apt-get install -y python3-pil COPY requirements.txt 
>>>>>>>> requirements.txt RUN pip3 install -r requirements.txt RUN apt-get 
>>>>>>>> update RUN add-apt-repository ppa:alex-p/tesseract-ocr5 RUN 
>>>>>>>> apt-get update RUN apt-get install -y tesseract-ocr COPY . . CMD 
>>>>>>>> ["python3", "bot.py"]
>>>>>>>>
>>>>>>>> On Friday, December 31, 2021 at 10:29:59 AM UTC-8 Cyrus Yip wrote:
>>>>>>>>
>>>>>>>>> better link? 
>>>>>>>>> <https://www.toptal.com/developers/hastebin/nonepalihe>
>>>>>>>>>
>>>>>>>>> On Friday, December 31, 2021 at 10:27:41 AM UTC-8 Cyrus Yip wrote:
>>>>>>>>>
>>>>>>>>>> Right now I'm installing tesseract 4 in docker with 
>>>>>>>>>> RUN apt-get install -y tesseract-ocr
>>>>>>>>>> That might be a reason why it's way slower than on my computer, 
>>>>>>>>>> how can I install tesseract 5?
>>>>>>>>>>
>>>>>>>>>> Dockerfile # syntax=docker/dockerfile:1
>>>>>>>>>>
>>>>>>>>>> ARG TOKEN
>>>>>>>>>>
>>>>>>>>>> FROM python:3.8-slim-buster
>>>>>>>>>>
>>>>>>>>>> RUN apt-get update
>>>>>>>>>> RUN apt-get install -y software-properties-common
>>>>>>>>>> RUN apt-get update
>>>>>>>>>> RUN add-apt-repository ppa:alex-p/tesseract-ocr-devel
>>>>>>>>>>
>>>>>>>>>> RUN apt-get update
>>>>>>>>>> RUN apt-get install -y build-essential
>>>>>>>>>>
>>>>>>>>>> COPY requirements.txt requirements.txt
>>>>>>>>>> RUN pip3 install -r requirements.txt
>>>>>>>>>>
>>>>>>>>>> COPY . .
>>>>>>>>>>
>>>>>>>>>> RUN apt-get install -y tesseract
>>>>>>>>>>
>>>>>>>>>> CMD ["python3", "bot.py"]
>>>>>>>>>>
>>>>>>>>>> Build logs 
>>>>>>>>>> <https://appbuild-logs-ams3.ams3.digitaloceanspaces.com/a7609af2-64e1-4ba2-8555-87a4fac8a37f/9420eaef-131e-410f-8add-bbfb870b2693/981a4c35-45d7-41b5-8619-3d9125d60c25/build.log?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=2JPIHVK4OTM6S5VRFBCK%2F20211231%2Fams3%2Fs3%2Faws4_request&X-Amz-Date=20211231T182608Z&X-Amz-Expires=900&X-Amz-SignedHeaders=host&X-Amz-Signature=3ae248ce9fb9e6fef0c71955d9cd9496feb8311162bdda8921750a21544f79a6>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Friday, December 31, 2021 at 3:18:18 AM UTC-8 zdenop wrote:
>>>>>>>>>>
>>>>>>>>>>> You are right -  np.isin is working another way than I expected 
>>>>>>>>>>> (it does not match tuples, but individual values at tuples) and by 
>>>>>>>>>>> coincidence, it produces similar results as your code.
>>>>>>>>>>>
>>>>>>>>>>> Here is updated code that produces the same result as PIL. It is 
>>>>>>>>>>> faster but with an increasing number of colors in  filter_colors, 
>>>>>>>>>>> it will 
>>>>>>>>>>> be slower.
>>>>>>>>>>>
>>>>>>>>>>> filter_colors = [(51, 51, 51), (69, 69, 65), (65, 64, 60), (59, 
>>>>>>>>>>> 58, 56), (67, 66, 62),
>>>>>>>>>>>           (67, 67, 63), (67, 67, 62), (53, 53, 53), (54, 54, 
>>>>>>>>>>> 53), (61, 61, 58),
>>>>>>>>>>>           (62, 62, 60), (55, 55, 54), (59, 59, 57), (56, 56, 55)]
>>>>>>>>>>>
>>>>>>>>>>> image = np.array(Image.open('mai.png').convert("RGB"))
>>>>>>>>>>> mask = np.array([], dtype=bool)
>>>>>>>>>>> for color in filter_colors:
>>>>>>>>>>>     if mask.size == 0:
>>>>>>>>>>>         mask = (image == color).all(-1)
>>>>>>>>>>>     else:
>>>>>>>>>>>         mask = mask | (image == color).all(-1)
>>>>>>>>>>> img = Image.fromarray(~mask)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Zdenko
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> pi 31. 12. 2021 o 1:45 Cyrus Yip <[email protected]> 
>>>>>>>>>>> napísal(a):
>>>>>>>>>>>
>>>>>>>>>>>> For some reason, using the numpy array has a different result 
>>>>>>>>>>>> than mine.
>>>>>>>>>>>>
>>>>>>>>>>>> Numpy array:
>>>>>>>>>>>>
>>>>>>>>>>>> [image: hi.png]
>>>>>>>>>>>> Loop through pixels:
>>>>>>>>>>>> [image: hi.png]
>>>>>>>>>>>> The second was is more accurate but way slower.
>>>>>>>>>>>> On Thursday, December 30, 2021 at 11:43:01 AM UTC-8 zdenop 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> try this:
>>>>>>>>>>>>>
>>>>>>>>>>>>> import numpy as np
>>>>>>>>>>>>> from PIL import Image
>>>>>>>>>>>>>
>>>>>>>>>>>>> filter_colors = [(51, 51, 51), (69, 69, 65), (65, 64, 60), 
>>>>>>>>>>>>> (59, 58, 56), (67, 66, 62),
>>>>>>>>>>>>>
>>>>>>>>>>>>>           (67, 67, 63), (67, 67, 62), (53, 53, 53), (54, 54, 
>>>>>>>>>>>>> 53), (61, 61, 58),
>>>>>>>>>>>>>           (62, 62, 60), (55, 55, 54), (59, 59, 57), (56, 56, 
>>>>>>>>>>>>> 55)]
>>>>>>>>>>>>> image = np.array(Image.open('mai.png').convert("RGB"))
>>>>>>>>>>>>> mask = np.isin(image, filter_colors, invert=True)
>>>>>>>>>>>>> img = Image.fromarray(mask.any(axis=2))
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Zdenko
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> št 30. 12. 2021 o 18:14 Cyrus Yip <[email protected]> 
>>>>>>>>>>>>> napísal(a):
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I also tried many things like cropping, colour changing, 
>>>>>>>>>>>>>> colour replacing, and mixing them together.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I landed on checking if a pixel is not one of these:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [(51, 51, 51), (69, 69, 65), (65, 64, 60), (59, 58, 56), (67, 
>>>>>>>>>>>>>> 66, 62), (67, 67, 63), (67, 67, 62), (53, 53, 53), (54, 54, 53), 
>>>>>>>>>>>>>> (61, 61, 
>>>>>>>>>>>>>> 58), (62, 62, 60), (55, 55, 54), (59, 59, 57), (56, 56, 55)]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> colours, replace it with white. It is pretty accurate but is 
>>>>>>>>>>>>>> there a way to do this with numpy arrays?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (code)
>>>>>>>>>>>>>> for x in range(im.width):
>>>>>>>>>>>>>>     if pixels[x, y] not in [(51, 51, 51), (69, 69, 65), (65, 
>>>>>>>>>>>>>> 64, 60), (59, 58, 56), (67, 66, 62), (67, 67, 63), (67, 67, 62), 
>>>>>>>>>>>>>> (53, 53, 
>>>>>>>>>>>>>> 53), (54, 54, 53), (61, 61, 58), (62, 62, 60), (55, 55, 54), 
>>>>>>>>>>>>>> (59, 59, 57), 
>>>>>>>>>>>>>> (56, 56, 55)]:
>>>>>>>>>>>>>>         pixels[x, y] = (255, 255, 255)
>>>>>>>>>>>>>> On Thursday, December 30, 2021 at 8:46:51 AM UTC-8 zdenop 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> OK. I played a little bit ;-):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I tested the speed of your code with your image:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> import timeit
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> pil_color_replace = """
>>>>>>>>>>>>>>> from PIL import Image
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> im = Image.open('mai.png').convert("RGB")
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> pixdata = im.load()
>>>>>>>>>>>>>>> for y in range(im.height):
>>>>>>>>>>>>>>>     for x in range(im.width):
>>>>>>>>>>>>>>>         if pixdata[x, y] != (51, 51, 51):
>>>>>>>>>>>>>>>             pixdata[x, y] = (255, 255, 255)
>>>>>>>>>>>>>>> """
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> elapsed_time = timeit.timeit(pil_color_replace, 
>>>>>>>>>>>>>>> number=100)/100
>>>>>>>>>>>>>>> print(f"duration: {elapsed_time:.4} seconds")
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I got an average speed 0.08547 seconds on my computer.
>>>>>>>>>>>>>>> On internet I found the suggestion to use numpy for this and 
>>>>>>>>>>>>>>> I finished with the following code:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> np_color_replace_rgb = """
>>>>>>>>>>>>>>> import numpy as np
>>>>>>>>>>>>>>> from PIL import Image
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> data = np.array(Image.open('mai.png').convert("RGB"))
>>>>>>>>>>>>>>> mask = (data == [51, 51, 51]).all(-1)
>>>>>>>>>>>>>>> img = Image.fromarray(np.invert(mask)) 
>>>>>>>>>>>>>>> """
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> elapsed_time = timeit.timeit(np_color_replace_rgb, 
>>>>>>>>>>>>>>> number=100)/100
>>>>>>>>>>>>>>> print(f"duration: {elapsed_time:.4} seconds")
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I got an average speed 0.01774 seconds e.g. 4.8 faster than 
>>>>>>>>>>>>>>> the PIL code.
>>>>>>>>>>>>>>> It is a little bit cheating as it does not replace colors - 
>>>>>>>>>>>>>>> just take a mask of target color and return it as a binarized 
>>>>>>>>>>>>>>> image, what 
>>>>>>>>>>>>>>> is exactly what you need for OCR ;-)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also, I would like to point out that the result OCR output 
>>>>>>>>>>>>>>> is not so perfect (compared to OCR of unmodified text areas), 
>>>>>>>>>>>>>>> as this kind 
>>>>>>>>>>>>>>> of binarization is very simple.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Zdenko
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> št 30. 12. 2021 o 11:19 Zdenko Podobny <[email protected]> 
>>>>>>>>>>>>>>> napísal(a):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Just made your tests ;-)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You can use tesserocr (maybe quite difficult installation 
>>>>>>>>>>>>>>>> if you are on windows) instead of pytesseract (e.g. initialize 
>>>>>>>>>>>>>>>> tesseract 
>>>>>>>>>>>>>>>> API once and use is multiple times). But it does not provide 
>>>>>>>>>>>>>>>> DICT output.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Zdenko
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> st 29. 12. 2021 o 21:18 Cyrus Yip <[email protected]> 
>>>>>>>>>>>>>>>> napísal(a):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> but won't multiple ocr's and crops use a lot of time?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wednesday, December 29, 2021 at 10:15:26 AM UTC-8 
>>>>>>>>>>>>>>>>> zdenop wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> IMO if the text is always in the same area, cropping and 
>>>>>>>>>>>>>>>>>> OCR just that area will be faster.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Zdenko
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> st 29. 12. 2021 o 18:58 Cyrus Yip <[email protected]> 
>>>>>>>>>>>>>>>>>> napísal(a):
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I played around a bit and replacing all colours except 
>>>>>>>>>>>>>>>>>>> for text colour and it works pretty well!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The only thing is replacing colours with:
>>>>>>>>>>>>>>>>>>> im = im.convert("RGB")
>>>>>>>>>>>>>>>>>>> pixdata = im.load()
>>>>>>>>>>>>>>>>>>> for y in range(im.height):
>>>>>>>>>>>>>>>>>>>     for x in range(im.width):
>>>>>>>>>>>>>>>>>>>         if pixdata[x, y] != (51, 51, 51):
>>>>>>>>>>>>>>>>>>>             pixdata[x, y] = (255, 255, 255)
>>>>>>>>>>>>>>>>>>> is a bit slow. Do you know a better way to replace 
>>>>>>>>>>>>>>>>>>> pixels in python? I don't know if this is off topic.
>>>>>>>>>>>>>>>>>>> On Wednesday, December 29, 2021 at 9:46:13 AM UTC-8 
>>>>>>>>>>>>>>>>>>> zdenop wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> If you properly crop text areas you get good output. 
>>>>>>>>>>>>>>>>>>>> E.g.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> [image: r_cropped.png]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> > tesseract r_cropped.png - --dpi 300
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Rascal Does Not Dream
>>>>>>>>>>>>>>>>>>>> of Bunny Girl Senpai
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Zdenko
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> st 29. 12. 2021 o 18:21 Cyrus Yip <[email protected]> 
>>>>>>>>>>>>>>>>>>>> napísal(a):
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> here is an example of an image i would like to use ocr 
>>>>>>>>>>>>>>>>>>>>> on:
>>>>>>>>>>>>>>>>>>>>> [image: drop8.png]
>>>>>>>>>>>>>>>>>>>>> I would like the results to be like:
>>>>>>>>>>>>>>>>>>>>> ["Naruto Uzumaki Naruto", "Mai Sakurajima Rascal Does 
>>>>>>>>>>>>>>>>>>>>> Not Dream of Bunny Girl Senpai", "Keqing Genshin Impact"]
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Right now I'm using
>>>>>>>>>>>>>>>>>>>>> region1 = im.crop((0, 55, im.width, 110))
>>>>>>>>>>>>>>>>>>>>> region2 = im.crop((0, 312, im.width, 360))
>>>>>>>>>>>>>>>>>>>>> image = Image.new("RGB", (im.width, region1.height + 
>>>>>>>>>>>>>>>>>>>>> region2.height + 20))
>>>>>>>>>>>>>>>>>>>>> image.paste(region1)
>>>>>>>>>>>>>>>>>>>>> image.paste(region2, (0, region1.height + 20))
>>>>>>>>>>>>>>>>>>>>> results = pytesseract.image_to_data(image, 
>>>>>>>>>>>>>>>>>>>>> output_type=pytesseract.Output.DICT)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> the processed image looks like
>>>>>>>>>>>>>>>>>>>>> [image: hi.png]
>>>>>>>>>>>>>>>>>>>>> but getting results like:
>>>>>>>>>>>>>>>>>>>>> [' ', 
>>>>>>>>>>>>>>>>>>>>> '»MaiSakurajima¥RascalDoesNotDreamofBunnyGirlSenpai', 
>>>>>>>>>>>>>>>>>>>>> 'iGenshinImpact']
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> How do I optimize the image/configs so the ocr is more 
>>>>>>>>>>>>>>>>>>>>> accurate?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>>> You received this message because you are subscribed 
>>>>>>>>>>>>>>>>>>>>> to the Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving 
>>>>>>>>>>>>>>>>>>>>> emails from it, send an email to 
>>>>>>>>>>>>>>>>>>>>> [email protected].
>>>>>>>>>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/1a2fa0e4-b998-4931-ad7d-ae069a46568bn%40googlegroups.com
>>>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/1a2fa0e4-b998-4931-ad7d-ae069a46568bn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>> You received this message because you are subscribed to 
>>>>>>>>>>>>>>>>>>> the Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails 
>>>>>>>>>>>>>>>>>>> from it, send an email to 
>>>>>>>>>>>>>>>>>>> [email protected].
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/3c60a0fd-a213-4caa-8a0d-6888a116b08an%40googlegroups.com
>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/3c60a0fd-a213-4caa-8a0d-6888a116b08an%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>> You received this message because you are subscribed to 
>>>>>>>>>>>>>>>>> the Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails 
>>>>>>>>>>>>>>>>> from it, send an email to [email protected]
>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/8d80ed59-6163-48c9-adb8-975d8274a9adn%40googlegroups.com
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/8d80ed59-6163-48c9-adb8-975d8274a9adn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>>>>> it, send an email to [email protected].
>>>>>>>>>>>>>>
>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/8749a458-6938-4894-aa67-804631b5139dn%40googlegroups.com
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/8749a458-6938-4894-aa67-804631b5139dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>
>>>>>>>>>>>>> -- 
>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>>> it, send an email to [email protected].
>>>>>>>>>>>>
>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/83f7473f-a2c5-4d5c-8a45-450cb9a630c1n%40googlegroups.com
>>>>>>>>>>>>  
>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/83f7473f-a2c5-4d5c-8a45-450cb9a630c1n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>> send an email to [email protected].
>>>>>>>> To view this discussion on the web visit 
>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/c7626180-9bd7-4759-9f0e-df0b0697ab15n%40googlegroups.com
>>>>>>>>  
>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/c7626180-9bd7-4759-9f0e-df0b0697ab15n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>>
>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/5891f832-b45d-4e24-bcc2-e45a0ed4bb38n%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/5891f832-b45d-4e24-bcc2-e45a0ed4bb38n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>>
>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/2109d002-62d8-4c93-a2de-e9585b277fabn%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/2109d002-62d8-4c93-a2de-e9585b277fabn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1013d21f-395b-47b8-a20f-f88bfd8aab2dn%40googlegroups.com.

Re: [tesseract-ocr] bad quality!?

Reply via email to