Just made your tests ;-)

You can use tesserocr (maybe quite difficult installation if you are on
windows) instead of pytesseract (e.g. initialize tesseract API once and use
is multiple times). But it does not provide DICT output.


Zdenko


st 29. 12. 2021 o 21:18 Cyrus Yip <[email protected]> napísal(a):

> but won't multiple ocr's and crops use a lot of time?
>
> On Wednesday, December 29, 2021 at 10:15:26 AM UTC-8 zdenop wrote:
>
>> IMO if the text is always in the same area, cropping and OCR just that
>> area will be faster.
>>
>> Zdenko
>>
>>
>> st 29. 12. 2021 o 18:58 Cyrus Yip <[email protected]> napísal(a):
>>
>>> I played around a bit and replacing all colours except for text colour
>>> and it works pretty well!
>>>
>>> The only thing is replacing colours with:
>>> im = im.convert("RGB")
>>> pixdata = im.load()
>>> for y in range(im.height):
>>>     for x in range(im.width):
>>>         if pixdata[x, y] != (51, 51, 51):
>>>             pixdata[x, y] = (255, 255, 255)
>>> is a bit slow. Do you know a better way to replace pixels in python? I
>>> don't know if this is off topic.
>>> On Wednesday, December 29, 2021 at 9:46:13 AM UTC-8 zdenop wrote:
>>>
>>>> If you properly crop text areas you get good output. E.g.
>>>>
>>>> [image: r_cropped.png]
>>>>
>>>> > tesseract r_cropped.png - --dpi 300
>>>>
>>>> Rascal Does Not Dream
>>>> of Bunny Girl Senpai
>>>>
>>>> Zdenko
>>>>
>>>>
>>>> st 29. 12. 2021 o 18:21 Cyrus Yip <[email protected]> napísal(a):
>>>>
>>>>> here is an example of an image i would like to use ocr on:
>>>>> [image: drop8.png]
>>>>> I would like the results to be like:
>>>>> ["Naruto Uzumaki Naruto", "Mai Sakurajima Rascal Does Not Dream of
>>>>> Bunny Girl Senpai", "Keqing Genshin Impact"]
>>>>>
>>>>> Right now I'm using
>>>>> region1 = im.crop((0, 55, im.width, 110))
>>>>> region2 = im.crop((0, 312, im.width, 360))
>>>>> image = Image.new("RGB", (im.width, region1.height + region2.height +
>>>>> 20))
>>>>> image.paste(region1)
>>>>> image.paste(region2, (0, region1.height + 20))
>>>>> results = pytesseract.image_to_data(image,
>>>>> output_type=pytesseract.Output.DICT)
>>>>>
>>>>>
>>>>> the processed image looks like
>>>>> [image: hi.png]
>>>>> but getting results like:
>>>>> [' ', '»MaiSakurajima¥RascalDoesNotDreamofBunnyGirlSenpai',
>>>>> 'iGenshinImpact']
>>>>>
>>>>> How do I optimize the image/configs so the ocr is more accurate?
>>>>>
>>>>> Thank you.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/1a2fa0e4-b998-4931-ad7d-ae069a46568bn%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/1a2fa0e4-b998-4931-ad7d-ae069a46568bn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/3c60a0fd-a213-4caa-8a0d-6888a116b08an%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/3c60a0fd-a213-4caa-8a0d-6888a116b08an%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/8d80ed59-6163-48c9-adb8-975d8274a9adn%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/8d80ed59-6163-48c9-adb8-975d8274a9adn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8x1h9tvWQEgGTuvQh0OcBvhmhvchXNV6KAroyUmpnKfeg%40mail.gmail.com.

Reply via email to