Hi,
I have tried to use a otsu threshold and It didn't work very well. I am
still not being able to recognize the word Carolline for example. Here is
the code I used for it.
*Any other ideas people?* :):)
from PIL import Image
img = Image.open("example_ocr_1.jpg").convert('L')
img_array = np.asarray(img)
print(img_array)
otsu_threshold = filters.threshold_otsu(img_array)
print(val)
def otsu_filter(x):
if x < otsu_threshold:
return 0
else:
return 255
otsu_filter = np.vectorize(otsu_filter)
img_otsu = otsu_filter(img_array)
img_otsu = Image.fromarray(np.uint8(img_otsu))
img_otsu.show()
img_otsu.save("example_ocr_1_otsu.jpg")
On Thursday, 6 April 2017 18:35:36 UTC+2, Allistair C wrote:
>
> You might want to try preprocessing with a threshold filter (otsu
> threshold) to harden the edges?
>
> Sent from my iPhone
>
> On 6 Apr 2017, at 10:16, Javier Abascal <[email protected]
> <javascript:>> wrote:
>
> Hi everyone! :)
>
> I am having troubles identifying correctly the text in the images
> attached. In my opinion, *they are quite clear but not sure how to help
> Tesseract to identify them*. I have tried some other OCR Online services
> and they seem to identify them correctly (without any configuration) so I
> believe I can handle these images with Tesseract. The reason is that I
> won't have Internet access in the machine that will run this task
>
> For now, I have tried to use several of the "top" Tesseract tune
> parameters (like PSM, dictionary, language, increasing DPI, etc.) but I
> haven't been successful yet. Could you please help me with this?
>
> Thank you very much in advance,* I really would appreciate any type of
> comments :)*
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected]
> <javascript:>.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/861cd975-a1da-4342-891f-325ae5d7f947%40googlegroups.com
>
> <https://groups.google.com/d/msgid/tesseract-ocr/861cd975-a1da-4342-891f-325ae5d7f947%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
> <example_ocr_1.jpg>
>
> <example_ocr_2.jpg>
>
> <example_ocr_3.jpg>
>
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/ebb15ac1-4259-4b39-b411-53cfdf33cf4c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.