Hi,

I have tried to use a otsu threshold and It didn't work very well. I am 
still not being able to recognize the word Carolline for example. Here is 
the code I used for it.

*Any other ideas people?* :):)

from PIL import Image
img = Image.open("example_ocr_1.jpg").convert('L')
img_array = np.asarray(img)
print(img_array)
otsu_threshold = filters.threshold_otsu(img_array)
print(val)

def otsu_filter(x):
    if x < otsu_threshold:
        return 0
    else:
        return 255
        
otsu_filter = np.vectorize(otsu_filter)
img_otsu = otsu_filter(img_array)
img_otsu = Image.fromarray(np.uint8(img_otsu))
img_otsu.show()
img_otsu.save("example_ocr_1_otsu.jpg")



On Thursday, 6 April 2017 18:35:36 UTC+2, Allistair C wrote:
>
> You might want to try preprocessing with a threshold filter (otsu 
> threshold) to harden the edges?
>
> Sent from my iPhone
>
> On 6 Apr 2017, at 10:16, Javier Abascal <[email protected] 
> <javascript:>> wrote:
>
> Hi everyone! :)
>
> I am having troubles identifying correctly the text in the images 
> attached. In my opinion, *they are quite clear but not sure how to help 
> Tesseract to identify them*. I have tried some other OCR Online services 
> and they seem to identify them correctly (without any configuration) so I 
> believe I can handle these images with Tesseract. The reason is that I 
> won't have Internet access in the machine that will run this task
>
> For now, I have tried to use several of the "top" Tesseract tune 
> parameters (like PSM, dictionary, language, increasing DPI, etc.) but I 
> haven't been successful yet. Could you please help me with this?
>
> Thank you very much in advance,* I really would appreciate any type of 
> comments :)*
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] 
> <javascript:>.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/tesseract-ocr/861cd975-a1da-4342-891f-325ae5d7f947%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/tesseract-ocr/861cd975-a1da-4342-891f-325ae5d7f947%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
> <example_ocr_1.jpg>
>
> <example_ocr_2.jpg>
>
> <example_ocr_3.jpg>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ebb15ac1-4259-4b39-b411-53cfdf33cf4c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to