Here are some tips for total beginners who wants to do OCR on natural scene 
text.

First: The scene text detection
        Cutting the detected text area to the ocr engine rather than 
putting the whole picture, it save a lot of time and resource.
        If you have OpenCV plus Contrib modules, "Class-specific Extremal 
Regions" in the "text" module will come in handy.
        If you want something really quick and don't want to compile the 
contrib modules yourself, This method 
<http://stackoverflow.com/questions/23506105/extracting-text-opencv/23556997#23556997>will
 
do the job.

Second: The image binerization before OCRing
       Turn the text area image into black-and-white pattern will improve 
the output of the ocr, especially on natural scenes.
       The tesseract engine do have its binerization function but I found 
it to be too vulnerable to the noise from the natural scene.
       You can try:01 the adaptive thresholding 
                         02 a special binerization method from here 
<http://liris.cnrs.fr/christian.wolf/software/binarize/>. 
-----------------
If you have something better and smarter than the above methods, please do 
post it. 
I think it will benefit more people who's working in this field.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/93504148-c97c-4191-9190-2665b42add6d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to