[tesseract-ocr] Sharing some tips on Scene text OCR

Cid Chang Wed, 27 Apr 2016 20:41:00 -0700

Here are some tips for total beginners who wants to do OCR on natural scene 
text.

First: The scene text detection
Cutting the detected text area to the ocr engine rather than
putting the whole picture, it save a lot of time and resource.
If you have OpenCV plus Contrib modules, "Class-specific Extremal
Regions" in the "text" module will come in handy.
If you want something really quick and don't want to compile the
contrib modules yourself, This method
<http://stackoverflow.com/questions/23506105/extracting-text-opencv/23556997#23556997>will

do the job.

Second: The image binerization before OCRing
Turn the text area image into black-and-white pattern will improve
the output of the ocr, especially on natural scenes.
The tesseract engine do have its binerization function but I found
it to be too vulnerable to the noise from the natural scene.
You can try:01 the adaptive thresholding
02 a special binerization method from here
<http://liris.cnrs.fr/christian.wolf/software/binarize/>.
-----------------
If you have something better and smarter than the above methods, please do
post it.
I think it will benefit more people who's working in this field.

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/93504148-c97c-4191-9190-2665b42add6d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Sharing some tips on Scene text OCR

Reply via email to