[tesseract-ocr] Assistance with OCR on frames from screen capture

Titus Barik Thu, 23 Jun 2016 23:19:47 -0700

Hi all,

We need to process the title bars from a set of screen recordings for a 
programming IDE. An example of a title is:

Java -
commons-collections4/src/main/java/org/apache/commons/collections4/list/LazyList.java

- Eclipse

The videos have already been recorded so we are stuck with the quality of
the frames as is (I have included an example of this image as an
attachment).

When running it under tesseract with stock settings, the output is instead:

> tesseract title_lazylist.png stdout

lava ,
(ammon5chIIemansA/src/msun/Java/arg/apame/wmmans/calIemansA/nst/Lazyust
Java , Eclipse

I expect that recognition will be poor with default settings, but I'm
unclear on what I should be doing to proceed in this particular case --
whether it is to apply some filter on the image first as a pre-processing
step, if I should have custom config settings (such as
"load_system_dawg 0") or some combination of both.

I'm not an expert in OCR so any suggestions are appreciated.

The version of tesseract is:

tesseract 3.05.00dev
leptonica-1.73
libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.6.20 :
libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0

Thanks,

Titus

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/90b5aca2-54dd-432e-be97-74c27fce7fd1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Assistance with OCR on frames from screen capture

Reply via email to