Hi all, We need to process the title bars from a set of screen recordings for a programming IDE. An example of a title is:
Java - commons-collections4/src/main/java/org/apache/commons/collections4/list/LazyList.java - Eclipse The videos have already been recorded so we are stuck with the quality of the frames as is (I have included an example of this image as an attachment). When running it under tesseract with stock settings, the output is instead: > tesseract title_lazylist.png stdout lava , (ammon5chIIemansA/src/msun/Java/arg/apame/wmmans/calIemansA/nst/Lazyust Java , Eclipse I expect that recognition will be poor with default settings, but I'm unclear on what I should be doing to proceed in this particular case -- whether it is to apply some filter on the image first as a pre-processing step, if I should have custom config settings (such as "load_system_dawg 0") or some combination of both. I'm not an expert in OCR so any suggestions are appreciated. The version of tesseract is: tesseract 3.05.00dev leptonica-1.73 libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.6.20 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0 Thanks, Titus -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/90b5aca2-54dd-432e-be97-74c27fce7fd1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

