Thanks for your reply, I can't really use pre defined patterns since the code pattern and font can change over time. I like the idea to segment the characters myself before giving it to tesseract one by one, but it looks time consuming (coding it I mean). Isn't there any other suitable method ? In particular to solve the 3rd issue, which I think must be easy to solve.
On Wednesday, May 20, 2015 at 12:29:08 PM UTC+2, Dmitri Silaev wrote: > > One no-brainer method to try out would be turning off all dictionaries and > using your own custom "user-patterns" file. Since you said about "your > application" I suppose you can program. So you can take a look at the > comment preceding read_pattern_list() declaration in "dict/trie.h" for more > details. > > It seems all your strings are of the same format: > \A\A\d\d\d\d\d\d\d\d\d\d > (Tess understands very limited pattern syntax). > > But if accuracy is critical in your app, in the long run I would > absolutely avoid using any parts of Tesseract except char classifier. I.e. > crop every single char out of your source image and run Tess in the single > char PSM. I think it's should be easy as long as location of every > character is quite stable among your source images. ImageMagick/shell > scripts would suffice. > > Best regards, > Dmitri Silaev > www.CustomOCR.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0da310e9-57b6-41a1-a363-66d35dc1bc19%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

