Timothy, Did you ever get this working? It looks like you and I are working a similar problem set.
On Sunday, June 26, 2016 at 4:27:48 PM UTC-4, Timothy Korse wrote: > > I'm trying to configurate tesseract to recognize *alphanumeric strings* of > 10 characters long (all uppercase). > > > This works pretty good, except it seems to mix up the following characters > pretty often: > > - 2 and Z > - 6 and G > > > Examples of images are: > > > <https://lh3.googleusercontent.com/-20dr7dBmT9c/V2_eMKE7TtI/AAAAAAAAAKw/ENcZMZogPws1elcz7BV0WRsE4B8M22IWgCKgB/s1600/X2JR6XK6VGMQP2L5.jpg> > > > <https://lh3.googleusercontent.com/-MysZA6TlqI0/V2_eQyVCOzI/AAAAAAAAAKw/LgUKmhGzsvcfod1bHLEIRfBtKO7-dCodQCKgB/s1600/X2LHV6KHPJ5TFTDK.jpg> > > > <https://lh3.googleusercontent.com/-s6QuiuY_GK8/V2_eUtSCvBI/AAAAAAAAAKw/nM-vnz9SCvQ2OWPuwytKJirJMCS4kIGqgCKgB/s1600/X3K9V5XKQV3Z5QT5.jpg> > > > <https://lh3.googleusercontent.com/-QVLjGd9Lcik/V2_eYvEDsJI/AAAAAAAAAKw/c_s5sYdtE0AbFZX8OqNiEAAvrnooYD6pwCKgB/s1600/X3P92TR7Q93F2G9F.jpg> > > > <https://lh3.googleusercontent.com/-wfH5bpBqC5E/V2_egk0Sj3I/AAAAAAAAAKw/-da1JPAT_hUF5CEn6c9FkkZqANu3TDtngCKgB/s1600/X4NT7CFMH2GR7HXZ.jpg> > > > <https://lh3.googleusercontent.com/-KHssFqw1XyE/V2_emEmR4yI/AAAAAAAAAK0/kftsbb0E65os-rdIlkHxpqT8Ip7gkWWbwCKgB/s1600/X4QGN9XQ3KP69YZX.jpg> > > These are preprocessed. I think this process was successfully done. I'll > glad to hear otherwise. > > > This is how I run Tesseract: > > > tesseract = new Tesseract(); > tesseract.setOcrEngineMode(TessAPI.TessOcrEngineMode.OEM_TESSERACT_ONLY); > tesseract.setPageSegMode(7); > tesseract.setTessVariable("load_system_dawg", "0"); > tesseract.setTessVariable("load_freq_dawg", "0"); > tesseract.setTessVariable("load_punc_dawg", "0"); > tesseract.setTessVariable("load_number_dawg", "0"); > tesseract.setTessVariable("load_unambig_dawg", "0"); > tesseract.setTessVariable("load_bigram_dawg", "0"); > tesseract.setTessVariable("load_fixed_length_dawgs", "0"); > > tesseract.setTessVariable("classify_enable_learning", "0"); > tesseract.setTessVariable("classify_enable_adaptive_matcher", "0"); > > tesseract.setTessVariable("segment_penalty_garbage", "0"); > tesseract.setTessVariable("segment_penalty_dict_nonword", "0"); > tesseract.setTessVariable("segment_penalty_dict_frequent_word", "0"); > tesseract.setTessVariable("segment_penalty_dict_case_ok", "0"); > tesseract.setTessVariable("segment_penalty_dict_case_bad", "0"); > > > *Note that this is Java code, but my question is not limited to Java.* > > I am not really experienced with Tesseract and seem to find the > documentation very unclear. I hope someone else can help me out. > ------------------------------ > > To give some more context: > > > *How do I train Tesseract?* > > > I train Tesseract by combining over 200 images into one image. Every image > contains 10 alphanumeric characters. Also, I am sure the box file is > correct. > > > I build the final language by executing the following batch script: > > tesseract qwe.combined.jpg qwe.combined.box nobatch box.train > > echo combined 1 0 0 0 0 > font_properties > > unicharset_extractor qwe.combined.box > > shapeclustering -F font_properties -U unicharset qwe.combined.box.tr > > mftraining -F font_properties -U unicharset -O qwe.unicharset > qwe.combined.box.tr > > cntraining qwe.combined.box.tr > > copy inttemp qwe.inttemp > copy normproto qwe.normproto > copy pffmtable qwe.pffmtable > copy shapetable qwe.shapetable > > combine_tessdata qwe. > > ------------------------------ > > How can I make Tesseract discriminate better between the 2, Z, 6 and G? > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c033d304-9fb0-4462-bc7f-c116479bfe42%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

