This was finetuned with 20+ monospaced fonts for 400 iterations to error rate of 0.242%.
At iteration 44/400/400, Mean rms=0.258%, delta=0.076%, char train=0.242%, word train=0.761%, skip ratio=0%, New best char error = 0.242 wrote best model:/home/ubuntu/tesstutorial/engrestrict_from_full/engrestrict_plus0.242_44.checkpoint wrote checkpoint. Finished! Error rate = 0.242 If you know the font used and customize training text to your data, you will get better results. On Sat, Mar 30, 2019 at 11:35 AM Shree Devi Kumar <[email protected]> wrote: > try the finetuned traineddata from > > > https://github.com/Shreeshrii/tessdata_shreetest/commit/0108263ad0c4c9bd11e0c8190a81fb36e2e4e56a > > > On Sat, Mar 30, 2019 at 1:47 AM Martin Emmerson <[email protected]> wrote: > >> Yikes! Thanks for the reply, but I could barely follow the discussion >> on that pull request. It seems the answer at least for now is that there >> isn't a straightforward way to restrict character set without being >> somewhat familiar with the code base and dev environment (which I'm not). >> Thanks anyway; I'll try to figure out some external workarounds. >> >> On Thursday, March 28, 2019 at 11:03:59 PM UTC-7, shree wrote: >>> >>> See https://github.com/tesseract-ocr/tesseract/pull/2294 >>> >>> On Fri, 29 Mar 2019, 11:17 Martin Emmerson, <[email protected]> wrote: >>> >>>> Is there a way to restrict the character set that tesseract-ocr will >>>> attempt to identify? I'm scanning USA-based receipts which have a fairly >>>> simple set of monospaced characters but, for example, often '1' will get >>>> misidentified as '|', and a whole host of other simple substitution >>>> errors. If I could just restrict tesseract to [-a-zA-Z0-9,.$()/] it would >>>> be an immediate boost to accuracy. (Hoping for a way that doesn't involved >>>> having to retrain from scratch on the limited set.) >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/2180d37f-50fd-47e6-9f48-c3ff73b1569e%40googlegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/2180d37f-50fd-47e6-9f48-c3ff73b1569e%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/df5177e4-32d0-4015-a863-02878ef53f9b%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/df5177e4-32d0-4015-a863-02878ef53f9b%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXAJbh1zuyZV80nwJkuqaiyGsOfxiwp1FCF6xbvWU-wOg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

