Thanks! This may still be a stretch for my current level of tesseract knowledge but definitely more within reach! I look forward to giving it a try.
On Friday, March 29, 2019 at 11:12:44 PM UTC-7, shree wrote: > > This was finetuned with 20+ monospaced fonts for 400 iterations to error > rate of 0.242%. > > At iteration 44/400/400, Mean rms=0.258%, delta=0.076%, char train=0.242%, > word train=0.761%, skip ratio=0%, New best char error = 0.242 wrote best > model:/home/ubuntu/tesstutorial/engrestrict_from_full/engrestrict_plus0.242_44.checkpoint > > wrote checkpoint. > > Finished! Error rate = 0.242 > > If you know the font used and customize training text to your data, you > will get better results. > > On Sat, Mar 30, 2019 at 11:35 AM Shree Devi Kumar <[email protected] > <javascript:>> wrote: > >> try the finetuned traineddata from >> >> >> https://github.com/Shreeshrii/tessdata_shreetest/commit/0108263ad0c4c9bd11e0c8190a81fb36e2e4e56a >> >> >> On Sat, Mar 30, 2019 at 1:47 AM Martin Emmerson <[email protected] >> <javascript:>> wrote: >> >>> Yikes! Thanks for the reply, but I could barely follow the discussion >>> on that pull request. It seems the answer at least for now is that there >>> isn't a straightforward way to restrict character set without being >>> somewhat familiar with the code base and dev environment (which I'm not). >>> Thanks anyway; I'll try to figure out some external workarounds. >>> >>> On Thursday, March 28, 2019 at 11:03:59 PM UTC-7, shree wrote: >>>> >>>> See https://github.com/tesseract-ocr/tesseract/pull/2294 >>>> >>>> On Fri, 29 Mar 2019, 11:17 Martin Emmerson, <[email protected]> wrote: >>>> >>>>> Is there a way to restrict the character set that tesseract-ocr will >>>>> attempt to identify? I'm scanning USA-based receipts which have a fairly >>>>> simple set of monospaced characters but, for example, often '1' will get >>>>> misidentified as '|', and a whole host of other simple substitution >>>>> errors. If I could just restrict tesseract to [-a-zA-Z0-9,.$()/] it >>>>> would >>>>> be an immediate boost to accuracy. (Hoping for a way that doesn't >>>>> involved >>>>> having to retrain from scratch on the limited set.) >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/2180d37f-50fd-47e6-9f48-c3ff73b1569e%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/2180d37f-50fd-47e6-9f48-c3ff73b1569e%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected] <javascript:>. >>> To post to this group, send email to [email protected] >>> <javascript:>. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/df5177e4-32d0-4015-a863-02878ef53f9b%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/df5177e4-32d0-4015-a863-02878ef53f9b%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> -- >> >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/31e4a00d-d75d-4aad-aab4-0bb03cf79741%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

