Thanks!   This may still be a stretch for my current level of tesseract 
knowledge but definitely more within reach!   I look forward to giving it a 
try.

On Friday, March 29, 2019 at 11:12:44 PM UTC-7, shree wrote:
>
> This was finetuned with 20+ monospaced fonts for 400 iterations to error 
> rate of 0.242%. 
>
> At iteration 44/400/400, Mean rms=0.258%, delta=0.076%, char train=0.242%, 
> word train=0.761%, skip ratio=0%,  New best char error = 0.242 wrote best 
> model:/home/ubuntu/tesstutorial/engrestrict_from_full/engrestrict_plus0.242_44.checkpoint
>  
> wrote checkpoint.
>
> Finished! Error rate = 0.242
>
> If you know the font used and customize training text to your data, you 
> will get better results.  
>
> On Sat, Mar 30, 2019 at 11:35 AM Shree Devi Kumar <[email protected] 
> <javascript:>> wrote:
>
>> try the finetuned traineddata from
>>
>>
>> https://github.com/Shreeshrii/tessdata_shreetest/commit/0108263ad0c4c9bd11e0c8190a81fb36e2e4e56a
>>   
>>
>> On Sat, Mar 30, 2019 at 1:47 AM Martin Emmerson <[email protected] 
>> <javascript:>> wrote:
>>
>>> Yikes!   Thanks for the reply, but I could barely follow the discussion 
>>> on that pull request.   It seems the answer at least for now is that there 
>>> isn't a straightforward way to restrict character set without being 
>>> somewhat familiar with the code base and dev environment (which I'm not).  
>>> Thanks anyway; I'll try to figure out some external workarounds.
>>>
>>> On Thursday, March 28, 2019 at 11:03:59 PM UTC-7, shree wrote:
>>>>
>>>> See https://github.com/tesseract-ocr/tesseract/pull/2294
>>>>
>>>> On Fri, 29 Mar 2019, 11:17 Martin Emmerson, <[email protected]> wrote:
>>>>
>>>>> Is there a way to restrict the character set that tesseract-ocr will 
>>>>> attempt to identify?  I'm scanning USA-based receipts which have a fairly 
>>>>> simple set of monospaced characters but, for example, often '1' will get 
>>>>> misidentified as '|', and a whole host of other simple substitution 
>>>>> errors.  If I could just restrict tesseract to [-a-zA-Z0-9,.$()/] it 
>>>>> would 
>>>>> be an immediate boost to accuracy.  (Hoping for a way that doesn't 
>>>>> involved 
>>>>> having to retrain from scratch on the limited set.)
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/2180d37f-50fd-47e6-9f48-c3ff73b1569e%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/2180d37f-50fd-47e6-9f48-c3ff73b1569e%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> To post to this group, send email to [email protected] 
>>> <javascript:>.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/df5177e4-32d0-4015-a863-02878ef53f9b%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/df5177e4-32d0-4015-a863-02878ef53f9b%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> -- 
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>
>
> -- 
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/31e4a00d-d75d-4aad-aab4-0bb03cf79741%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to