Re: Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-31 Thread Des Bw
Todays lesson: it is possible to *disable TARGET_ERROR_RATE. * If you find your training stopping prematurely because it is hitting the target_error, then, you can disable it and train by epochs (iterations) only . Any negative value (such as* TARGET_ERROR_RATE=-1*) will disable the

Re: Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-31 Thread Des Bw
training a new language or font, you should replace >>>>> “foo” with the name of your language or font. The standard is to choose >>>>> 3 >>>>> letters, but that is not required. In fact, I have been training a font >>>>> named “micr_e13b” a

Re: Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-31 Thread Des Bw
a new language or font, you should replace >>>>> “foo” with the name of your language or font. The standard is to choose >>>>> 3 >>>>> letters, but that is not required. In fact, I have been training a font >>>>> named “micr_e13b” a

Re: Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-30 Thread Des Bw
gt;>> named “micr_e13b” and it is working technically for me (though the >>>> accuracy >>>> isn’t good enough yet). Note the underscore character between sections of >>>> the name. >>>> >>>> >>>> >>>> Intern

Re: Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-29 Thread Des Bw
ally for me (though the >>>> accuracy >>>> isn’t good enough yet). Note the underscore character between sections of >>>> the name. >>>> >>>> >>>> >>>> Internal >>>> >>>> *F

Re: Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-29 Thread Des Bw
orking technically for me (though the accuracy >>> isn’t good enough yet). Note the underscore character between sections of >>> the name. >>> >>> >>> >>> Internal >>> >>> *From: *tesser...@googlegroups.com on >>> b

Re: Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-24 Thread Des Bw
ically for me (though the accuracy >> isn’t good enough yet). Note the underscore character between sections of >> the name. >> >> >> >> Internal >> >> *From: *tesser...@googlegroups.com on >> behalf of René JM Clais >> *Date: *Sund

Re: Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-24 Thread René JM Clais
enough yet). Note the underscore character between sections of the > name. > > > > Internal > > *From: *tesseract-ocr@googlegroups.com > on behalf of René JM Clais > *Date: *Sunday, October 22, 2023 at 12:41 PM > *To: *tesseract-ocr@googlegroups.com > *Subject: *[EX

Re: Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-23 Thread Keith Smith
@googlegroups.com Subject: [EXTERNAL] Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks CAUTION EXTERNAL EMAIL DO NOT open attachments or click on links from unknown senders or unexpected emails. Hi Keith, The foo.traindedata is not existing but do you mean

Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-22 Thread Des Bw
I have updated the guide explaining on how to train by cutting the top layer. You can check it out. I hope it is helpful. On Sunday, October 22, 2023 at 7:41:15 PM UTC+3 renec...@gmail.com wrote: > Hi Keith, > The foo.traindedata is not existing but do you mean : the trainedata I > want to

Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-22 Thread René JM Clais
Hi Keith, The foo.traindedata is not existing but do you mean : the trainedata I want to train ex: hye.traineddata ? In my case I should add a new character in the hye.traineddata It seems that I can do this using the option 2 ! But how ? Which command should I use to execute this function

Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-21 Thread Des Bw
Another useful parameter to turn ON would have been *perspective*. But, that one is not working for me. On Saturday, October 21, 2023 at 10:45:31 PM UTC+3 Des Bw wrote: > I have been experimenting with the text2image script: > Here are some of my observations so far: >*

Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-21 Thread Des Bw
I have been experimenting with the text2image script: Here are some of my observations so far: * '--strip_unrenderable_words=false':* The idea of this parameter seems to remove characters that are not covered by a certain font. But, I am getting better results with the False

Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-21 Thread Des Bw
That is good starter dear Keith. Very good idea. We can contribute texts and ideas; and develop it into a booklet or "getting started guide"--making additional explanatory comments, practical examples and elaborations on the official guide (which very dense, and incomplete). - the tips and

Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-21 Thread Keith Smith
Thank you Des for your help in this community. It is greatly appreciated! As one who is struggling, may I make a suggestion. I have started a google doc here with a suggested format for a tutorial