Re: [tesseract-ocr] Problem with using two trained.data files in combination for a better result.

2018-08-10 Thread Shree Devi Kumar
I do not know about the internal algorithms used by tesseract. If you are having accuracy issues with certain letters and digits, I will suggest that you fine-tune for impact using the images or similar font. Please see wiki page on training 4.0 for the command - look for fine tuning for new

Re: [tesseract-ocr] Problem with using two trained.data files in combination for a better result.

2018-08-10 Thread damon
Hi Shree, just a quick update. I've now looked into this output tesseract.log further and now understand how it works and how it will go through different choices and eventually decides on a "best choice". However the output doesn't explain how it then decides what has overriding priority on

Re: [tesseract-ocr] Problem with using two trained.data files in combination for a better result.

2018-08-10 Thread damon
I just realised some of the output underneath "Trying word using lang fo, oem 0" might be useful information! here it is: Running NoDangerousAmbig() for 5 [35 ]0 3 [33 ]0 . [2e ]p Looking for replaceable ngrams starting with 5 [35 ]0: Looking for replaceable ngrams starting with 3 [33 ]0:

Re: [tesseract-ocr] Problem with using two trained.data files in combination for a better result.

2018-08-10 Thread damon
Hi Shree, thanks for your patience and help! I have managed to produce the tesseract.log file with your help. Now i'm trying to understand it a bit more. here is a quick snippet of the output i want to show you: *Rejecter: 5 [35 ]0 3 [33 ]0 . [2e ]p (word=n, case=y, unambig=y, multiple=y)*

Re: [tesseract-ocr] Problem with using two trained.data files in combination for a better result.

2018-08-10 Thread Damon Kwong
Hi Shree, I've tried to run my commands again by having logfile as the last variable which has been changed to: *debug_file tesseract.log* *multilang_debug_level 3* *stopper_debug_level 3* When i entered the command with logfile at the end, it gives an output in cmd saying:

Re: [tesseract-ocr] Problem with using two trained.data files in combination for a better result.

2018-08-09 Thread Shree Devi Kumar
output tesseract.log file should be produced in the directory from where you are running the command, usually where your OCR output is created. On Thu, Aug 9, 2018 at 3:48 PM wrote: > Hello Shree, thank you for your prompt reply. > > I have now changed the logfile as instructed. Where can i

Re: [tesseract-ocr] Problem with using two trained.data files in combination for a better result.

2018-08-09 Thread damon
Hello Shree, thank you for your prompt reply. I have now changed the logfile as instructed. Where can i find the output tesseract.log file? will it be produced in the same location as the logfile? in C:\Program Files (x86)\Tesseract-OCR\tessdata\configs ? I'm guessing the tesseract.log file

Re: [tesseract-ocr] Problem with using two trained.data files in combination for a better result.

2018-08-08 Thread Shree Devi Kumar
i think this could be if your new traineddats is not trained to as high a accuracy level as the eng traineddata. You can setup a debug log to verify this. see https://github.com/tesseract-ocr/tesseract/issues/1275#issuecomment-360367865 for details On Wed, Aug 8, 2018 at 6:04 PM wrote: > i'm