[tesseract-ocr] Updating/improving tesseract-ocr

2018-08-10 Thread Lorenza Romano
Hi all I'm a neophyte in Tesseract. Currently I'm using tesseract 3.0.5, I'm using it with the Italian language and newspaper pages in pdf format. I'm converting pdf to png (with Imagemagick) with 300 and 600 dpi, following all these suggestions

Re: [tesseract-ocr] Problem with using two trained.data files in combination for a better result.

2018-08-10 Thread damon
Hi Shree, just a quick update. I've now looked into this output tesseract.log further and now understand how it works and how it will go through different choices and eventually decides on a "best choice". However the output doesn't explain how it then decides what has overriding priority on

Re: [tesseract-ocr] Problem with using two trained.data files in combination for a better result.

2018-08-10 Thread Shree Devi Kumar
I do not know about the internal algorithms used by tesseract. If you are having accuracy issues with certain letters and digits, I will suggest that you fine-tune for impact using the images or similar font. Please see wiki page on training 4.0 for the command - look for fine tuning for new

Re: [tesseract-ocr] cannot install new version, please help me

2018-08-10 Thread Shree Devi Kumar
uninstall all versions of tesseract and libtesseract-dev then install using ppa from https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr On Sat, Aug 11, 2018 at 11:08 AM Kimchi wrote: > Environment > >- Tesseract Version: 3.04 >- Commit Number: 3.04 >- Platform: ubuntu

[tesseract-ocr] How to use Tesseract on Visual C++ 2015?

2018-08-10 Thread Thomas
Hello Marco, some time has passed since you posted this issue. I am running into the exact same problem. Did you find the root cause for your errors, and could you solve the issue? Greetings Thomas -- You received this message because you are subscribed to the Google Groups "tesseract-ocr"

Re: [tesseract-ocr] Re: tesseract training flags to rtl languages

2018-08-10 Thread Mohammad Moin
tesseract 4 On Thu, Aug 9, 2018 at 7:56 PM Shree Devi Kumar wrote: > Are you training for tesseract 3 or tesseract 4(LSTM training)? > > On Thu 9 Aug, 2018, 8:13 PM Mohammad Moin, wrote: > >> this is not much accurate, i am trying to develop my own traineddata from >> scratch, i have completed

[tesseract-ocr] Urdu language left to right output and no space recognize

2018-08-10 Thread moen . eqbal
i have trained my own model for urdu language using jtessboxeditor to create tiff/box file and then used Serak tesseract trainer for creating trainedata file, my model is recognizing urdu language but there are 2 issues mainly other than accuracy(accuracy will be test after solving following

Re: [tesseract-ocr] Problem with using two trained.data files in combination for a better result.

2018-08-10 Thread damon
I just realised some of the output underneath "Trying word using lang fo, oem 0" might be useful information! here it is: Running NoDangerousAmbig() for 5 [35 ]0 3 [33 ]0 . [2e ]p Looking for replaceable ngrams starting with 5 [35 ]0: Looking for replaceable ngrams starting with 3 [33 ]0:

Re: [tesseract-ocr] Problem with using two trained.data files in combination for a better result.

2018-08-10 Thread damon
Hi Shree, thanks for your patience and help! I have managed to produce the tesseract.log file with your help. Now i'm trying to understand it a bit more. here is a quick snippet of the output i want to show you: *Rejecter: 5 [35 ]0 3 [33 ]0 . [2e ]p (word=n, case=y, unambig=y, multiple=y)*

Re: [tesseract-ocr] Problem with using two trained.data files in combination for a better result.

2018-08-10 Thread Damon Kwong
Hi Shree, I've tried to run my commands again by having logfile as the last variable which has been changed to: *debug_file tesseract.log* *multilang_debug_level 3* *stopper_debug_level 3* When i entered the command with logfile at the end, it gives an output in cmd saying:

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-08-10 Thread Mehul Bhardwaj
Hi, I went through this discussion thread and updated to Tesseract 3.05.02. Previously I was working with version 3.05. I was getting the same error of "FAILURE: Couldn't find a matching blob" for about 15% of my training characters. But even after updating, I am still getting the exact same