[tesseract-ocr] Re: Possible to prioritise some characters over others during OCR?

2016-06-16 Thread Bojidar Stanchev
well you could just run a simple program on the output on tesseract to find and correct those mistakes in your case if you have http:// and you see http:II then it should be a no brainer to just change to http:// it's an easy case because those two dashes are always there another thing is that

[tesseract-ocr] Re: Possible to prioritise some characters over others during OCR?

2016-06-15 Thread Diederik Hattingh
Hi Stef, Thanks for the reply (here and on SO). The fix mostly works, but unfortunately I am still seeing that tesseract sometimes ignores the unicharambigs file I set for it. For example I have the following two images:

[tesseract-ocr] Re: Possible to prioritise some characters over others during OCR?

2016-06-03 Thread 'Stef' via tesseract-ocr
Here you are: SO answer. Am Freitag, 3. Juni 2016 18:31:47 UTC+2 schrieb John Muccigrosso: > > On Thursday, June 2, 2016 at 5:21:51 PM UTC-4, Stef wrote: >> >> You can resolve

[tesseract-ocr] Re: Possible to prioritise some characters over others during OCR?

2016-06-03 Thread John Muccigrosso
On Thursday, June 2, 2016 at 5:21:51 PM UTC-4, Stef wrote: > > You can resolve the ambiguity using the unicharambigs file, for details > see my SO answer to your SO question. > > Stef > I'm curious about this as well. Could you post a link to this discussion? Thanks. -- You received this

[tesseract-ocr] Re: Possible to prioritise some characters over others during OCR?

2016-06-02 Thread 'Stef' via tesseract-ocr
You can resolve the ambiguity using the unicharambigs file, for details see my SO answer to your SO question. Stef -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an

[tesseract-ocr] Re: Possible to prioritise some characters over others during OCR?

2016-05-31 Thread Ashish Goel
I also wish to find a way to avoid such cases. Even I am facing some cases where I get extra white spaces, lower/upper case mismatch and wrong detection of characters... On Tuesday, May 31, 2016 at 11:40:28 PM UTC+5:30, Diederik Hattingh wrote: > > I have a case where my tesseract isn't