Re: [tesseract-ocr] Re: issue with simple reading of numbers 9 and 8

2017-04-21 Thread ShreeDevi Kumar
Which version of Tesseract. Which o/s? If all your text is in tungsten-semibold, have you tried training with just that font? - excuse the brevity, sent from mobile On 22-Apr-2017 12:50 AM, "James Abney" wrote: The font is tungsten semibold On Friday, April 21, 2017 at

[tesseract-ocr] issue with simple reading of numbers 9 and 8

2017-04-21 Thread James Abney
I'm having issues with tesseract dealing with the number 9 and 8 especially when they are next to each other. This is really the only issue I have. Even when ocr a tiff file it shows 123456789 as 123456788. I will link an example. Any help is appreciated. The following image is an example where

[tesseract-ocr] Re: coordinates of a skewed image

2017-04-21 Thread Dominik Jesiolowski
Hi, Don't think Tesseract deskews the image internally. You can deskew it before loading to Tesseract, have a look at leptonica source: prog/skewtest.c Regards, Dominik -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from

Re: [tesseract-ocr] Training tesseract-ocr unicharset_extractor, mftraining, cntraining

2017-04-21 Thread ShreeDevi Kumar
If you want to OCR an invoice like the sample you posted, just use the eng.traineddata and OCR the page. You do not need to do any training. Here is the output I get 8633 0410 NO RP 11 07122015 NYNN 01 01 0001 Page 2 Of 3 Did you know? Your Comcast Business Internet service gives

[tesseract-ocr] Re: Trying to combine files to form a single traineddata, having error in output

2017-04-21 Thread Alain Ghawi
Hello, I have opened many of your box files and we can clearly see that they are the same letter!!! For example, for the 1st box all of them starts with 0. The second box file all starts with 1. Therefore, I think there is a problem with your box files. Secondly, the error suggests that the