Re: [tesseract-ocr] "Empty Page" and incomplete text recognition

2015-10-28 Thread Daniel Kraft
Hi! On 2015-10-28 18:15, Tom Morris wrote: > In addition to the skew, which I didn't notice until Alistair mentioned > it, closer examination also reveals that the images are warped, almost > as if the text was displayed on the face of a curved CRT from the olden > days. You might try de-warping

[tesseract-ocr] Re: How to extract bounding box only? If I do not need the word/characters classifier.

2015-10-28 Thread umesh pandey
You need text detection for bounding boxes. One of the famous algorithm for it is MSER (Maximally stable extremal regions) and other ERStats, both are available as modules in text detection part of opencv_contrib . On Wednesday, October 28, 2015 at 10:18:56 AM UTC-7, Tom Morris wrote: > > On Wed

[tesseract-ocr] Re: How to extract bounding box only? If I do not need the word/characters classifier.

2015-10-28 Thread Tom Morris
On Wednesday, October 28, 2015 at 4:46:14 AM UTC-4, jinh...@google.com wrote: > > > First, I have very little knowledge about ocr/tesseract. > ... Please help. > If only you worked for Google, you could probably get help directly from the Google software engineers. Oh, wait. You DO work for

Re: [tesseract-ocr] "Empty Page" and incomplete text recognition

2015-10-28 Thread Tom Morris
On Tuesday, October 27, 2015 at 4:49:11 PM UTC-4, Daniel Kraft wrote: > > > On 2015-10-27 16:10, Allistair wrote: > > > I was able to get it reading everything by cropping it to the same > > amount as Working but then rotating it anti clockwise by just a few > > degrees - I tried this because

Re: [tesseract-ocr] Re: How to use use tesseract to extract regional Indian text such as Marathi or Hindi?

2015-10-28 Thread ShreeDevi Kumar
For indian languages also check out OCR feature in google drive/docs. - sent from my phone. excuse the brevity. On 28 Oct 2015 17:34, "ShreeDevi Kumar" wrote: > There is marathi traineddata. However that is not trained with cube engine > and hence may not be as accurate. > > http://packages.ubun

Re: [tesseract-ocr] Re: How to use use tesseract to extract regional Indian text such as Marathi or Hindi?

2015-10-28 Thread ShreeDevi Kumar
There is marathi traineddata. However that is not trained with cube engine and hence may not be as accurate. http://packages.ubuntu.com/wily/tesseract-ocr-mar You can test with both hin and mar and report your experience. Thanks! - sent from my phone. excuse the brevity. On 28 Oct 2015 14:16, "B

[tesseract-ocr] No difference in normal OCR and OCR with user pattern

2015-10-28 Thread Bhushan Patil
Hello everyone, I want to do OCR on this image. This is pre-define format. ie first five will characters, then next four will digits and last will be character. When execute following command $ *t

[tesseract-ocr] Re: How to use use tesseract to extract regional Indian text such as Marathi or Hindi?

2015-10-28 Thread Bhushan Patil
Hi Vaibhav, As far as I know, Marathi language is not available in Tesseract. For Hindi 1. download language data sudo apt-get install tesseract-ocr-hin 2. tesseract path/to/image stdout -l hin On Monday, October 26, 2015 at 4:21:33 PM UTC+5:30, vaibhav kurhe wrote: > > Hello everyo

[tesseract-ocr] How to extract bounding box only? If I do not need the word/characters classifier.

2015-10-28 Thread jinhuili
Hi, First, I have very little knowledge about ocr/tesseract. We use tesseract ocr to detect text area of a given image, which is used for calculating image quality(the smaller text area ratio the better). We don't use the content result of ocr, only use bounding boxes of words. And the probl