Re: [tesseract-ocr] Not finding large text?

2017-09-03 Thread ashish goel
Providing a sample image might help us suggest something.. On Sat, Sep 2, 2017 at 12:05 PM, George Erfesoglou wrote: > I have a small image 300x300 and there are some smaller fonts like the > size of this text here that it picks up but larger fonts like *JUST THIS > BIG *it

[tesseract-ocr] Major changes between stable 3.04.01 and 4.0

2017-03-02 Thread Ashish Goel
Can anyone please throw some light on major differences between tesseract 3.04 and 4.0? Since last 4 months, I have been working on a framework using tesseract 3.04. Is it worthwhile moving to 4.0 now? Will it improve OCR efficiency? Any suggestions will be highly appreciated. Regards, Ashish

[tesseract-ocr] Re: Tesseract config parameter for interword space

2016-12-09 Thread Ashish Goel
Any one who can please give me suggestions on this? How to tackle inter word spaces? On Wednesday, December 7, 2016 at 5:57:21 PM UTC+5:30, Ashish Goel wrote: > > I have an image that should read Luz 4 l. Image quality is good. > > Tesseract reads it as Luz4l. (It fails to dete

[tesseract-ocr] Re: Tesseract bad result with japanes

2016-12-09 Thread Ashish Goel
Following options may help you: 1. Image Processing (resize, filter etc,) 2. teseract_whitelist On Friday, December 9, 2016 at 2:23:38 PM UTC+5:30, dinh van Chinh wrote: > > >

[tesseract-ocr] Re: Need help on improving text accuracy

2016-12-08 Thread Ashish Goel
estion. > > The challenge is how to automate this process, any thoughts? > > > On Wednesday, December 7, 2016 at 1:00:59 AM UTC-8, Ashish Goel wrote: > >> Crop image into sub images and then OCR. Crop it in different segments. >> >> On Saturday, December 3

[tesseract-ocr] Re: Tesseract config parameter for interword space

2016-12-07 Thread Ashish Goel
Image is attched, in case some tries to look at it. It is in spanish. On Wednesday, December 7, 2016 at 5:57:21 PM UTC+5:30, Ashish Goel wrote: > > I have an image that should read Luz 4 l. Image quality is good. > > Tesseract reads it as Luz4l. (It fails to determine spaces). >

[tesseract-ocr] Tesseract config parameter for interword space

2016-12-07 Thread Ashish Goel
I have an image that should read Luz 4 l. Image quality is good. Tesseract reads it as Luz4l. (It fails to determine spaces). I have tried resizing the image, passing tesseract_whitelist etc etc. but it is not helping me. Can any one please help me with how can I tell tesseract with number of

[tesseract-ocr] Re: Need help on improving text accuracy

2016-12-07 Thread Ashish Goel
Crop image into sub images and then OCR. Crop it in different segments. On Saturday, December 3, 2016 at 5:54:51 PM UTC+5:30, Marie wrote: > > Hi, > > We are trying to recognize receipt using Tesseract (v3.02 on > Windows). Tried to process the images but the words accuracy (comparing > with

[tesseract-ocr] Re: tesseract crashes on windows for this image

2016-09-08 Thread Ashish Goel
Size is the problem. I reduced its size (using imagemagick) and the error went away. convert 1.tif -resize 70% 2.tif On Thursday, September 8, 2016 at 4:08:47 PM UTC+5:30, George Papadopoulos wrote: > > Hello, > > I am using the following version of tesseract on Windows 7. I have also >

[tesseract-ocr] Re: not recognizing simple image

2016-09-02 Thread Ashish Goel
Did you tried increasing size of the image? On Friday, September 2, 2016 at 12:03:51 PM UTC+5:30, ahs...@gmail.com wrote: > > So i'm trying to ocr the following images but looks we its not doing it > 100%. six is written as five. nine is written as 3. Any suggestions? > > >

[tesseract-ocr] Re: Improve electric meter

2016-08-29 Thread Ashish Goel
Can you be specific on what kind of image processing did you do using imagemagick? Is this you original image? What image goes to tesseract? If this is your original image, then I would have to at least rotate, crop and resize this image to localize it to the meter reading area of the image. If

[tesseract-ocr] Re: How to make tesseract to recognize those numbers better?

2016-08-25 Thread Ashish Goel
Tesseract requires an image to be of minimum of 300 X 300 dpi for good results. I would suggest to resize the image and apply a filter for improvement in detection. I generally use imagemagick for this purpose. On Thursday, August 25, 2016 at 3:28:33 PM UTC+5:30, Mikey wrote: > > I write java

Re: [tesseract-ocr] Unable to recognise the text with the traineddata

2016-07-25 Thread ashish goel
Instead of retraining font, you should focus on pre-processing image. One option that worked in this particular case was resizing the image. I did (tesseract was able to read the image) $ convert a.png -resize 170% b.png $ tesseract b.png stdout -l eng --tessdata-dir

Re: [tesseract-ocr] Can any feature of tesseract auto detect language (or majority language) of the image?

2016-07-20 Thread Ashish Goel
Thanks for the reply. That narrows down my options. On Tuesday, July 19, 2016 at 8:07:09 PM UTC+5:30, zdenop wrote: > > No. Tesseract needs for correct OCR result specification of language of > input image > > > Zdenko > > On Tue, Jul 19, 2016 at 8:47 AM, Ashish

[tesseract-ocr] Re: Can any feature of tesseract auto detect language (or majority language) of the image?

2016-07-19 Thread Ashish Goel
Thanks for the reply, but I am looking for a solution which I can integrate into my custom application. I have no idea, if I can make use of google drive application for this purpose. On Tuesday, July 19, 2016 at 12:17:20 PM UTC+5:30, Ashish Goel wrote: > > I have 100s of images in dif

[tesseract-ocr] Can any feature of tesseract auto detect language (or majority language) of the image?

2016-07-19 Thread Ashish Goel
I have 100s of images in different languages that I need to OCR. Presently, I need to know in advance the language of the image and pass the language paramater (for ex. -l deu or -l dan). Is their a way where I can get to somehow figure out language of the image auto magically? It is weird but

[tesseract-ocr] Re: Getting a blank tessinput.tif file

2016-06-07 Thread Ashish Goel
Zdenko, Thanks for your reply. I will try with standard distro and let know if it works. Ashish On Monday, June 6, 2016 at 4:38:11 PM UTC+5:30, Ashish Goel wrote: > > Hello All, > > I am trying to do OCR on a bunch of images. Getting some failures, and I > want to analyse th

[tesseract-ocr] Re: Getting a blank tessinput.tif file

2016-06-07 Thread Ashish Goel
Ubuntu 12.04 On Monday, June 6, 2016 at 4:38:11 PM UTC+5:30, Ashish Goel wrote: > > Hello All, > > I am trying to do OCR on a bunch of images. Getting some failures, and I > want to analyse them. > So, to do that, I am trying to get the tessinput.tif file so that I can >

[tesseract-ocr] Re: Getting a blank tessinput.tif file

2016-06-07 Thread Ashish Goel
(libjpeg-turbo 1.2.0) : libpng 1.2.46 : libtiff 3.9.5 : zlib 1.2.3.4 but still tessinput.tif is blank. Is there anything else that I can try so that I can get tessinput.tif? Thanks Ashish On Monday, June 6, 2016 at 4:38:11 PM UTC+5:30, Ashish Goel wrote: > > Hello All, > > I am tryi

Re: [tesseract-ocr] Re: Why do I get such bad results?

2016-06-06 Thread ashish goel
I had same problem for Swedish language and a temporary workaround helped me. I zoomed (re-scaled) image to 400% and it recognized the letter. (Though it added other problems). Not sure, but it could improve results for you. Ashish On Mon, Jun 6, 2016 at 8:53 PM, Tom Morris

Re: [tesseract-ocr] Getting a blank tessinput.tif file

2016-06-06 Thread ashish goel
> > On Mon, Jun 6, 2016 at 1:08 PM, Ashish Goel <goelk...@gmail.com> wrote: > >> Hello All, >> >> I am trying to do OCR on a bunch of images. Getting some failures, and I >> want to analyse them. >> So, to do that, I am trying to get the tessin

[tesseract-ocr] Getting a blank tessinput.tif file

2016-06-06 Thread Ashish Goel
Hello All, I am trying to do OCR on a bunch of images. Getting some failures, and I want to analyse them. So, to do that, I am trying to get the tessinput.tif file so that I can find out what input actually goes to tesseract. I am passing "-c tessedit_write_images 1" along with my tesseract to

[tesseract-ocr] Re: Why do I get such bad results?

2016-06-06 Thread Ashish Goel
If you can elaborate on what kind of failures you are experiencing, people might be able to help. On Monday, June 6, 2016 at 12:47:29 PM UTC+5:30, Doron Saar wrote: > > Hi, > > I'm trying to train Tesseract to work with a large library of Hebrew > language documents. > They are all in good

[tesseract-ocr] Re: Possible to prioritise some characters over others during OCR?

2016-05-31 Thread Ashish Goel
I also wish to find a way to avoid such cases. Even I am facing some cases where I get extra white spaces, lower/upper case mismatch and wrong detection of characters... On Tuesday, May 31, 2016 at 11:40:28 PM UTC+5:30, Diederik Hattingh wrote: > > I have a case where my tesseract isn't