Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-07-07 Thread Meenal Goyal
Hi Nick, I am using this technique for binarisation http://liris.cnrs.fr/christian.wolf/software/binarize/ . Could you recommend anything better than this one. Thanks. On Friday, July 4, 2014 7:45:54 PM UTC+5:30, Nick White wrote: On Fri, Jul 04, 2014 at 02:08:46AM -0700, Meenal Goyal

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-07-04 Thread Meenal Goyal
On Friday, July 4, 2014 2:48:41 AM UTC+5:30, Nick White wrote: On Wed, Jul 02, 2014 at 10:26:16PM -0700, Meenal Goyal wrote: The post about question about training tesseract only suggests some pre-processing steps which include binarisation and I have already tried them. I wanted

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-07-04 Thread Nick White
On Fri, Jul 04, 2014 at 02:08:46AM -0700, Meenal Goyal wrote: If you're sure that all the words you will encounter will be in the dictionary this should help somewhat: https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_to_ increase_the_trust_in/strength_of_the_dictionary?

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-07-03 Thread Meenal Goyal
Hi Nick, The post about question about training tesseract only suggests some pre-processing steps which include binarisation and I have already tried them. I wanted to know if anything can be done to improve output at later stage, something like adding the words to the dictionary used by

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-07-03 Thread Nick White
On Wed, Jul 02, 2014 at 10:26:16PM -0700, Meenal Goyal wrote: The post about question about training tesseract only suggests some pre-processing steps which include binarisation and I have already tried them. I wanted to know if anything can be done to improve output at later stage,

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-07-02 Thread Meenal Goyal
Hi Nick, I have read that post earlier and also tried to preprocess the image. This is the input image http://imgur.com/yCxOvQS,GD38rCa which after preprocessing gives this http://imgur.com/JzrDkug . I wanted to know if there is some way to improve in post-processing phase. Right now I am

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-07-02 Thread Nick White
That's a tough thing to preprocess. Take a look at this recent thread on this list: question about training tesseract. Nick On Tue, Jul 01, 2014 at 11:48:07PM -0700, Meenal Goyal wrote: Hi Nick, I have read that post earlier and also tried to preprocess the image. This is the input image

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-07-01 Thread Meenal Goyal
Hi Nick, When I try to ocr an image, it also produces some noise apart from the meaningful words. An example output for an image is: All women become like their’ mqthers. _ ' 1’ ' - —T at-{rs their tragedy. ” R-‘»“T‘*'-. ‘ . / N man does“ That's‘his. ‘ ' os'cAR»w;L'15E ‘ 9 So, I

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-07-01 Thread Nick White
Hi Meena, On Tue, Jul 01, 2014 at 02:04:36AM -0700, Meenal Goyal wrote: When I try to ocr an image, it also produces some noise apart from the meaningful words. An example output for an image is: All women become like their’ mqthers. _ ' 1’ ' - —T at-{rs their tragedy. ” R-‘»“T‘*'-.

[tesseract-ocr] retrieve words not matching the dictionary

2014-06-30 Thread Meenal Goyal
Hi, When i run tesseract on my image, it produces some words not present in the dictionary. Is there some way to directly get the list of these words and prevent tesseract from showing them in the output. Example of such words are: fiJfifilnlflfiflhu-«fifllfllfilfi , neefls» , oscxmwxufis etc. -- You

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-06-30 Thread Nick White
Hi Meenal, On Mon, Jun 30, 2014 at 01:40:10AM -0700, Meenal Goyal wrote: When i run tesseract on my image, it produces some words not present in the dictionary. Is there some way to directly get the list of these words and prevent tesseract from showing them in the output. Example of such