I defined a ROI around each number and it seemed to produce better results.
On Wednesday, March 26, 2014 1:10:56 PM UTC-5, V.Lorz wrote: > > Hi All, > > I started integrating tesseract (version 3.2, EMGV) in a project for > recognizing short texts in scanned images. Using some very simple image > processing I extract the area of interest for speeding up the process. > > The errors I get are related to recognition results, tesseract sometimes > confuses the digits '6' and '5', the image bellow is recognized as "443669 > *5*" instead of "443669*6*". I'm using the default *eng.traineddata* file > bundled with the library. Using some other trained data files from around > the Inet I got the same results with the same two digits (5 and 6). Before > processing the image I configure tesseract to process only digits. > > > > > Does anyone know what could be causing this error? How could I solve it? > > I started reading the guide for training the engine ( > http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3) as > suggested in some other threads, but it is of near to no help for me. Is > there any other guide around for 'dummies' like [presummably :(] me? In > this case I want to train it using one image that I created from 40 sampled > documents (attached here). Using jTessBoxEditor-1.0 I was able to generate > and correct the box file. What should I do next? > > > Thanks a lot in advance, V.Lorz > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.

