On 12 July 2010 16:05, Rippalka <[email protected]> wrote: > Hi, > > I'm currently developping an OCR application, in C#, adapted to my > type of files, to archive them on a database. > I analyse character by character, which is not in the tesseract's > philosophy. > So, i did something, not very pro, but supposed to work. For each > character I add this one in the main TIF file and in the box file. > Then I have a procedure to train tesseract (I followed a tutorial for > that, and i use the executables) and another one (using tessnet2 this > time) to detect the characters.. > With a few characters i works pretty well, but then when i have like > 15 characters (with redondances somethimes) it just desn't find > anything right anymore. > The boxes are exactly sized and all the characters are on the same > row. And the characters to analyse are exactly the same that i just > gave before, but with a very small margin. > > I just just don't get it. Did I do something wrong ? (I can't use any > dictionnary and I checked with a box viewer and everything is okay).
4th row, '50' - those characters are joined, which causes training problems with overlapping bounding boxes. You'll need to space the characters out a little better. Also, the characters in that TIFF file are quite poorly defined: don't expect very good quality results from such text. -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

