On 12 July 2010 16:05, Rippalka <[email protected]> wrote:
> Hi,
>
> I'm currently developping an OCR application, in C#, adapted to my
> type of files, to archive them on a database.
> I analyse character by character, which is not in the tesseract's
> philosophy.
> So, i did something, not very pro, but supposed to work. For each
> character I add this one in the main TIF file and in the box file.
> Then I have a procedure to train tesseract (I followed a tutorial for
> that, and i use the executables) and another one (using tessnet2 this
> time) to detect the characters..
> With a few characters i works pretty well, but then when i have like
> 15 characters (with redondances somethimes) it just desn't find
> anything right anymore.
> The boxes are exactly sized and all the characters are on the same
> row. And the characters to analyse are exactly the same that i just
> gave before, but with a very small margin.
>
> I just just don't get it. Did I do something wrong ? (I can't use any
> dictionnary and I checked with a box viewer and everything is okay).


4th row, '50' - those characters are joined, which causes training
problems with overlapping bounding boxes. You'll need to space the
characters out a little better.

Also, the characters in that TIFF file are quite poorly defined: don't
expect very good quality results from such text.

-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to