which version of tesseract you want to do training? Tesserac 2.0 or
Tesseract 3.0 Series?

On Fri, Jul 20, 2012 at 2:33 PM, Nick White <[email protected]> wrote:

> Hi Nikola,
>
> I suggest you don't try training it. Training is mostly for adding
> new languages, or at least significantly different fonts. As your
> input is English, and a common font, I doubt it would help much over
> the standard english training file.
>
> The results I got from running Tesseract 3 on your sample were
> pretty good, though. I'll attach them here. Using -psm 6 made a big
> improvement as it meant the table cells were on the correct row. So
> I ran:
>
>   tesseract ocr1.png outtest2 -psm 6
>
> The problems remaining in the output is 7 being consistently recognised
> as ?, and m is regularly misrecognised as r'n or r‘n. I have suggestions
> for this.
>
> If your input data will never have ? in, create an ambig rule which
> always changes a ? to a 7 (and similar for the r'n issues). The best
> way to do this would be:
>
> 1) unpack the english training data:
>
>   combine_tessdata -u eng.traineddata eng.
>
> 2) add the following lines to the end of eng.unicharambigs:
>
> 1       ?       1       7       1
> 3       r ' n   1       m       1
> 3       r ‘ n   1       m       1
>
> 3) recombine the training data:
>
>   combine_tessdata eng.
>
> And the eng.traineddata file will contain the extra ambig rules.
>
> Hope this helps, and let us know how you get on.
>
> Nick
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>



-- 
Regards
---------------------------------------------------------------------------------------
Ankur Rana
(ਅੰਕੁਰ ਰਾਣਾ)

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to