Hi Shree,
I'm glad you found my article helpful. Apologies for the delay in my
reply to you. I'll answer your questions below.
> I have found that trying to improve recognition by adding more training data
> sometimes leads to worse recognition. I am currently trying with just one
> font.
> Using multiple fonts sometimes fails with:
>
> Font id = -1/2, class id = 96/2922 on sample 70292
> font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in
> file
> ..\..\clasne 622
I don't think I've seen that failure before. But yes, you're right
that adding more training data can produce worse results.
> I would like to try your testing suite so that I can see whether there is
> improvement in the training data- do you have a windows binary for the same?
I don't have Windows binaries for them. The tools themselves should
compile for Windows, but the issue is that to work beyond ASCII they
need to be run with a wrapper script, that is Unix only
('ocrevalutf8'). I would recommend you set up Cygwin; they will be
easy to compile and run from there.
> Is the recommended training process to train one font and then add another? Or
> train them separately then merge??
I'm not sure I understand the question. How do the above two methods
differ, in the case of tesseract training?
> Does the order in which tif/box files are given matter?
Not as far as I know.
> If I am trying to fix errors, should new training data be given at end of old
> training data or before?
I also don't understand this question. Can you expand on what you
mean, please?
Hope this helps, and I look forward to hearing back from you.
Nick
--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
---
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.