As part of my attempt to improve Tesseract's accuracy with a language
model, I needed a better DangAmbigs file.  To get one I made a utility
that generates them empirically from any collection of correct and
generated texts (such as the UNLV test set and Tesseract's output from
the tests).  I believe this should be useful to other Tesseract
users.  You can find it here, with a more detailed description:

http://www.cs.toronto.edu/~mreimer/tesseract.html

To the project owners: I'm willing to support this long-term, and
would be pleased to see it put on the external add-ons page or
included in the training files.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to