Is the quality of recognition expected to be improving significantly, if the new scans (same font and size, same book in fact) are processed into the .box files using the previous training results (-l <langcode>) and the resulting new training then merged with the bulk, then used for the next new scan? What is the expected quality curve (exponential, logarithmic etc.)? Is there any reliably known quantity of data that's expected to produce, say, 95% accuracy?
I can't figure this out for myself. Don't see this happening with my data anyway yet. -Yury --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

