Hi, I'm using the Mercurial code and getting good results! One big problem is that words seem to run together. (I'm putting the image and text together with hocrtopdf)
"are in an informal setting in a conference room, we must" is recognized as: "areinaninformalset6nginaconferenceroom,wemust g" See http://hero.com/ken/trainme.pdf 1) Can I fix this with training? 2) How do I generate the different file types in http://ocropus.googlegroups.com/web/lines.tgz to start training? 3) Is my problem related to the fact ground truth files, like lines/0001/0080.gt.txt don't contain spaces either? E.g., "SNL,andF.HarveyDove,PNL,fortheirsuggestions" is image text that reads: "SNL, and F. Harvey Dove, PNL, for their suggestions" Thanks, Ken --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
