I'm dealing with font subsets, and I generate an image per font, so there is no reading order. Though I've seen latin and cjk in the same font subset. If OSD just gives, reading, orientation, and text order, it is not going to give me anything useful. Plus I have the font, so I could get some of that info from the font, just no idea what language (though maybe I should go back and take another look...).
I've got training up and running, on Ubuntu. I modified the text file you gave me, just adding some missing ligatures (ff, ffi, ffl), but my asc.traineddata is way worse then yours. *Do you have a list of fonts you used to create asc.traineddata that I could start with*? For example, I think my fonts are missing the old ascii drawing blocks that you include, and which works great on the fonts that use those (for bullets usually). -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9f084bab-80b2-4c3b-9de8-9add618a8484%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.