> > Serbian: actually both cyrillic and latin is interesting. I dont know > > about the documents yet. Is both possible, at the same time? > > I don't see why not, provided you have data trained for it. In real > world application, though, I don't think it would be all that helpful > - I could be mistaken, but I was under the impression that documents > are generally written in one or the other, or in a manner where there > is a clean split (opposing pages/columns).
there will be serbian documents, and i expect them to be either cyrillic or latin - but not mixed inside the documents. would it be helpful to run both ocr and see which one was more successful or is this considered "brute-force"? is there - aside the google download section - any 'dictionary' of available tesseract training data? Best, Hendrik --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

