> > Serbian: actually both cyrillic and latin is interesting. I dont know
> > about the documents yet. Is both possible, at the same time?
>
> I don't see why not, provided you have data trained for it. In real
> world application, though, I don't think it would be all that helpful
> - I could be mistaken, but I was under the impression that documents
> are generally written in one or the other, or in a manner where there
> is a clean split (opposing pages/columns).

there will be serbian documents, and i expect them to be
either cyrillic or latin - but not mixed inside the documents.
would it be helpful to run both ocr and see which one was
more successful or is this considered "brute-force"?

is there - aside the google download section - any 'dictionary' of
available tesseract training data?

Best,
Hendrik




--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to