Hello Team Ocropus :) Over the summer, I did a research project OCRing an obscure Native American language with a a rather complex alphabet (Chinook). It was definitely slow-going. We tested four software packages - OmniPage, ReadIRIS, the native app in the Xerox scanner we used, as well as the plugin for Acrobat - and ReadIRIS was the best. It was the only one that had both training and Unicode support. I tried to get Ocropus going, but my linux knowledge wasn't good enough to configure all of the packages.
Now it has come time to write a paper for publication on this project, and I want to know more about OCR. I figure you folks are experts, so I was wondering if I could talk to some people on the way OCR works. I am specifically interested in the training aspect, as well as how it parses individual characters. Furthermore, if anyone has any experience using OCR on complex alphabets, I'd love to talk to you. A bit of background on Chinook: - 30 characters - several accents for vowels and consonants - accent marks, prime marks, and glottalization marks Thanks for any and all help. --- micah stupak [EMAIL PROTECTED] --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
