I'm training Tess to recognize letters/numbers/symbols/etc. used for geometrical tolerancing and annotations (ASME Standard Y14.5) Alot of the characters used in the ASME standard are coming from all over the unicode tables (although the characters/words are from the English language).
This is part of a data validation project and I'm using OCR as part of the process. Since OCR is not 100% accurate, some of the validation will need to be done by hand (hopefully as little as possible). If the person checking the annotation sees a "little box" (ie unprintable character) then it will slow down their job. For the moment, I check unprintable characters using the webapp which I posted above. Once this goes into production, there will be a font (purchasd or home- brewed) which can correctly draw all the letters/numbers/symbols/etc. On May 2, 7:04 am, 74yrs old <[email protected]> wrote: > Hi Rob, > I know about conversion.php which I am using for long time for Kannada > project. > Will you kindly explain by step by step of your experiment with sample if > any. I > wanted to have hands on experience. BTW which lang. you were training? > Regards, > sriranga(76yrs old) > > On Sat, May 2, 2009 at 6:37 AM, Rob H. <[email protected]> wrote: > > > Also, I got this e-mail from a someone named Albert > > ========= > > Hi Rob, > > > Reply to your "ps".... > > > That doesn't make any sense to me. You are asking for a set of glyphs > > that can represent every Unicode character in existence. Not > > only would such a file be *HUGE* in size, but I can't see it as > > serving any purpose to anyone (other than you, I guess)... > > > So you should stop looking for it. > > > - > > Albert > > ========= > > > Arial Unicode covers ~50K of the ~140K characters defined at > > unicode.org. This font file is 22mb. > > Wouldn't a complete unicode font be around 70mb? > > > If you need a general text viewer which can legibly show documents > > that contain any number of the valid ~140K characters, > > then a complete font would be useful. > > > Great advice Albert...*roll eyes*... "stop looking"... how about > > something a little more constructive? > > maybe you know a strategy of mixing fonts to enable an application to > > view all the possible unicode characters? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

