I'm training Tess to recognize letters/numbers/symbols/etc. used for
geometrical tolerancing and annotations (ASME Standard Y14.5)
Alot of the characters used in the ASME standard are coming from all
over the unicode tables (although the characters/words are from the
English language).

This is part of a data validation project and I'm using OCR as part of
the process.
Since OCR is not 100% accurate, some of the validation will need to be
done by hand (hopefully as little as possible).
If the person checking the annotation sees a "little box" (ie
unprintable character) then it will slow down their job.
For the moment, I check unprintable characters using the webapp which
I posted above.
Once this goes into production, there will be a font (purchasd or home-
brewed) which can correctly draw all the letters/numbers/symbols/etc.


On May 2, 7:04 am, 74yrs old <[email protected]> wrote:
> Hi Rob,
> I know about conversion.php which I am using for long time for Kannada
> project.
> Will you kindly explain by step by step  of your experiment with sample if
> any. I
> wanted to have hands on experience.  BTW which lang. you were training?
> Regards,
> sriranga(76yrs old)
>
> On Sat, May 2, 2009 at 6:37 AM, Rob H. <[email protected]> wrote:
>
> > Also, I got this e-mail from a someone named Albert
> > =========
> > Hi Rob,
>
> > Reply to your "ps"....
>
> > That doesn't make any sense to me.  You are asking for a set of glyphs
> > that can represent every Unicode character in existence.  Not
> > only would such a file be *HUGE* in size, but I can't see it as
> > serving any purpose to anyone (other than you, I guess)...
>
> > So you should stop looking for it.
>
> > -
> > Albert
> > =========
>
> > Arial Unicode covers ~50K of the ~140K characters defined at
> > unicode.org. This font file is 22mb.
> > Wouldn't a complete unicode font be around 70mb?
>
> > If you need a general text viewer which can legibly show documents
> > that contain any number of the valid ~140K characters,
> > then a complete font would be useful.
>
> > Great advice Albert...*roll eyes*... "stop looking"... how about
> > something a little more constructive?
> > maybe you know a strategy of mixing fonts to enable an application to
> > view all the possible unicode characters?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to