Hi Rob,

Oh, I'm sorry you didn't interpret my advise as constructive.  I can see it 
from your point of view where you have a task, and I'm
simply not helping.  So here's a verbose version of my original answer.

What you are asking for is somewhat mysterious in purpose.  Allow me to 
explain.  Unicode doesn't specify what characters should
look like.  Fonts specify how characters are visually represented.  Hence, I 
see no reason why a font should exists that covers all
of the Unicode specifications because such a font would not be generally 
regarded as useful.  This is doubly true when one considers
that fonts are tied to operation systems (or, in the case of Java, operating 
environments) and/or specific tasks (i.e. fixed-width
fonts use?).

Furthermore, the Unicode specifications is an ever evolving beast.  I may be 
incorrect, but I believe they are currently working on
extending the specifications to cover ancient Asian characters which are no 
longer in any vernacular.  Due to this disuse, font
makers (in this case, calligraphers) disagree on the exact visual 
representations.

Lastly, Unicode is not the only game in town (see GB18030).  Your alternative 
font mapping might get a little messy at this point.

Moreover, you have indicated that you are currently using MS Arial Unicode.  It 
may be wrong, but Unicode.org states that "the Arial
Unicode MS font ... is the most complete" 
[http://www.unicode.org/help/display_problems.html].  You may augment MS Arial 
Unicode
with "last resort" [http://www.unicode.org/policies/lastresortfont_eula.html] 
but I think that links to an Mac-OSX-only solution.

Of course, what you really need to do is string several fonts together.  This 
probably must be done manually in the code and should
usually involves knowledge of the language being supplemented into MS Arial 
Unicode.  Oh, there may be font collisions so watch out.

You know what?  This is a problem already semi-solved (I believe there is no 
full-solution due to the ill-defined nature of the
problem) by Adobe in Acrobat PDF Reader.  Though, the PDF's purpose was 
originally for printing so they "cheated" and had
file-embedded fonts.  You should talk to a PDF expert and see how Adobe did it.

I hope you find this answer less of an eye-roller.  Unfortunately, my 
suggestion remains "stop looking".



-
Albert

-----Original Message-----
From: [email protected] [mailto:[email protected]] On 
Behalf Of Rob H.
Sent: Friday, May 01, 2009 21:08
To: tesseract-ocr
Subject: Re: Great tool for working with unicode


Also, I got this e-mail from a someone named Albert
=========
Hi Rob,

Reply to your "ps"....

That doesn't make any sense to me.  You are asking for a set of glyphs
that can represent every Unicode character in existence.  Not
only would such a file be *HUGE* in size, but I can't see it as
serving any purpose to anyone (other than you, I guess)...

So you should stop looking for it.


-
Albert
=========

Arial Unicode covers ~50K of the ~140K characters defined at
unicode.org. This font file is 22mb.
Wouldn't a complete unicode font be around 70mb?

If you need a general text viewer which can legibly show documents
that contain any number of the valid ~140K characters,
then a complete font would be useful.

Great advice Albert...*roll eyes*... "stop looking"... how about
something a little more constructive?
maybe you know a strategy of mixing fonts to enable an application to
view all the possible unicode characters?








--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to