Hi Rob, Oh, I'm sorry you didn't interpret my advise as constructive. I can see it from your point of view where you have a task, and I'm simply not helping. So here's a verbose version of my original answer.
What you are asking for is somewhat mysterious in purpose. Allow me to explain. Unicode doesn't specify what characters should look like. Fonts specify how characters are visually represented. Hence, I see no reason why a font should exists that covers all of the Unicode specifications because such a font would not be generally regarded as useful. This is doubly true when one considers that fonts are tied to operation systems (or, in the case of Java, operating environments) and/or specific tasks (i.e. fixed-width fonts use?). Furthermore, the Unicode specifications is an ever evolving beast. I may be incorrect, but I believe they are currently working on extending the specifications to cover ancient Asian characters which are no longer in any vernacular. Due to this disuse, font makers (in this case, calligraphers) disagree on the exact visual representations. Lastly, Unicode is not the only game in town (see GB18030). Your alternative font mapping might get a little messy at this point. Moreover, you have indicated that you are currently using MS Arial Unicode. It may be wrong, but Unicode.org states that "the Arial Unicode MS font ... is the most complete" [http://www.unicode.org/help/display_problems.html]. You may augment MS Arial Unicode with "last resort" [http://www.unicode.org/policies/lastresortfont_eula.html] but I think that links to an Mac-OSX-only solution. Of course, what you really need to do is string several fonts together. This probably must be done manually in the code and should usually involves knowledge of the language being supplemented into MS Arial Unicode. Oh, there may be font collisions so watch out. You know what? This is a problem already semi-solved (I believe there is no full-solution due to the ill-defined nature of the problem) by Adobe in Acrobat PDF Reader. Though, the PDF's purpose was originally for printing so they "cheated" and had file-embedded fonts. You should talk to a PDF expert and see how Adobe did it. I hope you find this answer less of an eye-roller. Unfortunately, my suggestion remains "stop looking". - Albert -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Rob H. Sent: Friday, May 01, 2009 21:08 To: tesseract-ocr Subject: Re: Great tool for working with unicode Also, I got this e-mail from a someone named Albert ========= Hi Rob, Reply to your "ps".... That doesn't make any sense to me. You are asking for a set of glyphs that can represent every Unicode character in existence. Not only would such a file be *HUGE* in size, but I can't see it as serving any purpose to anyone (other than you, I guess)... So you should stop looking for it. - Albert ========= Arial Unicode covers ~50K of the ~140K characters defined at unicode.org. This font file is 22mb. Wouldn't a complete unicode font be around 70mb? If you need a general text viewer which can legibly show documents that contain any number of the valid ~140K characters, then a complete font would be useful. Great advice Albert...*roll eyes*... "stop looking"... how about something a little more constructive? maybe you know a strategy of mixing fonts to enable an application to view all the possible unicode characters? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

