Dan Kogai <[EMAIL PROTECTED]> writes: > >But that is not good enough for cases below because... > >>>> (Hiragana | Katakana | Han) => 'jisx0208.1990-0' > >This is very wrong because jisx0208.1990-0 only contains \p{Han} that >appears in Japanese (JIS X 0208, to be exact). On the other hand, >jisx0208.1990-0 does contain greek and cyrillic alphabets.
But cyrillic glyphs are likely double width :-( This is one of reasons I want to do _something_ in this area. I don't want to even try and read a big 16-bit Japanese font just to get cyrillic (for SPAMer's name) or greek Sigma (for math). The other thing that needs fixing is that Tk currently ignores any locale information that might be available. So for "unified" ideographs it will use a font that has the character regardless of which "style" it is in. So for Japanese it is quite likely to find a simplified Chinese style font and use that for Han, then when it hits Katakana it will find an 8-bit (JIS201?) font and use that for those, then when it finds a Hiragana it will find a JIS 208 font. The result looks a mess even to my occidental eyes. What I am hoping to do for Tk804 is put some kind of callback to perl hook in so that when Tk wants a font for a particular character it can call to perl and perl will give it strong push in a particular direction. Thus for someone expecting Japanese if asked for a Han character it will suggest a JIS font. While for someone expecting Chinese it will suggest a Big5 or gb2312 font as appropriate. What gets really painful is the Unicode fonts - one has to look at which characters it has to decide if it Japanese/Simplified Chinese/Traditional Chinese/Korean or just a grab-bag of glyphs font designer had to hand. > >One of so many reasons why Han Unification was a bad idea. When it >comes to Han Ideographs, Unicode's sense of charscript is almost >useless. > >\x{5c0f}\x{98fc} \x{5f3e}