> I tried what you suggested with unipad, but for some reason it went to a > location on a PUA character map, rather than CJK Unified Ideographs > Extension B, where they are in fact located. I guess it is because Unipad > doesn't support Extension B yet, or else I am doing something wrong. But > thanks for directing me to the Unipad website, I'm sure it will be useful.
Code points above FFFF are represented by pairs of code values in the Surrogates Area, not in the Private Use Area. Unipad should be able to show those Surrogate Area values. The code points of the blanks on the website are really in the PUA. I don't completely follow you here, but I can see the code points are as you say. However, I have (correct) hard copies of the text, and there is no doubt that the chars that should be there are U+2835C and U+283B9, in the Ext. B chart. The website is unlikely to have the wrong chars. But there's definitely something wrong somewhere--maybe it is their fonts. As with you, only two of their fonts seem to work for me. > Here is a sample line of text with the two graphs as blanks (on my > machine), > second and third from the left. They are No. 2835C and 283B9 respectively, > on p.152 of the Extension B pdf: > > 心而鮮歡。望天涯而佇念,擢雄劍而長歎。 The second and third char are U+E596 and U+E58E > The page this text is from is > http://www.chant.org/scripts/frame.asp?t=b&id=000675 but I don't think > you'll get into the site unless you or your university is a member. I can access the fonts at: http://www.chant.org/info/download_font.asp Only two of the fonts work on my Windows 98 SE ICS3 and ICS4 If I copy the chinese text to wordpad and change the font the second and the third char become chinese chars. But in ICS3 they look very different from ICS4 Neither are right. ICS3 and ICS4 are both for the Oracle Bone Script database. With our sample text, ICS3 displays OBS graphs (i.e. not standard Chinese), and ICS4 displays Chinese gibberish. Our text is from the Pre-Han & Han and Six Dynasties databases, which use their ICS1, ICS2, and ICS6 fonts. As you say, it seems these fonts don't work. If I run a windows search, it shows in the search results that they are all in the Windows/fonts folder. But upon looking in the folder itself, or in the font box in Word2000, they aren't there. I guess this has something to do with them not working. The following case might confirm it's a problem with the website's fonts: http://www.chant.org/scripts/zj/scripts/frame.asp?t=b&id=000869 text (Shijing #57, last line): 庶姜,庶士有朅! Unipad shows the code points of the third and fourth char from the left (the same character) to be U+E053. But the character that belongs there is U+5B7D, as another website http://210.69.170.100/s25/index.htm (Han Quan), shows in the same line of text: 庶姜孽孽,庶士有朅。And this character is not even in Ext A or B, but the regular Unicode CJK U I charset. There are other cases where both these sites do not display the character (that is, if the problem is not at my end) (Shijing #40, line 11): 1) 室人交我。 http://www.chant.org/scripts/zj/scripts/frame.asp?t=b&id=000869 The fourth and fifth characters should be 徧 U+5FA7 and 讁 U+8B81, but Unipad shows they are U+E052 and U+E536. 2) 室人交遍謫我。 http://210.69.170.100/s25/index.htm (Han Quan) One would also expect Han Quan, like CHANT, to be rigorous and precise. Here there are substitutions for the two characters in question, followed by a blank that Unipad indicates is U+F6B1. Such substitutions should only be necessary when the actual characters are unavailable. What is behind the blank I'm not sure, but it may be a note explaining the substitutions. But again, all four of the characters can be found in the basic CJK charset, not even Ext A or B. I suppose the websites are not using Unicode charsets? Thanks again for your remarks and suggestions.--Allen

