Regarding characters in the SIP, maybe the Unihan IICore field could be useful? There are 62 Extension B characters which are listed as IICore. Of course, these may just be characters which *should* be supported by implementations, given that quite a lot of software has problems with supplementary characters in general...
Uriah On Tue, Jun 15, 2010 at 3:15 AM, Mark Davis ☕ <[email protected]> wrote: > From a sampling of the web (about .7M docs), the most common supplementary > characters are, curiously, private use. Top is [?] U+FEB85. For Han, the > top few are: 𣿡, 𠀤, 𩇫, 𥑬, 𤥂, 𡛺, 𤎌, 𠜎,... There are also, oddly, > some Gothic and Shavian characters. > > However, the data gets pretty noisy; it would take a bigger sample to get > more reliable data. > > Mark > > — Il meglio è l’inimico del bene — > > > On Mon, Jun 14, 2010 at 09:10, John H. Jenkins <[email protected]> wrote: > >> Some characters in the SIP are more common in Chinese written in the HK >> SAR than any character in Extension A, either because they are Hong Kong >> toponyms (or the like), or are Cantonese-specific. (My own analysis of text >> on the Chinese Wikipediæ is that the most common are U+23D13, U+282E2, >> U+28B4E, and U+2A568, which occur seven times each.) >> >> I imagine that the best data would come from Google. >> >> And there are some Web sites out there in Deseret and Shavian, as well. >> (If nothing else, both Deseret and Shavian versions of xkcd are available. >> I'm not aware of any Linear B translations.) >> >> On 2010/6/14, at 上午8:48, Frédéric Grosshans wrote: >> >> > Is there any data on the most commonly used characters which are not in >> > BMP ? >> > >> > I have the impression that SMP characters are mainly used scholars >> > (historic scripts and math symbols). However, I have no idea whether the >> > SIP characters are mainly historical, or if they include not-so rare >> > characters needed for name and/or chinese dialects. >> > >> > Frédéric Grosshans >> > >> > >> >> >> >> >
<<B85.gif>>

