>From a sampling of the web (about .7M docs), the most common supplementary characters are, curiously, private use. Top is [?] U+FEB85. For Han, the top few are: 𣿡, 𠀤, 𩇫, 𥑬, 𤥂, 𡛺, 𤎌, 𠜎,... There are also, oddly, some Gothic and Shavian characters.
However, the data gets pretty noisy; it would take a bigger sample to get more reliable data. Mark — Il meglio è l’inimico del bene — On Mon, Jun 14, 2010 at 09:10, John H. Jenkins <[email protected]> wrote: > Some characters in the SIP are more common in Chinese written in the HK SAR > than any character in Extension A, either because they are Hong Kong > toponyms (or the like), or are Cantonese-specific. (My own analysis of text > on the Chinese Wikipediæ is that the most common are U+23D13, U+282E2, > U+28B4E, and U+2A568, which occur seven times each.) > > I imagine that the best data would come from Google. > > And there are some Web sites out there in Deseret and Shavian, as well. > (If nothing else, both Deseret and Shavian versions of xkcd are available. > I'm not aware of any Linear B translations.) > > On 2010/6/14, at 上午8:48, Frédéric Grosshans wrote: > > > Is there any data on the most commonly used characters which are not in > > BMP ? > > > > I have the impression that SMP characters are mainly used scholars > > (historic scripts and math symbols). However, I have no idea whether the > > SIP characters are mainly historical, or if they include not-so rare > > characters needed for name and/or chinese dialects. > > > > Frédéric Grosshans > > > > > > > >

