Re: Most commonly used characters not in BMP

Mark Davis ☕ Mon, 14 Jun 2010 18:24:26 -0700

>From a sampling of the web (about .7M docs), the most common supplementary
characters are, curiously, private use. Top is [?] U+FEB85. For Han, the top
few are: 𣿡, 𠀤, 𩇫, 𥑬, 𤥂, 𡛺, 𤎌, 𠜎,... There are also, oddly, some
Gothic and Shavian characters.


However, the data gets pretty noisy; it would take a bigger sample to get
more reliable data.

Mark

— Il meglio è l’inimico del bene —


On Mon, Jun 14, 2010 at 09:10, John H. Jenkins <[email protected]> wrote:

> Some characters in the SIP are more common in Chinese written in the HK SAR
> than any character in Extension A, either because they are Hong Kong
> toponyms (or the like), or are Cantonese-specific.  (My own analysis of text
> on the Chinese Wikipediæ is that the most common are U+23D13, U+282E2,
> U+28B4E, and U+2A568, which occur seven times each.)
>
> I imagine that the best data would come from Google.
>
> And there are some Web sites out there in Deseret and Shavian, as well.
>  (If nothing else, both Deseret and Shavian versions of xkcd are available.
>  I'm not aware of any Linear B translations.)
>
> On 2010/6/14, at 上午8:48, Frédéric Grosshans wrote:
>
> > Is there any data on the most commonly used characters which are not in
> > BMP ?
> >
> > I have the impression that SMP characters are mainly used scholars
> > (historic scripts and math symbols). However, I have no idea whether the
> > SIP characters are mainly historical, or if they include not-so rare
> > characters needed for name and/or chinese dialects.
> >
> >       Frédéric Grosshans
> >
> >
>
>
>
>

Re: Most commonly used characters not in BMP

Reply via email to