Re: Usage stats?

Doug Ewell Sat, 28 Mar 2015 09:57:24 -0700

Michael Norton wrote:

Thanks Doug.  I did not know there exists a representative sample of
the world's text. :)


There is not, which was the point.

Thanks for reposting a private message back to the list, by the way. 💢

Your frequency chart is great.   The average char appearance is 2.91%.
Only 34% from your list exceed 10% of it.  Therefore, U+0020 is the
elephant in the room (ie. 15%.05% is far > 2.91%).   In fact, it's
almost >50% greater than the next most-appearing character.

Words in English are separated by spaces, and the average English wordis about 5 letters long. It follows that English text will contain a lotof spaces. You can eyeball this.

Only 34% from your list exceed 10% of the average percentile (2.9%).

This is serendipitously common (eg. the Earth:Moon albedo ratio is
.36).   A relationship about motion and other natural properties and
charactetristics among the local texts begin to emerge.


Right.

--

Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸

_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

Re: Usage stats?

Reply via email to