Re: Usage stats?

2015-03-27 Thread Steven R. Loomis
Here's an analogy: it's more of a piano factory than a concert hall. S Enviado desde nuestro iPhone. > El mar 27, 2015, a las 1:16 PM, Michael Norton > escribió: > > (I know this is way too simplistic a response but it is kind of like giving > everyone an invisible cloak and an invisible da

Re: Usage stats?

2015-03-27 Thread Clive Hohberger
Interesting that you should bring up the ^ and tilde. Their OS independence and IBM mainframe compatibility is the reason in 1985 I chose them as the command prefixes for the now widely used ZPL label design programming language. ISO 646 IRV was chosen as the programming character set. On Friday,

Re: Usage stats?

2015-03-27 Thread David Starner
On Fri, Mar 27, 2015 at 2:03 PM, Michael Norton wrote: > > This is good because when the volumes of traffic begin to exponentially > increase over a space, if there are predominant formulations of Unicode for > each, they need to be recognized for a number of reasons depending on which > sector

Re: Usage stats?

2015-03-27 Thread Richard Wordingham
On Fri, 27 Mar 2015 16:27:26 -0400 Michael Norton wrote: > Easy example: what's the code for [blank space] U+020 across all > language sets of Unicode? Is it the same ie: 100%? No. In China, U+3000 IDEOGRAPHIC SPACE, which is the appropriate ordinary intra-line white space character for use wi

Re: Usage stats?

2015-03-27 Thread Eric Muller
Would a corpus like wikipedia or Project Gutenberg be appropriate for you purpose ? Both are freely and easily accessible. and . Eric. _

Re: Usage stats?

2015-03-27 Thread Michael Norton
I'm trying to get a sense of the range and variance of the Unicode set in the same way I have with hypertext on the web: for every HTML or XHTML document URL, for example ,there is going to be a* >0* Minimum of* "<"* and* ">"* characters. Depending on which Markup set and schema(s) you are using,

Re: Usage stats?

2015-03-27 Thread Michael Norton
Doug Ewell's getting it. He sent this back to me, so I asked him if he could provide the same dataset drawn from his written reply to me: * For example, your original e-mail (327characters) consists of:U+0020 - 14.07%U+0065 - 10.09%U+0061 - 7.03%U+0074 - 6.73%U+006F - 5.81%* This is g

Re: Usage stats?

2015-03-27 Thread Markus Scherer
On Fri, Mar 27, 2015 at 1:27 PM, Michael Norton < michaelanortons...@gmail.com> wrote: > Easy example: what's the code for [blank space] U+020 across all language > sets of Unicode? Is it the same ie: 100%? > I don't understand what you are asking, and I have a hunch you haven't said it in a way

Re: Usage stats?

2015-03-27 Thread Michael Norton
Thank you. What's the count for "universal characters" at this time? Eg: [SP] On Fri, Mar 27, 2015 at 4:40 PM, Phillips, Addison wrote: > What you might be looking for would be the CLDR project's "exemplar > sets" (see for example [1]), which describes which characters are > customarily used

RE: Usage stats?

2015-03-27 Thread Phillips, Addison
What you might be looking for would be the CLDR project’s “exemplar sets” (see for example [1]), which describes which characters are customarily used for a given language and which are sometimes used. However, this is not the same thing as statistical distribution. One of the points of Unicode

Re: Usage stats?

2015-03-27 Thread Michael Norton
Easy example: what's the code for [blank space] U+020 across all language sets of Unicode? Is it the same ie: 100%? On Fri, Mar 27, 2015 at 4:24 PM, Michael Norton < michaelanortons...@gmail.com> wrote: > Just using the tools and formulations we have at present ought to allow > Unicode to produc

Re: Usage stats?

2015-03-27 Thread Michael Norton
Just using the tools and formulations we have at present ought to allow Unicode to produce a usage set without indexing the entire web which would provide implementors with an indication of variances for traffic, overflow, and override purposes relative to users of the standard. If the figure vari

Re: Usage stats?

2015-03-27 Thread John D. Burger
On Mar 27, 2015, at 15:57 , Michael Norton mailto:michaelanortons...@gmail.com>> wrote: > Why wouldn't Unicode itself have it? Because as Ken explained, acquiring (and constantly updating) such statistics would require roughly the effort that Google puts into its crawler. And it wouldn't includ

Re: Usage stats?

2015-03-27 Thread Michael Norton
(I know this is way too simplistic a response but it is kind of like giving everyone an invisible cloak and an invisible dagger and not telling them what a cloak and dagger is for [cutting butter & keeping warm]). On Fri, Mar 27, 2015 at 3:57 PM, Michael Norton < michaelanortons...@gmail.com> wrot

Re: Usage stats?

2015-03-27 Thread Doug Ewell
Michael Norton wrote: > Why wouldn't Unicode itself have it? Probably because the Unicode Consortium isn't responsible for indexing the entire web. Would you expect it to be? -- Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸 ___ Unicode mailing li

Re: Usage stats?

2015-03-27 Thread Michael Norton
Why wouldn't Unicode itself have it? On Fri, Mar 27, 2015 at 1:07 PM, Ken Whistler wrote: > Search engine companies (and in particular, Google) have such > information squirreled away in their index databases, at least as > far as usage stats for Unicode characters on the web go -- but it > is p

Re: Plain text (from Re: Avoidance variants)

2015-03-27 Thread Ken Whistler
On 3/27/2015 8:15 AM, William_J_G Overington wrote: Or you could just redefine "&" and "<" as That encapsulates what I do not like about using markup other than in very precise limited circumstances such as designing a web page. The characters have defined meanings in Unicode: HTML cha

Re: Plain text (from Re: Avoidance variants)

2015-03-27 Thread Ilya Zakharevich
On Fri, Mar 27, 2015 at 01:00:09PM +, William_J_G Overington wrote: > >> Exact semantics of formatting characters aside, it is best to define plain > >> text as a stateless stream. The characters you're proposing require a > >> decoder to keep state, therefore they won't do. At most you may a

magnetic limit

2015-03-27 Thread Michael Norton
A little more on what I had posted earlier today--> Today's periodic table consists of melting, boiling, and freezing points for elements; this magnetic point may now be added for each element in order to identify characteristics of a given volume for practitioners across which, via surface distri

Re: Usage stats?

2015-03-27 Thread Ken Whistler
Search engine companies (and in particular, Google) have such information squirreled away in their index databases, at least as far as usage stats for Unicode characters on the web go -- but it is proprietary information, and they generally don't publish information about such statistics. Perhaps

Re: Plain text (from Re: Avoidance variants)

2015-03-27 Thread William_J_G Overington
> Or you could just redefine "&" and "<" as U+0026 START HTML ENTITY and U+003C START HTML TAG and be done with it, and just incorporate HTML5 into Unicode forever, thus eliminating these discussions from this list, and moving them to the W3C and WHATWG lists... That encapsulates what

Usage stats?

2015-03-27 Thread Michael Norton
Hello and thank you for an incredible service (just joining the list). Is there a list of usage statistics per character of the Unicode set available somewhere? Cheers, -- Michael A. Norton, B.A. Cinema, M.P.A. My Cinema Home: http://www.NortonsNook.com "All great actors are mere mathematica

Re: Plain text (from Re: Avoidance variants)

2015-03-27 Thread Asmus Freytag (t)
On 3/27/2015 6:00 AM, William_J_G Overington wrote: So, if that were implemented, then to typeset, say, the word astrolabe within a plain text file, in italics, one would need to use nine instances of the COMBINING ITALICIZER, one instance after each letter of the word astrolabe. That woul

Re: Plain text (from Re: Avoidance variants)

2015-03-27 Thread Doug Ewell
So one of the concerns I have is the implication that "interesting" styling comprises: 1. bold 2. italic If formatting characters were encoded to support these two styling options, right away there would be calls to expand the set with: 3. underlining 4. strikeout 5. superscript 6. subscript 7.

Re: Plain text (from Re: Avoidance variants)

2015-03-27 Thread William_J_G Overington
>> Exact semantics of formatting characters aside, it is best to define plain >> text as a stateless stream. The characters you're proposing require a >> decoder to keep state, therefore they won't do. At most you may ask for *U+E1001 COMBINING ITALICIZER *U+E1003 COMBINING BOLDIFIER after all, w

Re: Plain text (from Re: Avoidance variants)

2015-03-27 Thread Neil Harris
On 26/03/15 23:27, Mark E. Shoulson wrote: On 03/26/2015 11:18 AM, William_J_G Overington wrote: > Blocks of boring plain text, no italics or effects any more complex than justification, simple notes written all in one font with no formatting to speak of etc. I am wondering if it is consider

Re: Plain text (from Re: Avoidance variants)

2015-03-27 Thread Michael Everson
On 27 Mar 2015, at 01:01, Leo Broukhis wrote: > Exact semantics of formatting characters aside, it is best to define plain > text as a stateless stream. The characters you're proposing require a decoder > to keep state, therefore they won't do. At most you may ask for > *U+E1001 COMBINING ITALI