On Wed, 13 Dec 2006 19:43:10 +0530, Mikko Rantalainen <[EMAIL PROTECTED]> wrote:

Charles McCathieNevile wrote:
On Wed, 13 Dec 2006 13:17:14 +0530, Henri Sivonen <[EMAIL PROTECTED]> wrote:
On Dec 13, 2006, at 08:32, Charles McCathieNevile wrote:
possible *and no simpler* - this is too simple. Maybe assuming you can parse numbers out of text is just a dumb idea as a normative part of a spec.
The attributes always work for any language. For English, the textContent works as a *bonus*. It isn't that the spec fails to work for non-English. It is just that a particular *redundant* bonus feature doesn't work for non-English.
The problem with this is that it means copying code the natural way doesn't work for some non-english speakers, and they have to read the spec or guess why. [...]

I think that "they have to read the spec" is a bonus, too.

Yeah, except it turns out to be wishful thinking of the kind WHATWG tries strenuously to avoid :( And where the problem is that people who habitually use conventions for numbers, it turns out that many of them don't really read english documents or mailing lists either...

Perhaps the parser could be specified as follows:

regexp for "numeric value" is [0-9 ,.]
scan the numeric value backwards from end
first character matching regexp [,.] is the decimal separator

This would correctly interpret numbers such as

1,251,152.124
634.46
453.436.346,235

 This last is the important use case that the existing method fails.

23 236 435 123,121

It would fail for numbers such as

1,234,456.789,012
1.234.456,789.012

but that such formats used in any locale?

Not that I know of. Formats I know of use ".", "," or " " as seperators for integer amounts, and "," or "." for decimal seperators. The only seperators I know of inside the decimal part are "-", "e" and "E". I can imagine someone using the notation for web content in a meter, but I am not sure that it is likely.

Of course there are a handful of other types of numbers. One thing that is helpful is that in hebrew and arabic, numbers are written LTR even though the rest of the text isn't. I am not sure about other LTR languages - apparently there are a couple of Indic ones. On the other hand, since I am going to meet a handful of people this weekend who specialise in publishing for the Indian government, in at least their 22 constitutionally official languages, I will try to remember to ask. One thing that is unhelpful is that in some languages numbers are written using ordinary letters. Although I suspect this use is very rare on the web, as I believe it is pretty much archaic in the relevant languages.

This is, of course, going down the path of specifying internationalised number picking - something that some people are ust dead against.

cheers

Chaals

--
  Charles McCathieNevile, Opera Software: Standards Group
  hablo español  -  je parle français  -  jeg lærer norsk
[EMAIL PROTECTED]          Try Opera 9 now! http://opera.com

Reply via email to