2013/9/18 Stephan Stiller <[email protected]> > On 9/18/2013 2:42 AM, Philippe Verdy wrote: > > There are "scalar values" used in so many other unrelated domains [...] > > There is no risk for confusion with vectors or complex numbers or reals or > whatnot. >
Yes there are such risks. I gave a meaningful example with formatting or parsing libraries. Historically the terms "scalar value" (often shortened "scalar") have been (and are still) widely used in mathematics. And today mathematics are widely performed on computers, making computations either symbolically or numerically, or with estimation heursitics, or through massive simulations. These mathematical operations will need to have input (so they'll use parsers or data sensors) and output (so they'll use formatters, not limited to only plain-text, this could be rich-text or colorful graphics as well).. But most of these input and output data will be textual, that wil also need to be encoded. The more universal mathematical concept of scalar values will then collide heavily with the specific internal definition of "scalar values" only used to define a small definition domain for standard UTF conversions. That's why I would propose exactly the opposite of what you want: avoid using "scalar value" alone. But only speak about 'Unicode scalar value character property". But it could as well be removed completely from the definitions (including deprecating it completely as a "character property"). It would mean that all code points have an associated integer value within an unrestricted range (just large enough to distinguish all values between U+0000 and U+10FFFF). The restriction of ranges would ONLY apply in the internal description of standard UTFs. But I think that the definition of "scalar values character property" was only done to save common texts that would otherwise need to be replicated in the description of each standardized UTF: i.e. UTF-8, UTF-16(BE/LE), UTF-32(BE/LE), CESU, BOCU, SCSU... or even the (deprecated?) UTF-7. It should now also be used with other "legacy" standards such as GB18030 (which should be compatible, at least for now in its last version, with the standard UTFs published by Unicode and ISO/IEC/IETF). IMHO, this definition should only be moved just in the capter that present these standard UTFs (or other non-standard UTFs but that respect some minimum condition, which is being able to represent reversibly any code point, *including* unpaired surrogates, even if documents containing surrogate code points could not be conforming, but no warranty about those surrogates for beng distinguished if they are paired or not). In my opinion, the terms "scalar value" used alone are definitely confusive (even if we add "Unicode" because The Unicode Consortium also hosts he CLDR projects and also frequently speeks about interoperability with mathematics, and also includes many direct references to mathematics in its core standard), and I would simply prefer "interoperable code point" (with a basic statement in introducion of the description of each UTF fixing the interoperability conditions for all standard UTFs or for any other conforming UTFs).

