Re: Code point vs. scalar value

Philippe Verdy Wed, 18 Sep 2013 17:07:32 -0700

The UCD is the "Unicode Characters Database". not the "Unicode Codepoints
Database". and we've used extremely frequently the terms "character
properties" (the expression is also found outside TUS, in the names of many
APIs, even if their input is a code point, or a "character" in the meaning
of the programming language, or a 1 or 2 code units

APIs exist that are not limited to use ONLY code points as input,
frequently they also use pointers or references to streams of code units.
And they can return properties from them (even if this requires an internal
conversion of the input) ; this is what the standard "string" APIs have
used since always in C, C++, Java, Javascript, BASIC, Cobol, Fortran, Lisp,
Prolog, PHP, Ruby, Python, Pascal, Ada, SQL, Eiffel... and many of their
dialects (in fact probably all programming languages we've ever heard that
are capable of handling some text). And even for assembly languages.

But none of them have been designed to use only "Unicode scalar values" on
input (this could eventually exist in OO programming or functional
programming, if the language supports ONLY strong type safety at compile
time, to avoid constant checks of value ranges at runtime, with internal
debugging assertions or extra return values or events).

2013/9/19 Markus Scherer <[email protected]>

> On Wed, Sep 18, 2013 at 3:52 PM, Philippe Verdy <[email protected]>wrote:
>
>> But the UCD and contents of the standard text are listing... oh well...
>> only the so-called "character properties"
>>
>
> Untrue. There are definitely code point properties, and surrogates have
> non-trivial property values for Block, Derived_Age, General_Category,
> Grapheme_Cluster_Break, and Line_Break.
> APIs for Unicode properties normally take Unicode code points.
>

Re: Code point vs. scalar value

Reply via email to