Re: "Unicode in 'NFG' formation" ?

Austin Hastings Mon, 18 May 2009 06:35:28 -0700

Mark J. Reed wrote:

On Mon, May 18, 2009 at 9:11 AM, Austin Hastings
<[email protected]> wrote:

If you haven't read the PDD, it's a good start.


<snip useful summary>

I get all that, really.  I still question the necessity of mapping
each grapheme to a single integer.  A single *value*, sure.
length($weird_grapheme) should always be 1, absolutely.  But why does
ord($weird_grapheme) have to be a *numeric* value?  If you convert to,
say, normalization form C and return a list of the scalar values so
obtained, that can be used in any context to reproduce the same
grapheme, with no worries about different processes coming up with
different assignments of arbitrary negative numbers to graphemes.

If you're doing arithmetic with the code points or scalar values of
characters, then the specific numbers would seem to matter.  I'm
looking for the use case where the fact that it's an integer matters
but the specific value doesn't.

There's a couple of cases. First of all, it doesn't have to be aninteger. It needs to be a fixed size, and it needs to be orderable, sothat we can store a bunch of them in an intelligent fashion, thus makingit easy to sort them.

With that said, integers meet the need exactly. Plus, there's thebenefit that unicode already has an "escape hatch" built in to it foruser-defined stuff. And that escape hatch is an integer.

The benefits are documented in the pod: they're fixed size, so we canscan over them forward and backward at low cost. They're easilydistinguished (high bit set) so string code can special-case themquickly. They're orderable, comparable, etc. And best of all theycontain no trans fat!


=Austin

Re: "Unicode in 'NFG' formation" ?

Reply via email to