Mark J. Reed wrote:
On Mon, May 18, 2009 at 9:11 AM, Austin Hastings
If you haven't read the PDD, it's a good start.
<snip useful summary>
I get all that, really. I still question the necessity of mapping
each grapheme to a single integer. A single *value*, sure.
length($weird_grapheme) should always be 1, absolutely. But why does
ord($weird_grapheme) have to be a *numeric* value? If you convert to,
say, normalization form C and return a list of the scalar values so
obtained, that can be used in any context to reproduce the same
grapheme, with no worries about different processes coming up with
different assignments of arbitrary negative numbers to graphemes.
If you're doing arithmetic with the code points or scalar values of
characters, then the specific numbers would seem to matter. I'm
looking for the use case where the fact that it's an integer matters
but the specific value doesn't.
There's a couple of cases. First of all, it doesn't have to be an
integer. It needs to be a fixed size, and it needs to be orderable, so
that we can store a bunch of them in an intelligent fashion, thus making
it easy to sort them.
With that said, integers meet the need exactly. Plus, there's the
benefit that unicode already has an "escape hatch" built in to it for
user-defined stuff. And that escape hatch is an integer.
The benefits are documented in the pod: they're fixed size, so we can
scan over them forward and backward at low cost. They're easily
distinguished (high bit set) so string code can special-case them
quickly. They're orderable, comparable, etc. And best of all they
contain no trans fat!