Author: larry Date: Sat Mar 10 09:42:53 2007 New Revision: 14327 Modified: doc/trunk/design/syn/S02.pod
Log: Clarifications on StrPos and StrLen requested by putter++. Modified: doc/trunk/design/syn/S02.pod ============================================================================== --- doc/trunk/design/syn/S02.pod (original) +++ doc/trunk/design/syn/S02.pod Sat Mar 10 09:42:53 2007 @@ -589,13 +589,23 @@ graphemes, or characters in some language. For all builtin operations, all C<Str> positions are reported as position objects, not integers. These C<StrPos> objects point into a particular string at a particular -location independent of abstraction level. The subtraction of two -C<StrPos> objects gives a C<StrLen> object, which is still not an -integer, because the string between two positions also has multiple -integer interpretations depending on the units. A given C<StrLen> -may know that it represents 18 bytes, 7 codepoints, and 3 graphemes, -but it knows this lazily because it actually just hangs onto the two -C<StrPos> objects. (It's much like a C<Range> object in that respect.) +location independent of abstraction level, either by tracking the +string and position directly, or by generating an abstraction-level +independent representation of the offset from the beginning of the +string that will give the same results if applied to the same string +in any context. This is assuming the string isn't modified in the +meanwhile; a C<StrPos> is not a "marker" and is not required to follow +changes to a mutable string. + +The subtraction of two C<StrPos> objects gives a C<StrLen> object, +which is also not an integer, because the string between two positions +also has multiple integer interpretations depending on the units. +A given C<StrLen> may know that it represents 18 bytes, 7 codepoints, +3 graphemes, and 1 letter in Malayalam, but it might only know this +lazily because it actually just hangs onto the two C<StrPos> endpoints +within the string that in turn may or may not just lazily point into +the string. (The lazy implementation of C<StrLen> is much like a +C<Range> object in that respect.) If you use integers as arguments where position objects are expected, it will be assumed that you mean the units of the current lexically @@ -607,6 +617,11 @@ Of course, such a dimensional number will fail if used on a string that doesn't provide the appropriate abstraction level. +If a C<StrPos> or C<StrLen> is forced into a numeric context, it will +assume the units of the current Unicode abstraction level. It is +erroneous to pass such a non-dimensional number to a routine that +would interpret it with the wrong units. + =item * A C<Buf> is a stringish view of an array of