On Mar 19, 2007, at 8:17 PM, [EMAIL PROTECTED] wrote:
UTF-8 and UTF-16 require one or more code units to represent a given
scalar value. Since the number of code units depends on the scalar
value
being encoded there's no algorithm that maps the i'th scalar value
to the
j'th code unit. If you want the i'th scalar value in a UTF-8 or UTF-16
string you have to search for it. And that, of course, is what
string-ref
is, a request for the i'th scalar value (returned as a character).
From what I understand, UTF-8, UTF-16, and UTF-32 are interchange
formats.
Unicode text encoded in any one of the formats can be converted to
another
without loss of information (right?). Moreover, the internal
representation
of strings does not have to match the external representation. For
example,
you can read a UTF-32 encoded file into a variable-length buffer to save
some space (sometimes); or alternatively, you can read a UTF-8
encoded file
into a fixed-length buffer to save time on random-access (sometimes).
Is the following a valid summary of the issue?
The existence of string-ref and string-set! operations seems to imply
that a variable-length internal representation is not an option and
a fixed-length representation wastes space and is therefore
inefficient
(mostly in an ascii-centered world).
Aziz,,,
_______________________________________________
r6rs-discuss mailing list
r6rs-discuss@lists.r6rs.org
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss