Re: [r6rs-discuss] Re: [Formal] formal comment (ports, characters, strings, Unicode)

Abdulaziz Ghuloum Mon, 19 Mar 2007 17:14:21 -0800


On Mar 19, 2007, at 8:17 PM, [EMAIL PROTECTED] wrote:

UTF-8 and UTF-16 require one or more code units to represent a given
scalar value. Since the number of code units depends on the scalarvaluebeing encoded there's no algorithm that maps the i'th scalar valueto the
j'th code unit. If you want the i'th scalar value in a UTF-8 or UTF-16
string you have to search for it. And that, of course, is whatstring-ref
is, a request for the i'th scalar value (returned as a character).

From what I understand, UTF-8, UTF-16, and UTF-32 are interchangeformats.Unicode text encoded in any one of the formats can be converted toanotherwithout loss of information (right?). Moreover, the internalrepresentationof strings does not have to match the external representation. Forexample,

you can read a UTF-32 encoded file into a variable-length buffer to save

some space (sometimes); or alternatively, you can read a UTF-8encoded file

into a fixed-length buffer to save time on random-access (sometimes).

Is the following a valid summary of the issue?

  The existence of string-ref and string-set! operations seems to imply
  that a variable-length internal representation is not an option and

a fixed-length representation wastes space and is thereforeinefficient

  (mostly in an ascii-centered world).

Aziz,,,

_______________________________________________
r6rs-discuss mailing list
r6rs-discuss@lists.r6rs.org
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Re: [r6rs-discuss] Re: [Formal] formal comment (ports, characters, strings, Unicode)

Reply via email to