Re: [Scheme-reports] Sequence to sequence conversion

Marc Feeley Mon, 02 Jul 2012 05:34:40 -0700

On 2012-07-01, at 4:39 PM, Alex Shinn wrote:

> On Sun, Jul 1, 2012 at 10:19 PM, Marc Feeley <[email protected]> wrote:
>> The R5RS has the following sequence to sequence conversion procedures:
>> 
>>    list->string, and string->list
>>    list->vector, and vector->list
>> 
>> The R7RS is adding bytevector sequences, but it does not add the conversion 
>> procedures:
>> 
>>    list->bytevector, and bytevector->list
>> 
>> What is the rationale for this inconsistency?
>> 
>> Moreover, the R7RS is adding only the first set of these conversion 
>> procedures:
>> 
>>    vector->string, and string->vector
>>    bytevector->string, and string->bytevector  (not in R7RS)
>>    vector->bytevector, and bytevector->vector  (not in R7RS)
> 
> Actually, we have the second, it's just named
> utf8->string and string->utf8 to emphasize the
> encoding used to convert to and from a bytevector.


Not really.  I expected bytevector->string to be equal to

       (lambda (bv) (list->string (map integer->char (bytevector->list bv))))

which would correspond I guess to a latin1->string functionality with your 
naming Scheme.

Concerning utf8->string and string->utf8, I dislike these procedures for many 
reasons:

1) Very minor point: the official name for this encoding is UTF-8, so it should 
be UTF-8->string and string->UTF-8.

2) The procedures specify in their names the character encoding to use.  But 
there are oodles of character encodings, so for easy extensibility to other 
encodings, it would be better to use a parameter as in (decode-string 
bytevector 'UTF-8) and (encode-string string 'UTF-8) instead of oodles of 
different procedures.

3) The main reason for character encodings is to perform I/O on byte-oriented 
streams.  Yet the only procedures having to do with character encodings in R7RS 
are utf8->string and string->utf8.  This seems wrong.  If textual output could 
be performed on binary ports and the character encoding could be specified when 
the port is opened (as was proposed in SRFI-91, 
http://srfi.schemers.org/srfi-91/srfi-91.html, and implemented in Gambit), then 
the procedures utf8->string and string->utf8 would be superfluous since they 
could be defined easily like this:

    (define (string->utf8 s)
      (let ((port (open-output-bytevector 'UTF-8)))
        (display s port)
        (get-output-bytevector port)))

Marc


_______________________________________________
Scheme-reports mailing list
[email protected]
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports

Re: [Scheme-reports] Sequence to sequence conversion

Reply via email to