Alaric Snell-Pym scripsit: > The behaviour of read-char in terms of read-octet will need careful > specifying for funny encodings, mind; some encodings have control > characters that shift modes and the like, but aren't part of any > character, so the byte on which a character boundary sits is a bit > vague. I guess the best approach to that is to say that read-char > reads 0 or more non-character octets, if present, then reads enough > octets to decode one character, and anything it's buffered, it shares > the buffer with read-octet.
That *sounds* good, but it's horribly slow in practice, and interpreters (without JITs) will suffer especially badly from it. Character encoding/decoding needs to be done in big buffers for the same reason that actual I/O does. Making those buffers the same buffer is horribly messy: if the internal character format is UTF-16 and the file encoding is ASCII, you need twice as big a decoding buffer as the I/O buffer to get any decent efficiency at all. > This will run into issues with any hypothetical character encoding > that uses sub-octet character boundaries, but that can be dealt with > too, I think: if you do a read-octet when the character reader is in > mid-octet, then the spare bits are discarded and you get the next octet. Character encodings can be weird, but not *that* weird. Bit-level compression, when present, is usually expanded/compressed by a layer between binary I/O and character I/O. -- A rose by any other name John Cowan may smell as sweet, http://www.ccil.org/~cowan but if you called it an onion [email protected] you'd get cooks very confused. --RMS _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
