"tomer filiba" <[EMAIL PROTECTED]> writes: >> Encoding conversion and newline conversion should be performed a >> block at a time, below buffering, so not only I/O syscalls, but >> also invocations of the recoding machinery are amortized by >> buffering. > > you have a good point, which i also stumbled upon when implementing > the TextInterface. but how would you suggest to solve it?
I've designed and implemented this for my language, but I'm not sure that you will like it because it's quite different from the Python tradition. The interface of block reading appends data to the end of the supplied buffer, up to the specified size (or infinity), and also it tells whether it reached end of data. The interface of block writing removes data from the beginning of the supplied buffer, up to the supplied size (or the whole buffer), and is told how to flush, which includes information whether this is the end of data. Both functions are allowed to read/write less than requested. The recoding engine moves data from the beginning of an input buffer to the end of an output buffer. The block recoding function has similar size parameters as above, and a flushing parameter. It returns True on output overflow, i.e. when it stopped because it needs more room in the output rather than because it needs more input. It leaves unconverted data at the end of the input buffer if data looks incomplete, unless it is told that this is the last block - in this case it fails. Both decoding input streams and encoding output streams have a persistent buffer in the format corresponding to their low end, i.e. a byte buffer when this is the boundary between bytes and characters. This design allows to plug everything together, including the cases where recoding changes sizes significantly (compression/decompression). It also allows reading/writing process to be interrupted without breaking the consistency of the state of buffers, as long as each primitive reading/writing operation is atomic, i.e. anything it removes from the input buffer is converted and put in the output buffer. Data not yet processed by the remaining layers remains in their respective buffers. For example reading a block from a decoding stream: 1. If there was no overflow previously, read more data from the underlying stream to the internal buffer, up to the supplied maximum size. 2. Decode data from the internal buffer to the supplied output buffer, up to the supplied maximum size. Tell the recoding engine that this is the last piece if there was no overflow previously and reading from the underlying stream reached the end. 3. Return True (i.e. end of input) if there was no overflow now and reading from the underlying stream reached the end. Writing a block to an encoding stream is simpler: 1. Encode data from the supplied input buffer to the internal buffer. 2. Write data from the internal buffer to the output stream. Buffered streams are typically put on the top of the stack. They support reading a line at a time, unlimited lookahead and unlimited unreading, and writing which guarantees that it won't leave anything in the buffer it is writing from. Newlines are converted by a separate layer. The buffered stream assumes "\n" endings. -- __("< Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com