Reinhold Birkenfeld wrote: > FWIW, I've already drafted a patch for the former. It lets you write to > file.encoding and honors this when writing Unicode strings to it.
I don't like that approach. You shouldn't be allowed to change the encoding mid-stream (except perhaps under very specific circumstances). As I see it, the buffer of an encoded file becomes split, atleast for input: there are bytes which have been read and not yet decoded, and there are characters which have been decoded but not yet consumed. If you change the encoding mid-stream, you would have to undo decoding that was already done, resetting the stream to the real "current" position. For output, the situation is similar: before changing to a new encoding, or before changing from unicode output to byte output, you have to flush then codec first: it may be that the codec has buffered some state which needs to be completely processed first before a new codec can be applied to the stream. Another issue is seeking: given the many different kinds of buffers, seeking becomes fairly complex. Ideally, seeking should apply to application-level positions, ie. if when you tell the current position, it should be in terms of data already consumed by the application. Perhaps seeking in an encoded stream should not be supported at all. Finally, you also have to consider Universal Newlines: you can apply them either on the byte stream, or on the character stream. I think conceptually right would be to do universal newlines on the character stream. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com