Greg Ewing wrote: > If the protocol has been sensibly designed, that shouldn't > happen, since everything up to the coding marker should > be ascii (or some other protocol-defined initial coding).
XML, for one protocol, requires you to restart over. The initial sequence could be UTF-16, or it could be EBCDIC. You read a few bytes (up to four), then know which of these it is. Then you start over, reading further if it looks like an ASCII superset, to find out the real encoding. You normally then start over, although switching at that point could also work. > For protocols that are not sensibly designed (or if you're > just trying to guess) what you suggest may be needed. But > it would be good to have a nicer way of going about it > for when the protocol is sensible. There might be buffering of decoded strings already, (ie. beyond the point to which you have read), so you would need to unbuffer these, and reinterpret them. To support that, you really need to buffer both the original bytes, and the decoded ones, since the encoding might not roundtrip. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com