> As far as Python 3 goes, I honestly have not yet familiarized myself > with the changes to the IO infrastructure and what the new idioms are. > At this time, I can't make any educated decisions with regard to how > it should be done because I don't know exactly how bytes are supposed > to work and what the common idioms are for other libraries in the > stdlib that do similar things.
It's really very similar to 2.x: the "bytes" type is to used in all interfaces that operate on byte sequences that may or may not represent characters; in particular, for interface where the operating system deliberately uses bytes - ie. low-level file IO and socket IO; also for cases where the encoding is embedded in the stream that still needs to be processed (e.g. XML parsing). (Unicode) strings should be used where the data is truly text by nature, i.e. where no encoding information is necessary to find out what characters are intended. It's used on interfaces where the encoding is known (e.g. text IO, where the encoding is specified on opening, XML parser results, with the declared encoding, and GUI libraries, which naturally expect text). > Until I figure that out, someone else > is better off making decisions about the Python 3 version. Some of us can certainly explain to you how this is supposed to work. However, we need you to check any assumption against the known use cases - would the users of the module be happy if it worked one way or the other? > My guess is > that it should work the same way as it does in Python 2.x: take bytes > or unicode input in loads (which means encoding is still relevant). I > also think the output of dumps should also be bytes, since it is a > serialization, but I am not sure how other libraries do this in Python > 3 because one could argue that it is also text. This, indeed, had been an endless debate, and, in the end, the decision was somewhat arbitrary. Here are some examples: - base64.encodestring expects bytes (naturally, since it is supposed to encode arbitrary binary data), and produces bytes (debatably) - binascii.b2a_hex likewise (expect and produce bytes) - pickle.dumps produces bytes (uniformly, both for binary and text pickles) - marshal.dumps likewise - email.message.Message().as_string produces a (unicode) string (see Barry's recent thread on whether that's a good thing; the email package hasn't been fully ported to 3k, either) - the XML libraries (continue to) parse bytes, and produce Unicode strings - for the IO libraries, see above > If other libraries > that do text/text encodings (e.g. binascii, mimelib, ...) use str for > input and output See above - most of them don't; mimetools is no longer (replaced by email package) > instead of bytes then maybe Antoine's changes are the > right solution and I just don't know better because I'm not up to > speed with how people write Python 3 code. There isn't too much fresh end-user code out there, so we can't really tell, either. As for standard library users - users will do whatever the library forces them to do. This is why I'm so concerned about this issue: we should get it right, or not done at all. I still think you would be the best person to determine what is right. > I'll do my best to find some time to look into Python 3 more closely > soon, but thus far I have not been very motivated to do so because > Python 3 isn't useful for us at work and twiddling syntax isn't a very > interesting problem for me to solve. And I didn't expect you to - it seems people are quite willing to do the actual work, as long as there is some guidance. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com