At 11:08 AM 2/14/2006 -0500, James Y Knight wrote: >On Feb 14, 2006, at 1:52 AM, Martin v. Löwis wrote: > >>Phillip J. Eby wrote: >>>I was just pointing out that since byte strings are bytes by >>>definition, >>>then simply putting those bytes in a bytes() object doesn't alter the >>>existing encoding. So, using latin-1 when converting a string to >>>bytes >>>actually seems like the the One Obvious Way to do it. >> >>This is a misconception. In Python 2.x, the type str already *is* a >>bytes type. So if S is an instance of 2.x str, bytes(S) does not need >>to do any conversion. You don't need to assume it is latin-1: it's >>already bytes. >> >>>In fact, the 'encoding' argument seems useless in the case of str >>>objects, >>>and it seems it should default to latin-1 for unicode objects. >> >>I agree with the former, but not with the latter. There shouldn't be a >>conversion of Unicode objects to bytes at all. If you want bytes from >>a Unicode string U, write >> >> bytes(U.encode(encoding)) > >I like it, it makes sense. Unicode strings are simply not allowed as >arguments to the byte constructor. Thinking about it, why would it be >otherwise? And if you're mixing str-strings and unicode-strings, that >means the str-strings you're sometimes giving are actually not byte >strings, but character strings anyhow, so you should be encoding >those too. bytes(s_or_U.encode('utf-8')) is a perfectly good spelling.
Actually, I think you mean: if isinstance(s_or_U, str): s_or_U = s_or_U.decode('utf-8') b = bytes(s_or_U.encode('utf-8')) Or maybe: if isinstance(s_or_U, unicode): s_or_U = s_or_U.encode('utf-8') b = bytes(s_or_U) Which is why I proposed that the boilerplate logic get moved *into* the bytes constructor. I think this use case is going to be common in today's Python, but in truth I'm not as sure what bytes() will get used *for* in today's Python. I'm probably overprojecting based on the need to use str objects now, but bytes aren't going to be a replacement for str for a good while anyway. >Kill the encoding argument, and you're left with: > >Python2.X: >- bytes(bytes_object) -> copy constructor >- bytes(str_object) -> copy the bytes from the str to the bytes object >- bytes(sequence_of_ints) -> make bytes with the values of the ints, >error on overflow > >Python3.X removes str, and most APIs that did return str return bytes >instead. Now all you have is: >- bytes(bytes_object) -> copy constructor >- bytes(sequence_of_ints) -> make bytes with the values of the ints, >error on overflow > >Nice and simple. I could certainly live with that approach, and it certainly rules out all the "when does the encoding argument apply and when should it be an error to pass it" questions. :) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com