Greg Ewing wrote: > Guido van Rossum wrote: >> On 2/13/06, Phillip J. Eby <[EMAIL PROTECTED]> wrote: >> >>> At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote: >>> >>>> On 2/13/06, Phillip J. Eby <[EMAIL PROTECTED]> wrote: >>>> >>>> What would bytes("abc\xf0", "latin-1") *mean*? >>> I'm saying that XXX would be the same encoding as you specified. i.e., >>> including an encoding means you are encoding the *meaning* of the string. > > No, this is wrong. As I understand it, the encoding > argument to bytes() is meant to specify how to *encode* > characters into the bytes object. If you want to be able > to specify how to *decode* a str argument as well, you'd > need a third argument.
I'm not sure I understand why this would be needed? But maybe it's still too early to pin anything down. My first impression and thoughts were: (and seems incorrect now) bytes(object) -> byte sequence of objects value Basically a "memory dump" of objects value. And so... object(bytes) -> copy of original object This would reproduce a copy of the original object as long as the from and to object are the same type with no encoding needed. If they are different then you would get garbage, or an error. But that would be a programming error and not a language issue. It would be up to the programmer to not do that. Of course this is one of those easier to say than do concepts I'm sure. And I was thinking a bytes argument of more than one item would indicate a byte sequence. bytes(1,2,3) -> bytes([1,2,3]) Where any values above 255 would give an error, but it seems an explicit list is preferred. And that's fine because it creates a way for bytes to know how to handle everything else. (I think) bytes([1,2,3]] -> bytes[(1,2,3)] Which is fine... so ??? b = bytes(0L) -> bytes([0,0,0,0]) long(b) -> 0L convert it back to 0L And ... b = bytes([0L]) -> bytes([0]) # a single byte int(b) -> 0 convert it back to 0 long(b) -> 0L It's up to the programmer to know if it's safe. Working with raw data is always a programmer needs to be aware of what's going on thing. But would it be any different with strings? You wouldn't ever want to encode one type's bytes into a different type directly. It would be better to just encode it back to the original type, then use *it's* encoding method to change it. so... b = bytes(s) -> bytes( raw sequence of bytes ) Weather or not you get a single byte per char or multiple bytes per character would depend on the strings encoding. s = str(bytes, encoding) -> original string You need to specify it here, because there is more than one sting encoding. To avoid encodings entirely we would need a type for each encoding. (which isn't really avoiding anything) And it's the "raw data so programmer needs to be aware" situation again. Don't decode to something other than what it is. If someone needs automatic encoding/decoding, then they probably should write a class to do what they want. Something roughly like... class bytekeeper(object): b = None t = None e = None def __init__(self, obj, enc='bytes') # or whatever encoding self.e = enc self.t = type(obj) self.b = bytes(obj) def decode(self): ... Would we be able to subclass bytes? class bytekeeper(bytes): ? ... Ok.. enough rambling... I wonder how much of this is way out in left field. ;) cheers, Ronald Adam And as fa In this case the encoding argument would only be needed not to _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com