Josiah Carlson wrote: > Ron Adam <[EMAIL PROTECTED]> wrote:
> Except that ambiguates it even further. > > Is encodings.tounicode() encoding, or decoding? According to everything > you have said so far, it would be decoding. But if I am decoding binary > data, why should it be spending any time as a unicode string? What do I > mean? Encoding and decoding are relative concepts. It's all encoding from one thing to another. Weather it's "decoding" or "encoding" depends on the relationship of the current encoding to a standard encoding. The confusion introduced by "decode" is when the 'default_encoding' changes, will change, or is unknown. > x = f.read() #x contains base-64 encoded binary data > y = encodings.to_unicode(x, 'base64') > > y now contains BINARY DATA, except that it is a unicode string No, that wasn't what I was describing. You get a Unicode string object as the result, not a bytes object with binary data. See the toy example at the bottom. > z = encodings.to_str(y, 'latin-1') > > Later you define a str_to_str function, which I (or someone else) would > use like: > > z = str_to_str(x, 'base64', 'latin-1') > > But the trick is that I don't want some unicode string encoded in > latin-1, I want my binary data unencoded. They may happen to be the > same in this particular example, but that doesn't mean that it makes any > sense to the user. If you want bytes then you would use the bytes() type to get bytes directly. Not encode or decode. binary_unicode = bytes(unicode_string) The exact byte order and representation would need to be decided by the python developers in this case. The internal representation 'unicode-internal', is UCS-2 I believed. >> It's no more ambiguous than any math >> operation where you can do it one way with one operations and get your >> original value back with the same operation by using an inverse value. >> >> n2=n+1; n3=n+(-1); n==n3 >> n2=n*2; n3=n*(.5); n==n3 > > Ahh, so you are saying 'to_base64' and 'from_base64'. There is one > major reason why I don't like that kind of a system: I can't just say > encoding='base64' and use str.encode(encoding) and str.decode(encoding), > I necessarily have to use, str.recode('to_'+encoding) and > str.recode('from_'+encoding) . Seems a bit awkward. Yes, but the encodings API could abstract out the 'to_base64' and 'from_base64' so you can just say 'base64' and have it work either way. Maybe a toy "incomplete" example might help. # in module bytes.py or someplace else. class bytes(list): """ bytes methods defined here """ # in module encodings.py # using a dict of lists, but other solutions would # work just as well. unicode_codecs = { 'base64': ('from_base64', 'to_base64'), } def tounicode(obj, from_codec): b = bytes(obj) b = b.recode(unicode_codecs[from_codec][0]) return unicode(b) def tostr(obj, to_codec): b = bytes(obj) b = b.recode(unicode_codecs[to_codec][1]) return str(b) # in your application import encodings ... a bunch of code ... u = encodings.tounicode(s, 'base64') # or if going the other way s = encodings.tostr(u, 'base64') Does this help? Is the relationship between the bytes object and the encodings API clearer here? If not maybe we should discuss it further off line. Cheers, Ronald Adam _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com