On Jun 21, 2010, at 12:34 PM, Toshio Kuratomi wrote: >I like the idea of having encoding information carried with the data. >I don't think that an ebytes type that can *optionally* have an encoding >attribute makes the situation less confusing, though.
Agreed. I think the attribute should always be there, but there probably needs to be a magic value (perhaps None) that indicates and unknown, manual, garbage, error, broken encoding. Examples: you read bytes off a socket and don't know what the encoding is; you concatenate two ebytes that have incompatible encodings. >To me the biggest >problem with python-2.x's unicode/bytes handling was not that it threw >exceptions but that it didn't always throw exceptions. You might test this >in python2:: > t = u'cafe' > function(t) > >And say, ah my code works. Then a user gives it this:: > t = u'café' > function(t) > >And get a unicode error because the function only works with unicode in the >ascii range. That's an excellent point. >ebytes seems to have the same pitfall where the code path exercised by your >tests could work with:: > eb = ebytes(b) > eb.encoding = 'euc-jp' > function(eb) > >but the user exercises a code path that does this and fails:: > eb = ebytes(b) > function(eb) > >What do you think of making the encoding attribute a mandatory part of >creating an ebyte object? (ex: ``eb = ebytes(b, 'euc-jp')``). If ebytes is a separate type, then definitely +1. If 'ebytes is bytes' then I'd probably want to default the second argument to the magical "i-don't-know' marker. -Barry
signature.asc
Description: PGP signature
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com