On Jun 21, 2010, at 12:34 PM, Toshio Kuratomi wrote:

>I like the idea of having encoding information carried with the data.
>I don't think that an ebytes type that can *optionally* have an encoding
>attribute makes the situation less confusing, though.

Agreed.  I think the attribute should always be there, but there probably
needs to be a magic value (perhaps None) that indicates and unknown, manual,
garbage, error, broken encoding.

Examples: you read bytes off a socket and don't know what the encoding is; you
concatenate two ebytes that have incompatible encodings.

>To me the biggest
>problem with python-2.x's unicode/bytes handling was not that it threw
>exceptions but that it didn't always throw exceptions.  You might test this
>in python2::
>    t = u'cafe'
>    function(t)
>
>And say, ah my code works.  Then a user gives it this::
>    t = u'café'
>    function(t)
>
>And get a unicode error because the function only works with unicode in the
>ascii range.

That's an excellent point.

>ebytes seems to have the same pitfall where the code path exercised by your
>tests could work with::
>    eb = ebytes(b)
>    eb.encoding = 'euc-jp'
>    function(eb)
>
>but the user exercises a code path that does this and fails::
>    eb = ebytes(b)
>    function(eb)
>
>What do you think of making the encoding attribute a mandatory part of
>creating an ebyte object?  (ex: ``eb = ebytes(b, 'euc-jp')``).

If ebytes is a separate type, then definitely +1.  If 'ebytes is bytes' then
I'd probably want to default the second argument to the magical "i-don't-know'
marker.

-Barry

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to