On Thu, Sep 16, 2010 at 8:42 AM, Toshio Kuratomi <a.bad...@gmail.com> wrote: > On Thu, Sep 16, 2010 at 09:52:48AM -0400, Barry Warsaw wrote: >> On Sep 16, 2010, at 11:28 PM, Nick Coghlan wrote: >> >> >There are some APIs that should be able to handle bytes *or* strings, >> >but the current use of string literals in their implementation means >> >that bytes don't work. This turns out to be a PITA for some networking >> >related code which really wants to be working with raw bytes (e.g. >> >URLs coming off the wire). >> >> Note that email has exactly the same problem. A general solution -- even if >> embodied in *well documented* best-practices and convention -- would really >> help make the stdlib work consistently, and I bet third party libraries too. >> > I too await a solution with abated breath :-) I've been working on > documenting best practices for APIs and Unicode and for this type of > function (take bytes or unicode and output the same type), knowing the > encoding is seems like a requirement in most cases: > > http://packages.python.org/kitchen/designing-unicode-apis.html#take-either-bytes-or-unicode-output-the-same-type > > I'd love to add another strategy there that shows how you can robustly > operate on bytes without knowing the encoding but from writing that, I think > that anytime you simplify your API you have to accept limitations on the > data you can take in. (For instance, some simplifications can handle > anything except ASCII-incompatible encodings).
In all cases I can imagine where such polymorphic functions make sense, the necessary and sufficient assumption should be that the encoding is a superset of 7-bit(*) ASCII. This includes UTF-8, all Latin-N variant, and AFAIK also the popular CJK encodings other than UTF-16. This is the same assumption made by Python's byte type when you use "character-based" methods like lower(). --Guido __________ (*) In my mind ASCII and 7-bit are synonymous, but unfortunately there are droves of naive users who believe that ASCII includes all 256 possible 8-bit bytes using some encoding -- typically the default encoding of their DOS or Windows box. :-( -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com