Re: [Python-Dev] bytes / unicode

Nick Coghlan Sat, 26 Jun 2010 19:54:16 -0700

On Sun, Jun 27, 2010 at 4:17 AM, P.J. Eby <[email protected]> wrote:
> The idea that I'm proposing is that the basic string and byte types should
> defer to "user-defined" string types for mixed type operations, so that
> polymorphism of string-manipulation functions is the *default* case, rather
> than a *special* case.  This makes tainting easier to implement, as well as
> optimizing and other special cases (like my "source string w/file and line
> info", or a string with font/formatting attributes).


Rather than building this into the base string type, perhaps it would
be better (at least initially) to add in a polymorphic str subtype
that worked along the following lines:

1. Has an encoded argument in the constructor (e.g. poly_str("/", encoded=b"/")
2. If given objects with an encode() method, assumes they're strings
and uses its own parent class methods
3. If given objects with a decode() method, assumes they're encoded
and delegates to the encoded attribute

str/bytes agnostic functions would need to invoke poly_str
deliberately, while bytes-only and text-only algorithms could just use
the appropriate literals.

Third party types would be supported to some degree (by having either
encode or decode methods), although they could still run into trouble
with some operations (While full support for third party strings and
byte sequence implementations is an interesting idea, I think it's
overkill for the specific problem of making it easier to write
str/bytes agnostic functions for tasks like URL parsing).

Regards,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

Reply via email to