On Tue, Jan 11, 2011 at 3:24 AM, Ian Bicking <i...@colorstudy.com> wrote:
>
> The kind of object PJE was referring to is more like Ruby's strings, which
> do not embed the encoding inside the bytes themselves but have the encoding
> as a kind of annotation on the bytes, and do lazy transcoding when combining
> strings of different encodings.  The goal with respect to WSGI is that you
> could annotate bytes with an encoding but also change or fix that encoding
> if other out-of-band information implied that you got the encoding wrong
> (e.g., some data is submitted with the encoding of the page the browser was
> on, and so nothing inside the request itself will indicate the encoding of
> the data).  Latin1 is kind of the poor man's version of this -- it's a good
> guess at an encoding, that at worst requires transcoding that can be done in
> a predictable way.  (Personally I think Latin1 gets us 99% of the way there,
> and so bytes-of-a-known-encoding are not really that important to the WSGI
> case.)

Having done the upgrade to urllib to support direct manipulation of
byte sequences, I don't think such a type would help as much people
hoped anyway. Converting to Unicode, manipulating as text and
converting back really *is* the right way to do text manipulation
(however, providing bytes-in-bytes-out APIs that do the conversions
for you can also be quite convenient).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to