On Tue, Jan 11, 2011 at 3:24 AM, Ian Bicking <i...@colorstudy.com> wrote: > > The kind of object PJE was referring to is more like Ruby's strings, which > do not embed the encoding inside the bytes themselves but have the encoding > as a kind of annotation on the bytes, and do lazy transcoding when combining > strings of different encodings. The goal with respect to WSGI is that you > could annotate bytes with an encoding but also change or fix that encoding > if other out-of-band information implied that you got the encoding wrong > (e.g., some data is submitted with the encoding of the page the browser was > on, and so nothing inside the request itself will indicate the encoding of > the data). Latin1 is kind of the poor man's version of this -- it's a good > guess at an encoding, that at worst requires transcoding that can be done in > a predictable way. (Personally I think Latin1 gets us 99% of the way there, > and so bytes-of-a-known-encoding are not really that important to the WSGI > case.)
Having done the upgrade to urllib to support direct manipulation of byte sequences, I don't think such a type would help as much people hoped anyway. Converting to Unicode, manipulating as text and converting back really *is* the right way to do text manipulation (however, providing bytes-in-bytes-out APIs that do the conversions for you can also be quite convenient). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com