On Sun, Jun 20, 2010 at 2:40 PM, P.J. Eby <p...@telecommunity.com> wrote: > At 10:57 AM 6/20/2010 -0700, Guido van Rossum wrote: >> >> The problem comes exactly where you find it: when *porting* existing >> code that uses aforementioned ways to alleviate the pain, you find >> that the hacks no longer work and a properly layered design is needed >> that clearly distinguishes between which variables contain bytes and >> which text. > > Actually, I would say that it's more that (in the network protocol case) we > *have* bytes, some of which we would like to *treat* as text, yet do not > wish to constantly convert back and forth to full-blown unicode -- > especially since the protocols themselves designate ASCII or latin-1 at the > transport layer (sometimes with odder encodings above, but these already > have to be explicitly dealt with by existing code). > > While reading over this thread, I'm wondering whether at least my > (WSGI-related) problems in this area would be solved by the availability of > a type (say "bstr") that was simply a wrapper providing string-like behavior > over an underlying bytes, byte array, or memoryview, that would produce > objects of compatible type when combined with strings (by encoding them to > match). > > Then, I could wrap bytes with it to pass them to string operations, and then > feed them back into everything else. The bstr type ideally would be > directly compatible with bytes I/O, or at least have a .bytes attribute that > would be. > > It seems like that would reduce WSGI porting issues quite a bit, since it > would mostly consist of throwing extra bstr() calls in where things are > breaking, and maybe grabbing the .bytes attribute for I/O. > > This approach would still be explicit as to what types you're working with, > but would not require O(n) *conversions* at every interaction boundary. It > would be limited, of course, to single-byte encodings with all characters > (0-255) valid. > > OTOH, maybe there should just be a bytestrings module with bytestrings.ascii > and bytestrings.latin1, and between the two that should cover the network > protocol needs quite well. > > Actually, if the Python 3 str() constructor could do O(1) conversion for the > latin-1 case (i.e., just wrapped the underlying bytes), I would just put, > "bstr = lambda x: str(x,'latin-1')" at the top of my programs and have > roughly the same effect. > > This idea is still a bit half-baked, but a more baked version might be just > the ticket for porting stuff that used str to work with bytes in 2.x, if > only because writing, e.g.: > > newurl = bstr(urljoin(bstr(base), 'subdir')) > > seems so much saner than writing *this* everywhere: > > newurl = str(urljoin(str(base, 'latin-1'), 'subdir'), 'latin-1') > > It is perhaps a bit late to propose this idea, since ideally we would also > want to use it in 2.x to aid porting. But I'm curious if any other people > here experiencing byte/unicode woes in relation to network protocols would > find this a solution to their chief frustration. (i.e., that the stdlib > often insists now on strings, where effectively bytes were usable before, > and thus one must do conversions both coming and going.) >
I hate to reply with a simple +1 - but I've heard this pain and proposal from a frightening number of people, something which allowed you to use bytes with some of the sting methods would go a really long way to solving a lot of peoples python 3 pain. I don't relish the idea that once people start moving over, there might be a billion implementations of "things like this". jesse _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com