On Tue, Sep 21, 2010 at 4:30 AM, Chris McDonough <chr...@plope.com> wrote: > Existing APIs save for "quote" don't really need to deal with charset > encodings at all, at least on any level that Python needs to care about. > The potential already exists to emit garbage which will turn into > mojibake from almost all existing APIs. The only remaining issue seems > to be fear of making a design mistake while designing APIs. > > IMO, having a separate module for all urllib.parse APIs, each designed > for only bytes input is a design mistake greater than any mistake that > could be made by allowing for both bytes and str input to existing APIs > and returning whatever type was passed. The existence of such a module > will make it more difficult to maintain a codebase which straddles > Python 2 and Python 3.
Failure to use quote/unquote correctly is a completely different problem from using bytes with an ASCII incompatible encoding, or mixing bytes with different encodings. Yes, if you don't quote your URLs you may end up with mojibake. That's not a justification for creating a *new* way to accidentally create mojibake. Separating the APIs means that application programmers will be expected to know whether they are working with data formatted for display to the user (i.e. Unicode text) or transfer over the wire (i.e. ASCII compatible bytes). Can you give me a concrete use case where the application programmer won't *know* which format they're working with? Py3k made the conscious decision to stop allowing careless mixing of encoded and unencoded text. This is just taking that philosophy and propagating it further up the API stack (as has already been done with several OS facing APIs for 3.2). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com