On 9/18/2010 10:03 PM, Nick Coghlan wrote: > On Sun, Sep 19, 2010 at 4:18 AM, John Nagle <na...@animats.com> wrote: >> On 9/18/2010 2:29 AM, python-dev-requ...@python.org wrote: >>> >>> Polymorphic best practices [was: (Not) delaying the 3.2 release] >> >> If you're hung up on this, try writing the user-level documentation >> first. Your target audience is a working-level Web programmer, not >> someone who knows six programming languages and has a CS degree. >> If the explanation is too complex, so is the design. >> >> Coding in this area is quite hard to do right. There are >> issues with character set, HTML encoding, URL encoding, and >> internationalized domain names. It's often done wrong; >> I recently found a Google service which botched it. >> Python libraries should strive to deliver textual data to the programmer >> in clean Unicode. If someone needs the underlying wire representation >> it should be available, but not the default. > > Even though URL byte sequences are defined as using only an ASCII > subset, I'm currently inclined to add raw bytes supports to > urlib.parse by providing parallel APIs (i.e. urlib.parse.urlsplitb, > etc) rather than doing it implicitly in the normal functions. > > My rationale is as follows: > - while URLs are *meant* to be encoded correctly as an ASCII subset, > the real world isn't always quite so tidy (i.e. applications treat as > URLs things that technically are not because the encoding is wrong) > - separating the APIs forces the programmer to declare that they know > they're working with the raw bytes off the wire to avoid the > decode/encode overhead that comes with working in the Unicode domain > - easier to change our minds later. Adding implicit bytes support to > the normal names can be done any time, but removing it would require > an extensive deprecation period > > Essentially, while I can see strong use cases for wanting to > manipulate URLs in wire format, I *don't* see strong use cases for > manipulating URLs without *knowing* whether they're in wire format > (encoded bytes) or display format (Unicode text). For some APIs that > work for arbitrary encodings (e.g. os.listdir) switching based on > argument type seems like a reasonable idea. For those that may > silently produce incorrect output for ASCII-incompatible encodings, > the os.environ/os.environb seems like a better approach. > > I could probably be persuaded to merge the APIs, but the email6 > precedent suggests to me that separating the APIs better reflects the > mental model we're trying to encourage in programmers manipulating > text (i.e. the difference between the raw octet sequence and the text > character sequence/parsed data). > That sounds pretty sane and coherent to me.
regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 DjangoCon US September 7-9, 2010 http://djangocon.us/ See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com