Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-23 Thread Baptiste Carvello
Stephen J. Turnbull a écrit : What really saves the day here is not that common encodings just don't do that. It's that even in the case where only syntactically significant bytes in the representation are URL-encoded, they *are* URL-encoded. As long as the parsing library restricts itself to

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-22 Thread Nick Coghlan
On Wed, Sep 22, 2010 at 9:37 AM, Andrew McNamara andr...@object-craft.com.au wrote: Yeah, that's the original reasoning that had me leaning towards the parallel API approach. If I seem to be changing my mind a lot in this thread it's because I'm genuinely torn between the desire to make it easier

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-22 Thread Nick Coghlan
On Wed, Sep 22, 2010 at 12:59 PM, Stephen J. Turnbull step...@xemacs.org wrote: Neil Hodgson writes:      Over time, the set of trail bytes used has expanded - in GB18030   digits are possible although many of the most important characters   for parsing such as ''' #%.?/''' are still safe as

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-21 Thread Nick Coghlan
On Tue, Sep 21, 2010 at 3:03 PM, Stephen J. Turnbull step...@xemacs.org wrote: On the other hand, it is dangerous to provide a polymorphic API which does that more extensive parsing, because a less than paranoid programmer will have very likely allowed the parsed components to escape from the

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-21 Thread Paul Moore
On 21 September 2010 14:38, Nick Coghlan ncogh...@gmail.com wrote: On Tue, Sep 21, 2010 at 3:03 PM, Stephen J. Turnbull step...@xemacs.org wrote: On the other hand, it is dangerous to provide a polymorphic API which [...] Sorry if this is off-topic, but I don't believe I ever saw Stephen's

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-21 Thread Stephen J. Turnbull
Nick Coghlan writes: (Basically, while the issue of programmers assuming 'latin-1' or 'utf-8' or similar ASCII friendly encodings when they shouldn't is real, I don't believe a polymorphic API here will make things any *worse* than what would happen with a parallel API) That depends on

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-21 Thread Barry Warsaw
On Sep 21, 2010, at 04:01 PM, Paul Moore wrote: Sorry if this is off-topic, but I don't believe I ever saw Stephen's email. I have a feeling that's happened a couple of times recently. Before I go off trying to work out why gmail is dumping list mails on me, did anyone else see Stephen's mail via

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-21 Thread Antoine Pitrou
On Wed, 22 Sep 2010 00:10:01 +0900 Stephen J. Turnbull step...@xemacs.org wrote: But I don't know whether the web apps programmers will be satisfied with such a minimal API. Web app programmers will generally go through a framework, which handles encoding/decoding for them (already so in

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-21 Thread Ian Bicking
On Mon, Sep 20, 2010 at 6:19 PM, Nick Coghlan ncogh...@gmail.com wrote: What are the cases you believe will cause new mojibake? Calling operations like urlsplit on byte sequences in non-ASCII compatible encodings and operations like urljoin on byte sequences that are encoded with different

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-21 Thread Paul Moore
On 21 September 2010 16:23, Barry Warsaw ba...@python.org wrote: On Sep 21, 2010, at 04:01 PM, Paul Moore wrote: Sorry if this is off-topic, but I don't believe I ever saw Stephen's email. I have a feeling that's happened a couple of times recently. Before I go off trying to work out why gmail

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-21 Thread Chris McDonough
On Tue, 2010-09-21 at 23:38 +1000, Nick Coghlan wrote: And if this turns out to be a disaster in practice: a) on my head be it; and b) we still have the option of the DeprecationWarning dance for bytes inputs to the existing functions and moving to a parallel API In the case of urllib.parse,

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-21 Thread Nick Coghlan
On Wed, Sep 22, 2010 at 1:10 AM, Stephen J. Turnbull step...@xemacs.org wrote: Nick Coghlan writes:   (Basically, while the issue of programmers assuming 'latin-1' or   'utf-8' or similar ASCII friendly encodings when they shouldn't is   real, I don't believe a polymorphic API here will make

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-21 Thread Nick Coghlan
On Wed, Sep 22, 2010 at 1:57 AM, Ian Bicking i...@colorstudy.com wrote: All this is unrelated to the question, though -- a separate byte-oriented function won't help any case I can think of.  If the programmer is implementing something like

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-21 Thread Neil Hodgson
Ian Bicking: I think the use case everyone has in mind here is where you get a URL from one of these sources, and you want to handle it.  I have a hard time imagining the sequence of events that would lead to mojibake. Naive parsing of a document in bytes couldn't do it, because if you have a

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-21 Thread Andrew McNamara
On the other hand, it is dangerous to provide a polymorphic API which does that more extensive parsing, because a less than paranoid programmer will have very likely allowed the parsed components to escape from the context where their encodings can be reliably determined. =A0Remember, *it is

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-21 Thread Stephen J. Turnbull
Neil Hodgson writes: Over time, the set of trail bytes used has expanded - in GB18030 digits are possible although many of the most important characters for parsing such as ''' #%.?/''' are still safe as they may not be trail bytes in the common double-byte character sets. That's just

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-20 Thread Nick Coghlan
On Mon, Sep 20, 2010 at 2:12 PM, Glyph Lefkowitz gl...@twistedmatrix.com wrote: While I don't like the email6 precedent as such (that there would be different parsed objects, based on whether you started parsing with bytes or with strings), the idea that when you are working directly with bytes

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-20 Thread Chris McDonough
On Sun, 2010-09-19 at 12:03 +1000, Nick Coghlan wrote: On Sun, Sep 19, 2010 at 4:18 AM, John Nagle na...@animats.com wrote: On 9/18/2010 2:29 AM, python-dev-requ...@python.org wrote: Polymorphic best practices [was: (Not) delaying the 3.2 release] If you're hung up on this, try

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-20 Thread Nick Coghlan
On Mon, Sep 20, 2010 at 10:12 PM, Chris McDonough chr...@plope.com wrote: urllib.parse.urlparse/urllib.parse.urlsplit will never need to decode anything when passed bytes input. Correct. Supporting manipulation of bytes directly is primarily a speed hack for when an application wants to avoid

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-20 Thread Chris McDonough
On Mon, 2010-09-20 at 23:23 +1000, Nick Coghlan wrote: On Mon, Sep 20, 2010 at 10:12 PM, Chris McDonough chr...@plope.com wrote: urllib.parse.urlparse/urllib.parse.urlsplit will never need to decode anything when passed bytes input. Correct. Supporting manipulation of bytes directly is

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-20 Thread Nick Coghlan
On Tue, Sep 21, 2010 at 4:30 AM, Chris McDonough chr...@plope.com wrote: Existing APIs save for quote don't really need to deal with charset encodings at all, at least on any level that Python needs to care about. The potential already exists to emit garbage which will turn into mojibake from

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-20 Thread Chris McDonough
On Tue, 2010-09-21 at 07:12 +1000, Nick Coghlan wrote: On Tue, Sep 21, 2010 at 4:30 AM, Chris McDonough chr...@plope.com wrote: Existing APIs save for quote don't really need to deal with charset encodings at all, at least on any level that Python needs to care about. The potential already

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-20 Thread Chris McDonough
On Tue, 2010-09-21 at 08:19 +1000, Nick Coghlan wrote: On Tue, Sep 21, 2010 at 7:39 AM, Chris McDonough chr...@plope.com wrote: On Tue, 2010-09-21 at 07:12 +1000, Nick Coghlan wrote: On Tue, Sep 21, 2010 at 4:30 AM, Chris McDonough chr...@plope.com wrote: Existing APIs save for quote don't

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-20 Thread Nick Coghlan
On Tue, Sep 21, 2010 at 7:39 AM, Chris McDonough chr...@plope.com wrote: On Tue, 2010-09-21 at 07:12 +1000, Nick Coghlan wrote: On Tue, Sep 21, 2010 at 4:30 AM, Chris McDonough chr...@plope.com wrote: Existing APIs save for quote don't really need to deal with charset encodings at all, at

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-19 Thread Glyph Lefkowitz
On Sep 18, 2010, at 10:18 PM, Steve Holden wrote: I could probably be persuaded to merge the APIs, but the email6 precedent suggests to me that separating the APIs better reflects the mental model we're trying to encourage in programmers manipulating text (i.e. the difference between the raw

[Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-18 Thread Nick Coghlan
On Sun, Sep 19, 2010 at 4:18 AM, John Nagle na...@animats.com wrote: On 9/18/2010 2:29 AM, python-dev-requ...@python.org wrote: Polymorphic best practices [was: (Not) delaying the 3.2 release]   If you're hung up on this, try writing the user-level documentation first.  Your target audience

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-18 Thread Steve Holden
On 9/18/2010 10:03 PM, Nick Coghlan wrote: On Sun, Sep 19, 2010 at 4:18 AM, John Nagle na...@animats.com wrote: On 9/18/2010 2:29 AM, python-dev-requ...@python.org wrote: Polymorphic best practices [was: (Not) delaying the 3.2 release] If you're hung up on this, try writing the user-level