Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-08-07 Thread Guido van Rossum
FWIW, the rest of this discussion is now happening in the tracker: http://bugs.python.org/issue3300. We could really use some feedback from Python users in Asian countries. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-08-07 Thread Matt Giuca
Wow .. a lot of replies today! On Thu, Aug 7, 2008 at 2:09 AM, "Martin v. Löwis" <[EMAIL PROTECTED]>wrote: > It hasn't been given priority: There are currently 606 patches in the > tracker, many fixing bugs of some sort. It's not clear (to me, at least) > why this should be given priority over al

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-08-06 Thread Guido van Rossum
On Wed, Aug 6, 2008 at 9:09 AM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: >> Nobody's been >> assigned to look at it and it hasn't been given a priority, even though >> we all agree it's a bug (though we disagree on how to fix it). > > This I can explain (I think). Nobody is assigned to look: we

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-08-06 Thread M.-A. Lemburg
On 2008-08-06 18:55, Antoine Pitrou wrote: Martin v. Löwis v.loewis.de> writes: URLs are just not made for non-ASCII characters. Perhaps they are not, but every non-English wiki (just to take a simple, generic example) potentially contains non-ASCII URLs. e.g. http://fr.wikipedia.org/wiki/%C3

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-08-06 Thread Martin v. Löwis
>> Implement IRIs if you want non-ASCII characters; the rules are much clearer > for these. > > I think most people would expect something which works with the current World > Wide Web rather than a rigorous implementation of a specific RFC. Implementing > RFCs is fine but it does not magically el

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-08-06 Thread Antoine Pitrou
Martin v. Löwis v.loewis.de> writes: > URLs are just not made for non-ASCII characters. Perhaps they are not, but every non-English wiki (just to take a simple, generic example) potentially contains non-ASCII URLs. e.g. http://fr.wikipedia.org/wiki/%C3%89l%C3%A9phant http://wiki.python.org/moin/J

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-08-06 Thread Martin v. Löwis
> Nobody's been > assigned to look at it and it hasn't been given a priority, even though > we all agree it's a bug (though we disagree on how to fix it). This I can explain (I think). Nobody is assigned to look: we usually don't do assignments of bugs or patches, except when there is a specific m

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-08-06 Thread Bill Janssen
I suggest we continue this discussion, if at all, on the bug-tracker, where there's code, and more participants. http://bugs.python.org/issue3300 I've now posted my idea of how quote/unquote should work in py3K, there. Bill ___ Python-Dev mailing list

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-08-06 Thread Scott Dial
André Malo wrote: * Matt Giuca wrote: We've reached, to quote Guido, "as close as consensus as we can get on this issue". There are a lot of quotes around. Including "After the most recent flurry of discussion I've lost track of what's the right thing to do." But I don't talk for other peopl

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-08-06 Thread Matt Giuca
> There are a lot of quotes around. Including "After the most recent flurry > of > discussion I've lost track of what's the right thing to do." > But I don't talk for other people. > OK .. let me compose myself a little. Sorry I went ahead and assumed this was closed. It's just frustrating to me

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-08-06 Thread André Malo
* Matt Giuca wrote: > > This whole discussion circles too much, I think. Maybe it should be > > pepped? > > The issue isn't circular. It's been patched and tested, then a whole lot > of people agreed including Guido. Then you and Bill wanted the bytes > functionality back. So I wrote that in ther

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-08-06 Thread Matt Giuca
> This whole discussion circles too much, I think. Maybe it should be pepped? The issue isn't circular. It's been patched and tested, then a whole lot of people agreed including Guido. Then you and Bill wanted the bytes functionality back. So I wrote that in there too, and Bill at least said that

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-08-06 Thread André Malo
* Bill Janssen wrote: > > I'm far less concerned about > > the decision with regards to unquote_to_bytes/quote_from_bytes, as > > those are new features which can wait. > > Forgive me, but those are the *old* features, which must be there. This whole discussion circles too much, I think. Maybe

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-08-05 Thread Bill Janssen
> I'm far less concerned about > the decision with regards to unquote_to_bytes/quote_from_bytes, as those are > new features which can wait. Forgive me, but those are the *old* features, which must be there. Bill ___ Python-Dev mailing list Python-Dev@p

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-08-05 Thread Matt Giuca
> After the most recent flurry of discussion I've lost track of what's > the right thing to do. I also believe it was said it should wait until > 2.7/3.0, so there's no hurry (in fact there's no way to check it -- we > don't have branches for those versions yet). > I assume you mean 2.7/3.1. I've

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-08-05 Thread Guido van Rossum
After the most recent flurry of discussion I've lost track of what's the right thing to do. I also believe it was said it should wait until 2.7/3.0, so there's no hurry (in fact there's no way to check it -- we don't have branches for those versions yet). On Tue, Aug 5, 2008 at 5:47 AM, Matt Giuca

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-08-05 Thread Matt Giuca
Has anyone had time to look at the patch for this issue? It got a lot of support about a week ago, but nobody has replied since then, and the patch still hasn't been assigned to anybody or given a priority. I hope I've complied with all the patch submission procedures. Please let me know if there

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-31 Thread Matt Giuca
> so you can use quote_from_bytes on strings? Yes, currently. > I assumed Guido meant it was okay to have quote accept string/byte input and > have a function that was redundant but limited in what it accepted (i.e. > quote_from_bytes accepts only bytes) > > I suppose your implementation doesn'

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-31 Thread Jeff Hall
> > > quote_from_bytes = quote >> > > So either name can be used on either input type, with the idea being that > you should use quote on a str, and quote_from_bytes on a bytes. Is this a > good idea or should it be rewritten so each function permits only one input > type? > > so you can use quote_

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-31 Thread Matt Giuca
Bill wrote: I'm not sure that's sufficient review, though I agree it's necessary. > The major consumers of quote/unquote are not in the Python standard > library. I figured that Python 3.0 is designed to fix things, with the breaking third-party code being an acceptable side-effect of that. So t

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-31 Thread Matt Giuca
Alright, I've uploaded the new patch which adds the two requested bytes-oriented functions, as well as accompanying docs and tests. http://bugs.python.org/issue3300 http://bugs.python.org/file11009/parse.py.patch6 I'd rather have two pairs of functions, so that those who want to give > the readers

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-31 Thread Stephen J. Turnbull
Bill Janssen writes: > > A quoting function that accepts bytes *must* have an encoding > > argument. > > Huh? What would it use it for? Ah, you're right. I was thinking in terms of an URI builder, where the quoter would do any required conversion (eg, if the bytes represented a string in J

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-31 Thread Bill Janssen
Also see . Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-31 Thread Bill Janssen
> Of course, it's un-Pythonic to enforce pedantry, and we pedants can > use a string->string encoder correctly. Sure. All I was asking was that we not break the existing usage of the standard library "unquote" by producing a string by *assuming* a UTF-8 encoded string is what's in those percent-e

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-31 Thread Bill Janssen
> Guido says: > > > Actually, we'd need to look at the various other APIs in Py3k before we can > > decide whether these should be considered taking or returning bytes or text. > > It looks like all other APIs in the Py3k version of urllib treat URLs as > > text. > > > Yes, as I said in the bug

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Stephen J. Turnbull
Matt Giuca writes: > OK, for all the people who say URI encoding does not encode characters: yes > it does. This is not an encoding for binary data, it's an encoding for > character data, but it's unspecified how the strings map to octets before > being percent-encoded. In other words, it's a

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Guido van Rossum
On Wed, Jul 30, 2008 at 8:49 PM, Matt Giuca <[EMAIL PROTECTED]> wrote: > >> Con: URI encoding does not encode characters. > > OK, for all the people who say URI encoding does not encode characters: yes > it does. This is not an encoding for binary data, it's an encoding for > character data, but it

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Matt Giuca
> Con: URI encoding does not encode characters. OK, for all the people who say URI encoding does not encode characters: yes it does. This is not an encoding for binary data, it's an encoding for character data, but it's unspecified how the strings map to octets before being percent-encoded. From R

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Bill Janssen
> I think this is as close as consensus as we can get on this issue. Can > whoever wrote the patch adjust the patch to this outcome? (I think the > only change is to remove the encoding arguments and make separate > functions for bytes.) This is 2.7/3.1 only, right? I'm looking at the bales of co

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Guido van Rossum
On Wed, Jul 30, 2008 at 12:49 PM, Bill Janssen <[EMAIL PROTECTED]> wrote: >> > unquote() -- takes string, produces bytes or string >> > >> > If optional "encoding" parameter is specified, decodes bytes with >> > that encoding and returns string. Otherwise, returns bytes. >> >> The default

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Bill Janssen
> > unquote() -- takes string, produces bytes or string > > > > If optional "encoding" parameter is specified, decodes bytes with > > that encoding and returns string. Otherwise, returns bytes. > > The default of returning bytes will break almost all uses. Most code > will uses the unquo

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Jeff Hall
> > > (Aside: I dislike functions that have a different return type based on > the value of a parameter.) > > I wanted to stay out of the whole discussion as it's largely over my head... But I did want to express support for this idea which I think almost rises to the level of a standard... I see m

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Guido van Rossum
On Wed, Jul 30, 2008 at 10:33 AM, Bill Janssen <[EMAIL PROTECTED]> wrote: >> It looks like all other APIs in the Py3k version of >> urllib treat URLs as text. > > The URL is text, a string of ASCII characters. We're just talking > about urllib.quote() and urllib.unquote(), which are there to suppo

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Bill Janssen
> It looks like all other APIs in the Py3k version of > urllib treat URLs as text. The URL is text, a string of ASCII characters. We're just talking about urllib.quote() and urllib.unquote(), which are there to support the text-ization of binary values, and the de-text-ization. > I think that wo

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Bill Janssen
> Actually (as I pointed out before) the existing functions are not > string-in/string-out. They are something-in and bytes-out. Sorry, this is wrong. "quote" is clearly bytes-in and string-out. "unquote" is clearly string-in and bytes-out. The whole point of "quote" is to take an arbitrary seq

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Guido van Rossum
On Wed, Jul 30, 2008 at 9:52 AM, Bill Janssen <[EMAIL PROTECTED]> wrote: >> On Wed, Jul 30, 2008 at 8:09 AM, André Malo <[EMAIL PROTECTED]> wrote: >> > I'm actually in favour of encoding bytes only back and forth. A useful >> > extension would be *another* function which wraps quote/unquote and enc

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Bill Janssen
> On Wed, Jul 30, 2008 at 8:09 AM, André Malo <[EMAIL PROTECTED]> wrote: > > I'm actually in favour of encoding bytes only back and forth. A useful > > extension would be *another* function which wraps quote/unquote and encod= > es > > and decodes characters. > > I'd reverse this. By all means, ad

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Bill Janssen
> For unquote, I think it will break a lot and surprise everyone. I > think that while this may be "purely" the best option, it's pretty > silly. I don't mind being silly to do the right thing. Happens to me a lot :-). Bill ___ Python-Dev mailing list

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Guido van Rossum
On Wed, Jul 30, 2008 at 8:09 AM, André Malo <[EMAIL PROTECTED]> wrote: > I'm actually in favour of encoding bytes only back and forth. A useful > extension would be *another* function which wraps quote/unquote and encodes > and decodes characters. I'd reverse this. By all means, add a new pair of

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread André Malo
[I was pretty busy these days, so sorry for jumping in late again] * Matt Giuca wrote: > 1. Leave it as it is. quote is Latin-1 if range(0,256), fallback to > UTF-8. unquote is Latin-1. > In favour: Anybody who doesn't reply to this thread > Pros: Already implemented; some existing code depends

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Antoine Pitrou
Facundo Batista gmail.com> writes: > > 2008/7/30 Matt Giuca gmail.com>: > > > 2. Default to UTF-8. > > In favour: Matt Giuca, Brett Cannon, Jeroen Ruigrok van der Werven > > Pros: Fully working and tested solution is implemented; recommended by > > RFC 3986 for all future schemes; recommended

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Facundo Batista
2008/7/30 Matt Giuca <[EMAIL PROTECTED]>: > 2. Default to UTF-8. > In favour: Matt Giuca, Brett Cannon, Jeroen Ruigrok van der Werven > Pros: Fully working and tested solution is implemented; recommended by > RFC 3986 for all future schemes; recommended by W3C for use with HTML; > UTF-8 used by al

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Oleg Broytmann
On Thu, Jul 31, 2008 at 12:11:40AM +1000, Matt Giuca wrote: > 2. Default to UTF-8. > In favour: Matt Giuca, Brett Cannon, Jeroen Ruigrok van der Werven Count me too: +1. Most sites I use theese days use UTF-8 for URL encoding. Examples: Wikipedia: http://ru.wikipedia.org/wiki/%D0%93%D0%B2%D0%B

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Matt Giuca
Arg! Damnit, why do my replies get split off from the main thread? Sorry about any confusion this may be causing. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mai

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-30 Thread Matt Giuca
Hi folks, This issue got some attention a few weeks back but it seems to have fallen quiet, and I haven't had a good chance to sit down and reply again till now. As I've said before this is a serious issue which will affect a great deal of code. However it's obviously not as clear-cut as I origin

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-14 Thread Bill Janssen
>> Clearly the unquote is str->bytes, You can't pass a Unicode string >> back >> as the result of unquote *without* passing in an encoding specifier, >> because the character set is application-specific. > So for unquote you're suggesting that it always return a bytes object > UNLESS an encoding

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-13 Thread Matt Giuca
On Mon, Jul 14, 2008 at 4:54 AM, André Malo <[EMAIL PROTECTED]> wrote: > > Ahem. The HTTP standard does ;-) > Really? Can you include a quotation please? The HTTP standard talks a lot about ISO-8859-1 (Latin-1) in terms of actually raw encoded bytes, but not in terms of URI percent-encoding (a di

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-13 Thread Bill Janssen
> Ah there may be some confusion here. We're only dealing with str->str > transformations (which in Python 3 means Unicode strings). You can't put a > bytes in or get a bytes out of either of these functions. I suggested a > "quote_raw" and "unquote_raw" function which would let you do this. Ah, w

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-13 Thread André Malo
* Matt Giuca wrote: > > This POV is way too browser-centric... > > This is but one example. Note that I found web forms to be the least > clear-cut example of choosing an encoding. Most of the time applications > seem to be using UTF-8, and all the standards I have read are moving > towards specif

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-12 Thread Matt Giuca
> This POV is way too browser-centric... > This is but one example. Note that I found web forms to be the least clear-cut example of choosing an encoding. Most of the time applications seem to be using UTF-8, and all the standards I have read are moving towards specifying UTF-8 (from being unspeci

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-12 Thread André Malo
* Matt Giuca wrote: > Well from what I've seen, the only time Latin-1 naturally appears on the > net is when you have a web page in Latin-1 (either explicit or inferred; > and note that a browser like Firefox will infer Latin-1 if it sees only > ASCII characters) with a form in it. Submitting the

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-12 Thread Matt Giuca
Thanks for all the replies, and making me feel welcome :) > > If what you are saying is true, then it can probably go in as a bug > fix (unless someone else knows something about Latin-1 on the Net that > makes this not true). > Well from what I've seen, the only time Latin-1 naturally appears on

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-12 Thread Martin v. Löwis
> Very nice, I had this somewhere on my todo list to work on. I'm very much > in favour, especially since it synchronizes us with the RFCs (for all I > remember reading about it last time). I still think that it doesn't. The RFCs haven't changed, and can't change for compatibility reasons. The enc

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-12 Thread Jeroen Ruigrok van der Werven
-On [20080712 19:27], Matt Giuca ([EMAIL PROTECTED]) wrote: >Basically, urllib.quote and unquote seem not to have been updated since Python >2.5, and because of this they implicitly perform Latin-1 encoding and decoding >(with respect to percent-encoded characters). I think they should default to >

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-12 Thread Bill Janssen
> Basically, urllib.quote and unquote seem not to have been updated since > Python 2.5, and because of this they implicitly perform Latin-1 encoding and > decoding (with respect to percent-encoded characters). I think they should > default to UTF-8 for a number of reasons, including that's what oth

Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-12 Thread Brett Cannon
On Sat, Jul 12, 2008 at 10:27 AM, Matt Giuca <[EMAIL PROTECTED]> wrote: > Hi all, > > My first post to the list. In fact, first time Python hacker, long-time > Python user though. (Melbourne, Australia). > Welcome! > Some of you may have seen for the past week or so my bug report on Roundup, > ht

[Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-12 Thread Matt Giuca
Hi all, My first post to the list. In fact, first time Python hacker, long-time Python user though. (Melbourne, Australia). Some of you may have seen for the past week or so my bug report on Roundup, http://bugs.python.org/issue3300 I've spent a heap of effort on this patch now so I'd really lik