[issue1712522] urllib.quote throws exception on Unicode URL

2020-05-31 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Python 2.7 is no longer supported.

--
nosy: +serhiy.storchaka
resolution: accepted -> out of date
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2014-02-03 Thread Mark Lawrence

Changes by Mark Lawrence breamore...@yahoo.co.uk:


--
nosy:  -BreamoreBoy

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2013-03-29 Thread Mark Lawrence

Mark Lawrence added the comment:

A lot of work has already been done on this issue.  If this is likely to get 
into 2.7 then fine leave it open, if not can this be closed?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-21 Thread Senthil Kumaran

Senthil Kumaran orsent...@gmail.com added the comment:

Reverted the checkin in revision 83045.

For the robotparser issue, one of the these two can be adopted.

1. Fix it by decoding the unicode url using utf-8, strict.
2. Catch the KeyError exception and raise a TypeError exception from the 
robotparser module informing the user that Unicode URLs are not allowed. So 
that users can handle at application end and send 8bit strings.

I prefer 2.

--
resolution: fixed - accepted
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-21 Thread Matt Giuca

Matt Giuca matt.gi...@gmail.com added the comment:

If you're going the way of option 2, I would strongly advise against relying on 
the KeyError. The fact that a KeyError is raised by urllib.quote is not part of 
it's specification, it's a bug/quirk in the implementation (which is now 
unlikely to be change, but it's unsafe to rely on it).

Robotparser should encode the string, if and only if it is a unicode string, 
with ('ascii', 'strict'), catch the UnicodeEncodeError, and raise the TypeError 
you suggested. This will have precisely the same behaviour as your proposed 
option 2 (will work fine for byte strings and Unicode strings with ASCII-only 
characters, but raise a TypeError on Unicode strings with non-ASCII characters) 
without relying on the KeyError from urllib.quote.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-20 Thread Senthil Kumaran

Senthil Kumaran orsent...@gmail.com added the comment:

I agree to the points raised by Antoine. Also yesterday in IRC, Eric Smith 
mentioned that If someone uses these new parameters in 2.7.1 his code may not 
work with 2.7 (That would obviously be an undesirable behavior). So, it is 
better to leave at Exception raised and anyways py3k has correct behavior.

I shall revert the patch from 2.7.1.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Senthil, have you read my comment on python-checkins?
Couldn't this have been fixed without introducing a new API in a bugfix branch?

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Senthil Kumaran

Senthil Kumaran orsent...@gmail.com added the comment:

I just checked your comment in the checkins list.

I saw this is as bug-fix, which was leading to change in the signature of the 
quote function, still in backward compatible way.
 
Should we still not do it? 

I understood only feature requests and behavior changes are disallowed in 
bug-fix branch.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 I understood only feature requests and behavior changes are disallowed
 in bug-fix branch.

Well, isn't it a new feature you're adding?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Matt Giuca

Matt Giuca matt.gi...@gmail.com added the comment:

From http://mail.python.org/pipermail/python-checkins/2010-July/095350.html:
 Looking at the issue (which in itself was quite old), you could as well
 have fixed the robotparser module instead.

It isn't an issue with robotparser. The original reporter found it via 
robotparser, but it's nothing to do with that module. I found it independently 
and I would have reported it separately if it hadn't already been here.

It's definitely a bug in urllib (as shown by my extensive new test cases).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Matt Giuca

Matt Giuca matt.gi...@gmail.com added the comment:

 Well, isn't it a new feature you're adding?

You had a function which raised a confusing and unintentional KeyError when 
given non-ASCII Unicode input. Now it doesn't. That's the bug fix part.

What I assume you're referring to as a new feature is the new arguments. I'd 
say they're unfortunately necessary in fixing this bug, as the fix requires 
encoding the non-ASCII unicode characters with some encoding, and it's 
(arguably) necessary to give the programmer the choice of encoding, with 
sensible defaults.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 It's definitely a bug in urllib

A bug in what way? Up to 2.6 (*), the docs state nothing about the type of the 
string parameter.
(*) http://docs.python.org/release/2.6.5/library/urllib.html#urllib.quote

I think everyone assumed that the parameter should be a str object and 
nothing else. Apparently some people used it accidentally with some unicode 
strings and it worked until these strings contained non-ASCII characters. But 
it's a side-effect of how 2.x unicode strings work, and it doesn't seem to me 
quote() was ever intended to accept unicode strings.

If we were following you, we would add encoding and errors arguments to any 
str-accepting 2.x function, so that it can also accept unicode strings. That's 
certainly not a reasonable solution.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Senthil Kumaran

Senthil Kumaran orsent...@gmail.com added the comment:

Well, my understanding was Type:behavior was a bug fix and Type: feature 
request was a new feature request, which may change some underlying behavior. I 
thought this issue was on the border.

The robotparser using this might be just one usage indicator, but having this 
capability in the quote definitely helps. And this could not have been done 
without changing the signature.

Ideally, this could have gone in 2.7, but I missed it.  Personally, I am still 
+1 in having this in 2.7.1. Is it undesirable? Does it need wider discussion?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 Well, my understanding was Type:behavior was a bug fix and Type:
 feature request was a new feature request, which may change some
 underlying behavior. I thought this issue was on the border.

The original issue is against robotparser, and clearly states a bug
(robotparser doesn't work in some cases).
But solving a bug by adding a feature isn't appropriate for a bugfix
release.

You shouldn't look at how the issue is classified. What's important is
what the actual *patch* does.

A patch doesn't have to change existing behaviour to be considered a
feature. That's a misconception. Feature releases try to be
forward-compatible as well (if I use urllib.quote() in 2.Y, it will
still work in 2.Y+1).

Adding API parameters, or accepting additional types in an existing API,
is clearly a new feature.

 Ideally, this could have gone in 2.7, but I missed it.  Personally, I
 am still +1 in having this in 2.7.1. Is it undesirable? Does it need
 wider discussion?

We can certainly make exceptions from time to time but only when there's
a strong argument for it (e.g. a security issue). There doesn't seem to
be an urgency to make urllib.quote() work with non-ASCII unicode strings
in 2.7.1, while it didn't before anyway.

Furthermore, the core issue is the automatic coercion between unicode
and 8-bit strings in 2.x. Many APIs are affected by this, urllib.quote()
shouldn't be considered a special case.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Senthil Kumaran

Senthil Kumaran orsent...@gmail.com added the comment:

On Mon, Jul 19, 2010 at 11:25:30AM +, Antoine Pitrou wrote:
 If we were following you, we would add encoding and errors
 arguments to any str-accepting 2.x function, so that it can also
 accept unicode strings. That's certainly not a reasonable solution.

I don't think Matt is indicating that. Just the quote function can be
used with unicode string as perfectly valid string input.
In the original py3k bug too, there was a big discussion on this very
same topic.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Matt Giuca

Matt Giuca matt.gi...@gmail.com added the comment:

 I think everyone assumed that the parameter should be a str object
 and nothing else. Apparently some people used it accidentally with
 some unicode strings and it worked until these strings contained
 non-ASCII characters.

I don't consider use of Unicode strings in Python 2.7 to be accidental. In my 
experience with Python 2, pretty much everything already works with Unicode 
strings, and it's best practice to use them.

Now one of the major goals of Python 2.6/2.7 is to allow the writing of code 
which ports smoothly to Python 3. Unicode support is a major issue here. To 
quote What's new in Python 3 (http://docs.python.org/py3k/whatsnew/3.0.html):
To be prepared in Python 2.x, start using unicode for all unencoded text, and 
str for binary or encoded data only. Then the 2to3  tool will do most of the 
work for you.
Having functions in Python 2.7 which don't accept Unicode (or worse, raise 
random exceptions) runs against best practices for moving to Python 3.

 If we were following you, we would add encoding and errors arguments
 to any str-accepting 2.x function, so that it can also accept unicode
 strings. That's certainly not a reasonable solution.

No, that's certainly not necessary. You don't need an encoding or errors 
argument on any given function in order to support unicode. In fact, most code 
written to work with strings naturally works with Unicode because unicode 
strings support the same basic operations.

The need for an encoding and errors, and in fact the need to deal with 
string encoding at all with urllib.quote is due to the special nature of URLs. 
If URLs had a syntax like %u then there would be no need for encoding 
Unicode strings (as in UTF-8) at all. However, because the RFC specifies that 
Unicode strings are to be encoded into a byte sequence *using an unspecified 
encoding*, it is therefore necessary, for this specific function, to ask the 
programmer which encoding to use.

Thus I assure you, this is not just one random function I have picked to add 
these arguments to. This is the only one (that I know of) that requires them to 
support Unicode.

 The original issue is against robotparser, and clearly states a bug
 (robotparser doesn't work in some cases).

I don't know why this keeps coming back to robotparser. The original bug was 
not against robotparser; it is called quote throws exception on Unicode URL 
and that is the bug. Robotparser was just one demonstrative piece of code which 
failed because of it.

Having said that, I don't expect to continue this argument. If you (the Python 
developers) decide that it's too late to accept this, then I won't object to 
reverting it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 Now one of the major goals of Python 2.6/2.7 is to allow the writing
 of code which ports smoothly to Python 3. Unicode support is a major
 issue here.

I understand the argument. But 2.7 is a bugfix branch and shouldn't
receive new features, even backports. If we wanted 2.x to converge
further into 3.x, we would do a 2.8, which we have decided not to do.

 I don't consider use of Unicode strings in Python 2.7 to be
 accidental. In my experience with Python 2, pretty much everything
 already works with Unicode strings, and it's best practice to use
 them.

Not true. From the urllib module itself:

$ touch /tmp/hé
$ python -c 'import urllib; urllib.urlretrieve(file:///tmp/hé)'
$ python -c 'import urllib; urllib.urlretrieve(ufile:///tmp/hé)'
Traceback (most recent call last):
  File string, line 1, in module
  File /usr/lib64/python2.6/urllib.py, line 93, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
  File /usr/lib64/python2.6/urllib.py, line 225, in retrieve
url = unwrap(toBytes(url))
  File /usr/lib64/python2.6/urllib.py, line 1027, in toBytes
 contains non-ASCII characters)
UnicodeError: URL u'file:///tmp/h\xc3\xa9' contains non-ASCII characters

 Having functions in Python 2.7 which don't accept Unicode (or worse,
 raise random exceptions) runs against best practices for moving to
 Python 3.

There are lots of them, and urllib.quote() isn't an exception:

'x\x9c\xcbH\x04\x00\x013\x00\xca'
 zlib.compress(uhà)
Traceback (most recent call last):
  File stdin, line 1, in module
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 1: 
ordinal not in range(128)

pwd.struct_passwd(pw_name='root', pw_passwd='x', pw_uid=0, pw_gid=0, 
pw_gecos='root', pw_dir='/root', pw_shell='/bin/bash')
 pwd.getpwnam(urooté)
Traceback (most recent call last):
  File stdin, line 1, in module
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 4: 
ordinal not in range(128)

 In fact, most code written to work with strings naturally works with
 Unicode because unicode strings support the same basic operations.

What should zlib compression of an unicode string result in?

  The original issue is against robotparser, and clearly states a bug
  (robotparser doesn't work in some cases).
 
 I don't know why this keeps coming back to robotparser. The original
 bug was not against robotparser; it is called quote throws exception
 on Unicode URL and that is the bug. Robotparser was just one
 demonstrative piece of code which failed because of it.

Well, there are two different concerns:
- robotparser fails on certain Web pages, which is a bug (unless the Web
pages are clearly malformed)
- urllib.quote() should accept any kind of unicode strings, and perform
appropriate encoding, with an ability to override default encoding
parameters: this is a feature request

The OP himself (John Nagle) said:
“The problem is down inside a library module. robotparser is calling
urllib.quote. One of those two library modules needs to be fixed.”

It seems to imply that the primary concern was robotparser not working.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Sorry, the email gateway of Roundup ate half of my snippets.
Here they are again:

 zlib.compress(uha)
'x\x9c\xcbH\x04\x00\x013\x00\xca'
 zlib.compress(uhà)
Traceback (most recent call last):
  File stdin, line 1, in module
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 1: 
ordinal not in range(128)

 pwd.getpwnam(uroot)
pwd.struct_passwd(pw_name='root', pw_passwd='x', pw_uid=0, pw_gid=0, 
pw_gecos='root', pw_dir='/root', pw_shell='/bin/bash')
 pwd.getpwnam(urooté)
Traceback (most recent call last):
  File stdin, line 1, in module
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 4: 
ordinal not in range(128)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Matt Giuca

Matt Giuca matt.gi...@gmail.com added the comment:

OK sure, there are some other things broken, but they are mostly not dealing 
with string data, but binary data (for example, zlib expects a sequence of 
bytes, not characters).

Just one quick point:

 urllib.urlretrieve(file:///tmp/hé)
 UnicodeError: URL u'file:///tmp/h\xc3\xa9' contains non-ASCII characters

That's precisely correct behaviour. URLs are not allowed to contain non-ASCII 
characters (that's the whole point of urllib.quote). urllib.quote should accept 
non-ASCII characters (for conversion into ASCII strings). Other URL processing 
functions should not accept non-ASCII characters, since they aren't valid URIs.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Senthil Kumaran

Senthil Kumaran orsent...@gmail.com added the comment:

There are two points here:

First, is it a desired behavior of quote function in 2.7?

IMO, it is. In the discussions of issue3300, I think, it was decided that quote 
handling of unicode strings may be backported. Behaviour wise the modified 
version still returns a string which is correct for py2.

The forward compatibility on 2.7.1 version here is on the basis that someone in 
2.7 might be relying on Exception raised for Unicode string for the quote 
function.
 
But my guess is, when they are trying to quote non-ascii characters using quote 
function which is path component, they might be expecting that it gives them a 
correct output and this (now) modified function would be of help.

Of the many cases we have on Unicode being auto-coerced to 8-bit string, this 
particular case of using UTF-8 as default encoding for Unicode and returning a 
string seems to be fine (again discussed in the earlier issue). We might not 
have good answers for many other cases.

The Second point, as this is leading to an API change we should not have it 
2.7.1

It would be unfortunate, if we revert the patch on this account only. 

This can be classified a bug-fix producing the desirable behavior, just that it 
needs the API to change too. I don't know if we have never adopted this 
approach (of changing API in backward compatible manner) for anything other 
than the security bug fixes alone.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 The forward compatibility on 2.7.1 version here is on the basis that
 someone in 2.7 might be relying on Exception raised for Unicode string
 for the quote function.

Again, the problem isn't compatibility. It is, simply, that we shouldn't
add new features in a bugfix branch.

 The Second point, as this is leading to an API change we should not
 have it 2.7.1
 
 It would be unfortunate, if we revert the patch on this account only. 

Let me put it differently: if this rule didn't exist, there would be no
point in having bugfix branches, since everyone would commit their
favourite new features to bugfix branches.

There are many things which were too late for 2.7, and nobody is trying
to make a case of adding them to 2.7.1.

 I don't know if we have never adopted this approach (of changing API
 in backward compatible manner) for anything other than the security
 bug fixes alone.

We have done it a couple of times in early 3.0 and even 3.1 versions,
but that was really exceptional, and 3.x allowed us to relax some of the
rules since it was so little used at the time.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-18 Thread Matt Giuca

Matt Giuca matt.gi...@gmail.com added the comment:

Thanks for doing that, Senthil.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-17 Thread Mark Lawrence

Mark Lawrence breamore...@yahoo.co.uk added the comment:

I've run an eye over this and don't see any problems, particularly in the light 
of msg101043.  Only 2.7 is affected as the fix has been backported from py3k.  
Please can we go forward with this.

--
nosy: +BreamoreBoy
versions:  -Python 2.6

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-17 Thread Senthil Kumaran

Senthil Kumaran orsent...@gmail.com added the comment:

Incidentally, I was working on this yeserterday. Some minor changes were 
required in the patch as quote had undergone changes. 

Fixed and committed in revision 82940.

Thanks to Matt Giuca for this.

--
resolution: accepted - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-06-04 Thread AdamN

AdamN a...@varud.com added the comment:

Nudge.  Somebody with the authority needs to increment the stage to patch 
review.

--
nosy: +adamnelson

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-06-04 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
stage: unit test needed - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-04-26 Thread Lino Mastrodomenico

Changes by Lino Mastrodomenico l.mastrodomen...@gmail.com:


--
nosy: +mastrodomenico

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-03-14 Thread Matt Giuca

Matt Giuca matt.gi...@gmail.com added the comment:

I've finally gotten around to a complete analysis of this code. I have a 
code/test/documentation patch which fixes the issue without any code breakage.

There is another bug in quote which I've found and fixed with this patch: If 
the 'safe' parameter is unicode, it raises a UnicodeDecodeError.

I have backported all of the 'quote' test cases from Python 3 (which I wrote) 
to Python 2. This exposed the reported bug as well as the above one. It's good 
to have a much larger set of test cases to work with. It tests things like all 
combinations of str/unicode, as well as non-ASCII byte string input and all 
manner of unicode inputs.

The bugfix itself comes from Python 3 (this has already been approved, over 
many months, by Guido, so I am hoping a similar change can get pushed through 
into Python 2 fairly easily). The solution is to add encoding and errors 
arguments to 'quote', and have quote encode the unicode string before anything 
else. 'encoding' defaults to 'utf-8'. So:

 quote(u'/El Niño/')
'/El%20Ni%C3%B1o/'

which is typically the desired behaviour. (Note that URI syntax does not cover 
Unicode strings; it merely says to encode them with some encoding, recommended 
but not required UTF-8, and then percent-encode those.)

With this patch, quote *always* returns a str, even on unicode input. I think 
that makes sense, because a URI is, by definition, an ASCII string. It could 
easily be made to return a unicode instead.

The other fix is for 'safe'. If 'safe' is a byte string we don't touch it. But 
if it is a Unicode string, we throw away all non-ASCII bytes. This means you 
can't make *characters* safe, only *bytes*, since URIs deal with bytes. In 
Python 3, we go further and throw away all non-ASCII bytes from 'safe' as well, 
so you can only make ASCII bytes safe. For this patch, I didn't go that far, 
for backwards compatibility reasons.

Also updated documentation.

In summary, this patch makes 'quote' fully Unicode compliant. It does not 
change any existing behaviour which wouldn't previously have thrown an 
exception, so it can't possibly break any existing code (unless it's relying on 
the exception being thrown).

(A minor change I made was replacing the line cachekey = (safe, always_safe) 
with cachekey = safe. This avoids unnecessary work of hashing always_safe and 
the tuple, since always_safe doesn't change. It doesn't affect the behaviour.)

Note: I've also backported the 'unquote' test cases from Python 3 and found a 
few more bugs. I'm going to report them separately, with patches.

--
keywords: +patch
Added file: http://bugs.python.org/file16539/urllib-quote.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2010-03-14 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +ezio.melotti

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2009-08-08 Thread Senthil

Changes by Senthil orsent...@gmail.com:


--
assignee:  - orsenthil
resolution:  - accepted
type:  - behavior

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2009-05-26 Thread Matt Giuca

Matt Giuca matt.gi...@gmail.com added the comment:

The issue of urllib.quote was discussed at extreme length in issue 3300,
which was specific to Python 3.
http://bugs.python.org/issue3300

In the end, I rewrote the entire family of urllib.quote and unquote
functions; they're now Unicode compliant and accept additional encoding
and errors arguments to handle this.

They were never backported to the 2.x branch; maybe we should do so.
Note that the code is quite different and considerably more complex due
to all the new issues with Unicode strings.

--
nosy: +mgiuca

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2009-04-22 Thread Daniel Diniz

Changes by Daniel Diniz aja...@gmail.com:


--
keywords: +easy

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2009-04-22 Thread John Nagle

John Nagle na...@users.sourceforge.net added the comment:

Note that the problem can't be solved by telling end users to call a
different quote function.  The problem is down inside a library
module. robotparser is calling urllib.quote. One of those two
library modules needs to be fixed.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2009-02-12 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


--
components: +Unicode
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2009-02-12 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

quote() works in Python3 with any bytes string (not only ASCII) and 
any unicode string:

Python 3.1a0 (py3k:69105M, Feb  3 2009, 15:04:35)
 from urllib.parse import quote
 quote('é')
'%C3%A9'
 quote('\xe9')
'%C3%A9'
 quote('\xe9'.encode('utf-8'))
'%C3%A9'
 quote('\xe9'.encode('latin-1'))
'%E9'

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2009-02-12 Thread Daniel Diniz

Changes by Daniel Diniz aja...@gmail.com:


--
nosy: +orsenthil
stage:  - test needed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2009-02-08 Thread Daniel Diniz

Daniel Diniz aja...@gmail.com added the comment:

IMHO, the TypeError would be a bugfix for 2.6.x. A urllib.quote_unicode
could be provided (in 2.7) to match urllib.parse.quote in 3.0 and the OP
usecase.

I can provide a simple patch, but I'm afraid the OP's remarks about
ASCII-range and thread-safety wouldn't be handled at all.

--
nosy: +ajaksu2
versions: +Python 2.6, Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2008-12-21 Thread Valery

Valery khame...@gmail.com added the comment:

Hi, gurus, can anyone then give a hint what we mortals should use in 
order to form  the URL with non-ascii symbols? We loved so much idea to 
feed our national symbols to urllib.quote as unicode string... and now 
we are quite disoriented... Thanks in advance for any comments!
Valery

--
nosy: +vak

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1712522] urllib.quote throws exception on Unicode URL

2008-12-21 Thread Valery

Valery khame...@gmail.com added the comment:

(self-answer to msg78153)
the working recipe is:
http://www.nabble.com/Re:-Problem:-neither-urllib2.quote-nor-
urllib.quote-encode-the--unicode-strings-arguments-p19823144.html

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1712522
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com