Bugs item #1396543, was opened at 2006-01-04 04:57 Message generated for change (Comment added) made by jjlee You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1396543&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: John Hansen (johnhansen) Assigned to: Nobody/Anonymous (nobody) Summary: urlparse is confused by / Initial Comment: If the parameter field of a URL contains a '/', urlparse does not enter date in the parameter field, but leaves it attached to the path. The simplified example is: >>> urlparse.urlparse("http://f/adi;s=a;c=b/") ('http', 'f', '/adi;s=a;c=b/', '', '', '') >>> urlparse.urlparse("http://f/adi;s=a;c=b") ('http', 'f', '/adi', 's=a;c=b', '', '') The realworld case was: >>> urlparse.urlparse("http://ad.doubleclick.net/adi/ N3691.VibrantMedia/B1733031.2;sz=160x600;click=http%3A/ adforce.adtech.de/adlink%7C82%7C59111%7C1%7C168%7CAdId% 3D1023327%3BBnId%3D4%3Bitime%3D335264036%3Bku%3D12900% 3Bkey%3Dcomputing%2Bbetanews%5Fgeneral%3Blink%3D") (''http'', 'ad.doubleclick.net/adi/N3691.VibrantMedia/ B1733031.2;sz=160x600;click=http%3A/adforce.adtech.de/adlink% 7C82%7C59111%7C1%7C168%7CAdId%3D1023327%3BBnId%3D4%3Bitime %3D335264036%3Bku%3D12900%3Bkey%3Dcomputing%2Bbetanews% 5Fgeneral%3Blink%3D', '', '', '') What's odd is that the code specifically says to do this: def _splitparams(url): if '/' in url: i = url.find(';', url.rfind('/')) if i < 0: return url, '' Is there a reason for the rfind? ---------------------------------------------------------------------- Comment By: John J Lee (jjlee) Date: 2006-02-06 01:09 Message: Logged In: YES user_id=261020 The urlparse.urlparse() code should not be changed, for backwards compatibility reasons. As the docs for module urlparse explain, you should instead use urlparse.urlsplit(), then another function to parse parameters (that other function is not supplied by the stdlib, IIRC). Also, note that RFCs 3986 obsoletes RFC 2396 (see also RFC 3987). ---------------------------------------------------------------------- Comment By: Peter van Kampen (pterk) Date: 2006-01-14 21:19 Message: Logged In: YES user_id=174455 Actually section 3.3 of RFC2396 is relevant here and it seems that it is indeed correctly implemented as is. I'm not sure what the 'python policy' is on RFC vs The Real World. My guess would be that RFC's carry some weight. Following the 'real world' is too vague a reference. Your world might be different than mine and tomorrow's world a different world than today's. You can always monkey-patch: >>> def my_splitparams(url): ... i = url.find(';') ... return url[:i], url[i+1:] ... >>> import urlparse >>> urlparse._splitparams = my_splitparams >>> urlparse.urlparse("http://f/adi;s=a;c=b/") ('http', 'f', '/adi', 's=a;c=b/', '', '') ---------------------------------------------------------------------- Comment By: John Hansen (johnhansen) Date: 2006-01-13 18:19 Message: Logged In: YES user_id=1418831 Well RFC2396, section 3.4 says "/" is reserved within a query. However, the real world doesn't seem to follow RFC2396... so I still think it's a bug: the class should be useful, rather than try to enforce an RFC. A warning would be fine. ---------------------------------------------------------------------- Comment By: Peter van Kampen (pterk) Date: 2006-01-13 00:25 Message: Logged In: YES user_id=174455 Looking at the testcases it appears the answers must be in rfc's 1808 or 2396. http://www.ietf.org/rfc/rfc1808.txt and http://www.ietf.org/rfc/rfc2396.txt See for example section 5.3 of 1808. I don't see why _splitparams does what is does but I didn't exactly close-read the text either. Also be sure to look at Lib/test/test_urlparse.py. ---------------------------------------------------------------------- Comment By: John Hansen (johnhansen) Date: 2006-01-04 16:31 Message: Logged In: YES user_id=1418831 The first line should have read: If the parameter field of a URL contains a '/', urlparse does not enter it into the parameter field, but leaves it attached to the path. ---------------------------------------------------------------------- Comment By: John Hansen (johnhansen) Date: 2006-01-04 05:00 Message: Logged In: YES user_id=1418831 The first line should have read: If the parameter field of a URL contains a '/', urlparse does not enter it into the parameter field, but leaves it attached to the path. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1396543&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com