Martin: > It still should be possible to come up with examples for these as > well, no? For example, if you pass a relative URI as the base > URI, what would you like to see happen?
Until two days ago I didn't even realize that was an incorrect use of urljoin. I can't be the only one. Hence, raise an exception - just like 4Suite's Uri.py does. > That's true. Actually, it's probably not true; it will only get fixed > if some volunteer contributes a fix. And it's not I. A true fix is a lot of work. I would rather use Uri.py, now that I see it handles everything I care about, and then some. Eg, file name <-> URI conversion. > So do you think this patch meets your requirements? # new >>> uriparse.urljoin("http://spam/", "foo/bar") 'http://spam//foo/bar' >>> # existing >>> urlparse.urljoin("http://spam/", "foo/bar") 'http://spam/foo/bar' >>> No. That was the first thing I tried. Also found >>> urlparse.urljoin("http://blah", "/spam/") 'http://blah/spam/' >>> uriparse.urljoin("http://blah", "/spam/") 'http://blah/spam' >>> I reported these on the patch page. Nothing else strange came up, but I did only try http urls and not the others. My "requirements", meaning my vague, spur-of-the-moment thoughts without any research or experimentation to determing their validity, are different than those for Python. My real requirements are met by the existing code. My imagined ones include support for edge cases, the idna codec, unicode, and real-world use on a variety of OSes. 4Suite's Uri.py seems to have this. Eg, lots of edge-case code like # On Windows, ensure that '|', not ':', is used in a drivespec. if os.name == 'nt' and scheme == 'file': path = path.replace(':','|',1) Hence the uriparse.py patch does not meet my hypothetical requirements . Python's requirements are probably to get closer to the spec. In which case yes, it's at least as good as and likely generally better than the existing module, modulo a few API naming debates and perhaps some rough edges which will be found when put into use. And perhaps various arguments about how bug compatible it should be and if the old code should be available as well as the new one, for those who depend on the existing 1808-allowed implementation dependent behavior. For those I have not the experience to guide me and no care to push the debate. I've decided I'm going to experiment using 4Suite's Uri.py for my code because it handles things I want which are outside of the scope of uriparse.py > This topic (URL parsing) is not only inherently difficult to > implement, it is just as tedious to review. Without anybody > reviewing the contributed code, it's certain that it will never > be incorporated. I have a different opinion. Python's url manipulation code is a mess. urlparse, urllib, urllib2. Why is "urlencode" part of urllib and not urllib2? For that matter, urllib is labeled 'Open an arbitrary URL' and not 'and also do manipulations on parts of URLs." I don't want to start fixing code because doing it the way I want to requires a new API and a much better understanding of the RFCs than I care about, especially since 4Suite and others have already done this. Hence I would say to just grab their library. And perhaps update the naming scheme. Also, urlgrabber and pycURL are better for downloading arbitrary URIs. For some definitions of "better". Andrew [EMAIL PROTECTED] _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com