Bugs item #1436428, was opened at 2006-02-22 07:03 Message generated for change (Comment added) made by zseil You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1436428&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: Donovan Eastman (dpeastman) Assigned to: Nobody/Anonymous (nobody) Summary: urllib has trouble with Windows filenames Initial Comment: When you pass urllib the name of a local file including a Windows drive letter (e.g. 'C:\dir\My File.txt') URLopener.open() incorrectly interprets the drive letter as the scheme of a URL. Of course, given that there is no scheme 'C', this fails. I have solved this in my own code by putting the following test before calling urllib.urlopen(): if url[1] == ':' and url[0].isalpha(): url = 'file:' + url Although this works fine in my particular case, it seems like urllib should just simply "do the right thing" without having to worry about it. Therefore I propose that urllib should automatically assume that any URL that begins with a single alpha followed by a colon is a local file. The only potential downside would be that it would preclude the use of single letter scheme names. I did a little research on this. RFC 3986 suggests, but does not explicitly state that scheme names must be more than one character. (http://www.gbiv.com/protocols/uri/rfc/rfc3986.html#scheme) . That said, there are no currently recognized single letter scheme names (http://www.iana.org/assignments/uri-schemes.html) and it seems very unlikely that there every would be. I would gladly write the code for this myself -- but I suspect that it would take someone longer to review and integrate my changes than it would to just write the code. Thanks, Donovan ---------------------------------------------------------------------- Comment By: iga Seilnacht (zseil) Date: 2006-04-13 02:12 Message: Logged In: YES user_id=1326842 There are already two platform specific functions in urllib module just for this purpose: pathname2url and url2pathname. See http://docs.python.org/lib/module-urllib.html#l2h-3193. I agree that this should be closed as invalid. ---------------------------------------------------------------------- Comment By: Andrew Clover (bobince) Date: 2006-03-20 18:41 Message: Logged In: YES user_id=311085 Filepaths aren't URIs and attempting to hide the difference in the backend is doomed to fail (as it did for SAX). Throw filenames with colons in, network paths, Mac paths and RISC OS paths into the mix, and you've got a situation where it is all but impossible to handle correctly. In any case, the docs *don't* say you can pass in a filepath: If the URL does not have a scheme identifier, or if it has file: as its scheme identifier, this opens a local file This means the string you pass in is unequivocally a URL *not* a pathname... just that you can leave the scheme prefix off for file: URLs. Effectively this is a relative URL. r'C:\spam' is *not* a valid way to refer to a local file using a relative URL. Pass it through pathname2url and you'll get '///C|/spam', which is okay; 'C|/spam' and '/C|span' will also work. Even on Unix, a filepath won't always work when passed to urlopen. Filenames can have percent signs in, which have to be encoded in URLs, for example. Always use pathname2url or you're going to trip up. (Suggest setting status INVALID, possible clarification to docs to warn against passing a filepath to urlopen?) ---------------------------------------------------------------------- Comment By: Donovan Eastman (dpeastman) Date: 2006-03-14 03:32 Message: Logged In: YES user_id=757799 OK - Here's my suggested fix: This can be fixed with a single if statement (and a comment to explain it to confused unix programmers). In splittype(), right after the line that reads: scheme = match.group(1) add the following: #ignore single char schemes to avoid confusion with win32 drive letters if len(scheme) > 1: ...and indent the next line. Alternatively, the if statement could read: if len(scheme) > 1 or sys.platform != 'win32': ...which would allow single letter scheme names on non-Windows systems. I would argue that it is better to be consistent and have it work the same way on all OS's. ---------------------------------------------------------------------- Comment By: Donovan Eastman (dpeastman) Date: 2006-03-14 02:56 Message: Logged In: YES user_id=757799 Reasons why urllib should open local files: 1) This allows you to write code that handles local files and Internet files equally well -- without having to do any special magic of your own. 2) The docs all say that it should. I believe this would work just fine under Unix. In URLopener.open() it looks for the protocol prefix and if it can't find one, it assumes that it is a local file. The problem on Windows is that you have these pesky drive letters. The form 'C:\location' ends up looking a lot like the form 'http://location'. Therefore it looks for a protocol called 'c' -- which obviously isn't going to work. ---------------------------------------------------------------------- Comment By: Koen van de Sande (shadowmorpher) Date: 2006-03-13 20:19 Message: Logged In: YES user_id=270334 Why should the URL lib module support opening of local files? It already does so through the file: protocol prefix, and do not see why it should support automatic detection of Windows filenames. AFAIK it does not do automatic detection of Unix filenames (one could recognize it from /home/ something), so why would Windows work differently? I'm not an expert or anything, so I might be wrong. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1436428&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com