Hi Micah, Your're right - this was raised before and in fact it was a feature Mauro Tortonesi intended to be implemented for the 1.12 release, but it seems to have been forgotten somewhere along the line. I wrote to the list in 2006 describing what I consider a compelling reason to support file://. Here is what I wrote then:
At 03:45 PM 26/06/2006, David wrote: In replies to the post requesting support of the "file://" scheme, requests were made for someone to provide a compelling reason to want to do this. Perhaps the following is such a reason. I have a CD with HTML content (it is a CD of abstracts from a scientific conference), however for space reasons not all the content was included on the CD - there remain links to figures and diagrams on a remote web site. I'd like to create an archive of the complete content locally by having wget retrieve everything and convert the links to point to the retrieved material. Thus the wget functionality when retrieving the local files should work the same as if the files were retrieved from a web server (i.e. the input local file needs to be processed, both local and remote content retrieved, and the copies made of the local and remote files all need to be adjusted to now refer to the local copy rather than the remote content). A simple shell script that runs cp or rsync on local files without any further processing would not achieve this aim. Regarding to where the local files should be copied, I suggest a default scheme similar to current http functionality. For example, if the local source was /source/index.htm, and I ran something like: wget.exe -m -np -k file:///source/index.htm this could be retrieved to ./source/index.htm (assuming that I ran the command from anywhere other than the root directory). On Windows, if the local source file is c:\test.htm, then the destination could be .\c\test.htm. It would probably be fair enough for wget to throw up an error if the source and destination were the same file (and perhaps helpfully suggest that the user changes into a new subdirectory and retry the command). One additional problem this scheme needs to deal with is when one or more /../ in the path specification results in the destination being above the current parent directory; then the destination would have to be adjusted to ensure the file remained within the parent directory structure. For example, if I am in /dir/dest/ and ran wget.exe -m -np -k file://../../source/index.htm this could be saved to ./source/index.htm (i.e. /dir/dest/source/index.htm) -David. At 08:49 AM 3/09/2008, you wrote: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Petri Koistinen wrote: > Hi, > > I would be nice if wget would also support file://. Feel free to file an issue for this (I'll mark it "Needs Discussion" and set at low priority). I'd thought there was already an issue for this, but can't find it (either open or closed). I know this has come up before, at least. I think I'd need some convincing on this, as well as a clear definition of what the scope for such a feature ought to be. Unlike curl, which "groks urls", Wget "W(eb)-gets", and file:// can't really be argued to be part of the web. That in and of itself isn't really a reason not to support it, but my real misgivings have to do with the existence of various excellent tools that already do local-file transfers, and likely do it _much_ better than Wget could hope to. Rsync springs readily to mind. Even the system "cp" command is likely to handle things much better than Wget. In particular, special OS-specific, extended file attributes, extended permissions and the like, are among the things that existing system tools probably handle quite well, and that Wget is unlikely to. I don't really want Wget to be in the business of duplicating the system "cp" command, but I might conceivably not mind "file://" support if it means simple _content_ transfer, and not actual file duplication. Also in need of addressing is what "recursion" should mean for file://. Between ftp:// and http://, "recursion" currently means different things. In FTP, it means "traverse the file hierarchy recursively", whereas in HTTP it means "traverse links recursively". I'm guessing file:// should work like FTP (i.e., recurse when the path is a directory, ignore HTML-ness), but anyway this is something that'd need answering. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvcLq7M8hyUobTrERAl6YAJ9xeTINVkuvl8HkElYlQt7dAsUfHACfXRT3 lNR++Q0XMkcY4c6dZu0+gi4= =mKqj -----END PGP SIGNATURE----- Make the switch to the world's best email. Get Yahoo!7 Mail! http://au.yahoo.com/y7mail