Your're right - this was raised before and in fact it was a feature Mauro
Tortonesi intended to be implemented for the 1.12 release, but it seems to have
been forgotten somewhere along the line. I wrote to the list in 2006 describing
what I consider a compelling reason to support file://. Here is what I wrote
At 03:45 PM 26/06/2006, David wrote:
In replies to the post requesting support of the "file://" scheme, requests
were made for someone to provide a compelling reason to want to do this.
Perhaps the following is such a reason.
I have a CD with HTML content (it is a CD of abstracts from a scientific
conference), however for space reasons not all the content was included on the
CD - there remain links to figures and diagrams on a remote web site. I'd like
to create an archive of the complete content locally by having wget retrieve
everything and convert the links to point to the retrieved material. Thus the
wget functionality when retrieving the local files should work the same as if
the files were retrieved from a web server (i.e. the input local file needs to
be processed, both local and remote content retrieved, and the copies made of
the local and remote files all need to be adjusted to now refer to the local
copy rather than the remote content). A simple shell script that runs cp or
rsync on local files without any further processing would not achieve this aim.
Regarding to where the local files should be copied, I suggest a default scheme
similar to current http functionality. For example, if the local source was
/source/index.htm, and I ran something like:
wget.exe -m -np -k file:///source/index.htm
this could be retrieved to ./source/index.htm (assuming that I ran the command
from anywhere other than the root directory). On Windows, if the local source
file is c:\test.htm, then the destination could be .\c\test.htm. It would
probably be fair enough for wget to throw up an error if the source and
destination were the same file (and perhaps helpfully suggest that the user
changes into a new subdirectory and retry the command).
One additional problem this scheme needs to deal with is when one or more /../
in the path specification results in the destination being above the current
parent directory; then the destination would have to be adjusted to ensure the
file remained within the parent directory structure. For example, if I am in
/dir/dest/ and ran
wget.exe -m -np -k file://../../source/index.htm
this could be saved to ./source/index.htm (i.e. /dir/dest/source/index.htm)
At 08:49 AM 3/09/2008, you wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Petri Koistinen wrote:
> I would be nice if wget would also support file://.
Feel free to file an issue for this (I'll mark it "Needs Discussion" and
set at low priority). I'd thought there was already an issue for this,
but can't find it (either open or closed). I know this has come up
before, at least.
I think I'd need some convincing on this, as well as a clear definition
of what the scope for such a feature ought to be. Unlike curl, which
"groks urls", Wget "W(eb)-gets", and file:// can't really be argued to
be part of the web.
That in and of itself isn't really a reason not to support it, but my
real misgivings have to do with the existence of various excellent tools
that already do local-file transfers, and likely do it _much_ better
than Wget could hope to. Rsync springs readily to mind.
Even the system "cp" command is likely to handle things much better than
Wget. In particular, special OS-specific, extended file attributes,
extended permissions and the like, are among the things that existing
system tools probably handle quite well, and that Wget is unlikely to. I
don't really want Wget to be in the business of duplicating the system
"cp" command, but I might conceivably not mind "file://" support if it
means simple _content_ transfer, and not actual file duplication.
Also in need of addressing is what "recursion" should mean for file://.
Between ftp:// and http://, "recursion" currently means different
things. In FTP, it means "traverse the file hierarchy recursively",
whereas in HTTP it means "traverse links recursively". I'm guessing
file:// should work like FTP (i.e., recurse when the path is a
directory, ignore HTML-ness), but anyway this is something that'd need
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
-----END PGP SIGNATURE-----
Make the switch to the world's best email. Get Yahoo!7 Mail!