-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Christopher G. Lewis wrote:
> Hmm - changing the rename schema would potentially create a HUGE issue with
> clobbering.
> 
> For example, and quite hypothetical...
> 
> Given a directory with the following:
>   index.html
>   index-1.html
>   index.1.html
> 
> All three are served by the server and rendered by the browser.  They are
> distinct files given the file system and the URL interpretation of the file
> system by the web server.
> 
> Now, Wget downloads index.html, then downloads it again.  Our choices for
> the second file are:
>   1) index.html.1
>   2) index-1.html
>   3) index.1.html
> 
> Of the three, only #1 is pretty much guaranteed *not* to exist on the web
> server.  Why?  Because by changing the extension, we've changed the content
> type.  So if our intentions are to not clobber (which, I believe, is the
> whole point) we are *much* better off sticking with the current schema and
> creating a file that most can't be served by the web server.

Of course you are 100% correct that it is the whole point.

However, while this is indeed a problem, I don't think it's a clobbering
problem. I believe Wget would then choose (or could be made to then
choose) index-2.html, etc, for the file which on the server is named
index-1.html.

Of course, while that would resolve clobbering, that would make it
virtually impossible to determine what file had what local name, which
is entirely unacceptable.

I wonder how Wget currently handles perverse cases like index.html.1
actually existing on the server and already on the local system. :)

> Note that this is quite a contrived example to illustrate the point.

Yeah. Unfortunately, though, something like page-1.html, page-2.html,
isn't quite so unlikely.

It's intended that Reget (I'll call it that for now, until we figure out
what the hell we're going to do with that whole cluster of
functionality) will have support for a database of download-session
metadata, that would handle mappings between the remote URI and the
local file. With that, it'd be possible to construct a simple utility
which could be invoked like, "reget-fmap http://example.com/foo.html";
and might spit out something like "./example.com/foo.html".

This might couple quite well with providing a plugin hook to control the
renaming scheme.

Given your excellent points, and the fact that I didn't get the
overwhelmingly positive response to this suggestion that I had
anticipated, I'd better table this patch. :(

> However, my 2 cents on the behavior - It would be *wonderful* if wget could
> look at the local file system and rename each version to file.ext.n+1 so the
> new download is index.html, not index.html.1.  I've been caught a couple of
> times with this, so to me the default behavior is backwards (ie, new file
> s/b the URL, older files get versioned)

That would of course be substantially more work, and provide even
greater opportunity for race conditions/interoperability issues than we
already do, but I agree that it'd be nice-to-have.

Unfortunately, I don't think there's any way we'll ever do this in Wget:
 it'd be too confusing for people used to the current way. And while, as
Hrvoje pointed out, the currently proposed suffixes patch could
potentially break backwards compatibility, it's not likely to do so in a
harmful/destructive way, whereas any current scripts that currently
download files and then erase the renamed ones will suddenly be
destroying the new data, rather than the old, if we reverse the renaming. :\

That problem is partly exacerbated by the fact that, from a certain
perspective, we ought to be able to "stick our noses in the air" and
claim that any scripts of that sort ought to have been telling Wget to
clobber files, rather than letting Wget rename them and trying to delete
them afterwards... but there is not currently a way to Wget to do that.
There is no way to ask Wget to clobber files when it normally wouldn't.

However, with the proper hook in Reget, it'd be easy enough to have a
plugin that handles it this way. Actually, since Reget is looking to
probably be an entirely new beast, and we'll certainly have to break
compatibility with traditional Wget, we could consider making this the
default renaming mechanism for Reget; but I'm still concerned about the
extra work, race conditions, and potential for screwing with other
programs that may be operating on some of the files involved.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHMKRh7M8hyUobTrERCFeGAJwM8yPR35j8rbsqkG8Vk8A1Bdm0YACggbBN
6s7EOEwhxCerjaeuQAblccw=
=rdpM
-----END PGP SIGNATURE-----

Reply via email to