Re: .1, .2 before suffix rather than after
On 11/29/07, Micah Cowan <[EMAIL PROTECTED]> wrote: > Yeah... of course they won't be able to edit the wiki that way. I doubt you'd get the slashdot effect from just the people who're interested in editing the wiki. You may get a handful of developers and a few thousand people who only want to read it :-)
Re: .1, .2 before suffix rather than after
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Josh Williams wrote: > On 11/29/07, Micah Cowan <[EMAIL PROTECTED]> wrote: >> Well, the trouble with that is that I'm running all of Wget's stuff >> (plus my own personal mail and whatnot) on a little VPS. I'm rather >> concerned that the traffic will kill me. I'm already worried about it >> potentially hitting SlashDot or Digg because it's the first Wget release >> in quite a while. D: > > Tada! http://en.wikipedia.org/wiki/Coral_Content_Distribution_Network > > There's also archive.org. Yeah... of course they won't be able to edit the wiki that way. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFHT24z7M8hyUobTrERAprHAJ4gCaeiel8UPINXAa2wiept/ZsvFwCeLy0f 7SLzgXI6Jzcgmyy6GpyMH7k= =MZaQ -END PGP SIGNATURE-
Re: .1, .2 before suffix rather than after
On 11/29/07, Micah Cowan <[EMAIL PROTECTED]> wrote: > Well, the trouble with that is that I'm running all of Wget's stuff > (plus my own personal mail and whatnot) on a little VPS. I'm rather > concerned that the traffic will kill me. I'm already worried about it > potentially hitting SlashDot or Digg because it's the first Wget release > in quite a while. D: Tada! http://en.wikipedia.org/wiki/Coral_Content_Distribution_Network There's also archive.org.
Re: .1, .2 before suffix rather than after
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Josh Williams wrote: > On 11/29/07, Micah Cowan <[EMAIL PROTECTED]> wrote: >> I dunno, man, I think our current wget2 roadmap goals are already pretty >> wild-and-crazy. ;) > > I agree. I think we should create an announcement asking for > developers to help and submit it to digg and slashdot. The new > features may get some excitement going and start rumors. :-P > > ^^ in all seriousness ^^ Well, the trouble with that is that I'm running all of Wget's stuff (plus my own personal mail and whatnot) on a little VPS. I'm rather concerned that the traffic will kill me. I'm already worried about it potentially hitting SlashDot or Digg because it's the first Wget release in quite a while. D: - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD4DBQFHT2vG7M8hyUobTrERAiOAAJd6Htrtd2i9oxjJoK5ww+DFafzkAJ4lSiJR qtT8LHghRuxYlkcdznnlmQ== =ddEY -END PGP SIGNATURE-
Re: .1, .2 before suffix rather than after
On 11/29/07, Micah Cowan <[EMAIL PROTECTED]> wrote: > I dunno, man, I think our current wget2 roadmap goals are already pretty > wild-and-crazy. ;) I agree. I think we should create an announcement asking for developers to help and submit it to digg and slashdot. The new features may get some excitement going and start rumors. :-P ^^ in all seriousness ^^
Re: .1, .2 before suffix rather than after
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony Godshall wrote: > ... >> At the release of Wget 1.11, it is my intention to try to attract as >> much developer interest as possible. At the moment, and despite Wget's >> pervasive presence, it has virtually no user or developer community. >> Given the amount of work that needs to be done, this is not good. The >> announcement of the first new release of GNU Wget in two years seems a >> great opportunity to solicit help! > ... > > That's sort of the nature of older tools with a well-defined mission- > they do their > job so well there's little itch to tweak them. If it ain't broken, > you don't fix it. > Freshmeat lists wget as "mature", which basically means the same thing. Yeah, I imagine that's it. Except that Wget _is_ broken in several important ways... but I think it works for the vast majority of users. In particular, I think the most widespread use of Wget is for fetching single files, which Wget seldom has any problems doing. It's when you try tricky things that Wget can sometimes break your expectations. Even so, of course, I have rarely if ever run into problems using it, personally. > I guess wget will have to get a bit immature to get some buzz going. Some > pretty insane goals in a wget2 roadmap would probably do the trick. How > about announcing plans implement DHT and make bittorrent obsolete? That > should make slashdot ;-) I dunno, man, I think our current wget2 roadmap goals are already pretty wild-and-crazy. ;) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFHT2ni7M8hyUobTrERAvh2AJ4hEcCzAF5vdpuflFJ1P7GyzPzjxgCfeaHh /GVTxx+vFcm9PcE3a8P21qM= =Hkhj -END PGP SIGNATURE-
Re: .1, .2 before suffix rather than after
... > At the release of Wget 1.11, it is my intention to try to attract as > much developer interest as possible. At the moment, and despite Wget's > pervasive presence, it has virtually no user or developer community. > Given the amount of work that needs to be done, this is not good. The > announcement of the first new release of GNU Wget in two years seems a > great opportunity to solicit help! ... That's sort of the nature of older tools with a well-defined mission- they do their job so well there's little itch to tweak them. If it ain't broken, you don't fix it. Freshmeat lists wget as "mature", which basically means the same thing. I guess wget will have to get a bit immature to get some buzz going. Some pretty insane goals in a wget2 roadmap would probably do the trick. How about announcing plans implement DHT and make bittorrent obsolete? That should make slashdot ;-) Tony -- The above is not to be taken seriously.
Re: .1, .2 before suffix rather than after
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Josh Williams wrote: > On Nov 29, 2007 6:20 PM, David Ginger > <[EMAIL PROTECTED]> wrote: >> So can I ask is a wget2 actualy being developed ? > > Go ahead, but I'll answer that question before you do ;-) > > The answer is no - not at the moment. But we've been discussing it for > several months. It will be a while before any code is actually > written. Specifically, it will probably be years, unless we can get a much-needed influx of developers in here. The list of issues targeted at Wget 1.12 are many, and most of them really should be resolved before we begin work on the "beefier" Wget. And, as I am (1) by far the most active current Wget developer, and (2) not all that terribly active, given that it's all just in my spare time ;) - work is liable to be a bit slow. The good news is, once the Wget 1.12 stuff is out of the way, we can move almost all focus to the new thing, as Wget will be almost completely in bug-fixes-only mode. Given that's the case, one might argue that Wget 2.0 is in fact a reasonable name for the new package. I'm still thinking about that stuff, and will probably add a Wiki page for the purpose of names discussion soon. At the release of Wget 1.11, it is my intention to try to attract as much developer interest as possible. At the moment, and despite Wget's pervasive presence, it has virtually no user or developer community. Given the amount of work that needs to be done, this is not good. The announcement of the first new release of GNU Wget in two years seems a great opportunity to solicit help! - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFHT1FT7M8hyUobTrERAswMAJ9rNSv2kC1MIy3vErblMfcqBmcWdQCgjT2z C8kgh5b4msWnw0ORb8x0Jl8= =VMV+ -END PGP SIGNATURE-
Re: .1, .2 before suffix rather than after
On Nov 29, 2007 6:20 PM, David Ginger <[EMAIL PROTECTED]> wrote: > So can I ask is a wget2 actualy being developed ? Go ahead, but I'll answer that question before you do ;-) The answer is no - not at the moment. But we've been discussing it for several months. It will be a while before any code is actually written.
Re: .1, .2 before suffix rather than after
> i totally agree with hrvoje here. also note that changing wget > unique-name-finding algorithm can potentially break lots of wget-based > scripts out there. i think we should leave these kind of changes for wget2 > - or wget-on-steroids or however you want to call it ;-) So can I ask is a wget2 actualy being developed ?
Re: .1, .2 before suffix rather than after
On Sunday 04 November 2007 22:54:24 Hrvoje Niksic wrote: > Micah Cowan <[EMAIL PROTECTED]> writes: > > Christian Roche has submitted a revised version of a patch to modify > > the unique-name-finding algorithm to generate names in the pattern > > "foo-n.html" rather than "foo.html.n". The patch looks good, and > > will likely go in very soon. > > foo.html.n has the advantage of simplicity: you can tell at a glance > that .n is a duplicate of . Also, it is trivial to remove > the unwanted files by removing .*. Why change what worked so > well in the past? i totally agree with hrvoje here. also note that changing wget unique-name-finding algorithm can potentially break lots of wget-based scripts out there. i think we should leave these kind of changes for wget2 - or wget-on-steroids or however you want to call it ;-) -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng. http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linux http://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: .1, .2 before suffix rather than after
"Tony Lewis" <[EMAIL PROTECTED]> writes: > Hrvoje Niksic wrote: >> > And how is .tar.gz renamed? .tar-1.gz? >> Ouch. > > OK. I'm responding to the chain and not Hrvoje's expression of pain. :-) > > What if we changed the semantics of --no-clobber so the user could specify > the behavior? I'm thinking it could accept the following strings: > - after: append a number after the file name (current behavior) > - before: insert a number before the suffix But see Andreas's post quoted above: the term "suffix" is ambiguous. In foo.tar.gz, what is the suffix? How about .emacs.el? And Heroes.S203.DivX.avi? Currently implemented name mangling is far from perfect, but it's easy to understand, to recognize, and to reverse. One other possibility that offers the same features would be to put the number before the file, such as "1.foo.html" instead of "foo.html.1"; but that seems hardly an improvement. > - new: change name of new file (current behavior) > - old: change name of old file It would be nice to be able to change the name of the old file, but when you start to consider the consequences, it gets trickier. What do you do when you have many files left over from previous runs, such as foo, foo.1, foo.2, etc.? Handling it correctly would trigger a flurry of renames, which would need to be carried out in the correct order, be prepared to handle a rename failing, and to detect changed conditions in mid-run. In general it seems like bad design to need to touch many files in order to simply download one. Maybe the improved end user experience makes it worth it, but at this point I'm not convinced of it. > Back to the painful point at the start of this note, I think we > treat ".tar.gz" as a suffix and if --no-clobber=before is specified, > the file name becomes ".1.tar.gz". But see my other examples above.
RE: .1, .2 before suffix rather than after
Hrvoje Niksic wrote: > > And how is .tar.gz renamed? .tar-1.gz? > > Ouch. OK. I'm responding to the chain and not Hrvoje's expression of pain. :-) What if we changed the semantics of --no-clobber so the user could specify the behavior? I'm thinking it could accept the following strings: - after: append a number after the file name (current behavior) - before: insert a number before the suffix - new: change name of new file (current behavior) - old: change name of old file With this scheme --no-clobber becomes equivalent to --no-clobber=after,new. If I want to change where the number appears in the file name or have the old file renamed then I can specify the behavior I want on the command line (or in .wgetrc). I think I would change my default to --no-clobber=before,old. I think it would be useful to have semantics in .wgetrc where I specify what I want my --no-clobber default to be without that meaning I want --no-clobber processing on each invocation. It would be nice if I could say that I want my default to be "before,old", but to only have that apply when I specify --no-clobber on the command line. Back to the painful point at the start of this note, I think we treat ".tar.gz" as a suffix and if --no-clobber=before is specified, the file name becomes ".1.tar.gz". Tony
Re: .1, .2 before suffix rather than after
Andreas Pettersson <[EMAIL PROTECTED]> writes: > And how is .tar.gz renamed? .tar-1.gz? Ouch.
Re: .1, .2 before suffix rather than after
Hrvoje Niksic wrote: It just occurred to me that this change breaks backward compatibility. It will break scripts that try to clean up after Wget or that in any way depend on the current naming scheme I'm also a bit hesitant about changing the way files get named. With a .1 at the absolute end of the filename I _know_ this file got its name because there already was a file with the same name. If the new file instead is named filename-1.jpg I cannot be certain if this is because of a file collision, or if the original file really had this name, which of course it might have had. If a script is supposed to restore the original filename of a downloaded file (perhaps for future downloads), it's easy to just cut the trailing number, it there is one. How could that be done in an easy and secure way if there is an eventual number before the extension, a number that I don't even know if it's part of the original filename or not? And already having local files named -1.ext is not so uncommon. What happens if there is a local file with that name? -2.ext could be the answer, but that makes it really difficult to find downloaded files programmatically. And how is .tar.gz renamed? .tar-1.gz? Sorry, but I'm not so sure about this.. -- Andreas
Re: .1, .2 before suffix rather than after
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Christopher G. Lewis wrote: > Hmm - changing the rename schema would potentially create a HUGE issue with > clobbering. > > For example, and quite hypothetical... > > Given a directory with the following: > index.html > index-1.html > index.1.html > > All three are served by the server and rendered by the browser. They are > distinct files given the file system and the URL interpretation of the file > system by the web server. > > Now, Wget downloads index.html, then downloads it again. Our choices for > the second file are: > 1) index.html.1 > 2) index-1.html > 3) index.1.html > > Of the three, only #1 is pretty much guaranteed *not* to exist on the web > server. Why? Because by changing the extension, we've changed the content > type. So if our intentions are to not clobber (which, I believe, is the > whole point) we are *much* better off sticking with the current schema and > creating a file that most can't be served by the web server. Of course you are 100% correct that it is the whole point. However, while this is indeed a problem, I don't think it's a clobbering problem. I believe Wget would then choose (or could be made to then choose) index-2.html, etc, for the file which on the server is named index-1.html. Of course, while that would resolve clobbering, that would make it virtually impossible to determine what file had what local name, which is entirely unacceptable. I wonder how Wget currently handles perverse cases like index.html.1 actually existing on the server and already on the local system. :) > Note that this is quite a contrived example to illustrate the point. Yeah. Unfortunately, though, something like page-1.html, page-2.html, isn't quite so unlikely. It's intended that Reget (I'll call it that for now, until we figure out what the hell we're going to do with that whole cluster of functionality) will have support for a database of download-session metadata, that would handle mappings between the remote URI and the local file. With that, it'd be possible to construct a simple utility which could be invoked like, "reget-fmap http://example.com/foo.html"; and might spit out something like "./example.com/foo.html". This might couple quite well with providing a plugin hook to control the renaming scheme. Given your excellent points, and the fact that I didn't get the overwhelmingly positive response to this suggestion that I had anticipated, I'd better table this patch. :( > However, my 2 cents on the behavior - It would be *wonderful* if wget could > look at the local file system and rename each version to file.ext.n+1 so the > new download is index.html, not index.html.1. I've been caught a couple of > times with this, so to me the default behavior is backwards (ie, new file > s/b the URL, older files get versioned) That would of course be substantially more work, and provide even greater opportunity for race conditions/interoperability issues than we already do, but I agree that it'd be nice-to-have. Unfortunately, I don't think there's any way we'll ever do this in Wget: it'd be too confusing for people used to the current way. And while, as Hrvoje pointed out, the currently proposed suffixes patch could potentially break backwards compatibility, it's not likely to do so in a harmful/destructive way, whereas any current scripts that currently download files and then erase the renamed ones will suddenly be destroying the new data, rather than the old, if we reverse the renaming. :\ That problem is partly exacerbated by the fact that, from a certain perspective, we ought to be able to "stick our noses in the air" and claim that any scripts of that sort ought to have been telling Wget to clobber files, rather than letting Wget rename them and trying to delete them afterwards... but there is not currently a way to Wget to do that. There is no way to ask Wget to clobber files when it normally wouldn't. However, with the proper hook in Reget, it'd be easy enough to have a plugin that handles it this way. Actually, since Reget is looking to probably be an entirely new beast, and we'll certainly have to break compatibility with traditional Wget, we could consider making this the default renaming mechanism for Reget; but I'm still concerned about the extra work, race conditions, and potential for screwing with other programs that may be operating on some of the files involved. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHMKRh7M8hyUobTrERCFeGAJwM8yPR35j8rbsqkG8Vk8A1Bdm0YACggbBN 6s7EOEwhxCerjaeuQAblccw= =rdpM -END PGP SIGNATURE-
RE: .1, .2 before suffix rather than after
Hmm - changing the rename schema would potentially create a HUGE issue with clobbering. For example, and quite hypothetical... Given a directory with the following: index.html index-1.html index.1.html All three are served by the server and rendered by the browser. They are distinct files given the file system and the URL interpretation of the file system by the web server. Now, Wget downloads index.html, then downloads it again. Our choices for the second file are: 1) index.html.1 2) index-1.html 3) index.1.html Of the three, only #1 is pretty much guaranteed *not* to exist on the web server. Why? Because by changing the extension, we've changed the content type. So if our intentions are to not clobber (which, I believe, is the whole point) we are *much* better off sticking with the current schema and creating a file that most can't be served by the web server. Note that this is quite a contrived example to illustrate the point. However, my 2 cents on the behavior - It would be *wonderful* if wget could look at the local file system and rename each version to file.ext.n+1 so the new download is index.html, not index.html.1. I've been caught a couple of times with this, so to me the default behavior is backwards (ie, new file s/b the URL, older files get versioned) Chris Christopher G. Lewis http://www.ChristopherLewis.com > -Original Message- > From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] > Sent: Sunday, November 04, 2007 4:19 PM > To: Wget > Cc: Christian Roche > Subject: Re: .1, .2 before suffix rather than after > > Hrvoje Niksic <[EMAIL PROTECTED]> writes: > > > Micah Cowan <[EMAIL PROTECTED]> writes: > > > >> Christian Roche has submitted a revised version of a patch > to modify > >> the unique-name-finding algorithm to generate names in the pattern > >> "foo-n.html" rather than "foo.html.n". The patch looks good, and > >> will likely go in very soon. > > > > foo.html.n has the advantage of simplicity: you can tell at a glance > > that .n is a duplicate of . Also, it is trivial to remove > > the unwanted files by removing .*. > > It just occurred to me that this change breaks backward compatibility. > It will break scripts that try to clean up after Wget or that in any > way depend on the current naming scheme. > smime.p7s Description: S/MIME cryptographic signature
Re: .1, .2 before suffix rather than after
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hrvoje Niksic wrote: > Micah Cowan <[EMAIL PROTECTED]> writes: > >>> It just occurred to me that this change breaks backward compatibility. >>> It will break scripts that try to clean up after Wget or that in any >>> way depend on the current naming scheme. >> It may. I am not going to commit to never ever changing the current >> naming scheme. > > Agreed, but there should be a very good reason for changing it, and > the change should be a clear improvement. How do those reasons differ? :) > In my view, neither is the > case here. It seems like a fairly clear improvement to me; at least, I believe that the improvement would outweigh the rather mild risk that it might break something. It's a mild improvement, but it's an even milder risk, AFAICT. > For example, the change to respect the Content-Disposition > header constitutes a good reason[1]. (I don't seem to have the footnote you seem to have intended to put there.) I'm not sure how good an example Content-Disposition is, though, given that the risk of backwards-incompatibility is probably virtually nil. In that this is a more general change, whereas that is a specific change (to a certain subset of URLs). Of course, your opinion is important to me, and to be honest, I didn't expect to find any resistance to this idea (there were no comments besides mine on the original post back in July). So I welcome further feedback. However, at the moment, I don't see any compelling reason not to apply the change, and do find reason to apply it (interoperability seems like a desirable trait). Hm... wget-patches seems not to be archived... There's a supposed link to a gmane archive, but it's apparently empty. :\ That makes it difficult to refer to the original post from July 13. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHLuyC7M8hyUobTrERCJ1lAJ99AfiCPkjPra9UlBakgyKlUMhyFQCfY0ht 57y31BM4+6YFadFnhkVH62Q= =UQ5g -END PGP SIGNATURE-
Re: .1, .2 before suffix rather than after
Micah Cowan <[EMAIL PROTECTED]> writes: >> It just occurred to me that this change breaks backward compatibility. >> It will break scripts that try to clean up after Wget or that in any >> way depend on the current naming scheme. > > It may. I am not going to commit to never ever changing the current > naming scheme. Agreed, but there should be a very good reason for changing it, and the change should be a clear improvement. In my view, neither is the case here. For example, the change to respect the Content-Disposition header constitutes a good reason[1].
Re: .1, .2 before suffix rather than after
I don't care particularly how this stuff works, but if you'd like to do me a favor, please make sure, whatever the final scheme is, that it's easy to add the #ifdef for VMS to bypass the whole mess, because the file version numbers on VMS obviate it. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: .1, .2 before suffix rather than after
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hrvoje Niksic wrote: > Hrvoje Niksic <[EMAIL PROTECTED]> writes: > >> Micah Cowan <[EMAIL PROTECTED]> writes: >> >>> Christian Roche has submitted a revised version of a patch to modify >>> the unique-name-finding algorithm to generate names in the pattern >>> "foo-n.html" rather than "foo.html.n". The patch looks good, and >>> will likely go in very soon. >> foo.html.n has the advantage of simplicity: you can tell at a glance >> that .n is a duplicate of . Also, it is trivial to remove >> the unwanted files by removing .*. > > It just occurred to me that this change breaks backward compatibility. > It will break scripts that try to clean up after Wget or that in any > way depend on the current naming scheme. It may. I am not going to commit to never ever changing the current naming scheme. It is the responsibility of the upgrader to read the NEWS file, after all. Obviously I don't want to wantonly break backward compatibility, but this seems like a worthwhile change, and I can't imagine there being a particularly high number of such scripts. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHLlyk7M8hyUobTrERCD/XAJ9YQEoqdz4pFJi3OQlocjBFPz4ADwCfUu4D w+tkP1DrkvZxnosFcpV2jH4= =flxY -END PGP SIGNATURE-
Re: .1, .2 before suffix rather than after
On 11/4/07, Hrvoje Niksic <[EMAIL PROTECTED]> wrote: > It just occurred to me that this change breaks backward compatibility. > It will break scripts that try to clean up after Wget or that in any > way depend on the current naming scheme. > You mean the scripts that fix the same problem this patch does? ;-)
Re: .1, .2 before suffix rather than after
Hrvoje Niksic <[EMAIL PROTECTED]> writes: > Micah Cowan <[EMAIL PROTECTED]> writes: > >> Christian Roche has submitted a revised version of a patch to modify >> the unique-name-finding algorithm to generate names in the pattern >> "foo-n.html" rather than "foo.html.n". The patch looks good, and >> will likely go in very soon. > > foo.html.n has the advantage of simplicity: you can tell at a glance > that .n is a duplicate of . Also, it is trivial to remove > the unwanted files by removing .*. It just occurred to me that this change breaks backward compatibility. It will break scripts that try to clean up after Wget or that in any way depend on the current naming scheme.
Re: .1, .2 before suffix rather than after
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hrvoje Niksic wrote: > Micah Cowan <[EMAIL PROTECTED]> writes: > >> Christian Roche has submitted a revised version of a patch to modify >> the unique-name-finding algorithm to generate names in the pattern >> "foo-n.html" rather than "foo.html.n". The patch looks good, and >> will likely go in very soon. > > foo.html.n has the advantage of simplicity: you can tell at a glance > that .n is a duplicate of . Also, it is trivial to remove > the unwanted files by removing .*. Why change what worked so > well in the past? Well, the original motivation for Chris was that it was actually interfering with the accept/reject rules; see the log.txt attachment at https://savannah.gnu.org/bugs/index.php?20482; this behavior is also related to the -nd/-r behavior I brought up yesterday. However, that's obviously not a good long-term fix for the problem; the real reason _I_ like it, is that it preserves the type of the files, on systems/applications that depend on the filename extension to identify it. Most browsers I've seen, including Lynx (though for Lynx you can specify a flag to override it, I think) depend on this, at least for HTML; and even for JPEgs and such on Unixen it is often beneficial to have an extension that matches the type. It automatically gives an "-E"-like benefit (for this instance; not for URLs that don't end with appropriate extensions). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHLkQ47M8hyUobTrERCKpvAJkBzlvl9td1pRmzfZqJmRM9M8LtJQCcCHl6 yDVeZRljJ2QSISmTxVQ/oLI= =Z+7T -END PGP SIGNATURE-
Re: .1, .2 before suffix rather than after
Micah Cowan <[EMAIL PROTECTED]> writes: > Christian Roche has submitted a revised version of a patch to modify > the unique-name-finding algorithm to generate names in the pattern > "foo-n.html" rather than "foo.html.n". The patch looks good, and > will likely go in very soon. foo.html.n has the advantage of simplicity: you can tell at a glance that .n is a duplicate of . Also, it is trivial to remove the unwanted files by removing .*. Why change what worked so well in the past? > A couple of minor detail questions: what do you guys think about using > "foo.n.html" instead of "foo-n.html"? Better, but IMHO not as good as foo.html.n. But I'm obviously biased. :-)
Re: .1, .2 before suffix rather than after
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Josh Williams wrote: > On 11/4/07, Micah Cowan <[EMAIL PROTECTED]> wrote: >> Christian Roche has submitted a revised version of a patch to modify the >> unique-name-finding algorithm to generate names in the pattern >> "foo-n.html" rather than "foo.html.n". The patch looks good, and will >> likely go in very soon. > > That's something I had meant to submit a bug report for a while back, > but somehow never found the time to do it. I guess it wasn't my top > priority since GNU/Linux is usually smart enough to ignore the file > extensions anyways. I have not found that to be generally true; and particularly in the case of HTML files, which is most relevant here. >> A couple of minor detail questions: what do you guys think about using >> "foo.n.html" instead of "foo-n.html"? And (this one to Gisle), how would >> this naming convention affect DOS (and, BTW, how does the current one >> hold up on DOS)? > > Well, this problem is mainly for win32 users, so I think we need to > keep sloppy coding in mind. It's been my experience that *man* win32 > programs will treat everything after the first period as the file > extension. > > Honestly, I don't see any reason to risk the annoyance of these kinds > of bugs. Just go with the dash. Yeah, and that was probably the reason for it. > (On a side note, have you thought of running FreeDOS in a virtual machine?) I have, but haven't gotten around to it, and probably won't for a while. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHLizQ7M8hyUobTrERCACFAJ4oJ/y+EGLiRyCj+qLaxbAEFWkSSwCfc5pQ dS3sv26PHop1Hfz73FcpFRg= =lVrq -END PGP SIGNATURE-
Re: .1, .2 before suffix rather than after
On 11/4/07, Micah Cowan <[EMAIL PROTECTED]> wrote: > Christian Roche has submitted a revised version of a patch to modify the > unique-name-finding algorithm to generate names in the pattern > "foo-n.html" rather than "foo.html.n". The patch looks good, and will > likely go in very soon. That's something I had meant to submit a bug report for a while back, but somehow never found the time to do it. I guess it wasn't my top priority since GNU/Linux is usually smart enough to ignore the file extensions anyways. > A couple of minor detail questions: what do you guys think about using > "foo.n.html" instead of "foo-n.html"? And (this one to Gisle), how would > this naming convention affect DOS (and, BTW, how does the current one > hold up on DOS)? Well, this problem is mainly for win32 users, so I think we need to keep sloppy coding in mind. It's been my experience that *man* win32 programs will treat everything after the first period as the file extension. Honestly, I don't see any reason to risk the annoyance of these kinds of bugs. Just go with the dash. (On a side note, have you thought of running FreeDOS in a virtual machine?)
.1, .2 before suffix rather than after
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Christian Roche has submitted a revised version of a patch to modify the unique-name-finding algorithm to generate names in the pattern "foo-n.html" rather than "foo.html.n". The patch looks good, and will likely go in very soon. A couple of minor detail questions: what do you guys think about using "foo.n.html" instead of "foo-n.html"? And (this one to Gisle), how would this naming convention affect DOS (and, BTW, how does the current one hold up on DOS)? If I don't get an answer soon, I'll probably just go ahead and apply the patch, and plan to make any necessary adjustments later. I suspect that if DOS, Windows, or other systems need special treatment, they'll need to use their own version of unique_name_1 anyway. I've attached the patch for reference. The only beefs I currently have with it is that we should prefer strrchr() to a for-loop; and I'd prefer more robust handling of the alloca'd buffer size (but these are easily fixed). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHLhQx7M8hyUobTrERCEUoAJ9dO7OK6X8B4YraDTptgmjMrEYnTgCgirvE JVFv+RUdcwONlOf2/OKaAPM= =8nRY -END PGP SIGNATURE- diff -r ca1ba64545bc doc/ChangeLog --- a/doc/ChangeLog Tue Oct 23 12:34:10 2007 -0700 +++ b/doc/ChangeLog Sat Nov 03 12:49:25 2007 + @@ -1,3 +1,8 @@ 2007-10-13 Micah Cowan <[EMAIL PROTECTED] +2007-10-29 Christian Roche <[EMAIL PROTECTED]> + + * wget.texi: + Updated description of file renaming scheme. + 2007-10-13 Micah Cowan <[EMAIL PROTECTED]> * wget.texi : Replaced mention of no-longer diff -r ca1ba64545bc doc/wget.texi --- a/doc/wget.texi Tue Oct 23 12:34:10 2007 -0700 +++ b/doc/wget.texi Sat Nov 03 12:49:25 2007 + @@ -573,18 +573,18 @@ cases, the local file will be @dfn{clobb cases, the local file will be @dfn{clobbered}, or overwritten, upon repeated download. In other cases it will be preserved. -When running Wget without @samp{-N}, @samp{-nc}, @samp{-r}, or @samp{p}, -downloading the same file in the same directory will result in the -original copy of @var{file} being preserved and the second copy being -named @[EMAIL PROTECTED] If that file is downloaded yet again, the -third copy will be named @[EMAIL PROTECTED], and so on. When [EMAIL PROTECTED] is specified, this behavior is suppressed, and Wget will -refuse to download newer copies of @[EMAIL PROTECTED] Therefore, [EMAIL PROTECTED]'' is actually a misnomer in this mode---it's not -clobbering that's prevented (as the numeric suffixes were already -preventing clobbering), but rather the multiple version saving that's +When running Wget without @samp{-N}, @samp{-nc}, or @samp{-r}, downloading the +same file in the same directory will result in the original copy of @var{file} +being preserved and the second copy being named [EMAIL PROTECTED]@[EMAIL PROTECTED], assuming @var{file} = @var{prefix.suffix}. +If that file is downloaded yet again, the third copy will be named [EMAIL PROTECTED]@[EMAIL PROTECTED], and so on. When @samp{-nc} is specified, +this behavior is suppressed, and Wget will refuse to download newer copies of [EMAIL PROTECTED]@var{file}}. Therefore, [EMAIL PROTECTED]'' is actually a misnomer in +this mode---it's not clobbering that's prevented (as the numeric suffixes were +already preventing clobbering), but rather the multiple version saving that's prevented. - + When running Wget with @samp{-r} or @samp{-p}, but without @samp{-N} or @samp{-nc}, re-downloading a file will result in the new copy simply overwriting the old. Adding @samp{-nc} will prevent this @@ -1611,7 +1611,7 @@ details. @item -l @var{depth} @itemx [EMAIL PROTECTED] Specify recursion maximum depth level @var{depth} (@pxref{Recursive -Download}). The default maximum depth is 5. +Download}). The default maximum depth is 5. Zero means infinite recursion. @cindex proxy filling @cindex delete after retrieval diff -r ca1ba64545bc src/ChangeLog --- a/src/ChangeLog Tue Oct 23 12:34:10 2007 -0700 +++ b/src/ChangeLog Sat Nov 03 12:52:17 2007 + @@ -1,3 +1,13 @@ 2007-10-22 Gisle Vanem <[EMAIL PROTECTED] +2007-10-29 Christian Roche <[EMAIL PROTECTED]> + + * utils.c (unique_name_1): + Modified filename generation scheme when avoiding clobbering to preserve file extensions. + + * recurc.c (download_child_p, point 6): + When checking whether a URL should be treated as HTML, use + link_expect_html flag instead of relying on the written file extension + by calling has_html_suffix_p. + 2007-10-22 Gisle Vanem <[EMAIL PROTECTED]> * mswindows.c: Move INHIBIT_WRAP macro definition up with wget.h diff -r ca1ba64545bc src/recur.c --- a/src/recur.c Tue Oct 23 12:34:10 2007 -0700 +++ b/src/recur.c Sat N