.1, .2 before suffix rather than after

2007-11-04 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Christian Roche has submitted a revised version of a patch to modify the
unique-name-finding algorithm to generate names in the pattern
foo-n.html rather than foo.html.n. The patch looks good, and will
likely go in very soon.

A couple of minor detail questions: what do you guys think about using
foo.n.html instead of foo-n.html? And (this one to Gisle), how would
this naming convention affect DOS (and, BTW, how does the current one
hold up on DOS)?

If I don't get an answer soon, I'll probably just go ahead and apply the
patch, and plan to make any necessary adjustments later. I suspect that
if DOS, Windows, or other systems need special treatment, they'll need
to use their own version of unique_name_1 anyway.

I've attached the patch for reference. The only beefs I currently have
with it is that we should prefer strrchr() to a for-loop; and I'd prefer
more robust handling of the alloca'd buffer size (but these are easily
fixed).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHLhQx7M8hyUobTrERCEUoAJ9dO7OK6X8B4YraDTptgmjMrEYnTgCgirvE
JVFv+RUdcwONlOf2/OKaAPM=
=8nRY
-END PGP SIGNATURE-
diff -r ca1ba64545bc doc/ChangeLog
--- a/doc/ChangeLog Tue Oct 23 12:34:10 2007 -0700
+++ b/doc/ChangeLog Sat Nov 03 12:49:25 2007 +
@@ -1,3 +1,8 @@ 2007-10-13  Micah Cowan  [EMAIL PROTECTED]
+2007-10-29  Christian Roche [EMAIL PROTECTED]
+
+   * wget.texi:
+   Updated description of file renaming scheme.
+
 2007-10-13  Micah Cowan  [EMAIL PROTECTED]
 
* wget.texi Mailing Lists: Replaced mention of no-longer
diff -r ca1ba64545bc doc/wget.texi
--- a/doc/wget.texi Tue Oct 23 12:34:10 2007 -0700
+++ b/doc/wget.texi Sat Nov 03 12:49:25 2007 +
@@ -573,18 +573,18 @@ cases, the local file will be @dfn{clobb
 cases, the local file will be @dfn{clobbered}, or overwritten, upon
 repeated download.  In other cases it will be preserved.
 
-When running Wget without @samp{-N}, @samp{-nc}, @samp{-r}, or @samp{p},
-downloading the same file in the same directory will result in the
-original copy of @var{file} being preserved and the second copy being
-named @[EMAIL PROTECTED]  If that file is downloaded yet again, the
-third copy will be named @[EMAIL PROTECTED], and so on.  When
[EMAIL PROTECTED] is specified, this behavior is suppressed, and Wget will
-refuse to download newer copies of @[EMAIL PROTECTED]  Therefore,
[EMAIL PROTECTED]'' is actually a misnomer in this mode---it's not
-clobbering that's prevented (as the numeric suffixes were already
-preventing clobbering), but rather the multiple version saving that's
+When running Wget without @samp{-N}, @samp{-nc}, or @samp{-r}, downloading the
+same file in the same directory will result in the original copy of @var{file}
+being preserved and the second copy being named
[EMAIL PROTECTED]@[EMAIL PROTECTED], assuming @var{file} = @var{prefix.suffix}.
+If that file is downloaded yet again, the third copy will be named
[EMAIL PROTECTED]@[EMAIL PROTECTED], and so on. When @samp{-nc} is specified,
+this behavior is suppressed, and Wget will refuse to download newer copies of
[EMAIL PROTECTED]@var{file}}. Therefore, [EMAIL PROTECTED]'' is actually a 
misnomer in
+this mode---it's not clobbering that's prevented (as the numeric suffixes were
+already preventing clobbering), but rather the multiple version saving that's
 prevented.
-
+  
 When running Wget with @samp{-r} or @samp{-p}, but without @samp{-N}
 or @samp{-nc}, re-downloading a file will result in the new copy
 simply overwriting the old.  Adding @samp{-nc} will prevent this
@@ -1611,7 +1611,7 @@ details.
 @item -l @var{depth}
 @itemx [EMAIL PROTECTED]
 Specify recursion maximum depth level @var{depth} (@pxref{Recursive
-Download}).  The default maximum depth is 5.
+Download}).  The default maximum depth is 5.  Zero means infinite recursion.
 
 @cindex proxy filling
 @cindex delete after retrieval
diff -r ca1ba64545bc src/ChangeLog
--- a/src/ChangeLog Tue Oct 23 12:34:10 2007 -0700
+++ b/src/ChangeLog Sat Nov 03 12:52:17 2007 +
@@ -1,3 +1,13 @@ 2007-10-22  Gisle Vanem  [EMAIL PROTECTED]
+2007-10-29  Christian Roche [EMAIL PROTECTED]
+
+   * utils.c (unique_name_1):
+   Modified filename generation scheme when avoiding clobbering to 
preserve file extensions.
+   
+   * recurc.c (download_child_p, point 6):
+   When checking whether a URL should be treated as HTML, use
+   link_expect_html flag instead of relying on the written file extension
+   by calling has_html_suffix_p.
+
 2007-10-22  Gisle Vanem  [EMAIL PROTECTED]
 
* mswindows.c: Move INHIBIT_WRAP macro definition up with wget.h
diff -r ca1ba64545bc src/recur.c
--- a/src/recur.c   Tue Oct 23 12:34:10 2007 -0700
+++ b/src/recur.c   Sat Nov 

Re: .1, .2 before suffix rather than after

2007-11-04 Thread Josh Williams
On 11/4/07, Micah Cowan [EMAIL PROTECTED] wrote:
 Christian Roche has submitted a revised version of a patch to modify the
 unique-name-finding algorithm to generate names in the pattern
 foo-n.html rather than foo.html.n. The patch looks good, and will
 likely go in very soon.

That's something I had meant to submit a bug report for a while back,
but somehow never found the time to do it. I guess it wasn't my top
priority since GNU/Linux is usually smart enough to ignore the file
extensions anyways.

 A couple of minor detail questions: what do you guys think about using
 foo.n.html instead of foo-n.html? And (this one to Gisle), how would
 this naming convention affect DOS (and, BTW, how does the current one
 hold up on DOS)?

Well, this problem is  mainly for win32 users, so I think we need to
keep sloppy coding in mind. It's been my experience that *man* win32
programs will treat everything after the first period as the file
extension.

Honestly, I don't see any reason to risk the annoyance of these kinds
of bugs. Just go with the dash.

(On a side note, have you thought of running FreeDOS in a virtual machine?)


Re: .1, .2 before suffix rather than after

2007-11-04 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Josh Williams wrote:
 On 11/4/07, Micah Cowan [EMAIL PROTECTED] wrote:
 Christian Roche has submitted a revised version of a patch to modify the
 unique-name-finding algorithm to generate names in the pattern
 foo-n.html rather than foo.html.n. The patch looks good, and will
 likely go in very soon.
 
 That's something I had meant to submit a bug report for a while back,
 but somehow never found the time to do it. I guess it wasn't my top
 priority since GNU/Linux is usually smart enough to ignore the file
 extensions anyways.

I have not found that to be generally true; and particularly in the case
of HTML files, which is most relevant here.

 A couple of minor detail questions: what do you guys think about using
 foo.n.html instead of foo-n.html? And (this one to Gisle), how would
 this naming convention affect DOS (and, BTW, how does the current one
 hold up on DOS)?
 
 Well, this problem is  mainly for win32 users, so I think we need to
 keep sloppy coding in mind. It's been my experience that *man* win32
 programs will treat everything after the first period as the file
 extension.
 
 Honestly, I don't see any reason to risk the annoyance of these kinds
 of bugs. Just go with the dash.

Yeah, and that was probably the reason for it.

 (On a side note, have you thought of running FreeDOS in a virtual machine?)

I have, but haven't gotten around to it, and probably won't for a while.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHLizQ7M8hyUobTrERCACFAJ4oJ/y+EGLiRyCj+qLaxbAEFWkSSwCfc5pQ
dS3sv26PHop1Hfz73FcpFRg=
=lVrq
-END PGP SIGNATURE-


Re: .1, .2 before suffix rather than after

2007-11-04 Thread Hrvoje Niksic
Micah Cowan [EMAIL PROTECTED] writes:

 Christian Roche has submitted a revised version of a patch to modify
 the unique-name-finding algorithm to generate names in the pattern
 foo-n.html rather than foo.html.n. The patch looks good, and
 will likely go in very soon.

foo.html.n has the advantage of simplicity: you can tell at a glance
that foo.n is a duplicate of foo.  Also, it is trivial to remove
the unwanted files by removing foo.*.  Why change what worked so
well in the past?

 A couple of minor detail questions: what do you guys think about using
 foo.n.html instead of foo-n.html?

Better, but IMHO not as good as foo.html.n.  But I'm obviously biased.
:-)


Re: .1, .2 before suffix rather than after

2007-11-04 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hrvoje Niksic wrote:
 Micah Cowan [EMAIL PROTECTED] writes:
 
 Christian Roche has submitted a revised version of a patch to modify
 the unique-name-finding algorithm to generate names in the pattern
 foo-n.html rather than foo.html.n. The patch looks good, and
 will likely go in very soon.
 
 foo.html.n has the advantage of simplicity: you can tell at a glance
 that foo.n is a duplicate of foo.  Also, it is trivial to remove
 the unwanted files by removing foo.*.  Why change what worked so
 well in the past?

Well, the original motivation for Chris was that it was actually
interfering with the accept/reject rules; see the log.txt attachment at
https://savannah.gnu.org/bugs/index.php?20482; this behavior is also
related to the -nd/-r behavior I brought up yesterday.

However, that's obviously not a good long-term fix for the problem; the
real reason _I_ like it, is that it preserves the type of the files, on
systems/applications that depend on the filename extension to identify
it. Most browsers I've seen, including Lynx (though for Lynx you can
specify a flag to override it, I think) depend on this, at least for
HTML; and even for JPEgs and such on Unixen it is often beneficial to
have an extension that matches the type. It automatically gives an
-E-like benefit (for this instance; not for URLs that don't end with
appropriate extensions).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHLkQ47M8hyUobTrERCKpvAJkBzlvl9td1pRmzfZqJmRM9M8LtJQCcCHl6
yDVeZRljJ2QSISmTxVQ/oLI=
=Z+7T
-END PGP SIGNATURE-


Re: .1, .2 before suffix rather than after

2007-11-04 Thread Hrvoje Niksic
Hrvoje Niksic [EMAIL PROTECTED] writes:

 Micah Cowan [EMAIL PROTECTED] writes:

 Christian Roche has submitted a revised version of a patch to modify
 the unique-name-finding algorithm to generate names in the pattern
 foo-n.html rather than foo.html.n. The patch looks good, and
 will likely go in very soon.

 foo.html.n has the advantage of simplicity: you can tell at a glance
 that foo.n is a duplicate of foo.  Also, it is trivial to remove
 the unwanted files by removing foo.*.

It just occurred to me that this change breaks backward compatibility.
It will break scripts that try to clean up after Wget or that in any
way depend on the current naming scheme.


Re: .1, .2 before suffix rather than after

2007-11-04 Thread Josh Williams
On 11/4/07, Hrvoje Niksic [EMAIL PROTECTED] wrote:
 It just occurred to me that this change breaks backward compatibility.
 It will break scripts that try to clean up after Wget or that in any
 way depend on the current naming scheme.


You mean the scripts that fix the same problem this patch does? ;-)


Re: .1, .2 before suffix rather than after

2007-11-04 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hrvoje Niksic wrote:
 Hrvoje Niksic [EMAIL PROTECTED] writes:
 
 Micah Cowan [EMAIL PROTECTED] writes:

 Christian Roche has submitted a revised version of a patch to modify
 the unique-name-finding algorithm to generate names in the pattern
 foo-n.html rather than foo.html.n. The patch looks good, and
 will likely go in very soon.
 foo.html.n has the advantage of simplicity: you can tell at a glance
 that foo.n is a duplicate of foo.  Also, it is trivial to remove
 the unwanted files by removing foo.*.
 
 It just occurred to me that this change breaks backward compatibility.
 It will break scripts that try to clean up after Wget or that in any
 way depend on the current naming scheme.

It may. I am not going to commit to never ever changing the current
naming scheme. It is the responsibility of the upgrader to read the NEWS
file, after all.

Obviously I don't want to wantonly break backward compatibility, but
this seems like a worthwhile change, and I can't imagine there being a
particularly high number of such scripts.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHLlyk7M8hyUobTrERCD/XAJ9YQEoqdz4pFJi3OQlocjBFPz4ADwCfUu4D
w+tkP1DrkvZxnosFcpV2jH4=
=flxY
-END PGP SIGNATURE-


Re: .1, .2 before suffix rather than after

2007-11-04 Thread Steven M. Schweda
   I don't care particularly how this stuff works, but if you'd like to
do me a favor, please make sure, whatever the final scheme is, that it's
easy to add the #ifdef for VMS to bypass the whole mess, because the
file version numbers on VMS obviate it.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547