.1, .2 before suffix rather than after

Micah Cowan Sun, 04 Nov 2007 10:49:44 -0800

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Christian Roche has submitted a revised version of a patch to modify the
unique-name-finding algorithm to generate names in the pattern
"foo-n.html" rather than "foo.html.n". The patch looks good, and will
likely go in very soon.


A couple of minor detail questions: what do you guys think about using
"foo.n.html" instead of "foo-n.html"? And (this one to Gisle), how would
this naming convention affect DOS (and, BTW, how does the current one
hold up on DOS)?

If I don't get an answer soon, I'll probably just go ahead and apply the
patch, and plan to make any necessary adjustments later. I suspect that
if DOS, Windows, or other systems need special treatment, they'll need
to use their own version of unique_name_1 anyway.

I've attached the patch for reference. The only beefs I currently have
with it is that we should prefer strrchr() to a for-loop; and I'd prefer
more robust handling of the alloca'd buffer size (but these are easily
fixed).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHLhQx7M8hyUobTrERCEUoAJ9dO7OK6X8B4YraDTptgmjMrEYnTgCgirvE
JVFv+RUdcwONlOf2/OKaAPM=
=8nRY
-----END PGP SIGNATURE-----

diff -r ca1ba64545bc doc/ChangeLog
--- a/doc/ChangeLog     Tue Oct 23 12:34:10 2007 -0700
+++ b/doc/ChangeLog     Sat Nov 03 12:49:25 2007 +0000
@@ -1,3 +1,8 @@ 2007-10-13  Micah Cowan  <[EMAIL PROTECTED]
+2007-10-29  Christian Roche <[EMAIL PROTECTED]>
+
+       * wget.texi:
+       Updated description of file renaming scheme.
+
 2007-10-13  Micah Cowan  <[EMAIL PROTECTED]>
 
        * wget.texi <Mailing Lists>: Replaced mention of no-longer
diff -r ca1ba64545bc doc/wget.texi
--- a/doc/wget.texi     Tue Oct 23 12:34:10 2007 -0700
+++ b/doc/wget.texi     Sat Nov 03 12:49:25 2007 +0000
@@ -573,18 +573,18 @@ cases, the local file will be @dfn{clobb
 cases, the local file will be @dfn{clobbered}, or overwritten, upon
 repeated download.  In other cases it will be preserved.
 
-When running Wget without @samp{-N}, @samp{-nc}, @samp{-r}, or @samp{p},
-downloading the same file in the same directory will result in the
-original copy of @var{file} being preserved and the second copy being
-named @[EMAIL PROTECTED]  If that file is downloaded yet again, the
-third copy will be named @[EMAIL PROTECTED], and so on.  When
[EMAIL PROTECTED] is specified, this behavior is suppressed, and Wget will
-refuse to download newer copies of @[EMAIL PROTECTED]  Therefore,
[EMAIL PROTECTED]'' is actually a misnomer in this mode---it's not
-clobbering that's prevented (as the numeric suffixes were already
-preventing clobbering), but rather the multiple version saving that's
+When running Wget without @samp{-N}, @samp{-nc}, or @samp{-r}, downloading the
+same file in the same directory will result in the original copy of @var{file}
+being preserved and the second copy being named
[EMAIL PROTECTED]@[EMAIL PROTECTED], assuming @var{file} = @var{prefix.suffix}.
+If that file is downloaded yet again, the third copy will be named
[EMAIL PROTECTED]@[EMAIL PROTECTED], and so on. When @samp{-nc} is specified,
+this behavior is suppressed, and Wget will refuse to download newer copies of
[EMAIL PROTECTED]@var{file}}. Therefore, [EMAIL PROTECTED]'' is actually a 
misnomer in
+this mode---it's not clobbering that's prevented (as the numeric suffixes were
+already preventing clobbering), but rather the multiple version saving that's
 prevented.
-
+  
 When running Wget with @samp{-r} or @samp{-p}, but without @samp{-N}
 or @samp{-nc}, re-downloading a file will result in the new copy
 simply overwriting the old.  Adding @samp{-nc} will prevent this
@@ -1611,7 +1611,7 @@ details.
 @item -l @var{depth}
 @itemx [EMAIL PROTECTED]
 Specify recursion maximum depth level @var{depth} (@pxref{Recursive
-Download}).  The default maximum depth is 5.
+Download}).  The default maximum depth is 5.  Zero means infinite recursion.
 
 @cindex proxy filling
 @cindex delete after retrieval
diff -r ca1ba64545bc src/ChangeLog
--- a/src/ChangeLog     Tue Oct 23 12:34:10 2007 -0700
+++ b/src/ChangeLog     Sat Nov 03 12:52:17 2007 +0000
@@ -1,3 +1,13 @@ 2007-10-22  Gisle Vanem  <[EMAIL PROTECTED]
+2007-10-29  Christian Roche <[EMAIL PROTECTED]>
+
+       * utils.c (unique_name_1):
+       Modified filename generation scheme when avoiding clobbering to 
preserve file extensions.
+       
+       * recurc.c (download_child_p, point 6):
+       When checking whether a URL should be treated as HTML, use
+       link_expect_html flag instead of relying on the written file extension
+       by calling has_html_suffix_p.
+
 2007-10-22  Gisle Vanem  <[EMAIL PROTECTED]>
 
        * mswindows.c: Move INHIBIT_WRAP macro definition up with wget.h
diff -r ca1ba64545bc src/recur.c
--- a/src/recur.c       Tue Oct 23 12:34:10 2007 -0700
+++ b/src/recur.c       Sat Nov 03 12:49:25 2007 +0000
@@ -531,7 +531,7 @@ download_child_p (const struct urlpos *u
      automatically implies non-leaf because with -p we can, if
      necesary, overstep the maximum depth to get the page requisites.)  */
   if (u->file[0] != '\0'
-      && !(has_html_suffix_p (u->file)
+      && !(upos->link_expect_html
            /* The exception only applies to non-leaf HTMLs (but -p
               always implies non-leaf because we can overstep the
               maximum depth to get the requisites): */
diff -r ca1ba64545bc src/utils.c
--- a/src/utils.c       Tue Oct 23 12:34:10 2007 -0700
+++ b/src/utils.c       Sat Nov 03 12:49:25 2007 +0000
@@ -435,33 +435,48 @@ file_size (const char *filename)
 #endif
 }
 
-/* stat file names named PREFIX.1, PREFIX.2, etc., until one that
-   doesn't exist is found.  Return a freshly allocated copy of the
-   unused file name.  */
+/*
+ * Stat file names named PREFIX-1.SUFFIX, PREFIX-2.SUFFIX, etc., until
+ * one that doesn't exist is found. Return a freshly allocated copy of
+ * the unused file name.
+ */
 
 static char *
-unique_name_1 (const char *prefix)
+unique_name_1 (const char *s)
 {
   int count = 1;
-  int plen = strlen (prefix);
-  char *template = (char *)alloca (plen + 1 + 24);
-  char *template_tail = template + plen;
-
-  memcpy (template, prefix, plen);
-  *template_tail++ = '.';
-
-  do
-    number_to_string (template_tail, count++);
-  while (file_exists_p (template));
-
-  return xstrdup (template);
+  int p, l = strlen (s);
+  char *t, *filename = (char *) alloca (l + 26);
+  
+  /* Look for last '.' in filename */
+  
+  for(p = l; p >= 0 && s[p] != '.'; p--);
+
+  /* If none found, then prefix is the whole filename */
+  
+  if (p < 0)
+    p = l;
+
+  /* Copy constant prefix */
+
+  memcpy (filename, s, p);
+  filename[p] = '-';
+
+  /* Try indexed filenames until an unused one is found */
+
+  do {
+      t = number_to_string (filename+p+1, count++); /* Add index */
+      memcpy (t, s+p, l-p+1); /* Add suffix and trailing NUL */
+  } while (file_exists_p (filename));
+
+  return xstrdup (filename);
 }
 
 /* Return a unique file name, based on FILE.
 
-   More precisely, if FILE doesn't exist, it is returned unmodified.
-   If not, FILE.1 is tried, then FILE.2, etc.  The first FILE.<number>
-   file name that doesn't exist is returned.
+   More precisely, if FILE.SUF doesn't exist, it is returned unmodified.
+   If not, FILE-1.SUF is tried, then FILE-2.SUF etc.  The first
+   FILE-<number>.SUF file name that doesn't exist is returned.
 
    The resulting file is not created, only verified that it didn't
    exist at the point in time when the function was called.

new-patch-utils.txt.sig
Description: Binary data

.1, .2 before suffix rather than after

Reply via email to