This patch makes URL file name generation a bit more flexible and,
hopefully, better for the end-user. It does two things:
* Decouples file name quoting from URL quoting. The conflation of the
two has been an endless source of annoyance for users. For example,
space *has* to be quoted in URLs, but you don't really want to quote
it in file names.
* Gives the user more control over the quoting mechanism. There are
now several quoting levels:
--restrict-file-names=none - no restriction, only quote / and \0
--restrict-file-names=unix - quote the above, plus chars in the
0-31 and in the 128-159 range, which
are not printable in the shell.
--restrict-file-names=windows - quote the above, plus chars
disallowed on Windows: \, |, , ,
?, :, *, and .
The default windows under Windows and Cygwin and unix elsewhere.
This patch should supersede the various patches that have been
floating around that fix the problem in a limited fashion. Please
test this patch and let me know if it works for you, and if something
else is needed.
2003-09-14 Hrvoje Niksic [EMAIL PROTECTED]
* url.c (append_uri_pathel): Use opt.restrict_file_names when
calling file_unsafe_char.
* init.c: New command restrict_file_names.
* main.c (main): New option --restrict-file-names[=windows,unix].
* url.c (url_file_name): Renamed from url_filename.
(url_file_name): Add directory and hostdir prefix here, not in
mkstruct.
(append_dir_structure): New function, does part of the work that
used to be in mkstruct. Iterates over path elements in u-path,
calling append_uri_pathel on each one to append it to the file
name.
(append_uri_pathel): URL-unescape a path element and reencode it
with a different set of rules, more appropriate for handling of
files.
(file_unsafe_char): New function, uses a lookup table to decide
whether a character should be escaped for use in file name.
(append_string): New utility function.
(append_char): Ditto.
(file_unsafe_char): New argument restrict_for_windows, decide
whether Windows file names should be escaped in run-time.
* connect.c: Include stdlib.h to get prototype for abort().
Index: NEWS
===
RCS file: /pack/anoncvs/wget/NEWS,v
retrieving revision 1.38
diff -u -r1.38 NEWS
--- NEWS2003/09/10 20:21:13 1.38
+++ NEWS2003/09/14 21:45:48
@@ -7,8 +7,6 @@
* Changes in Wget 1.9.
-** The build process now requires Autoconf 2.5x.
-
** It is now possible to specify that POST method be used for HTTP
requests. For example, `wget --post-data=id=foodata=bar URL' will
send a POST request with the specified contents.
@@ -32,6 +30,15 @@
** The new option `--dns-cache=off' may be used to prevent Wget from
caching DNS lookups.
+
+** The build process now requires Autoconf 2.5x.
+
+** Wget no longer quotes characters in local file names that would be
+considered unsafe as part of URL. Quoting can still occur for
+control characters or for '/', but no longer for frequent characters
+such as space. You can use the new option --restrict-file-names to
+enforce even stricter rules, which is useful when downloading to
+Windows partitions.
* Wget 1.8.1 is a bugfix release with no user-visible changes.
Index: doc/wget.texi
===
RCS file: /pack/anoncvs/wget/doc/wget.texi,v
retrieving revision 1.68
diff -u -r1.68 wget.texi
--- doc/wget.texi 2003/09/10 19:41:50 1.68
+++ doc/wget.texi 2003/09/14 21:46:10
@@ -800,6 +800,39 @@
If you don't understand the above description, you probably won't need
this option.
+
[EMAIL PROTECTED] file names, restrict
[EMAIL PROTECTED] Windows file names
[EMAIL PROTECTED] --restrict-file-names=none|unix|windows
+Restrict characters that may occur in local file names created by Wget
+from remote URLs. Characters that are considered @dfn{unsafe} under a
+set of restrictions are escaped, i.e. replaced with @samp{%XX}, where
[EMAIL PROTECTED] is the hexadecimal code of the character.
+
+The default for this option depends on the operating system: on Unix and
+Unix-like OS'es, it defaults to ``unix''. Under Windows and Cygwin, it
+defaults to ``windows''. Changing the default is useful when you are
+using a non-native partition, e.g. when downloading files to a Windows
+partition mounted from Linux, or when using NFS-mounted or SMB-mounted
+Windows drives.
+
+When set to ``none'', the only characters that are quoted are those that
+are impossible to get into a file name---the NUL character and @samp{/}.
+The control characters, newline, etc. are all placed into file names.
+
+When set to ``unix'',