Hi,
Jean-Marc MOLINA schrieb:
> I have an other opinion about that limitation. Could it be considered as a
> bug ? From the "Types of Files" section of the manual we can read : « Note
> that these two options do not affect the downloading of html files; Wget
> must load all the htmls to know where to go at all-recursive retrieval would
> make no sense otherwise. ». It means the accept and reject options don't
> work on HTML files. But I think they should because, special in this case,
> you deliberately have to exclude them. Excluding them makes sense. So I
> don't really know what to do... Consider the problem as a bug, as a new
> feature to implement or as an existing feature that should be redesigned.
> It's pretty tricky.
I just set up my compile environment for WGet again.
When I did regex support, I had the same problem with exclusion, so I
introduced a new parameter "--follow-excluded-html".
(Which is of course the default) but you can turn it off with
--no-follow-excluded-html...
See attached patch for current trunk.
TT
Index: trunk/src/init.c
===================================================================
--- trunk/src/init.c (revision 2133)
+++ trunk/src/init.c (working copy)
@@ -146,6 +146,7 @@
#endif
{ "excludedirectories", &opt.excludes, cmd_directory_vector },
{ "excludedomains", &opt.exclude_domains, cmd_vector },
+ { "followexcluded", &opt.followexcluded, cmd_boolean },
{ "followftp", &opt.follow_ftp, cmd_boolean },
{ "followtags", &opt.follow_tags, cmd_vector },
{ "forcehtml", &opt.force_html, cmd_boolean },
@@ -277,6 +278,7 @@
opt.cookies = true;
opt.verbose = -1;
+ opt.followexcluded = 1;
opt.ntry = 20;
opt.reclevel = 5;
opt.add_hostdir = true;
Index: trunk/src/main.c
===================================================================
--- trunk/src/main.c (revision 2133)
+++ trunk/src/main.c (working copy)
@@ -158,6 +158,7 @@
{ "exclude-directories", 'X', OPT_VALUE, "excludedirectories", -1 },
{ "exclude-domains", 0, OPT_VALUE, "excludedomains", -1 },
{ "execute", 'e', OPT__EXECUTE, NULL, required_argument },
+ { "follow-excluded-html", 0, OPT_BOOLEAN, "followexcluded", -1 },
{ "follow-ftp", 0, OPT_BOOLEAN, "followftp", -1 },
{ "follow-tags", 0, OPT_VALUE, "followtags", -1 },
{ "force-directories", 'x', OPT_BOOLEAN, "dirstruct", -1 },
@@ -611,6 +612,9 @@
-X, --exclude-directories=LIST list of excluded directories.\n"),
N_("\
-np, --no-parent don't ascend to the parent directory.\n"),
+ N_("\
+ --follow-excluded-html turns on downloading of excluded files
for\n\
+ inspection (this is the default).\n"),
"\n",
N_("Mail bug reports and suggestions to <[EMAIL PROTECTED]>.\n")
Index: trunk/src/recur.c
===================================================================
--- trunk/src/recur.c (revision 2133)
+++ trunk/src/recur.c (working copy)
@@ -511,13 +511,14 @@
&& !(has_html_suffix_p (u->file)
/* The exception only applies to non-leaf HTMLs (but -p
always implies non-leaf because we can overstep the
- maximum depth to get the requisites): */
- && (/* non-leaf */
+ maximum depth to get the requisites):
+ No execption if the user specified no-follow-excluded */
+ && (opt.followexcluded && (/* non-leaf */
opt.reclevel == INFINITE_RECURSION
/* also non-leaf */
|| depth < opt.reclevel - 1
/* -p, which implies non-leaf (see above) */
- || opt.page_requisites)))
+ || opt.page_requisites))))
{
if (!acceptable (u->file))
{