On Feb 22, 4:00pm, Dan Harkless wrote:
>
> > In my application, I wanted to apply the pattern to the *entire* URL,
> > inclusive of intervening directories. I made a small modification to
> > the source code to implement full-URL pattern matching:
> >
> > *** utils.c.orig Sun Jun 25 23:11:44 2000
> > --- utils.c Mon Feb 19 20:00:14 2001
> > ***************
> > *** 546,557 ****
> > --- 546,559 ----
> > int
> > acceptable (const char *s)
> > {
> > + #ifdef MATCH_ONLY_LAST_PART_OF_URL
> > int l = strlen (s);
> >
> > while (l && s[l] != '/')
> > --l;
> > if (s[l] == '/')
> > s += (l + 1);
> > + #endif
> > if (opt.accepts)
> > {
> > if (opt.rejects)
>
> I think we'll have to hold off on making a change like this until we
> determine whether we'll be added full regexp matching ability or not.
Okay, a technical note - the mod. above only works if the
perl regex capability is enabled. This is because the file globbing
code in the wget release will not match across `/'s without a
modification of the code that calls fnmatch().
>
> > Another suggestion: if the user supplies either --accept or --reject
> > options, it is possible that the resulting mirrored file hierarchy
> > will contain empty directories. Although a shell script could
> > easily clean them up, as in:
> > find . -type d -exec rmdir --ignore-fail-on-non-empty {} \;
A detail, this should be:
find . -depth -type d -exec rmdir --ignore-fail-on-non-empty {} \;
(the "-depth" command-line switch tells find to traverse depth-first)
> > it would be nice if wget took that action as a final clean up.
>
> Okay, I'll add that to the TODO.
An implementation note: wget will likely need to remember the top-level
archive/mirror directories that it creates/traverses, so that it knows
which directory sub-trees must be searched.