Re: how to filter only certain URL's?

Gary Funck Fri, 23 Feb 2001 14:43:44 -0800
On Feb 22,  4:00pm, Dan Harkless wrote:
> 
> > In my application, I wanted to apply the pattern to the *entire* URL,
> > inclusive of intervening directories.  I made a small modification to
> > the source code to implement full-URL pattern matching:
> > 
> > *** utils.c.orig        Sun Jun 25 23:11:44 2000
> > --- utils.c     Mon Feb 19 20:00:14 2001
> > ***************
> > *** 546,557 ****
> > --- 546,559 ----
> >   int
> >   acceptable (const char *s)
> >   {
> > + #ifdef MATCH_ONLY_LAST_PART_OF_URL
> >     int l = strlen (s);
> >   
> >     while (l && s[l] != '/')
> >       --l;
> >     if (s[l] == '/')
> >       s += (l + 1);
> > + #endif
> >     if (opt.accepts)
> >       {
> >         if (opt.rejects)
> 
> I think we'll have to hold off on making a change like this until we
> determine whether we'll be added full regexp matching ability or not.

Okay, a technical note - the mod. above only works if the
perl regex capability is enabled.  This is because the file globbing
code in the wget release will not match across `/'s without a
modification of the code that calls fnmatch().

> 
> > Another suggestion: if the user supplies either --accept or --reject
> > options, it is possible that the resulting mirrored file hierarchy
> > will contain empty directories.  Although a shell script could
> > easily clean them up, as in:
> >    find . -type d -exec rmdir --ignore-fail-on-non-empty {} \;

A detail, this should be:
    find . -depth -type d -exec rmdir --ignore-fail-on-non-empty {} \;

(the "-depth" command-line switch tells find to traverse depth-first)

> > it would be nice if wget took that action as a final clean up.
> 
> Okay, I'll add that to the TODO.

An implementation note: wget will likely need to remember the top-level
archive/mirror directories that it creates/traverses, so that it knows
which directory sub-trees must be searched.
Re: how to filter only certain URL's?

Reply via email to