[EMAIL PROTECTED] (Gary Funck) writes:
> Thanks for the clarification.  The 'info' page helped clear things up:
> 
> (topic: Types of Files)
> 
> `-A ACCLIST'
> `--accept ACCLIST'
> `accept = ACCLIST'
>      The argument to `--accept' option is a list of file suffixes or
>      patterns that Wget will download during recursive retrieval.  A
>      suffix is the ending part of a file, and consists of "normal"
>      letters, e.g. `gif' or `.jpg'.  A matching pattern contains
>      shell-like wildcards, e.g. `books*' or `zelazny*196[0-9]*'.
> 
>      So, specifying `wget -A gif,jpg' will make Wget download only the
>      files ending with `gif' or `jpg', i.e. GIFs and JPEGs.  On the
>      other hand, `wget -A "zelazny*196[0-9]*"' will download only files
>      beginning with `zelazny' and containing numbers from 1960 to 1969
>      anywhere within.  Look up the manual of your shell for a
>      description of how pattern matching works.
> 
> but the man page that I checked first, didn't talk about the pattern
> matching capabilities:

Wget hasn't been distributed with an official man page for some time.  Linux
vendors often include an old one.  Starting in 1.7, an official manpage
(autogenerated from the .texi) will be included again.

> In my application, I wanted to apply the pattern to the *entire* URL,
> inclusive of intervening directories.  I made a small modification to
> the source code to implement full-URL pattern matching:
> 
> *** utils.c.orig        Sun Jun 25 23:11:44 2000
> --- utils.c     Mon Feb 19 20:00:14 2001
> ***************
> *** 546,557 ****
> --- 546,559 ----
>   int
>   acceptable (const char *s)
>   {
> + #ifdef MATCH_ONLY_LAST_PART_OF_URL
>     int l = strlen (s);
>   
>     while (l && s[l] != '/')
>       --l;
>     if (s[l] == '/')
>       s += (l + 1);
> + #endif
>     if (opt.accepts)
>       {
>         if (opt.rejects)

I think we'll have to hold off on making a change like this until we
determine whether we'll be added full regexp matching ability or not.

> Another suggestion: if the user supplies either --accept or --reject
> options, it is possible that the resulting mirrored file hierarchy
> will contain empty directories.  Although a shell script could
> easily clean them up, as in:
>    find . -type d -exec rmdir --ignore-fail-on-non-empty {} \;
> it would be nice if wget took that action as a final clean up.

Okay, I'll add that to the TODO.

---------------------------------------------------------------
Dan Harkless            | To help prevent SPAM contamination,
GNU Wget co-maintainer  | please do not mention this email
http://sunsite.dk/wget/ | address in Usenet posts -- thank you.

Reply via email to