Re: how to filter only certain URL's?

Dan Harkless Mon, 19 Feb 2001 18:56:52 -0800

[EMAIL PROTECTED] (Gary Funck) writes:
> > > - but that RPM URL that I mentioned,
> > > appeared to be version that has pattern matching in it, but now it
> > > appears that this version has some sort of shell-like globbing,
> > > but doesn't have the regex stuff.  I actually would prefer the
> > > regex version for what    I'm trying to do, and there's no docs.
> > > on how the globbing works, how much of a pathname I can use it
> > > on, etc.  (ie, does it only match the part of the URL after the
> > > rightmost slash?)
> 
> I experimented with the above mentioned "globbing", but couldn't
> figure out how it works (though admittedly I didn't try firing up
> the debugger to see what's going on).

Base wget supports some globbing on those parameters, so you may not have
been looking at anything added in the "gold" version.

> One thing that the matching did appear to doing however, is
> first *downloading the entire page* before making the decision
> as to whether to keep the page or not.  This is decidely not the
> preferred implementationn -- it wastes bandwidth.

Again, this behavior is not endemic to the "gold" version.  Wget
intentionally downloads all HTML files it encounters so that it can get the
links to the files you _do_ want.  Then if the HTML file was one you
explicitly did not desire, it deletes it.  All this only happens for HTML
files -- you shouldn't get any other types of files downloaded if you said
you didn't want them.

---------------------------------------------------------------
Dan Harkless            | To help prevent SPAM contamination,
GNU Wget co-maintainer  | please do not mention this email
http://sunsite.dk/wget/ | address in Usenet posts -- thank you.
Re: how to filter only certain URL's?

Reply via email to