Barnett, Rodney schrieb:
>  
> 
>> -----Original Message-----
>> From: Matthias Vill [mailto:[EMAIL PROTECTED] 
>> Sent: Thursday, August 23, 2007 1:54 AM
>> To: [email protected]
>> Subject: Re: -R and HTML files
>>
>> Micah Cowan schrieb:
>>> Josh Williams wrote:
>>>> On 8/22/07, Micah Cowan <[EMAIL PROTECTED]> wrote:
>>>>> What would be the appropriate behavior of -R then?
>>>> I think the default option should be to download the html files to 
>>>> parse the links, but it should discard them afterwards if 
>>>> they do not match the acceptance list.
>>> Heh, that _is_ the current default. But I'm not convinced 
>>> that's what the naïve user is going to expect the default
>>> to be. Especially since the manpage doesn't mention it, and
>>> the info page only mentions it if you dig into the details
>>> section.
>>>
>>> OTOH, it has a history, so choosing to change it is not a 
>>> small decision.
>>>
>> To me downloading of HTML-files which match rejection-patterns
>> make no sense.
>> Of course, there is this case, where you want "the whole 
>> site, but" lets say you don't want any of the pictures 
>> because they are to big.
> 
> In my case, there's a web site that contains a lot of text and
> PDF files that I need to monitor so that I can process the new
> or changed ones.  The HTML pages merely reflect the directory
> structure.  I don't want them, but they have to be traversed to
> get to the files I do want.

Ok, than we maybe need a special parse-and-delete filter.
Even in you case I believe that you don't want to follow links to some
of-pdf-tree HTML-files, which will contain pictures and links to even
more HTML-files you don't need. Downloading & parsing all of them is
quite an overhead if they reside in other path, I would guess.

So maybe
-R *outside* -C *listing-html* -A *pdfs*
{-C meaning consider-for-links-only (and being one of the few
single-chars left)}

would help and still this could be extended in a mime:url-pattern way,
which I guess is really useful for some wikis, which don't append
special type-endings to their paths.

Also -C could default to the -R value to provide compatibility with
previous versions and something like -C - would hard-reject everything in -R

I hope that's a better suggestion now...

Matthias

Reply via email to