On 4 Aug 2001, at 3:25, Bao, Jiangcheng wrote:
> Suppose I have page a.html, which has a link to b.html. If a is not
> changed, and b is changed. When I process a, I have no way to check a so
> that I can process b too, without downloading a. -N will cause a not to be
> downloaded, but not processed either, so change of b will be ignored. If I
> will -nc, then a will be processed, and b too, but b's change will still
> be ignored. If a new page c is linked into b now, then c won't get
> noticed.
>
> Is this a feature or it's a bug? Thanks.
What version of wget are you using, since I've just tried your
scenario in wget 1.7 and it worked correctly. I used the -N and -r
options, but you could just use the -m option instead which combines
these two options (timestamp checks and recursion).
If you use the -nc option then no new versions of the pages that are
already stored locally will be downloaded so you won't see the
changed versions and (if using recursion) wget will process the old,
existing versions. In this case, wget doesn't need to request any
thing at all from the web server for files it already has.
If you use the -N option (or -m) then if the local file exists, wget
will at first ask the web server for just the HTTP headers, rather
than the whole document. The "Last-Modified" and "Content-Length"
HTTP response headers will be compared with the last-modified time
and length of the local file (if it exists) and if the local file
does not exists, the lengths are different or the local file is older
than than the new content, another request will be sent to the web
server to retrieve the whole document.
So when you say -N will cause files that are not downloaded to not be
processed, you are incorrect, at least as far as wget 1.7 is
concerned.