Re: wget downloading a single page when it should recurse
"Tony Lewis" <[EMAIL PROTECTED]> writes: > Philip Mateescu wrote: > >> A warning message would be nice when for not so obvious reasons wget >> doesn't behave as one would expect. >> >> I don't know if there are other tags that could change wget's behavior >> (like -r and meta name="robots" do), but if they happen it would be >> useful to have a message. > > I agree that this is worth a notable mention in the wget output. At the very > least, running with -d should provided more guidance on why the links it has > appended to urlpos are not being followed. Buried in the middle of hundreds > of lines of output is: > > no-follow in index.php > > On the other hand, if other rules prevent a URL from being followed, you > might see something like: > > Deciding whether to enqueue "http://www.othersite.com/index.html";. > This is not the same hostname as the parent's (www.othersite.com and > www.thissite.com). > Decided NOT to load it. There's a practical reason for this discrepancy. All these other links are examined one by one and rejected one by one. On the other hand, when nofollow is specified, it causes Wget to not even *consider* any of the links for download. Another tweak that should be added (easily, I think): Wget should ignore robots when downloading the page requisites.
Re: wget downloading a single page when it should recurse
"Aaron S. Hawley" <[EMAIL PROTECTED]> writes: > The HTML of those pages contains the meta-tag > > > > and Wget listened, and only downloaded the first page. > > Perhaps Wget should give a warning message that the file contained a > meta-robots tag, so that people aren't quite so dumb-founded. Good point. A message would be easy to add, and in this case enormously useful.
Re: wget downloading a single page when it should recurse
Philip Mateescu wrote: > A warning message would be nice when for not so obvious reasons wget > doesn't behave as one would expect. > > I don't know if there are other tags that could change wget's behavior > (like -r and meta name="robots" do), but if they happen it would be > useful to have a message. I agree that this is worth a notable mention in the wget output. At the very least, running with -d should provided more guidance on why the links it has appended to urlpos are not being followed. Buried in the middle of hundreds of lines of output is: no-follow in index.php On the other hand, if other rules prevent a URL from being followed, you might see something like: Deciding whether to enqueue "http://www.othersite.com/index.html";. This is not the same hostname as the parent's (www.othersite.com and www.thissite.com). Decided NOT to load it. Tony
Re: wget downloading a single page when it should recurse
Thanks! A warning message would be nice when for not so obvious reasons wget doesn't behave as one would expect. I don't know if there are other tags that could change wget's behavior (like -r and meta name="robots" do), but if they happen it would be useful to have a message. Thanks again! Aaron S. Hawley wrote: The HTML of those pages contains the meta-tag and Wget listened, and only downloaded the first page. Perhaps Wget should give a warning message that the file contained a meta-robots tag, so that people aren't quite so dumb-founded. /a On Fri, 17 Oct 2003, Philip Mateescu wrote: Hi, I'm having a problem with wget 1.8.2 cygwin and I'm almost ready to swear it once worked... I'm trying to download the php manual off the web using this command: $ wget -nd -nH -r -np -p -k -S http://us4.php.net/manual/en/print/index.php --- "Don't belong. Never join. Think for yourself. Peace" ---
Re: wget downloading a single page when it should recurse
The HTML of those pages contains the meta-tag and Wget listened, and only downloaded the first page. Perhaps Wget should give a warning message that the file contained a meta-robots tag, so that people aren't quite so dumb-founded. /a On Fri, 17 Oct 2003, Philip Mateescu wrote: > Hi, > > I'm having a problem with wget 1.8.2 cygwin and I'm almost ready to > swear it once worked... > > I'm trying to download the php manual off the web using this command: > > $ wget -nd -nH -r -np -p -k -S http://us4.php.net/manual/en/print/index.php -- Consider supporting GNU Software and the Free Software Foundation By Buying Stuff - http://www.gnu.org/gear/ (GNU and FSF are not responsible for this promotion nor necessarily agree with the views of the author)
wget downloading a single page when it should recurse
Hi, I'm having a problem with wget 1.8.2 cygwin and I'm almost ready to swear it once worked... I'm trying to download the php manual off the web using this command: $ wget -nd -nH -r -np -p -k -S http://us4.php.net/manual/en/print/index.php Here's the result: --10:12:15-- http://us4.php.net/manual/en/print/index.php => `index.php' Resolving us4.php.net... done. Connecting to us4.php.net[209.197.17.2]:80... connected. HTTP request sent, awaiting response... 1 HTTP/1.1 200 OK 2 Date: Fri, 17 Oct 2003 15:12:18 GMT 3 Server: Apache/1.3.27 (Unix) Debian GNU/Linux PHP/4.3.2 mod_python/2.7.8 Pyth on/2.2.3 mod_ssl/2.8.14 OpenSSL/0.9.7b mod_perl/1.27 mod_lisp/2.32 DAV/1.0.3 4 X-Powered-By: PHP/4.3.2 5 Content-language: en 6 Set-Cookie: LAST_LANG=en; expires=Sat, 16-Oct-04 15:12:18 GMT; path=/; domain =.php.net 7 Set-Cookie: COUNTRY=USA%2C65.208.59.73; expires=Fri, 24-Oct-03 15:12:18 GMT; path=/; domain=.php.net 8 Status: 200 OK 9 Last-Modified: Sat, 18 Oct 2003 06:12:28 GMT 10 Vary: Cookie 11 Connection: close 12 Content-Type: text/html;charset=ISO-8859-1 [ <=>] 13,96136.16K/s 10:12:17 (36.16 KB/s) - `index.php' saved [13961] FINISHED --10:12:17-- Downloaded: 13,961 bytes in 1 files Converting index.php... 3-183 Converted 1 files in 0.01 seconds. I expected it to follow the links and Am I doing anything wrong? Thank you very much, philip --- "Don't belong. Never join. Think for yourself. Peace" ---