I run wget on this file:

<! ------------------------------------------------------ >
<A HREF="a.html">a</a>
<! ------------------------------------------------------ >
<A HREF="b.html">b</a>

It downloads b.html, but it does not download a.html.

However, if the following file is used, then it does download a.html:

<! ------------------------------------------------------ >
<A HREF="a.html">a</a>

I ran this command:

~/wget-1.7/src/wget --recursive --debug http://ucsee.eecs.berkeley.edu/~bjacob/ > & 
output

Command "find ucsee.eecs.berkeley.edu/" outputs:

ucsee.eecs.berkeley.edu/
ucsee.eecs.berkeley.edu/%7Ebjacob
ucsee.eecs.berkeley.edu/%7Ebjacob/index.html
ucsee.eecs.berkeley.edu/%7Ebjacob/b.html

The first version of the file is at this URL:

http://ucsee.eecs.berkeley.edu/~bjacob/

I have no .wgetrc file.

The bug occurs in wget 1.7.

wget 1.6 always downloads a.html.

Here is the output file from the above wget command:

DEBUG output created by Wget 1.7 on linux-gnu.

parseurl ("http://ucsee.eecs.berkeley.edu/~bjacob/";) -> host ucsee.eecs.berkeley.edu 
-> opath ~bjacob/ -> dir ~bjacob -> file  -> ndir ~bjacob
newpath: /%7Ebjacob/
Checking for ucsee.eecs.berkeley.edu in host_name_address_map.
Checking for ucsee.eecs.berkeley.edu in host_slave_master_map.
First time I hear about ucsee.eecs.berkeley.edu by that name; looking it up.
Caching ucsee.eecs.berkeley.edu <-> 128.32.138.93
Checking again for ucsee.eecs.berkeley.edu in host_slave_master_map.
--23:02:08--  http://ucsee.eecs.berkeley.edu/%7Ebjacob/
           => `ucsee.eecs.berkeley.edu/%7Ebjacob/index.html'
Connecting to ucsee.eecs.berkeley.edu:80... Found ucsee.eecs.berkeley.edu in 
host_name_address_map: 128.32.138.93
Created fd 3.
connected!
---request begin---
GET /%7Ebjacob/ HTTP/1.0

User-Agent: Wget/1.7

Host: ucsee.eecs.berkeley.edu

Accept: */*

Connection: Keep-Alive



---request end---
HTTP request sent, awaiting response... HTTP/1.1 200 OK
Date: Thu, 05 Jul 2001 06:00:30 GMT
Server: Apache/1.3.2 (Unix)
Last-Modified: Thu, 05 Jul 2001 05:50:29 GMT
ETag: "297b1-a6-3b440025"
Accept-Ranges: bytes
Content-Length: 166
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/html


Found ucsee.eecs.berkeley.edu in host_name_address_map: 128.32.138.93
Registered fd 3 for persistent reuse.
Length: 166 [text/html]

    0K                                                       100% @ 162.11 KB/s

23:02:09 (162.11 KB/s) - `ucsee.eecs.berkeley.edu/%7Ebjacob/index.html' saved 
[166/166]

parseurl ("http://ucsee.eecs.berkeley.edu/~bjacob/";) -> host ucsee.eecs.berkeley.edu 
-> opath ~bjacob/ -> dir ~bjacob -> file  -> ndir ~bjacob
newpath: /%7Ebjacob/
Loaded ucsee.eecs.berkeley.edu/%7Ebjacob/index.html (size 166).
ucsee.eecs.berkeley.edu/%7Ebjacob/index.html: 
merge("http://ucsee.eecs.berkeley.edu/%7Ebjacob/";, "b.html") -> 
http://ucsee.eecs.berkeley.edu/%7Ebjacob/b.html
no-follow in ucsee.eecs.berkeley.edu/%7Ebjacob/index.html: 0
parseurl ("http://ucsee.eecs.berkeley.edu/%7Ebjacob/b.html";) -> host 
ucsee.eecs.berkeley.edu -> opath %7Ebjacob/b.html -> dir ~bjacob -> file b.html -> 
ndir ~bjacob
newpath: /%7Ebjacob/b.html
Checking for ucsee.eecs.berkeley.edu in host_name_address_map.
Found; ucsee.eecs.berkeley.edu was already used, by that name.
Comparing hosts ucsee.eecs.berkeley.edu and ucsee.eecs.berkeley.edu...
They are quite alike.
parseurl ("http://ucsee.eecs.berkeley.edu/%7Ebjacob/b.html";) -> host 
ucsee.eecs.berkeley.edu -> opath %7Ebjacob/b.html -> dir ~bjacob -> file b.html -> 
ndir ~bjacob
newpath: /%7Ebjacob/b.html
Loading robots.txt; please ignore errors.
parseurl ("http://ucsee.eecs.berkeley.edu/robots.txt";) -> host ucsee.eecs.berkeley.edu 
-> opath robots.txt -> dir  -> file robots.txt -> ndir 
newpath: /robots.txt
Checking for ucsee.eecs.berkeley.edu in host_name_address_map.
Found; ucsee.eecs.berkeley.edu was already used, by that name.
--23:02:09--  http://ucsee.eecs.berkeley.edu/robots.txt
           => `ucsee.eecs.berkeley.edu/robots.txt'
Found ucsee.eecs.berkeley.edu in host_name_address_map: 128.32.138.93
Reusing connection to ucsee.eecs.berkeley.edu:80.
Reusing fd 3.
---request begin---
GET /robots.txt HTTP/1.0

User-Agent: Wget/1.7

Host: ucsee.eecs.berkeley.edu

Accept: */*

Connection: Keep-Alive



---request end---
HTTP request sent, awaiting response... HTTP/1.1 404 Not Found
Date: Thu, 05 Jul 2001 06:00:30 GMT
Server: Apache/1.3.2 (Unix)
Connection: close
Content-Type: text/html


Closing fd 3
Invalidating fd 3 from further reuse.
23:02:09 ERROR 404: Not Found.

I've decided to load it -> parseurl 
("http://ucsee.eecs.berkeley.edu/%7Ebjacob/b.html";) -> host ucsee.eecs.berkeley.edu -> 
opath %7Ebjacob/b.html -> dir ~bjacob -> file b.html -> ndir ~bjacob
newpath: /%7Ebjacob/b.html
Checking for ucsee.eecs.berkeley.edu in host_name_address_map.
Found; ucsee.eecs.berkeley.edu was already used, by that name.
--23:02:09--  http://ucsee.eecs.berkeley.edu/%7Ebjacob/b.html
           => `ucsee.eecs.berkeley.edu/%7Ebjacob/b.html'
Connecting to ucsee.eecs.berkeley.edu:80... Found ucsee.eecs.berkeley.edu in 
host_name_address_map: 128.32.138.93
Created fd 3.
connected!
---request begin---
GET /%7Ebjacob/b.html HTTP/1.0

User-Agent: Wget/1.7

Host: ucsee.eecs.berkeley.edu

Accept: */*

Connection: Keep-Alive

Referer: http://ucsee.eecs.berkeley.edu/%7Ebjacob/



---request end---
HTTP request sent, awaiting response... HTTP/1.1 200 OK
Date: Thu, 05 Jul 2001 06:00:31 GMT
Server: Apache/1.3.2 (Unix)
Last-Modified: Thu, 05 Jul 2001 05:52:10 GMT
ETag: "29818-2-3b44008a"
Accept-Ranges: bytes
Content-Length: 2
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/html


Found ucsee.eecs.berkeley.edu in host_name_address_map: 128.32.138.93
Registered fd 3 for persistent reuse.
Length: 2 [text/html]

    0K                                                       100% @   1.95 KB/s

23:02:09 (1.95 KB/s) - `ucsee.eecs.berkeley.edu/%7Ebjacob/b.html' saved [2/2]

Loaded ucsee.eecs.berkeley.edu/%7Ebjacob/b.html (size 2).
no-follow in ucsee.eecs.berkeley.edu/%7Ebjacob/b.html: 0

FINISHED --23:02:09--
Downloaded: 168 bytes in 2 files

Reply via email to