Hi
I am sure you're aware of the fact that wget 1.5.3 does not properly
handle passworded HTTP sites (even with Basic authentication). There are
several areas where the username/password are silently "dropped" in the
code, and wget tries to access the same site with no password.
Furthermore, the deal was complicated, because my username contained
character '@'. Handling of the character was OK in retrieving the first
page (because it was marked as %40), but upon redirection and other
stuff described below, the password was dropped because the code is
written sloppily.
1. HTTP code 301 -- page permanently moved. The site I worked with,
always redirected every page to http://site:80 and would not accept
http://site. Therefore, upon redirection, it's important to keep the
password in the code, which does not happen in wget.
2. The same site referenced itself with fully qualified URLs. Such as,
instead of saying href = "main.html" it would say href =
"http://site/directory/main.html." Wget would lose the password in that
case as well. Furthermore, wget would think that the URL belongs to a
*different* site and would not take the link if the -L (i.e., local
files only) option is specified. This was apparently because the cur_url
contained the password, but the href did not (again, some patching was
needed to bypass the first @ as part of my username).
3. If the username contains @ (such an email address), then after a few
iterations of the main code, the %40 would eventually get replaced by @
and upon future searches for the site name, the code would get stuck on
the first symbol @ instead of the second one, which separates the
password from the website. Consider this URL:
'[EMAIL PROTECTED]@www.site.com/main' -- once the %40 is expanded to the
first @, the code would NOT convert it back to %40 as required by one of
the RFCs.
It took me about 3 hours to patch the code, but I am not sure what other
functionality I might have disabled or affected. To tell the truth, it
is quite annoying that simple things like these were not thought of by
whoever wrote the code. Anyhow, thanks for writing it. :)
Dmitri