Please try the latest wget version 1.6 or even better try the CVS
developement (version 1.7-dev). Take a look at http://sunsite.dk/wget
for instruccions on how to get it.

There has been done some work on improving wget's handling of passwords,
specifically the handling of '@' in passwords. But if not all of your
cases has been addresse, consider submitting your patch. The web-site
also says how the wget development team prefers to receive such patches
(diff -u against the CVS source)

Dmitri Loguinov wrote:
> 
> Hi
> 
> I am sure you're aware of the fact that wget 1.5.3 does not properly
> handle passworded HTTP sites (even with Basic authentication). There are
> several areas where the username/password are silently "dropped" in the
> code, and wget tries to access the same site with no password.
> Furthermore, the deal was complicated, because my username contained
> character '@'. Handling of the character was OK in retrieving the first
> page (because it was marked as %40), but upon redirection and other
> stuff described below, the password was dropped because the code is
> written sloppily.
> 
> 1. HTTP code 301 -- page permanently moved. The site I worked with,
> always redirected every page to http://site:80 and would not accept
> http://site. Therefore, upon redirection, it's important to keep the
> password in the code, which does not happen in wget.
> 
> 2. The same site referenced itself with fully qualified URLs. Such as,
> instead of saying href = "main.html" it would say href =
> "http://site/directory/main.html." Wget would lose the password in that
> case as well. Furthermore, wget would think that the URL belongs to a
> *different* site and would not take the link if the -L (i.e., local
> files only) option is specified. This was apparently because the cur_url
> contained the password, but the href did not (again, some patching was
> needed to bypass the first @ as part of my username).
> 
> 3. If the username contains @ (such an email address), then after a few
> iterations of the main code, the %40 would eventually get replaced by @
> and upon future searches for the site name, the code would get stuck on
> the first symbol @ instead of the second one, which separates the
> password from the website. Consider this URL:
> '[EMAIL PROTECTED]@www.site.com/main' -- once the %40 is expanded to the
> first @, the code would NOT convert it back to %40 as required by one of
> the RFCs.
> 
> It took me about 3 hours to patch the code, but I am not sure what other
> functionality I might have disabled or affected. To tell the truth, it
> is quite annoying that simple things like these were not thought of by
> whoever wrote the code. Anyhow, thanks for writing it. :)
> 
> Dmitri

-- 
Med venlig hilsen / Kind regards

Hack Kampbjørn               [EMAIL PROTECTED]
HackLine                     +45 2031 7799

Reply via email to