On Jun 26, 2007, at 11:50 PM, Micah Cowan wrote:

After running

  $ wget -H -k -p http://www.fdoxnews.com/

It downloaded all of the relevant files. However, the results were still
not viewable until I edited the link in www.fdoxnews.com/index.html,
replacing the "?" with "%3F" ("index.mas%3Fepl=..."). Probably, wget
should have done that when converting the links, considering that it
named the file with a ?, but left it literally in the converted link; ? is a special character for URIs, and cannot be part of filenames unless
they are encoded. I'll make note of that in my buglist.

It appears that this is actually by design. If -E (--html-extension) is not specified, `?' will not be replaced with `%3F'. From src/ convert.c:

   We quote ? as %3F to avoid passing part of the file name as the
   parameter when browsing the converted file through HTTP.  However,
   it is safe to do this only when `--html-extension' is turned on.
   This is because converting "index.html?foo=bar" to
   "index.html%3Ffoo=bar" would break local browsing, as the latter
   isn't even recognized as an HTML file!  However, converting
   "index.html?foo=bar.html" to "index.html%3Ffoo=bar.html" should be
   safe for both local and HTTP-served browsing.

Running

   $ wget -E -H -k -p http://www.fdoxnews.com/

does the right thing.

-Ben

Reply via email to