On Jun 26, 2007, at 11:50 PM, Micah Cowan wrote:
After running
$ wget -H -k -p http://www.fdoxnews.com/
It downloaded all of the relevant files. However, the results were
still
not viewable until I edited the link in www.fdoxnews.com/index.html,
replacing the "?" with "%3F" ("index.mas%3Fepl=..."). Probably, wget
should have done that when converting the links, considering that it
named the file with a ?, but left it literally in the converted
link; ?
is a special character for URIs, and cannot be part of filenames
unless
they are encoded. I'll make note of that in my buglist.
It appears that this is actually by design. If -E (--html-extension)
is not specified, `?' will not be replaced with `%3F'. From src/
convert.c:
We quote ? as %3F to avoid passing part of the file name as the
parameter when browsing the converted file through HTTP. However,
it is safe to do this only when `--html-extension' is turned on.
This is because converting "index.html?foo=bar" to
"index.html%3Ffoo=bar" would break local browsing, as the latter
isn't even recognized as an HTML file! However, converting
"index.html?foo=bar.html" to "index.html%3Ffoo=bar.html" should be
safe for both local and HTTP-served browsing.
Running
$ wget -E -H -k -p http://www.fdoxnews.com/
does the right thing.
-Ben