Re: wget itself discards # and the rest in urls

Hrvoje Niksic Sun, 18 Sep 2005 05:32:09 -0700

"Martin Koniczek" <[EMAIL PROTECTED]> writes:

> my wget (GNU Wget 1.10) on a crux-based system simply truncates the
> # and everything after [...]


The part after the "#" in HTTP URLs is what some call a "fragment
identifier".  The browsers use it to position the page at the <a>
element whose NAME attribute matches the fragment name.  In other
words, when you type http://www.server.com/file.html#bla, a browser
will request "/file.html" from www.server.com and position the page at
"bla" anchor, if such exists.  It will *not* ask for "/file.html#bla".

Since Wget doesn't display the page, it is trying to be compatible
with the browsers by also not using the stuff after the #.

> in contrast to the faq (http://www.gnu.org/software/wget/faq.html):
>
> 3.3 How do I download a URL with funny characters in it?
[...]

The FAQ is very imprecise here with its use of the term "funny
characters".  There are characters that are specially processed by the
shell, and then there are characters with special meanings in URLs.
The former can be protected by shell quoting and the latter by URL
quoting.

Re: wget itself discards # and the rest in urls

Reply via email to