Daniel J. Popowich wrote:
Now if I type the following into a telnet session (telnet localhost 8000):
GET http://foo:[EMAIL PROTECTED]:8000/~dpopowich/py/parsed?a=b&c=d#here
HTTP/1.1
Authorization: Basic Zm9vOmJhcg==
Host: localhost:8000
Then the output is:
req.hostname: localhost
req.unparsed_uri: http://foo:[EMAIL
PROTECTED]:8000/~dpopowich/py/parsed?a=b&c=d#here
req.parsed_uri: ('http', 'foo:[EMAIL PROTECTED]:8000', 'foo', 'bar',
'localhost', 8000, '/~dpopowich/py/parsed', 'a=b&c=d', 'here')
req.uri: /~dpopowich/py/parsed
req.args: a=b&c=d
Servers are required to respond to an absoluteURI, but when requesting a
resource from the origin server, most clients will use only the abs_path.
o req.hostname is set by the contents of the full URI, or in absence
of a full uri, the value of the Host header (this is what is
actually said in the mod_python docs). As mentioned before, in the
case when HTTP/1.1 AND the full URI are not specified, req.hostname
can be None.
I don't see where you confirmed this by using an absoluteURI but a
different hostname in the Host: header.
o When a full URI is specified with GET, the values of hostname and
port can be bogus, i.e., the values in parsed_uri will be set to
whatever the uri specifies, but this may not be the host or port
the client actually connected to. While not explicitly a security
risk, poor programming based on these values could lead to one,
IMHO.
Therefore, I think we're stuck. There's no way we can guarantee
browsers will pass full URIs and none seem to do so.
They do if they are set up to use a proxy server. This is currently the
most common (if not only) use case for sending an absoluteURI. This
suggests that parsed_uri is behaving as expected, and developers should
recognize that it will contain mostly null values under common use,
unless the request comes from a proxy.
IOW, the uparsed URI is just another client-supplied string, like
Referer, and should be treated as an untrustworthy source that may
occasionally contain interesting information.