mouad added the comment:

Hi Guido,

If I understand this correctly, the HOST header was added only in HTTP1.1 and 
setting the absolute URI was the right behavior client should follow if they 
are behind a proxy for HTTP1.0, but the same behavior was kept in HTTP1.1 for 
backward compatibility.

> In any case I would worry (a bit) that this might cause security issues if
> implemented as naively as shown in your patch,
> the other components of the URL should probably be validated against the
> configuration of the server.
> Also I am wondering whether specifying a different port or protocol (e.g.
> HTTPS) should be allowed or not.

If there should be a validation, I think it should be done in 
BaseHTTPRequestHandler, FWIW this later doesn't validate HOST header neither, 
just tested sending a request to "python -mhttp.server" which succeeded.

$ telnet 127.0.0.1 8000
GET /dummy HTTP/1.1
HOST google.com

One thing to note here for future work for the validation part, is that 
according to http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.2, 
first point:

    If Request-URI is an absoluteURI, the host is part of the Request-URI.
    Any Host header field value in the request MUST be ignored.

> and what to do with extraneous parts of the path (in particular the
> "#fragment" identifier)

AFAIK clients are not supposed to send fragments to servers, and I didn't find 
in the HTTP spec what should happen if they do, CherryPy for example (link of 
the code is in the footer) will raise 400 if request URI include the #fragment 
part.

An other thing that CherryPy guys also did, is that ``HTTPRequest(...).path`` 
will always return a relativ path, which is IMHO the right behavior but for 
backward compatibility I hesitate fixing this problem directly in 
BaseHTTPRequestHandler or should we ? 

> You should probably also be careful with path-less domain names -- IIRC some
> URL parsers produce "" as the path for e.g. "http://python.org";.

According to the PEP-0333 the PATH_INFO can be empty: 

    PATH_INFO
     The remainder of the request URL's "path", designating the virtual   
     "location" of the request's target within the application. This may be an
     empty string, if the request URL targets the application root and does     
 
     not have a trailing slash.

> Have you asked for the status of this particular feature?

I sent an email to python WEB-SIG mailing list trying to gather information 
about this behavior, but not luck yet :(

At this point I would like to link to both CherryPy and Werkzeug that already 
handle this behavior.

Werkzeug: 
https://github.com/mitsuhiko/werkzeug/blob/0.9.6/werkzeug/serving.py#L75-81
CherryPy: 
https://github.com/cherrypy/cherrypy/blob/cherrypy-3.2.2rc1/cherrypy/wsgiserver/wsgiserver3.py#L633-638.

As a side note, I have already said in my email to the WEB-sig kudos to 
cherrypy guys, the funny part is that I am starting to think that they 
succeeded in both my tests because they didn't rely on core python 
implementation of BaseHTTPRequestHandler (I guess reinventing the wheel 
sometime works :)), at the opposite to other frameworks that showed this 
problems :).

Cheers,

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21472>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to