[issue37093] http.client aborts header parsing upon encountering non-ASCII header names

2019-12-23 Thread Tim Burke


Tim Burke  added the comment:

Note that because http.server uses http.client to parse headers [0], this can 
pose a request-smuggling vector depending on how you've designed your system. 
For example, you might have a storage system with a user-facing HTTP server 
that is in charge of

* authenticating and authorizing users,
* determining where data should be stored, and
* proxying the user request to the backend

and a separate (unauthenticated) HTTP server for actually storing that data. If 
the proxy and backend are running different versions of CPython (say, because 
you're trying to upgrade an existing py2 cluster to run on py3), they may 
disagree about where the request begins and ends -- potentially causing the 
backend to process multiple requests, only the first of which was authorized.

See, for example, https://bugs.launchpad.net/swift/+bug/1840507

For what it's worth, most http server libraries (that I tested; take it with a 
grain of salt) seem to implement their own header parsing. Eventlet was a 
notable exception [1].

[0] https://github.com/python/cpython/blob/v3.8.0/Lib/http/server.py#L336-L337
[1] https://github.com/eventlet/eventlet/pull/574

--

___
Python tracker 
<https://bugs.python.org/issue37093>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38216] Fix for issue30458 (HTTP Header Injection) prevents crafting invalid requests

2019-09-20 Thread Tim Burke


Tim Burke  added the comment:

Something like this for 3.7, say? I should probably go add some tests in 
test_httplib.py (for example, to demonstrate that http.client can still send a 
raw #, even if urllib appropriately drops the fragment), but I wanted some 
feedback on whether this is even an avenue worth pursuing.

--
keywords: +patch
Added file: 
https://bugs.python.org/file48618/0001-bpo-38216-Only-forbid-CR-LF-and-SP-in-http-URLs.patch

___
Python tracker 
<https://bugs.python.org/issue38216>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38216] Fix for issue30458 prevents crafting invalid requests

2019-09-20 Thread Tim Burke


Tim Burke  added the comment:

> Since at least one project is known to have been impacted, it's not 
> unreasonable to expect that more will be.

I can confirm at least one other: OpenStack Swift's stable jobs have been 
broken by https://github.com/python/cpython/commit/bb8071a since 11 Sep; see 
https://bugs.launchpad.net/swift/+bug/1843816 . (*Why* we're testing against an 
untagged version of py27, I still need to investigate...)

Now, the broken tests *do* represent something of a long-standing bug in 
application code: our front-end server was lenient in what it accepted but not 
strict in what it emitted -- it would forward on whatever crazy query params 
the client sent, directly to the back-end servers. Of course, as long as the 
back-end servers are *also* lenient in what they accept, it doesn't actually 
present a problem. Still, I'm happy to fix the issue.

Actually *addressing* this has proven problematic, however, as we have numerous 
functional tests that send raw UTF-8 request paths -- to the point that I can 
only assume there exist clients sending such requests. Changing the tests to 
URL-encode the path would do a disservice to those clients, as my fix (or some 
future fix) may break them and we'd be none the wiser.

> We either "be strict in what we produce" or we ignore abuse of the APIs and 
> effectively close all CVEs filed against them as "not a bug, this library is 
> intended to allow abuse when given untrusted input."

I think this is a false dichotomy; in 
https://bugs.python.org/issue36274#msg351834 Jason proposed a few alternatives 
that allow for a secure and obvious default API while adding a new, explicitly 
unsafe API.

I'd like to add yet another option that may be useful specifically for 
maintenance releases: forbid only the problematic characters -- namely LF (and 
potentially CR and SP). This seems like a much more surgical fix for 
maintenance releases, allowing the null byte for CherryPy or the raw UTF-8 
bytes for Swift, while still mitigating the CVE.

3.8 and later can still have the more-robust check to ensure callers have done 
any requisite URL-escaping. Ideally, there would also be a reasonable hook in 
place to allow application developers to verify behavior in the presence of 
invalid requests; something short of writing raw bytes to a socket.

> For reference this is how urllib handled the same issue in their test suite : 
> https://github.com/urllib3/urllib3/pull/1673.

I think the test suites developed by clients (like urllib3) and servers (like 
CherryPy or Swift) inherently have very different goals -- where a client would 
want to ensure that it *produces valid bytes* on the wire, a server ought to 
check that it doesn't completely fall over when it *receives invalid bytes* on 
the wire.

> What I seek is to avoid the Go recommendation of "fork the implementation" ...

This is *exactly* the way we worked around https://bugs.python.org/issue36274 
in Swift: swap out putrequest() in our func tests. (See 
https://github.com/openstack/swift/commit/c0ae48b .) It looks like a similar 
change will be required to fix our stable branches.

I agree, I'd much rather *not* have to do that. :-/

--

___
Python tracker 
<https://bugs.python.org/issue38216>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38216] Fix for issue30458 prevents crafting invalid requests

2019-09-18 Thread Tim Burke


Change by Tim Burke :


--
nosy: +tburke

___
Python tracker 
<https://bugs.python.org/issue38216>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36274] http.client cannot send non-ASCII request lines

2019-09-11 Thread Tim Burke


Tim Burke  added the comment:

Fair enough. Seems kinda related to https://bugs.python.org/issue30458 -- looks 
like it was a fun one ;-)

I think either approach would work for me; my existing work-around doesn't 
preclude either, particularly since I want it purely for testing purposes.

For a bit of context, I work on a large-ish project (few hundred kloc if you 
include tests) that recently finished porting from python 2.7 to 3.6 & 3.7. As 
part of that process I discovered https://bugs.python.org/issue33973 and worked 
around it in https://github.com/openstack/swift/commit/93b49c5. That was a 
prerequisite for running tests under py2 against a server running py3, but this 
bug complicated us running the *tests* under py3 as well. I eventually landed 
on https://github.com/openstack/swift/commit/c0ae48b as a work-around.

I could probably take another stab at a fix if you like, though I'm not 
entirely sure when I'll get to it at the moment.

--

___
Python tracker 
<https://bugs.python.org/issue36274>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37093] http.client aborts header parsing upon encountering non-ASCII header names

2019-06-03 Thread Tim Burke


Change by Tim Burke :


--
keywords: +patch
pull_requests: +13672
stage: test needed -> patch review
pull_request: https://github.com/python/cpython/pull/13788

___
Python tracker 
<https://bugs.python.org/issue37093>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37093] http.client aborts header parsing upon encountering non-ASCII header names

2019-05-29 Thread Tim Burke

New submission from Tim Burke :

First, spin up a fairly trivial http server:

import wsgiref.simple_server

def app(environ, start_response):
start_response('200 OK', [
('Some-Canonical', 'headers'),
('sOme-CRAzY', 'hEaDERs'),
('Utf-8-Values', '\xe2\x9c\x94'),
('s\xc3\xb6me-UT\xc6\x92-8', 'in the header name'),
('some-other', 'random headers'),
])
return [b'Hello, world!\n']

if __name__ == '__main__':
httpd = wsgiref.simple_server.make_server('', 8000, app)
while True:
httpd.handle_request()

Note that this code works equally well on py2 or py3; the interesting bytes on 
the wire are the same on either.

Verify the expected response using an independent tool such as curl:

$ curl -v http://localhost:8000
*   Trying ::1...
* TCP_NODELAY set
* connect to ::1 port 8000 failed: Connection refused
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET / HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.64.0
> Accept: */*
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Date: Wed, 29 May 2019 23:02:37 GMT
< Server: WSGIServer/0.2 CPython/3.7.3
< Some-Canonical: headers
< sOme-CRAzY: hEaDERs
< Utf-8-Values: ✔
< söme-UTƒ-8: in the header name
< some-other: random headers
< Content-Length: 14
< 
Hello, world!
* Closing connection 0

Check that py2 includes all the same headers:

$ python2 -c 'import pprint, urllib; resp = 
urllib.urlopen("http://localhost:8000;); 
pprint.pprint((dict(resp.info().items()), resp.read()))'
({'content-length': '14',
  'date': 'Wed, 29 May 2019 23:03:02 GMT',
  'server': 'WSGIServer/0.2 CPython/3.7.3',
  'some-canonical': 'headers',
  'some-crazy': 'hEaDERs',
  'some-other': 'random headers',
  's\xc3\xb6me-ut\xc6\x92-8': 'in the header name',
  'utf-8-values': '\xe2\x9c\x94'},
 'Hello, world!\n')

But py3 *does not*:

$ python3 -c 'import pprint, urllib.request; resp = 
urllib.request.urlopen("http://localhost:8000;); 
pprint.pprint((dict(resp.info().items()), resp.read()))'
({'Date': 'Wed, 29 May 2019 23:04:09 GMT',
  'Server': 'WSGIServer/0.2 CPython/3.7.3',
  'Some-Canonical': 'headers',
  'Utf-8-Values': 'â\x9c\x94',
  'sOme-CRAzY': 'hEaDERs'},
 b'Hello, world!\n')

Instead, it is missing the first header that has a non-ASCII name as well as 
all subsequent headers (even if they are all-ASCII). Interestingly, the 
response body is intact.

This is eventually traced back to email.feedparser's expectation that all 
headers conform to rfc822 and its assumption that anything that *doesn't* 
conform must be part of the body: 
https://github.com/python/cpython/blob/v3.7.3/Lib/email/feedparser.py#L228-L236

However, http.client has *already* determined the boundary between headers and 
body in parse_headers, and sent everything that it thinks is headers to the 
parser: 
https://github.com/python/cpython/blob/v3.7.3/Lib/http/client.py#L193-L214

--
messages: 343942
nosy: tburke
priority: normal
severity: normal
status: open
title: http.client aborts header parsing upon encountering non-ASCII header 
names
versions: Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue37093>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36274] http.client cannot send non-ASCII request lines

2019-03-13 Thread Tim Burke


Change by Tim Burke :


--
pull_requests: +12289

___
Python tracker 
<https://bugs.python.org/issue36274>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36274] http.client cannot send non-ASCII request lines

2019-03-13 Thread Tim Burke


Change by Tim Burke :


--
keywords: +patch
pull_requests: +12288
stage:  -> patch review

___
Python tracker 
<https://bugs.python.org/issue36274>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36274] http.client cannot send non-ASCII request lines

2019-03-12 Thread Tim Burke

New submission from Tim Burke :

While the RFCs are rather clear that non-ASCII data would be out of spec,

* that doesn't prevent a poorly-behaved client from sending non-ASCII bytes on 
the wire, which means
* as an application developer, it's useful to be able to mimic such a client to 
verify expected behavior while still using stdlib to handle things like header 
parsing, particularly since
* this worked perfectly well on Python 2.

The two most-obvious ways (to me, anyway) to try to send a request for /你好 (for 
example) are

# Assume it will get UTF-8 encoded, as that's the default encoding
# for urllib.parse.quote()
conn.putrequest('GET', '/\u4f60\u597d')

# Assume it will get Latin-1 encoded, as
#   * that's the encoding used in http.client.parse_headers(),
#   * that's the encoding used for PEP-, and
#   * it has a one-to-one mapping with bytes
conn.putrequest('GET', '/\xe4\xbd\xa0\xe5\xa5\xbd')

both fail with something like

UnicodeEncodeError: 'ascii' codec can't encode characters in position ...

Trying to pre-encode like

conn.putrequest('GET', b'/\xe4\xbd\xa0\xe5\xa5\xbd')

at least doesn't raise an error, but still does not do what was intended; 
rather than a request line like

GET /你好 HTTP/1.1

(or

/你好

depending on how you choose to interpret the bytes), the server gets

GET b'/\xe4\xbd\xa0\xe5\xa5\xbd' HTTP/1.1

The trouble comes down to 
https://github.com/python/cpython/blob/v3.7.2/Lib/http/client.py#L1104-L1107 -- 
we don't actually have any control over what the caller passes as the url (so 
the assumption doesn't hold), nor do we know anything about the encoding that 
was *intended*.

One of three fixes seems warranted:

* Switch to using Latin-1 to encode instead of ASCII (again, leaning on the 
precedent set in parse_headers and PEP-). This may make it too easy to 
write an out-of-spec client, however.
* Continue to use ASCII to encode, but include errors='surrogateescape' to give 
callers an escape hatch. This seems like a reasonably high bar to ensure that 
the caller actually intends to send unquoted data.
* Accept raw bytes and actually use them (rather than their repr()), allowing 
the caller to decide upon an appropriate encoding.

--
components: Library (Lib)
messages: 337802
nosy: tburke
priority: normal
severity: normal
status: open
title: http.client cannot send non-ASCII request lines
type: behavior
versions: Python 3.4, Python 3.5, Python 3.6, Python 3.7, Python 3.8

___
Python tracker 
<https://bugs.python.org/issue36274>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34698] urllib.request.Request.set_proxy doesn't (necessarily) replace type

2018-09-15 Thread Tim Burke

New submission from Tim Burke :

Not sure if this is a documentation or behavior bug, but... the docs for 
urllib.request.Request.set_proxy 
(https://docs.python.org/3/library/urllib.request.html#urllib.request.Request.set_proxy)
 say

> Prepare the request by connecting to a proxy server. *The host and type will 
> replace those of the instance*, and the instance’s selector will be the 
> original URL given in the constructor.

(Emphasis mine.) In practice, behavior is more nuanced than that:

>>> from urllib.request import Request
>>> req = Request('http://hostame:port/some/path')
>>> req.host, req.type
('hostame:port', 'http')
>>> req.set_proxy('proxy:other-port', 'https')
>>> req.host, req.type # So far, so good...
('proxy:other-port', 'https')
>>>
>>> req = Request('https://hostame:port/some/path')
>>> req.host, req.type
('hostame:port', 'https')
>>> req.set_proxy('proxy:other-port', 'http')
>>> req.host, req.type # Type doesn't change!
('proxy:other-port', 'https')

Looking at the source 
(https://github.com/python/cpython/blob/v3.7.0/Lib/urllib/request.py#L397) it's 
obvious why https is treated specially.

The behavior is consistent with how things worked on py2...

>>> from urllib2 import Request
>>> req = Request('http://hostame:port/some/path')
>>> req.get_host(), req.get_type()
('hostame:port', 'http')
>>> req.set_proxy('proxy:other-port', 'https')
>>> req.get_host(), req.get_type()
('proxy:other-port', 'https')
>>>
>>> req = Request('https://hostame:port/some/path')
>>> req.get_host(), req.get_type()
('hostame:port', 'https')
>>> req.set_proxy('proxy:other-port', 'http')
>>> req.get_host(), req.get_type()
('proxy:other-port', 'https')


... but only if you're actually inspecting host/type along the way!

>>> from urllib2 import Request
>>> req = Request('https://hostame:port/some/path')
>>> req.set_proxy('proxy:other-port', 'http')
>>> req.get_host(), req.get_type()
('proxy:other-port', 'http')

(FWIW, this came up while porting an application from py2 to py3; there was a 
unit test expecting that last behavior of proxying a https connection through a 
http proxy.)

--
components: Library (Lib)
messages: 325449
nosy: tburke
priority: normal
severity: normal
status: open
title: urllib.request.Request.set_proxy doesn't (necessarily) replace type

___
Python tracker 
<https://bugs.python.org/issue34698>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33973] HTTP request-line parsing splits on Unicode whitespace

2018-06-26 Thread Tim Burke


Change by Tim Burke :


--
keywords: +patch
pull_requests: +7539
stage:  -> patch review

___
Python tracker 
<https://bugs.python.org/issue33973>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33973] HTTP request-line parsing splits on Unicode whitespace

2018-06-26 Thread Tim Burke

New submission from Tim Burke :

This causes (admittedly, buggy) clients that would work with a Python 2 server 
to stop working when the server upgrades to Python 3. To demonstrate, run 
`python2.7 -m SimpleHTTPServer 8027` in one terminal and `curl -v 
http://127.0.0.1:8027/你好` in another -- curl reports

*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8027 (#0)
> GET /你好 HTTP/1.1
> Host: 127.0.0.1:8027
> User-Agent: curl/7.54.0
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 404 File not found
< Server: SimpleHTTP/0.6 Python/2.7.10
< Date: Tue, 26 Jun 2018 17:23:25 GMT
< Content-Type: text/html
< Connection: close
<

Error response


Error response
Error code 404.
Message: File not found.
Error code explanation: 404 = Nothing matches the given URI.

* Closing connection 0

...while repeating the experiment with `python3.6 -m http.server 8036` and 
`curl -v http://127.0.0.1:8036/你好` gives

*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8036 (#0)
> GET /你好 HTTP/1.1
> Host: 127.0.0.1:8036
> User-Agent: curl/7.54.0
> Accept: */*
>
http://www.w3.org/TR/html4/strict.dtd;>



Error response


Error response
Error code: 400
Message: Bad request syntax ('GET /ä½\xa0好 HTTP/1.1').
Error code explanation: HTTPStatus.BAD_REQUEST - Bad request 
syntax or unsupported method.


* Connection #0 to host 127.0.0.1 left intact

Granted, a well-behaved client would have quoted the UTF-8 '你好' as 
'%E4%BD%A0%E5%A5%BD' (in which case everything would have behaved as expected), 
but RFC 7230 is pretty clear that the request-line should be SP-delimited. 
While it notes that "recipients MAY instead parse on whitespace-delimited word 
boundaries and, aside from the CRLF terminator, treat any form of whitespace as 
the SP separator", it goes on to say that "such whitespace includes one or more 
of the following octets: SP, HTAB, VT (%x0B), FF (%x0C), or bare CR" with no 
mention of characters like the (ISO-8859-1 encoded) non-breaking space that 
caused the 400 response.

FWIW, there was a similar unicode-separators-are-not-the-right-separators bug 
in header parsing a while back: https://bugs.python.org/issue22233

--
components: Library (Lib), Unicode
messages: 320507
nosy: ezio.melotti, tburke, vstinner
priority: normal
severity: normal
status: open
title: HTTP request-line parsing splits on Unicode whitespace
type: behavior
versions: Python 3.4, Python 3.5, Python 3.6, Python 3.7, Python 3.8

___
Python tracker 
<https://bugs.python.org/issue33973>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com