Re: [issue42967] Web cache poisoning - `;` as a query args separator

2021-01-20 Thread M.-A. Lemburg
On 20.01.2021 12:07, STINNER Victor wrote:
> Maybe we should even go further in Python 3.10 and only split at "&" by 
> default, but let the caller to opt-in for ";" separator as well.

+1.

Personally, I've never seen URLs encoded with ";" as query parameter
separator in practice on the server side.

The use of ";" was recommended in the HTML4 spec, but only in an
implementation side note:

https://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2

and not in the main reference:

https://www.w3.org/TR/1999/REC-html401-19991224/interact/forms.html#h-17.13.4.1

Browsers are also pretty relaxed about seeing non-escaped ampersands in
link URLs and do the right thing, so the suggested work-around for
avoiding escaping is not really needed.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42967] Web cache poisoning - `;` as a query args separator

2021-01-20 Thread STINNER Victor


STINNER Victor  added the comment:

Oops, I missed this issue. I just marked my bpo-42975 issue as a duplicate of 
this one.

My message:

urllib.parse.parse_qsl() uses "&" *and* ";" as separators:

>>> urllib.parse.parse_qsl("a=1=2=3")
[('a', '1'), ('b', '2'), ('c', '3')]
>>> urllib.parse.parse_qsl("a=1=2;c=3")
[('a', '1'), ('b', '2'), ('c', '3')]

But the W3C standards evolved and now suggest against considering semicolon 
(";") as a separator:

https://www.w3.org/TR/2014/REC-html5-20141028/forms.html#url-encoded-form-data

"This form data set encoding is in many ways an aberrant monstrosity, the 
result of many years of implementation accidents and compromises leading to a 
set of requirements necessary for interoperability, but in no way representing 
good design practices. In particular, readers are cautioned to pay close 
attention to the twisted details involving repeated (and in some cases nested) 
conversions between character encodings and byte sequences."

"To decode application/x-www-form-urlencoded payloads (...) Let strings be the 
result of strictly splitting the string payload on U+0026 AMPERSAND characters 
(&)."

Maybe we should even go further in Python 3.10 and only split at "&" by 
default, but let the caller to opt-in for ";" separator as well.

--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42967] Web cache poisoning - `;` as a query args separator

2021-01-19 Thread Adam Goldschmidt

New submission from Adam Goldschmidt :

The urlparse module treats semicolon as a separator 
(https://github.com/python/cpython/blob/master/Lib/urllib/parse.py#L739) - 
whereas most proxies today only take ampersands as separators. Link to a blog 
post explaining this vulnerability: 
https://snyk.io/blog/cache-poisoning-in-popular-open-source-packages/

When the attacker can separate query parameters using a semicolon (;), they can 
cause a difference in the interpretation of the request between the proxy 
(running with default configuration) and the server. This can result in 
malicious requests being cached as completely safe ones, as the proxy would 
usually not see the semicolon as a separator, and therefore would not include 
it in a cache key of an unkeyed parameter - such as `utm_*` parameters, which 
are usually unkeyed. Let’s take the following example of a malicious request:   
  

```
GET /?link=http://google.com_content=1;link='>alert(1) HTTP/1.1

Host: somesite.com

Upgrade-Insecure-Requests: 1

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,imag 
e/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9 
Accept-Encoding: gzip, deflate 

Accept-Language: en-US,en;q=0.9 Connection: close   
```

urlparse sees 3 parameters here: `link`, `utm_content` and then `link` again. 
On the other hand, the proxy considers this full string: 
`1;link='>alert(1)` as the value of `utm_content`, which is why the 
cache key would only contain `somesite.com/?link=http://google.com`. 

A possible solution could be to allow developers to specify a separator, like 
werkzeug does:

https://github.com/pallets/werkzeug/blob/6784c44673d25c91613c6bf2e614c84465ad135b/src/werkzeug/urls.py#L833

--
components: C API
messages: 385266
nosy: AdamGold
priority: normal
severity: normal
status: open
title: Web cache poisoning - `;` as a query args separator
type: security
versions: Python 3.10, Python 3.6, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com