New submission from Steve Dower <steve.do...@python.org>:

URLs encoded with Punycode/IDNA use NFKC normalization to decompose characters 
[1]. This can result in some characters introducing new segments into a URL.

For example, \uFF03 is not equal to '#' under direct comparison, but normalizes 
to '#' which changes the fragment part of the URL. Similarly \u2100 normalizes 
to 'a/c' which introduces a path segment.

Currently, urlsplit() does not normalize, which may result in it returning a 
different netloc from what a browser would

>>> u = "https://example.com\uf...@bing.com";
>>> urlsplit(u).netloc.rpartition("@")[2]
bing.com

>>> # Simulate
>>> u = "https://example.com\uf...@bing.com".encode("idna").decode("ascii")
>>> urlsplit(u).netloc.rpartition("@")[2]
example.com

(Note that .netloc includes user/pass and .rpartition("@") is often used to 
remove it.)

This may be used to steal cookies or authentication data from applications that 
use the netloc to cache or retrieve this information.

The preferred fix for the urllib module is to detect and raise ValueError if 
NFKC-normalization of the netloc introduce any of '/?#@:'. Applications that 
want to avoid this error should perform their own decomposition using 
unicodedata or transcode to ASCII via IDNA.

>>> # New behavior
>>> u = "https://example.com\uf...@bing.com";
>>> urlsplit(u)
...
ValueError: netloc 'example.com#@bing.com' contains invalid characters under 
NFKC normalization

>>> # Workaround 1
>>> u2 = unicodedata.normalize("NFKC", u)
>>> urlsplit(u2)
SplitResult(scheme='https', netloc='example.com', path='', query='', 
fragment='@bing.com')

>>> # Workaround 2
>>> u3 = u.encode("idna").decode("ascii")
>>> urlsplit(u3)
SplitResult(scheme='https', netloc='example.com', path='', query='', 
fragment='@bing.com')

Note that we do not address other characters, such as those that convert into 
period. The error is only raised for changes that affect how urlsplit() locates 
the netloc and the very common next step of removing credentials from the 
netloc.

This vulnerability was reported by Jonathan Birch of Microsoft Corporation and 
Panayiotis Panayiotou (p.panayiot...@gmail.com) via the Python Security 
Response Team. A CVE number has been requested.

[1]: https://unicode.org/reports/tr46/

----------
assignee: steve.dower
components: Unicode
keywords: security_issue
messages: 337336
nosy: benjamin.peterson, ezio.melotti, larry, ned.deily, steve.dower, vstinner
priority: normal
severity: normal
stage: needs patch
status: open
title: urlsplit does not handle NFKC normalization
type: security
versions: Python 2.7, Python 3.4, Python 3.5, Python 3.6, Python 3.7, Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36216>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to