ready-research <[email protected]> added the comment:
Some other examples to test this behaviour:
urlparse('https:/\/\/\www.attacker.com/a/b')
urlparse('https:/\www.attacker.com/a/b')
## Comparing it to other languages/runtimes
How do other languages and their runtimes work with URL parsing functions?
Here's Node.js, also showing that it is missing the `host` and `hostname`, with
a similar behavior to the currently reported "buggy" python `urlparse()` one:
```
node
>require("url").parse("https:/\/\/\www.attacker.com/a/b");
Will return
Url {
protocol: 'https:',
slashes: true,
auth: null,
host: '',
port: null,
hostname: '',
hash: null,
search: null,
query: null,
pathname: '/www.attacker.com/a/b',
path: '/www.attacker.com/a/b',
href: 'https:///www.attacker.com/a/b'
}
```
But it is already documented that using Node.js url.parse can lead to security
issues:
https://nodejs.org/dist/latest-v16.x/docs/api/url.html#url_url_parse_urlstring_parsequerystring_slashesdenotehost
`Use of the legacy url.parse() method is discouraged. Users should use the
WHATWG URL API. Because the url.parse() method uses a lenient, non-standard
algorithm for parsing URL strings, security issues can be introduced.
Specifically, issues with host name spoofing and incorrect handling of
usernames and passwords have been identified.`
Here's Ruby, also showing that it is missing the `host` and `hostname`, with a
similar behavior to the currently reported "buggy" python `urlparse()` one:
```sh
irb(main):001:0> require 'uri'
=> false
irb(main):002:0> uri = URI.parse('https:/www.attacker.com/a/b')
=> #<URI::HTTPS https:/www.attacker.com/a/b>
irb(main):003:0> uri.host
=> nil
irb(main):004:0> uri.hostname
=> nil
irb(main):005:0> uri.scheme
=> "https"
irb(main):006:0> uri.path
=> "/www.attacker.com/a/b"
```
That said, it seems that Ruby throws on other permutations of the bad URL,
which python does not. For example:
```
irb(main):011:0> other_uri = URI.parse('https:/\/\/\www.attacker.com/a/b')
Traceback (most recent call last):
8: from /usr/bin/irb:23:in `<main>'
7: from /usr/bin/irb:23:in `load'
6: from /Library/Ruby/Gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `<top
(required)>'
5: from (irb):11
4: from (irb):11:in `rescue in irb_binding'
3: from
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/common.rb:234:in
`parse'
2: from
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:73:in
`parse'
1: from
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:67:in
`split'
URI::InvalidURIError (bad URI(is not URI?):
"https:/\\/\\/\\www.attacker.com/a/b")
```
Same for this other URI, which Ruby does not accept (unlike python, which does
accept it and returns with a missing host and hostname properties as evident
earlier in this report):
```
irb(main):012:0> other_uri = URI.parse('https:/\www.attacker.com/a/b')
Traceback (most recent call last):
8: from /usr/bin/irb:23:in `<main>'
7: from /usr/bin/irb:23:in `load'
6: from /Library/Ruby/Gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `<top
(required)>'
5: from (irb):12
4: from (irb):12:in `rescue in irb_binding'
3: from
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/common.rb:234:in
`parse'
2: from
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:73:in
`parse'
1: from
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:67:in
`split'
URI::InvalidURIError (bad URI(is not URI?): "https:/\\www.attacker.com/a/b")
```
Let's look at PHP. PHP's parse_url() function behaves much like python, where
it misses to identify the host property for all 3 examples provided in this
report:
```
❯ php -a
Interactive shell
php > var_dump(parse_url('https:/\www.attacker.com/a/b'));
array(2) {
["scheme"]=>
string(5) "https"
["path"]=>
string(22) "/\www.attacker.com/a/b"
}
php > var_dump(parse_url('https:/www.attacker.com/a/b'));
array(2) {
["scheme"]=>
string(5) "https"
["path"]=>
string(21) "/www.attacker.com/a/b"
}
php > var_dump(parse_url('https:/\/\/\www.attacker.com/a/b'));
array(2) {
["scheme"]=>
string(5) "https"
["path"]=>
string(26) "/\/\/\www.attacker.com/a/b"
}
php > var_dump(parse_url('https://www.attacker.com/a/b'));
array(3) {
["scheme"]=>
string(5) "https"
["host"]=>
string(16) "www.attacker.com"
["path"]=>
string(4) "/a/b"
}
```
The applicability of this vulnerability
It seems that, there's no direct way of manipulating a python runtime into a
severe impact simply by sending it a malformed URL.
However, a userland logic implementation that bases its decision on the python
urlparse() function may introduce a security vulnerability due to the
unexpected returned values of the function. These vulnerabilities may manifest
as an SSRF, Open Redirect and other types of vulnerabilities related to
incorrectly trusting a URL.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue44744>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com