[issue35748] urlparse library detecting wrong hostname leads to open redirect vulnerability

2019-08-17 Thread Ilya Konstantinov

Ilya Konstantinov  added the comment:

>From RFC-1738:

hostname   = *[ domainlabel "." ] toplabel
domainlabel= alphadigit | alphadigit *[ alphadigit | "-" ] alphadigit
toplabel   = alpha | alpha *[ alphadigit | "-" ] alphadigit
alphadigit = alpha | digit


py> urlparse('https://foo\\bar/baz')
ParseResult(scheme='https', netloc='foo\\bar', path='/baz', params='', 
query='', fragment='')

The hostname's BNF doesn't allow for a backslash ('\\') character, so I'd 
expect urlparse to raise a ValueError for this "URL".

nosy: +Ilya Konstantinov

Python tracker 

Python-bugs-list mailing list

[issue35748] urlparse library detecting wrong hostname leads to open redirect vulnerability

2019-01-19 Thread Steven D'Aprano

Steven D'Aprano  added the comment:

> The “urllib.parse” module generally follows RFC 3986, which does not 
> allow a literal backslash in the “userinfo” part:

And yet the parse() function seems to allow arbitrary unescaped 
characters. This is from 3.8.0a0:

py> from urllib.parse import urlparse
py> urlparse(r'http://spam\eggs!cheese&aardv...@evil.com').netloc
py> urlparse(r'http://spam\eggs!cheese&aardv...@evil.com').hostname

If that's a bug, it is a separate bug to this issue.

Backslash doesn't seem relevant to the security issue of userinfo being 
used to mislead:

py> urlparse('http://www.google@evil.com').netloc
py> urlparse('http://www.google@evil.com').hostname

If it is relevant, can somebody explain to me how?


Python tracker 

Python-bugs-list mailing list

[issue35748] urlparse library detecting wrong hostname leads to open redirect vulnerability

2019-01-18 Thread Martin Panter

Martin Panter  added the comment:

The “urllib.parse” module generally follows RFC 3986, which does not allow a 
literal backslash in the “userinfo” part:

userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

The RFC does not allow a backslash in the host name, path, query or fragment 
either. That is why I said the URL is not valid.


Python tracker 

Python-bugs-list mailing list

[issue35748] urlparse library detecting wrong hostname leads to open redirect vulnerability

2019-01-18 Thread Steven D'Aprano

Steven D'Aprano  added the comment:

I believe that Python's behaviour here is correct. You are supplying a netloc 
which includes a username "www.google.com\" with no password. That might be 
what you intend to do, or it might be malicious data. That depends on context, 
and the urlparse module can't tell what the context is and has no reason to 
assume malice.

If I am reading this correctly:


the colon after the username can be omitted, so the URL is legal and Python has 
returned the correct value for the netloc.

As Christian says, Python is not an end-user application like a browser. It is 
right and proper for a browser to expect that the user is non-technical and may 
not have noticed the @ sign, and to expect malicious behaviour, or to assume 
that backslash \ is a typo for forward slash / but Python programmers by 
definition are technical users and it is their responsibility to validate their 

There are legitimate uses for the userinfo component (user:password@hostname) 
and it is not the library's responsibility to assume that backslashes are typos 
for forward slashes.

So I think that the behaviour here is correct, and this should be closed. But 
if you disagree, please explain what you think the library should do, and why. 
WHen you do, remember that:

* there are legitimate users for user:password@hostname;
* either the user name or the password can contain backslashes.

nosy: +steven.daprano

Python tracker 

Python-bugs-list mailing list

[issue35748] urlparse library detecting wrong hostname leads to open redirect vulnerability

2019-01-16 Thread Karthikeyan Singaravelan

Karthikeyan Singaravelan  added the comment:

There are also some notes at https://tools.ietf.org/html/rfc3986#section-7.6

Because the userinfo subcomponent is rarely used and appears before
the host in the authority component, it can be used to construct a
URI intended to mislead a human user by appearing to identify one
(trusted) naming authority while actually identifying a different
authority hidden behind the noise.  For example


might lead a human user to assume that the host is 'cnn.example.com',
whereas it is actually ''.  Note that a misleading userinfo
subcomponent could be much longer than the example above.

A misleading URI, such as that above, is an attack on the user's
preconceived notions about the meaning of a URI rather than an attack
on the software itself.  User agents may be able to reduce the impact
of such attacks by distinguishing the various components of the URI
when they are rendered, such as by using a different color or tone to
render userinfo if any is present, though there is no panacea.  More
information on URI-based semantic attacks can be found in [Siedzik]

In Firefox nightly and latest chrome pasting the above URL makes a request to where in Chrome the URL in the address bar is changed to and Firefox has the same URL in the address bar. Python 
also returns '' as the hostname for the above example using urlparse.


Python tracker 

Python-bugs-list mailing list

[issue35748] urlparse library detecting wrong hostname leads to open redirect vulnerability

2019-01-16 Thread Karthikeyan Singaravelan

Karthikeyan Singaravelan  added the comment:

I just tested other implementations in Ruby and Go and they too return host as 
"evil.com" for "http://www.google@evil.com"; along with the user info 

$ ruby -e 'require "uri"; puts URI("http://www.google@evil.com";).hostname'
$ cat /tmp/foo.go
package main

import (

func main() {
u, _ := url.Parse(`http://www.google@evil.com`)
$ go run /tmp/foo.go

nosy: +xtreak

Python tracker 

Python-bugs-list mailing list

[issue35748] urlparse library detecting wrong hostname leads to open redirect vulnerability

2019-01-16 Thread Christian Heimes

Christian Heimes  added the comment:

You cannot compare a low level library like Python's urllib module with a user 
interface like a modern browser. Browsers do a lot of extra work to make sense 
of user input. For example Firefox and Chrome mangle your example URL and 
replace \ with /. Firefox even shows a warning when the URL contains user and 

You are about to log in to the site “python.org” with the username “user”, but 
the website does not require authentication. This may be an attempt to trick 

Is “python.org” the site you want to visit?

nosy: +christian.heimes

Python tracker 

Python-bugs-list mailing list

[issue35748] urlparse library detecting wrong hostname leads to open redirect vulnerability

2019-01-16 Thread Neeraj Sonaniya

Neeraj Sonaniya  added the comment:


I know that \ (backslash)  should be encoded to url encoding (%5c) but if the 
same url (without urlencoded form) typed into URL bar of browser we are getting 
hostname to 'https://www.google.com'


Python tracker 

Python-bugs-list mailing list

[issue35748] urlparse library detecting wrong hostname leads to open redirect vulnerability

2019-01-16 Thread Martin Panter

Martin Panter  added the comment:

FWIW I understand the backslash should be percent-encoded in URLs, otherwise 
the URL is not valid.

This reminds me of a few other bugs:

* Issue 30500: Made the behaviour of fragment (#. . .) versus userinfo (. . .@) 
consistent, e.g. in //www.google.com#@xxx.com
* Issue 18140: Also about the ambiguity of fragment (#. . .) and query (?. . .) 
versus userinfo (. . .@)
* Issue 23328: Precedence of path segment (/. . .) versus userinfo (. . .@); 
e.g. //user/name:pass/w...@www.google.com

I think people some times come up with these invalid URLs because they are 
trying to make a URL that includes a password with unusual characters (e.g. for 
the “http_proxy” environment variable). So raising an exception or otherwise 
changing the parsing behaviour could break those cases.


Python tracker 

Python-bugs-list mailing list

[issue35748] urlparse library detecting wrong hostname leads to open redirect vulnerability

2019-01-16 Thread Karthikeyan Singaravelan

Change by Karthikeyan Singaravelan :

nosy: +martin.panter

Python tracker 

Python-bugs-list mailing list

[issue35748] urlparse library detecting wrong hostname leads to open redirect vulnerability

2019-01-15 Thread Neeraj Sonaniya

New submission from Neeraj Sonaniya :

It have been identified that `urlparse` under `urllib.parse` module is 
detecting wrong hostname which could leads to a security issue known as Open 
redirect vulnerability.

Steps to reproduce the issue:

Following code will help you in reproducing the issue:

from urllib.parse import urlparse
x= 'http://www.google.com\@xxx.com'
y = urlparse(x)


The hostname from above URL which is actually rendered by browser is : 

In following browsers tested: (hostname detected as: https://www.google.com)

1. Chromium - Version 72.0.3626.7  - Developer Build
2. Firefox - 60.4.0esr (64-bit)
3. Internet Explorer - 11.0.9600.17843
4. Safari - Version 12.0.2 (14606.3.4)

components: Library (Lib)
files: Screenshot from 2019-01-16 12-47-22.png
messages: 333750
nosy: nsonaniya2010, orsenthil
priority: normal
severity: normal
status: open
title: urlparse library detecting wrong hostname leads to open redirect 
type: security
versions: Python 3.6, Python 3.7, Python 3.8
Added file: https://bugs.python.org/file48058/Screenshot from 2019-01-16 

Python tracker 

Python-bugs-list mailing list