Re: [Python-Dev] Need help to fix HTTP Header Injection vulnerability

2019-04-10 Thread Gregory P. Smith
On Wed, Apr 10, 2019 at 11:00 AM Ivan Pozdeev via Python-Dev <
python-dev@python.org> wrote:

>
> On 10.04.2019 7:30, Karthikeyan wrote:
>
> Thanks Gregory. I think it's a good tradeoff to ensure this validation
> only for URLs of http scheme.
>
> I also agree handling newline is little problematic over the years and the
> discussion over the level at which validation should occur also prolongs
> some of the patches. https://bugs.python.org/issue35906 is another
> similar case where splitlines is used but it's better to raise an error and
> the proposed fix could be used there too. Victor seemed to wrote a similar
> PR like linked one for other urllib functions only to fix similar attack in
> ftplib to reject newlines that was eventually fixed only in ftplib
>
> * https://bugs.python.org/issue30713
> * https://bugs.python.org/issue29606
>
> Search also brings multiple issues with one duplicate over another that
> makes these attacks scattered over the tracker and some edge case missing.
> Slightly off topic, the last time I reported a cookie related issue where
> the policy can be overriden by third party library I was asked to fix it in
> stdlib itself since adding fixes to libraries causes maintenance burden to
> downstream libraries to keep up upstream. With urllib being a heavily used
> module across ecosystem it's good to have a fix landing in stdlib that
> secures downstream libraries encouraging users to upgrade Python too.
>
> Validation should occur whenever user data crosses a trust boundary --
> i.e. when the library starts to assume that an extracted chunk now contains
> something valid.
>
> https://tools.ietf.org/html/rfc3986 defines valid syntax (incl. valid
> characters) for every part of a URL -- _of any scheme_ (FYI, \r\n are
> invalid everywhere and the test code for `data:' that Karthikeyan
> referred to is raw data to compare to rather than a part of a URL). It also
> obsoletes all the RFCs that the current code is written against.
>
> AFAICS, urllib.split* fns (obsoleted as public in 3.8) are used by both
> urllib and urllib2 to parse URLs. They can be made to each validate the
> chunk that they split off. urlparse can validate the entire URL altogether.
>
> Also, all modules ought to use the same code (urlparse looks like the best
> candidate) to parse URLs -- this will minimize the attack surface.
>
> I think I can look into this later this week.
>
My PR as of last night cites that RFC and does validation in http.client
while constructing the protocol request payload.  Doing it within split
functions was an initial hack that looked like it might work but didn't
feel right as that isn't what people expect of those functions and that
turned out to be the case as I tested things due to our mess of codepaths
for opening URLs, but they all end with http.client so yay!

I did *not* look at any of the async http client code paths.  (legacy
asyncore or new asyncio).  If there is an issue there, those deserve to
have their own bugs filed.

As for third party PyPI libraries such as urllib3, they are on their own to
fix bugs.  If they happen to use a code path that a stdlib fix helps, good
for them, but honestly they are much better off making and shipping their
own update to avoid the bug.  Users can get it much sooner as it's a mere
pip install -U away rather than a python runtime upgrade.

> Fixing this is going to break code that relies on the current code
> accepting invalid URLs. But the docs have never said that e.g. in urlopen,
> anything apart from a (valid) URL is accepted (in particular, this implies
> that the user is responsible for escaping stuff properly before passing
> it). So I would say that we are within our right here and whoever is
> relying on those quirks is and has always been on unsupported territory.
>
yep.  even http.client.HTTPConnection.request names the function parameter
"url" so anyone embedding whitespace newlines and http protocol strings
within that is well outside of supported territory (as one example in our
own test_xmlrpc was taking advantage of to test a malformed request).

I suggest following up on https://bugs.python.org/issue30458 rather than in
this thread.  the thread did its job, it directed our eyeballs at the
problems. :)

-gps

> Determining which of those quirks are exploitable and which are not to fix
> just the former is an incomparably larger, more error-prone and avoidable
> work. If anything, the history of the issue referenced to by previous
> posters clearly shows that this is too much to ask from the Python team.
>
> I also see other undocumented behavior like accepting '>' (also
> obsoleted as public in 3.8) which I would like to but it's of no harm.
>
> --
>
> Regards,
> Ivan
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/greg%40krypto.org
>

Re: [Python-Dev] Need help to fix HTTP Header Injection vulnerability

2019-04-10 Thread Ivan Pozdeev via Python-Dev


On 10.04.2019 7:30, Karthikeyan wrote:

Thanks Gregory. I think it's a good tradeoff to ensure this validation only for 
URLs of http scheme.

I also agree handling newline is little problematic over the years and the discussion over the level at which validation should occur also 
prolongs some of the patches. https://bugs.python.org/issue35906 is another similar case where splitlines is used but it's better to raise 
an error and the proposed fix could be used there too. Victor seemed to wrote a similar PR like linked one for other urllib functions only 
to fix similar attack in ftplib to reject newlines that was eventually fixed only in ftplib


* https://bugs.python.org/issue30713
* https://bugs.python.org/issue29606

Search also brings multiple issues with one duplicate over another that makes these attacks scattered over the tracker and some edge case 
missing. Slightly off topic, the last time I reported a cookie related issue where the policy can be overriden by third party library I 
was asked to fix it in stdlib itself since adding fixes to libraries causes maintenance burden to downstream libraries to keep up 
upstream. With urllib being a heavily used module across ecosystem it's good to have a fix landing in stdlib that secures downstream 
libraries encouraging users to upgrade Python too.


Validation should occur whenever user data crosses a trust boundary -- i.e. when the library starts to assume that an extracted chunk now 
contains something valid.


https://tools.ietf.org/html/rfc3986 defines valid syntax (incl. valid characters) for every part of a URL -- _of any scheme_ (FYI, \r\n are 
invalid everywhere and the test code for     `data:' that Karthikeyan referred to is raw data to compare to rather than a part of a URL). It 
also obsoletes all the RFCs that the current code is written against.


AFAICS, urllib.split* fns (obsoleted as public in 3.8) are used by both urllib and urllib2 to parse URLs. They can be made to each validate 
the chunk that they split off. urlparse can validate the entire URL altogether.


Also, all modules ought to use the same code (urlparse looks like the best 
candidate) to parse URLs -- this will minimize the attack surface.

I think I can look into this later this week.

Fixing this is going to break code that relies on the current code accepting invalid URLs. But the docs have never said that e.g. in 
urlopen, anything apart from a (valid) URL is accepted (in particular, this implies that the user is responsible for escaping stuff properly 
before passing it). So I would say that we are within our right here and whoever is relying on those quirks is and has always been on 
unsupported territory.
Determining which of those quirks are exploitable and which are not to fix just the former is an incomparably larger, more error-prone and 
avoidable work. If anything, the history of the issue referenced to by previous posters clearly shows that this is too much to ask from the 
Python team.


I also see other undocumented behavior like accepting '>' (also 
obsoleted as public in 3.8) which I would like to but it's of no harm.

--

Regards,
Ivan

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Need help to fix HTTP Header Injection vulnerability

2019-04-10 Thread Karthikeyan
> 1. Is there a library of URL / Header injection tests e.g. for fuzzing
> that we could generate additional test cases with or from?


https://github.com/swisskyrepo/PayloadsAllTheThings seems to contain
payload related stuff but not sure how useful it is for URL parsing.

>
> 2. Are requests.get() and requests.post() also vulnerable?
>

urllib3 seems to be vulnerable as noted in
https://bugs.python.org/issue36276#msg337837 . requests uses urllib3 under
the hood. The last time I checked requests passed encoded URL to urllib3
where this doesn't seem to be exploitable but I could be wrong.

-- 
Regards,
Karthikeyan S
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Need help to fix HTTP Header Injection vulnerability

2019-04-10 Thread Victor Stinner
Hi,

I dig into Python code history and the bug tracker. I would like to
say that this issue is a work-in-progress since 2004. Different fixes
have been pushed, but there are *A LOT* of open issues:
https://bugs.python.org/issue30458#msg339846

I would suggest to discuss on https://bugs.python.org/issue30458
rather than here, just to avoid to duplicate discussions ;-)

Note: the whole class of issue (HTTP Header Injection) got at least 3
CVE: CVE-2016-5699, CVE-2019-9740, CVE-2019-9947. I changed bpo-30458
title to "[security][CVE-2019-9740][CVE-2019-9947] HTTP Header
Injection (follow-up of CVE-2016-5699)".

Victor

Le mer. 10 avr. 2019 à 12:20, Wes Turner  a écrit :
>
> 1. Is there a library of URL / Header injection tests e.g. for fuzzing that 
> we could generate additional test cases with or from?
>
> 2. Are requests.get() and requests.post() also vulnerable?
>
> 3. Despite the much-heralded UNIX pipe protocols' utility, filenames 
> containing newlines (the de-facto line record delimiter) are possible: 
> "file"$'\n'"name"
>
> Should filenames containing newlines and control characters require a kwarg 
> to be non-None in order to be passed through unescaped to the HTTP request?
>
> On Wednesday, April 10, 2019, Karthikeyan  wrote:
>>
>> Thanks Gregory. I think it's a good tradeoff to ensure this validation only 
>> for URLs of http scheme.
>>
>> I also agree handling newline is little problematic over the years and the 
>> discussion over the level at which validation should occur also prolongs 
>> some of the patches. https://bugs.python.org/issue35906 is another similar 
>> case where splitlines is used but it's better to raise an error and the 
>> proposed fix could be used there too. Victor seemed to wrote a similar PR 
>> like linked one for other urllib functions only to fix similar attack in 
>> ftplib to reject newlines that was eventually fixed only in ftplib
>>
>> * https://bugs.python.org/issue30713
>> * https://bugs.python.org/issue29606
>>
>> Search also brings multiple issues with one duplicate over another that 
>> makes these attacks scattered over the tracker and some edge case missing. 
>> Slightly off topic, the last time I reported a cookie related issue where 
>> the policy can be overriden by third party library I was asked to fix it in 
>> stdlib itself since adding fixes to libraries causes maintenance burden to 
>> downstream libraries to keep up upstream. With urllib being a heavily used 
>> module across ecosystem it's good to have a fix landing in stdlib that 
>> secures downstream libraries encouraging users to upgrade Python too.
>>
>> Regards,
>> Karthikeyan S
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/vstinner%40redhat.com



-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Need help to fix HTTP Header Injection vulnerability

2019-04-10 Thread Wes Turner
1. Is there a library of URL / Header injection tests e.g. for fuzzing that
we could generate additional test cases with or from?

2. Are requests.get() and requests.post() also vulnerable?

3. Despite the much-heralded UNIX pipe protocols' utility, filenames
containing newlines (the de-facto line record delimiter) are possible:
"file"$'\n'"name"

Should filenames containing newlines and control characters require a kwarg
to be non-None in order to be passed through unescaped to the HTTP request?

On Wednesday, April 10, 2019, Karthikeyan  wrote:

> Thanks Gregory. I think it's a good tradeoff to ensure this validation
> only for URLs of http scheme.
>
> I also agree handling newline is little problematic over the years and the
> discussion over the level at which validation should occur also prolongs
> some of the patches. https://bugs.python.org/issue35906 is another
> similar case where splitlines is used but it's better to raise an error and
> the proposed fix could be used there too. Victor seemed to wrote a similar
> PR like linked one for other urllib functions only to fix similar attack in
> ftplib to reject newlines that was eventually fixed only in ftplib
>
> * https://bugs.python.org/issue30713
> * https://bugs.python.org/issue29606
>
> Search also brings multiple issues with one duplicate over another that
> makes these attacks scattered over the tracker and some edge case missing.
> Slightly off topic, the last time I reported a cookie related issue where
> the policy can be overriden by third party library I was asked to fix it in
> stdlib itself since adding fixes to libraries causes maintenance burden to
> downstream libraries to keep up upstream. With urllib being a heavily used
> module across ecosystem it's good to have a fix landing in stdlib that
> secures downstream libraries encouraging users to upgrade Python too.
>
> Regards,
> Karthikeyan S
>
>>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Need help to fix HTTP Header Injection vulnerability

2019-04-09 Thread Karthikeyan
Thanks Gregory. I think it's a good tradeoff to ensure this validation only
for URLs of http scheme.

I also agree handling newline is little problematic over the years and the
discussion over the level at which validation should occur also prolongs
some of the patches. https://bugs.python.org/issue35906 is another similar
case where splitlines is used but it's better to raise an error and the
proposed fix could be used there too. Victor seemed to wrote a similar PR
like linked one for other urllib functions only to fix similar attack in
ftplib to reject newlines that was eventually fixed only in ftplib

* https://bugs.python.org/issue30713
* https://bugs.python.org/issue29606

Search also brings multiple issues with one duplicate over another that
makes these attacks scattered over the tracker and some edge case missing.
Slightly off topic, the last time I reported a cookie related issue where
the policy can be overriden by third party library I was asked to fix it in
stdlib itself since adding fixes to libraries causes maintenance burden to
downstream libraries to keep up upstream. With urllib being a heavily used
module across ecosystem it's good to have a fix landing in stdlib that
secures downstream libraries encouraging users to upgrade Python too.

Regards,
Karthikeyan S

>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Need help to fix HTTP Header Injection vulnerability

2019-04-09 Thread Gregory P. Smith
On Tue, Apr 9, 2019 at 4:45 PM Karthikeyan  wrote:

> I would recommend fixing it since it's potentially remote code execution
> on systems like Redis (latest versions of Redis have this mitigated) though
> I must admit I don't fully understand the complexity since there are
> multiple issues linked. Go was also assigned a CVE for linked issue and it
> seemed to be the same reporter by username : CVE-2019-9741 . I tried using
> go's approach in the commit but urlopen accepts more URLs like data URLs
> [0] that seemed to accept \n as a valid case and the patch broke some
> tests. Looking at the issue discussion complexity also involves backwards
> compatibility. golang also pushed an initial fix that seemed to broke their
> internal tests [0] to arrive at a more simpler fix.
>
> [0]
> https://github.com/python/cpython/blob/a40681dd5db8deaf05a635eecb91498dac882aa4/Lib/test/test_urllib.py#L482
> [1]
> https://go-review.googlesource.com/c/go/+/159157/2#message-39c6be13a192bf760f6318ac641b432a6ab8fdc8
>
> --
> Regards,
> Karthikeyan S
>

useful references, thanks!  limiting the checks to only http and https as
those are the text based protocols with urls transmitted in text form makes
sense and avoids the data: test failures.

proposed simple fix in https://github.com/python/cpython/pull/12755

but tests are needed as is an audit of the code to see where else we may
potentially need to do such things.

-gps
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Need help to fix HTTP Header Injection vulnerability

2019-04-09 Thread Karthikeyan
I would recommend fixing it since it's potentially remote code execution on
systems like Redis (latest versions of Redis have this mitigated) though I
must admit I don't fully understand the complexity since there are multiple
issues linked. Go was also assigned a CVE for linked issue and it seemed to
be the same reporter by username : CVE-2019-9741 . I tried using go's
approach in the commit but urlopen accepts more URLs like data URLs [0]
that seemed to accept \n as a valid case and the patch broke some tests.
Looking at the issue discussion complexity also involves backwards
compatibility. golang also pushed an initial fix that seemed to broke their
internal tests [0] to arrive at a more simpler fix.

[0]
https://github.com/python/cpython/blob/a40681dd5db8deaf05a635eecb91498dac882aa4/Lib/test/test_urllib.py#L482
[1]
https://go-review.googlesource.com/c/go/+/159157/2#message-39c6be13a192bf760f6318ac641b432a6ab8fdc8

-- 
Regards,
Karthikeyan S
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com