[issue37461] email.parser.Parser hang
Guido Vranken added the comment: I used fuzzing to find this bug. After applying your patch, the infinite loop is gone and it cannot find any other bugs of this nature. -- ___ Python tracker <https://bugs.python.org/issue37461> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29505] Submit the re, json, & csv modules to oss-fuzz testing
Guido Vranken added the comment: Hi, I've built a generic Python fuzzer and submitted it to OSS-Fuzz. It works by implementing a "def FuzzerRunOne(FuzzerInput):" function in Python in which some arbitrary code is run based on FuzzerInput, which is a bytes object. This is a more versatile solution than the current re, json, csv fuzzers as it requires no custom C code and adding more fuzzing targets is as easy as writing a new harness in Python and adding a build rule. Code coverage is measured at both the CPython level (*.c) and the Python level (*.py). CPython is compiled with AddressSanitizer. What this means is that both CPython memory bugs and Python library bugs (excessive memory consumption, hangs, slowdowns, unexpected exceptions) are expected to transpire. You can see my current set of fuzzers here: https://github.com/guidovranken/python-library-fuzzers The PR to OSS-Fuzz is https://github.com/google/oss-fuzz/pull/2567 Currently, the only Python maintainer who will be receiving automated bug reports is gpshead. Are there any other developers who normally process Python security bug reports and would like to receive notifications? Feel free to respond directly in the OSS-Fuzz PR thread. -- nosy: +Guido ___ Python tracker <https://bugs.python.org/issue29505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37461] email.parser.Parser hang
New submission from Guido Vranken : The following will hang, and consume a large amount of memory: from email.parser import BytesParser, Parser from email.policy import default payload = "".join(chr(c) for c in [0x43, 0x6f, 0x6e, 0x74, 0x65, 0x6e, 0x74, 0x2d, 0x54, 0x79, 0x70, 0x65, 0x3a, 0x78, 0x3b, 0x61, 0x72, 0x1b, 0x2a, 0x3d, 0x22, 0x73, 0x4f, 0x27, 0x23, 0x61, 0xff, 0xff, 0x27, 0x5c, 0x22]) Parser(policy=default).parsestr(payload) -- components: email messages: 346953 nosy: Guido, barry, r.david.murray priority: normal severity: normal status: open title: email.parser.Parser hang type: crash versions: Python 3.9 ___ Python tracker <https://bugs.python.org/issue37461> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23165] Heap overwrite in Python/fileutils.c:_Py_char2wchar() on 32 bit systems due to malloc parameter overflow
New submission from Guido Vranken: The vulnerability described here is exceedingly difficult to exploit, since there is no straight-forward way an attacker (someone who controls a Python script contents but not other values such as system environment variables), can control a relevant parameter to the vulnerable function (_Py_char2wchar in Python/fileutils.c). It is, however, important that it is remediated since unawareness of this vulnerability may cause an unsuspecting author to establish a link between user and the function parameter in future versions of Python. Like I said, the vulnerability is caused by code in the _Py_char2wchar function. Indirectly this function is accessed through Objects/unicodeobject.c:PyUnicode_DecodeLocaleAndSize(), PyUnicode_DecodeFSDefaultAndSize(), PyUnicode_DecodeLocale, and some other functions. As far as I know this can only be exploited on 32-bit architectures (whose overflow threshold of its registers is 2**32). The following description sets out from the latest Python 3.4 code retrieved from https://hg.python.org/cpython . The problem lies in the computation of size of the buffer that will hold the wide char version of the input string: -- Python/fileutils.c -- 296 #ifdef HAVE_BROKEN_MBSTOWCS 297 /* Some platforms have a broken implementation of 298 * mbstowcs which does not count the characters that 299 * would result from conversion. Use an upper bound. 300 */ 301 argsize = strlen(arg); 302 #else 303 argsize = mbstowcs(NULL, arg, 0); 304 #endif ... ... 306 res = (wchar_t *)PyMem_RawMalloc((argsize+1)*sizeof(wchar_t)); and: 331 argsize = strlen(arg) + 1; 332 res = (wchar_t*)PyMem_RawMalloc(argsize*sizeof(wchar_t)); Both invocations to PyMem_RawMalloc are not preceded by code that asserts no overflow will occur as a result of multiplication of the length of 'arg' by sizeof(wchar_t), which is typically 4 bytes. It follows that on a 32-bit architecture, it is possible cause an internal overflow to occur through the supplication of a string whose size is = ((2**32)-1) / 4, which is 1 gigabyte. The supplication of a 1 GB (minus one byte) string will therefore result in a value of 0 being passed to PyMem_RawMalloc, because: argsize = 1024*1024*1024-1 malloc_argument = ((argsize+1) * 4 print malloc_argument 0x # prints '0' Effectively this will result in an allocation of exactly 1 byte, since a parameter of 0 is automatically adjusted to 1 by the underlying _PyMem_RawMalloc(): -- Objects/obmalloc.c -- 51 static void * 52 _PyMem_RawMalloc(void *ctx, size_t size) 53 { 54 /* PyMem_Malloc(0) means malloc(1). Some systems would return NULL 55for malloc(0), which would be treated as an error. Some platforms would 56return a pointer with no memory behind it, which would break pymalloc. 57To solve these problems, allocate an extra byte. */ 58 if (size == 0) 59 size = 1; 60 return malloc(size); 61 } Once the memory has been allocated, mbstowcs() is invoked: -- Python/fileutils.c -- 306 res = (wchar_t *)PyMem_RawMalloc((argsize+1)*sizeof(wchar_t)); 307 if (!res) 308 goto oom; 309 count = mbstowcs(res, arg, argsize+1); In my test setup (latest 32 bit Debian), mbstowcs returns '0', meaning no bytes were written to 'res'. Then, 'res' is iterated over and the iteration is halted as soon as a null-wchar or a wchar which is a surrogate: -- Python/fileutils.c -- 310 if (count != (size_t)-1) { 311 wchar_t *tmp; 312 /* Only use the result if it contains no 313surrogate characters. */ 314 for (tmp = res; *tmp != 0 315 !Py_UNICODE_IS_SURROGATE(*tmp); tmp++) 316 ; 317 if (*tmp == 0) { 318 if (size != NULL) 319 *size = count; 320 return res; 321 } 322 } 323 PyMem_RawFree(res); Py_UNICODE_IS_SURROGATE is defined as follows: -- Include/unicodeobject.h -- 183 #define Py_UNICODE_IS_SURROGATE(ch) (0xD800 = (ch) (ch) = 0xDFFF) In the iteration over 'res', control is transferred back to the invoker of _Py_char2wchar() if a null-wchar is encountered first. If, however, a wchar that does satisfies the expression in Py_UNICODE_IS_SURROGATE() is encountered first, *tmp is not null and thus the conditional code on lines 318-320 is skipped. The space that 'res' points to is unintialized. Uninitialized, however, does not not entail randomness in this case. If an attacker has sufficient freedom to manipulate the contents of the process memory prior to calling _Py_char2wchar() in order to scatter it with values that satisfy Py_UNICODE_IS_SURROGATE(), this could increase their odds of having _Py_char2wchar() encounter such a value
[issue23130] Tools/scripts/ftpmirror.py allows overwriting arbitrary files on filesystem
New submission from Guido Vranken: Tools/scripts/ftpmirror.py does not guard against arbitrary path constructions, and, given a connection to a malicious FTP server (or a man in the middle attack), it is possible that any file on the client's filesystem gets overwritten. Ie,. if we suppose that ftpmirror.py is run from a base directory /home/xxx/yyy, file creations can occur outside this base directory, such as in /tmp, /etc, /var, just to give some examples. I've constructed a partial proof of concept FTP server that demonstrates directory and file creation outside the base directory (the directory the client script was launched from). I understand that most of the files in Tools/scripts/ are legacy applications that have long been deprecated. However, if the maintainers think these applications should be safe nonetheless, I'll be happy to construct and submit a patch that will remediate this issue. Guido Vranken Intelworks -- components: Demos and Tools messages: 233189 nosy: Guido priority: normal severity: normal status: open title: Tools/scripts/ftpmirror.py allows overwriting arbitrary files on filesystem type: security versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23130 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23055] PyUnicode_FromFormatV crasher
Guido Vranken added the comment: Serhiy Storchaka: good call on changing my 'n += (width + precision) 20 ? 20 : (width + precision);' into 'if (width precision) width = precision;', I didn't realize that sprintf's space requirement entails using the largest of the two instead of adding the two together. I noticed the apparently pointless width calculation in 'step 1' but decided not to touch it -- good that it's removed now though. I will start doing more debugging based on this new patch now to ensure that the bug is gone now. On a more design-related note, for the sake of readability and stability, I'd personally opt for implementing toned-down custom sprintf-like function that does exactly what it needs to do and nothing more, since a function like this one requires perfect alignment with the underlying sprintf() in terms of functionality, at the possible expense of stability and integrity issues like we see here. For instance, width and precision are currently overflowable, resulting in either a minus sign appearing in the resulant format string given to sprintf() (width and precision are signed integers), or completely overflowing it (ie. (uint64_t)18446744073709551617 == 1 ). Considering the latter example, how do we know sprintf uses the same logic? Guido -- nosy: +Guido ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23055 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23055] PyUnicode_FromFormatV crasher
Guido Vranken added the comment: I'd also like to add that, although I agree with Guido van Rossum that the likelihood of even triggering this bug in a general programming context is low, there are two buffer overflows at play here (one stack-based and one heap-based), and given an adversary's control over the format and vargs parameters, I'd there is a reasonable likelihood of exploiting it to execute arbitrary code, since the one controlling the parameters has some control as to which bytes end up where outside buffer boundaries. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23055 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22928] HTTP header injection in urrlib2/urllib/httplib/http.client
New submission from Guido Vranken: Proof of concept: # Script for Python 2 import urllib2 opener = urllib2.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0' + chr(0x0A) + Location: header injection)] response = opener.open(http://localhost:;) # Data sent is: GET / HTTP/1.1 Accept-Encoding: identity Host: localhost: Connection: close User-Agent: Mozilla/5.0 Location: header injection # End of script # Python 3 from urllib.request import urlopen, build_opener opener = build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0' + chr(0x0A) + Location: header injection)] opener.open(http://localhost:;) # Data sent is: GET / HTTP/1.1 Accept-Encoding: identity Host: localhost: Connection: close User-Agent: Mozilla/5.0 Location: header injection # End of script It is the responsibility of the developer leveraging Python and its HTTP client libraries to ensure that their (web) application acts in accordance to official HTTP specifications and that no threats to security will arise from their code. However, newlines inside headers are arguably a special case of breaking the conformity with RFC's in regard to the allowed character set. No illegal character used inside a HTTP header is likely to have a compromising side effect on back-end clients and servers and the integrity of their communication, as a result of the leniency of most web servers. However, a newline character (0x0A) embedded in a HTTP header invariably has the semantic consequence of denoting the start of an additional header line. To put it differently, not sanitizing headers in complete accordance to RFC's could be seen as as virtue in that it gives the programmer a maximum amount of freedom, without having to trade it for any likely or severe security ramifications, so that they may use illegal characters in testing environments and environments that are outlined by an expliticly less strict interpretation of the HTTP protocol. Newlines are special in that they enable anyone who is able to influence the header content, to, in effect, perform additional invocations to add_header(). In issue 17322 ( http://bugs.python.org/issue17322 ) there is some discussion as to the general compliance to RFC's by the HTTP client libraries. I'd like to opt to begin with prohibiting newline characters to be present in HTTP headers. Although this issue is not a hard vulnerability such as a buffer overflow, it does translate to a potentially equal level of severity when considered from the perspective of a web-enabled application, for which purpose the HTTP libraries are typically used for. Lack of input validation on the application developer's end will faciliate header injections, for example if user-supplied data will end up as cookie content verbatim. Adding this proposed additional layer of validation inside Python minimizes the likelihood of a successful header injection while functionality is not notably affected. I'm inclined to add this validation to putheader() in the 'http' module rather than in urllib, as this will secure all invocations to 'http' regardless of intermediate libraries such as urllib. Included is a patch for the latest checkout of the default branch that will cause CannotSendHeader() to be raised if a newline character is detected in either a header name or its value. Aside from detecting \n, it also breaks on \r as their respective implications can be similar. Feel free to adjust, rewrite and transpose this to other branches where you feel this is appropriate. Guido Vranken Intelworks -- components: Library (Lib) files: disable_http_header_injection.patch keywords: patch messages: 231590 nosy: Guido priority: normal severity: normal status: open title: HTTP header injection in urrlib2/urllib/httplib/http.client type: security versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6 Added file: http://bugs.python.org/file37264/disable_http_header_injection.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22928 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com