Hi,

I have been fuzzing[1] various parts of Python standard library for Python 3.7 
with python-afl[2] to find out internal implementation issues that exist in the 
library. What I have been looking for are mainly following:

* Exceptions that are something else than the documented ones. These usually 
indicate an internal implementation issue. For example one would not expect an 
UnicodeDecodeError from netrc.netrc() function when the documentation[3] 
promises netrc.NetrcParseError and there is no way to pass properly sanitized 
file object to the netrc.netrc().
* Differences between values returned by C and Python versions of some 
functions. quopri module may have these.
* Unexpected performance and memory allocation issues. These can be somewhat 
controversial to fix, if at all, but at least in some cases from end-user 
perspective it can be really nasty if for example 
fractions.Fraction("1.64E6646466664") results in hundreds of megabytes of 
memory allocated and takes very long to calculate. I gave up waiting for that 
function call to finish after 5 minutes.

As this is going to result in a decent amount of bug reports (currently I only 
filed one[4], although that audio processing area has much more issues to 
file), I would like to ask your opinion on filing these bug reports. Should I 
report all issues regarding some specific module in one bug report, or try to 
further split them into more fine grained reports that may be related? These 
different types of errors are specifically noticeable in zipfile module that 
includes a lot of different exception and behavioral types on invalid data 
<https://github.com/Barro/python-stdlib-fuzzers/tree/master/zipfile/crashes> . 
And in case of sndhdr module, there are multiple modules with issues (aifc, 
sunau, wave) that then show up also in sndhdr when they are used. Or are some 
of you willing to go through the crashes that pop up and help with the report 
filing?

The code and more verbose description for this is available from 
<https://github.com/Barro/python-stdlib-fuzzers>. It works by default on some 
GNU/Linux systems only (I use Debian testing), as it relies on /dev/shm/ being 
available and uses shell scripts as wrappers that rely on various tools that 
may not be installed on all systems by default.

As a bonus, as this uses coverage based fuzzing, it also opens up the 
possibility of automatically creating a regression test suite for each of the 
fuzzed modules to ensure that the existing functionality (input files under 
<fuzz-target>/corpus/ directory) does not suddenly result in additional 
exceptions and that it is more easy to test potential bug fixes (crash inducing 
files under <fuzz-target>/crashes/ directory).

As a downside, this uses two quite specific tools (afl, python-afl) that have 
further dependencies (Cython) inside them, I doubt the viability of integrating 
this type of testing as part of normal Python verification process. As a 
difference to libFuzzer based fuzzing that is already integrated in Python[5], 
this instruments the actual (and only the) Python code and not the actions that 
the interpreter does in the background. So this should result in better fuzzer 
coverage for Python code that is used with the downside that when C functions 
are called, they are complete black boxes to the fuzzer.

I have mainly run these fuzzer instances at most for several hours per module 
with 4 instances and stopped running no-issue modules after there have been no 
new coverage discovered after more than 10 minutes. Also I have not really 
created high quality initial input files, so I wouldn't be surprised if there 
are more issues lurking around that could be found with throwing more CPU and 
higher quality fuzzers at the problem.

[1]: https://en.wikipedia.org/wiki/Fuzzing
[2]: https://github.com/jwilk/python-afl
[3]: https://docs.python.org/3/library/netrc.html
[4]: https://bugs.python.org/issue34088
[5]: https://github.com/python/cpython/tree/3.7/Modules/_xxtestfuzz

-- 
Jussi Judin
https://jjudin.iki.fi/
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to