Zac Hatfield-Dodds <[email protected]> added the comment:
(I think structuring this as a high-level explanation is clearer than
point-by-point replies - it covers most of them, but the logic is hopefully
easier to follow)
The core idea of Hypothesis is that making random choices is equivalent to
parsing a random bitstring:
- wrap your test function in a decorator which calls it many times with
random inputs
- describe those inputs with strategies, (which are parser combinators)
- the bytes parsed by strategies can be provided by a Random() instance,
saved to and loaded from a database, edited and (maybe) re-parsed into related
or simpler inputs, etc.
Minithesis [1] is a complete implementation of this core, including strategies,
shrinking, and the database, in a few hundred lines of Python. I think reading
this will answer most of your implementation questions.
Hypothesis is not the *only* property-based testing library for Python (there's
pytest-quickcheck), but in the same sense that `six` is not the only py2/py3
compatibility layer. I attribute this to a combination of a better underlying
model [2], and brute implementation effort to provide great error messages,
integration with popular libraries, lots of strategies, related tooling, and so
on.
[1] https://github.com/DRMacIver/minithesis
See also https://hypothesis.works/articles/how-hypothesis-works/ for an
older writeup; the implementation details are a little outdated but it explains
the motivations better.
[2] https://hypothesis.works/articles/types-and-properties/
So a "strategy" is an object (implementation detail: a parser combinator,
instance of SearchStrategy), which tells Hypothesis how to convert a random
bytestring into whatever object you've asked for. This is a little unusual,
but I haven't found a better way which still makes it easy to mix the strategy
primitives with arbitrary user code such as "pick a username, ensure it's in
the database, and then pass it to my test method".
I share your distaste for "Haskell in Python"; I don't think Hypothesis
introduces more of that flavor than Numpy would introduce Fortran idioms -
there's occasionally a resemblance, but they're both Python-first libraries and
I'd attribute it mostly to the problem being solved. FWIW Hypothesis
deliberately rejected most of the design choices of QuickCheck, and
implementation-wise has more in common with unixy fuzzing tools like AFL.
The strategies in our API can generally be divided into useful primitives (e.g.
integers, floats, lists, tuples, sampled_from, one_of) and pre-composed
strategies (booleans = sampled_from, dictionaries = map+lists-of-kv-tuples,
builds = tuples+fixed_dictionaries+map).
Very specific strategies like emails() are included because they're a common
need, and a naive implementation usually omits exactly the weirder and
error-prone features that might reveal bugs.
We _are_ deliberately vague about the order in which we generate test data,
because the distribution is full of heuristics and adapts itself to the
behaviour of the test - including over multiple runs, thanks to the database.
The exact details are also considered an implementation detail, and subject to
change as we discover heuristics and sampling techniques which find bugs faster.
The upside here is writing effective strategies is straightforward:
- expressing more possible (valid) inputs is good, because e.g. if the bug
triggers on NaN we'd have to generate some NaNs to detect it; and
- limiting use of rejection sampling (hypothesis.assume(), the .filter()
method) is an obvious performance improvement - if only 20% of inputs we try to
generate are valid, the test will take five times longer.
following which Hypothesis' backend (the "conjecture" engine) uses a large
helping of heuristics and fancy code to ensure that we get a nice diverse
sample from the inputs space. Unfortunately there's a direct tension between
high performance and explainable behaviour; Minithesis is readable mostly by
virtue of not handling the hard edge cases.
I'll take my Hypothesmith [3] project for random Python programs as an example
of a complex strategy. It was inspired by CSmith [4] - but is considerably
less sophisticated. CSmith constructs semantically valid C programs;
Hypothesmith (for now!) constructs a subset of syntatically-valid Python. The
basic idea is to "invert" the grammar: start with the root node, recursively
construct a valid parse tree, and then serialise it out to a parsable input
string. Doing a better job would require more time than I had for a
proof-of-concept, but no new insights; and the same applies to defining a
strategy for semantically-valid code. Being based on the grammar I'd guess
it's unlikely to find corner cases in the grammar, but it might do so in the
parser, or via e.g. `(tree := ast.parse(code)) == ast.parse(ast.unparse(tree))`.
Your distrust of the python_programs() strategy is entirely reasonable, and I'd
also want to understand it. I'd just add that it doesn't need to be elegant,
or general, or express all inputs... so long as it can find valid inputs that
reveal a bug, more easily or cheaply than users or other testing techniques :-)
To test compiler optimizations, yes, once you had the strategy that exact test
body would work. An even more effective option is to use "equivalence modulo
inputs" [5] to derive `func2` and also assert that `optimizer(func)(arg) ==
optimizer(func2)(arg)`.
[3] https://pypi.org/project/hypothesmith/ ; in action at
https://github.com/psf/black/blob/main/fuzz.py
[4] https://embed.cs.utah.edu/csmith/
[5] http://vuminhle.com/pdf/pldi14-emi.pdf
I hope that helps; in particular Minithesis should take most of the magic out
of it.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue42109>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com