On Jun 24, 2012, at 1:33 PM, "Ondřej Čertík" <[email protected]> wrote:
> On Sun, Jun 24, 2012 at 10:24 AM, [email protected] > <[email protected]> wrote: >>> I noticed that sometimes the Travis CI buildbot succeeds, and Stefan's >>> buildbot fails on the same code. >>> I also understand that sometimes the 32bit and 64bit Python versions >>> behave differently in tests >>> and sometimes also major Python versions like 3.2 vs 2.7, also I think >>> 2.4 and 2.5 (or maybe it was 2.3 and 2.4). >> >> I do not think that anybody is testing 2.3 or 2.4 > > Not anymore, but I vaguely remember that by moving to one of these major > versions of Python (can't remember if it was 2.4 or 2.5) the hash has changed. > >> >>> All these are caused by the different behavior of the hash() function. >>> In particular, 32bit and 64bit >>> and also sometimes Python versions have different hash implementation. >>> Also in SymPy, >>> sometimes we just store instances in a dictionary, but instances have >>> essentially random hash >>> (depending on where they sit in the memory, right?). >> `id` depends on where they sit in the memory. Most of the hashes in >> sympy are overloaded, however they are still dependent on many >> environmental factors. > > That's right, in Python you can't compute a hash() of a dict. > In SymPy, we overload __hash__(), but by looking at it, > it doesn't seem to depend on "id", but rather on the arguments, > which depend on the implicit ordering of dictionaries, thus > on the hashing algorithm in Python. Ok. > >> >>> So rather than trying to make the hash() uniform, we need to make sure >>> that SymPy tests pass >>> with any hash() implementation. I think the way to do it is to use the >>> new "-R" option (http://bugs.python.org/issue13703) and specify >>> the PYTHONHASHSEED env variable. We will use let's say 3 or 4 >>> different tests with different (but definite) value. >>> Besides that, we should also run with "-R" and keep the seed random >>> (to help discover seeds, that break sympy), >>> but we need to be able to print the seed, so that we can add it to the >>> test suite. My hope is that by having >>> 3 or 4 different seeds will catch pretty much all such bugs in sympy >>> (if things behave randomly, >>> it shouldn't even matter what seeds we use, as long as we use 3 or 4 >>> different seeds). And we can just use one platform >>> for testing. This should take care of the hash() differences for good >>> hopefully. >> Aaron has just pushed a PR that enables hash randomization by default >> in the tests, so probably most of the errors you see are coming >> exactly from this. Some time ago there was a mailing list discussion > > So actually, just by choosing the seed randomly (and printing it, for > reproducibility), > and by testing for 2.5, 2.6, 2.7, 3.2, those are 4 randomly different > hash seeds, so that > should catch pretty much all such errors. Python 2.5 does not support hash randomization. Also, you have to have the latest minor version of 2.6-3.2 for it to work. And remember that the hash is based on the seed *and* the architecture (32-bit or 64-bit) because that determines the word size of the hash. > >> about making the sorting of args in Add and Mul hash agnostic as an >> immediate solution. Also it was discussed that for a better solution >> we should change the architecture of sympy in a way that never depends >> on sorting of args, however this will be hard (it seems that it was >> left as a very distant goal). > > I don't think it's a distant goal -- the actual results of sympy are > pretty much hash independent, > printing should be completely hash independent, and some algorithms in > sympy might depend > on hash, but we just need to make sure that the actual tests (and > especially doctests) are hash independent. I'm starting to think that's better way is to make sure that tests work with all orderings. If an algorithm is correct, you should get a mathematically correct result no matter what, but the exact form of that result will depend on the order in which things were processed. cse is a good example of this (c.f. the test_expand failure). From what I remember, it depends both on hash values and the order of iterating through a dictionary. > >> >>> Besides the hash, is there any other difference between platforms? >> >> Not any that we should care about, I think. > > So in this case I think all we need is to setup automatic pull request testing > for the 2.5, 2.6, 2.7, 3.2 Python versions with hash randomization on > a single computer (e.g. my linode server). > If they all pass, then we can be reasonably sure that things pass in > all Python versions > as well as all platforms. > > My goal is to have a single simple red/green light after running > tests, and if this light is green, it means > that everything works in all Python versions and all platforms. For > master, the Travis CI will do it > *after* they upgrade all their Python's to use hash randomization. In > the meantime and for pull requests > we have to use sympy-bot. > > Ondrej I suppose we should open a feature request in their issues to upgrade their Pythons. Aaron Meurer -- You received this message because you are subscribed to the Google Groups "sympy" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/sympy?hl=en.
