Re: [sympy] Testing: differences between platforms

Aaron Meurer Sun, 24 Jun 2012 16:12:48 -0700

On Jun 24, 2012, at 1:33 PM, "Ondřej Čertík" <[email protected]> wrote:


> On Sun, Jun 24, 2012 at 10:24 AM, [email protected]
> <[email protected]> wrote:
>>> I noticed that sometimes the Travis CI buildbot succeeds, and Stefan's
>>> buildbot fails on the same code.
>>> I also understand that sometimes the 32bit and 64bit Python versions
>>> behave differently in tests
>>> and sometimes also major Python versions like 3.2 vs 2.7, also I think
>>> 2.4 and 2.5 (or maybe it was 2.3 and 2.4).
>>
>> I do not think that anybody is testing 2.3 or 2.4
>
> Not anymore, but I vaguely remember that by moving to one of these major
> versions of Python (can't remember if it was 2.4 or 2.5) the hash has changed.
>
>>
>>> All these are caused by the different behavior of the hash() function.
>>> In particular, 32bit and 64bit
>>> and also sometimes Python versions have different hash implementation.
>>> Also in SymPy,
>>> sometimes we just store instances in a dictionary, but instances have
>>> essentially random hash
>>> (depending on where they sit in the memory, right?).
>> `id` depends on where they sit in the memory. Most of the hashes in
>> sympy are overloaded, however they are still dependent on many
>> environmental factors.
>
> That's right, in Python you can't compute a hash() of a dict.
> In SymPy, we overload __hash__(), but by looking at it,
> it doesn't seem to depend on "id", but rather on the arguments,
> which depend on the implicit ordering of dictionaries, thus
> on the hashing algorithm in Python. Ok.
>
>>
>>> So rather than trying to make the hash() uniform, we need to make sure
>>> that SymPy tests pass
>>> with any hash() implementation. I think the way to do it is to use the
>>> new "-R" option (http://bugs.python.org/issue13703) and specify
>>> the PYTHONHASHSEED env variable. We will use let's say 3 or 4
>>> different tests with different (but definite) value.
>>> Besides that, we should also run with "-R" and keep the seed random
>>> (to help discover seeds, that break sympy),
>>> but we need to be able to print the seed, so that we can add it to the
>>> test suite. My hope is that by having
>>> 3 or 4 different seeds will catch pretty much all such bugs in sympy
>>> (if things behave randomly,
>>> it shouldn't even matter what seeds we use, as long as we use 3 or 4
>>> different seeds). And we can just use one platform
>>> for testing. This should take care of the hash() differences for good 
>>> hopefully.
>> Aaron has just pushed a PR that enables hash randomization by default
>> in the tests, so probably most of the errors you see are coming
>> exactly from this. Some time ago there was a mailing list discussion
>
> So actually, just by choosing the seed randomly (and printing it, for
> reproducibility),
> and by testing for 2.5, 2.6, 2.7, 3.2, those are 4 randomly different
> hash seeds, so that
> should catch pretty much all such errors.

Python 2.5 does not support hash randomization. Also, you have to have
the latest minor version of 2.6-3.2 for it to work. And remember that
the hash is based on the seed *and* the architecture (32-bit or
64-bit) because that determines the word size of the hash.

>
>> about making the sorting of args in Add and Mul hash agnostic as an
>> immediate solution. Also it was discussed that for a better solution
>> we should change the architecture of sympy in a way that never depends
>> on sorting of args, however this will be hard (it seems that it was
>> left as a very distant goal).
>
> I don't think it's a distant goal -- the actual results of sympy are
> pretty much hash independent,
> printing should be completely hash independent, and some algorithms in
> sympy might depend
> on hash, but we just need to make sure that the actual tests (and
> especially doctests) are hash independent.

I'm starting to think that's better way is to make sure that tests
work with all orderings. If an algorithm is correct, you should get a
mathematically correct result no matter what, but the exact form of
that result will depend on the order in which things were processed.
cse is a good example of this (c.f. the test_expand failure). From
what I remember, it depends both on hash values and the order of
iterating through a dictionary.

>
>>
>>> Besides the hash, is there any other difference between platforms?
>>
>> Not any that we should care about, I think.
>
> So in this case I think all we need is to setup automatic pull request testing
> for the 2.5, 2.6, 2.7, 3.2 Python versions with hash randomization on
> a single computer (e.g. my linode server).
> If they all pass, then we can be reasonably sure that things pass in
> all Python versions
> as well as all platforms.
>
> My goal is to have a single simple red/green light after running
> tests, and if this light is green, it means
> that everything works in all Python versions and all platforms. For
> master, the Travis CI will do it
> *after* they upgrade all their Python's to use hash randomization. In
> the meantime and for pull requests
> we have to use sympy-bot.
>
> Ondrej

I suppose we should open a feature request in their issues to upgrade
their Pythons.

Aaron Meurer

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sympy?hl=en.

Re: [sympy] Testing: differences between platforms

Reply via email to