Le dimanche 22 juillet 2012 à 17:55 +0300, Sergiu Ivanov a écrit : > Hello, > > Sorry for the large delays between my messages in this thread. > > I will remind the gist of the problem. I currently have the need to > have a collection of FiniteSet's. ~1 month ago, FiniteSet stopped > sorting its elements at creation, which means that the ordering of > FiniteSet.args is hash-dependent. The structure of the sort key is > dependent on the order of .args, and is thus hash-dependent. This > means that, if one has a container with FiniteSet's, their order in > the container is _hash-dependent_, even if one tries to explicitly > sort the container according to FiniteSet.sort_key. > > On Tue, Jul 10, 2012 at 6:39 PM, Sergiu Ivanov > <[email protected]> wrote: > > On Tue, Jul 10, 2012 at 6:10 PM, Chris Smith <[email protected]> wrote: > >> > >> I modified pre and post order traversals to use a sort key (so the > >> incoming expression doesn't have to have args presorted). Would this > >> help? I no longer have a pull request with it because I haven't solved > >> the solve and cse randomization failures, but if you want to pull off > >> that single commit it is in my rand branch. > > > > I must confess I don't know what you refer to when you say "pre and > > post order traversals" (I mean, I'm not sure what exactly it's about > > in SymPy), but I'll check out your modifications later today and see > > whether that helps. > > Yes, your change would help, but as I can see in > sympy.core.basic.preorder_traversal, the independence of the order of > args is assured by sorting them. Using preorder_traversal to make the > sort key hash independent is therefore worse than just sorting the > .args of the FiniteSet, because it involves sorting _and_ additional > effort. > > Therefore, there are still only two solutions on the table to have > FiniteSet produce consistent sort keys: > > 1. have FiniteSet sort its .args at creation; > > 2. sort the FiniteSet every time a sort_key is needed. > > I believe that, compared to the other drawbacks of having FiniteSet > not sort its .args at creation, this one is decisive. Apparently, not > sorting .args with performance in mind impacts performance even worse. > > Are there any counter-arguments to making FiniteSet sort its arguments > at creation again?
Most objects are never printed, and sort keys are only really needed for printing. It seems wrong to optimise for the uncommon case. To be efficient, FiniteSet should use sets internally, but sorting would prevent that. E.g. membership testing is O(1) with a set but O(n) with a list. Sorting also turns set creation from an O(n) operation into O(n*log(n)), which in itself is probably only significant with large sets. However, there is a more subtle, but far worse, effect of sorting: it forces the computation and (especially) comparison of sort keys, which is very expensive, making set creation far slower than it can be: >>> s = set(map(S, range(100))) >>> %timeit tuple(sorted(s, key=default_sort_key)) 100 loops, best of 3: 7.92 ms per loop >>> %timeit frozenset(s) 100000 loops, best of 3: 6.72 us per loop For comparison, the current implementation, which basically does tuple(frozenset(tuple(s))) when given a set: >>> %timeit FiniteSet(*s) 1000 loops, best of 3: 676 us per loop >>> %timeit tuple(frozenset(tuple(s))) 1000 loops, best of 3: 575 us per loop So it's clear that sorting args causes a huge, and IMO unreasonable, performance penalty. -- You received this message because you are subscribed to the Google Groups "sympy" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/sympy?hl=en.
