I am +1 to sorting args on creation. This is a simple and exhaustive fix.
I think that FiniteSet is usually used for small sets and that sorting isn't that expensive even for large ones. I think that this discussion is premature optimization. If a use case arises that requires very large FiniteSets then I think that we should revisit this issue. Until then I think that we should revert the change and fix the issues in master. On Sun, Jul 22, 2012 at 2:45 PM, Ronan Lamy <[email protected]> wrote: > Le dimanche 22 juillet 2012 à 17:55 +0300, Sergiu Ivanov a écrit : > > Hello, > > > > Sorry for the large delays between my messages in this thread. > > > > I will remind the gist of the problem. I currently have the need to > > have a collection of FiniteSet's. ~1 month ago, FiniteSet stopped > > sorting its elements at creation, which means that the ordering of > > FiniteSet.args is hash-dependent. The structure of the sort key is > > dependent on the order of .args, and is thus hash-dependent. This > > means that, if one has a container with FiniteSet's, their order in > > the container is _hash-dependent_, even if one tries to explicitly > > sort the container according to FiniteSet.sort_key. > > > > On Tue, Jul 10, 2012 at 6:39 PM, Sergiu Ivanov > > <[email protected]> wrote: > > > On Tue, Jul 10, 2012 at 6:10 PM, Chris Smith <[email protected]> wrote: > > >> > > >> I modified pre and post order traversals to use a sort key (so the > > >> incoming expression doesn't have to have args presorted). Would this > > >> help? I no longer have a pull request with it because I haven't solved > > >> the solve and cse randomization failures, but if you want to pull off > > >> that single commit it is in my rand branch. > > > > > > I must confess I don't know what you refer to when you say "pre and > > > post order traversals" (I mean, I'm not sure what exactly it's about > > > in SymPy), but I'll check out your modifications later today and see > > > whether that helps. > > > > Yes, your change would help, but as I can see in > > sympy.core.basic.preorder_traversal, the independence of the order of > > args is assured by sorting them. Using preorder_traversal to make the > > sort key hash independent is therefore worse than just sorting the > > .args of the FiniteSet, because it involves sorting _and_ additional > > effort. > > > > Therefore, there are still only two solutions on the table to have > > FiniteSet produce consistent sort keys: > > > > 1. have FiniteSet sort its .args at creation; > > > > 2. sort the FiniteSet every time a sort_key is needed. > > > > I believe that, compared to the other drawbacks of having FiniteSet > > not sort its .args at creation, this one is decisive. Apparently, not > > sorting .args with performance in mind impacts performance even worse. > > > > Are there any counter-arguments to making FiniteSet sort its arguments > > at creation again? > > Most objects are never printed, and sort keys are only really needed for > printing. It seems wrong to optimise for the uncommon case. > > To be efficient, FiniteSet should use sets internally, but sorting would > prevent that. E.g. membership testing is O(1) with a set but O(n) with a > list. Sorting also turns set creation from an O(n) operation into > O(n*log(n)), which in itself is probably only significant with large > sets. However, there is a more subtle, but far worse, effect of sorting: > it forces the computation and (especially) comparison of sort keys, > which is very expensive, making set creation far slower than it can be: > > >>> s = set(map(S, range(100))) > >>> %timeit tuple(sorted(s, key=default_sort_key)) > 100 loops, best of 3: 7.92 ms per loop > >>> %timeit frozenset(s) > 100000 loops, best of 3: 6.72 us per loop > > For comparison, the current implementation, which basically does > tuple(frozenset(tuple(s))) when given a set: > > >>> %timeit FiniteSet(*s) > 1000 loops, best of 3: 676 us per loop > >>> %timeit tuple(frozenset(tuple(s))) > 1000 loops, best of 3: 575 us per loop > > So it's clear that sorting args causes a huge, and IMO unreasonable, > performance penalty. > > -- > You received this message because you are subscribed to the Google Groups > "sympy" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/sympy?hl=en. > > -- You received this message because you are subscribed to the Google Groups "sympy" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/sympy?hl=en.
