I am +1 to sorting args on creation.

This is a simple and exhaustive fix.

I think that FiniteSet is usually used for small sets and that sorting
isn't that expensive even for large ones. I think that this discussion is
premature optimization. If a use case arises that requires very large
FiniteSets then I think that we should revisit this issue. Until then I
think that we should revert the change and fix the issues in master.

On Sun, Jul 22, 2012 at 2:45 PM, Ronan Lamy <[email protected]> wrote:

> Le dimanche 22 juillet 2012 à 17:55 +0300, Sergiu Ivanov a écrit :
> > Hello,
> >
> > Sorry for the large delays between my messages in this thread.
> >
> > I will remind the gist of the problem.  I currently have the need to
> > have a collection of FiniteSet's.  ~1 month ago, FiniteSet stopped
> > sorting its elements at creation, which means that the ordering of
> > FiniteSet.args is hash-dependent.  The structure of the sort key is
> > dependent on the order of .args, and is thus hash-dependent.  This
> > means that, if one has a container with FiniteSet's, their order in
> > the container is _hash-dependent_, even if one tries to explicitly
> > sort the container according to FiniteSet.sort_key.
> >
> > On Tue, Jul 10, 2012 at 6:39 PM, Sergiu Ivanov
> > <[email protected]> wrote:
> > > On Tue, Jul 10, 2012 at 6:10 PM, Chris Smith <[email protected]> wrote:
> > >>
> > >> I modified pre and post order traversals to use a sort key (so the
> > >> incoming expression doesn't have to have args presorted). Would this
> > >> help? I no longer have a pull request with it because I haven't solved
> > >> the solve and cse randomization failures, but if you want to pull off
> > >> that single commit it is in my rand branch.
> > >
> > > I must confess I don't know what you refer to when you say "pre and
> > > post order traversals" (I mean, I'm not sure what exactly it's about
> > > in SymPy), but I'll check out your modifications later today and see
> > > whether that helps.
> >
> > Yes, your change would help, but as I can see in
> > sympy.core.basic.preorder_traversal, the independence of the order of
> > args is assured by sorting them.  Using preorder_traversal to make the
> > sort key hash independent is therefore worse than just sorting the
> > .args of the FiniteSet, because it involves sorting _and_ additional
> > effort.
> >
> > Therefore, there are still only two solutions on the table to have
> > FiniteSet produce consistent sort keys:
> >
> > 1. have FiniteSet sort its .args at creation;
> >
> > 2. sort the FiniteSet every time a sort_key is needed.
> >
> > I believe that, compared to the other drawbacks of having FiniteSet
> > not sort its .args at creation, this one is decisive.  Apparently, not
> > sorting .args with performance in mind impacts performance even worse.
> >
> > Are there any counter-arguments to making FiniteSet sort its arguments
> > at creation again?
>
> Most objects are never printed, and sort keys are only really needed for
> printing. It seems wrong to optimise for the uncommon case.
>
> To be efficient, FiniteSet should use sets internally, but sorting would
> prevent that. E.g. membership testing is O(1) with a set but O(n) with a
> list. Sorting also turns set creation from an O(n) operation into
> O(n*log(n)), which in itself is probably only significant with large
> sets. However, there is a more subtle, but far worse, effect of sorting:
> it forces the computation and (especially) comparison of sort keys,
> which is very expensive, making set creation far slower than it can be:
>
> >>> s = set(map(S, range(100)))
> >>> %timeit tuple(sorted(s, key=default_sort_key))
> 100 loops, best of 3: 7.92 ms per loop
> >>> %timeit frozenset(s)
> 100000 loops, best of 3: 6.72 us per loop
>
> For comparison, the current implementation, which basically does
> tuple(frozenset(tuple(s))) when given a set:
>
> >>> %timeit FiniteSet(*s)
> 1000 loops, best of 3: 676 us per loop
> >>> %timeit tuple(frozenset(tuple(s)))
> 1000 loops, best of 3: 575 us per loop
>
> So it's clear that sorting args causes a huge, and IMO unreasonable,
> performance penalty.
>
> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/sympy?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sympy?hl=en.

Reply via email to