Re: [sympy] FiniteSet.sort_key and the ordering of .args

Ronan Lamy Mon, 23 Jul 2012 08:36:18 -0700

Le dimanche 22 juillet 2012 à 23:07 +0300, Sergiu Ivanov a écrit :
> On Sun, Jul 22, 2012 at 10:45 PM, Ronan Lamy <[email protected]> wrote:
> > Le dimanche 22 juillet 2012 à 17:55 +0300, Sergiu Ivanov a écrit :
> >>
> >> Therefore, there are still only two solutions on the table to have
> >> FiniteSet produce consistent sort keys:
> >>
> >> 1. have FiniteSet sort its .args at creation;
> >>
> >> 2. sort the FiniteSet every time a sort_key is needed.
> >>
> >> I believe that, compared to the other drawbacks of having FiniteSet
> >> not sort its .args at creation, this one is decisive.  Apparently, not
> >> sorting .args with performance in mind impacts performance even worse.
> >>
> >> Are there any counter-arguments to making FiniteSet sort its arguments
> >> at creation again?
> >
> > Most objects are never printed, and sort keys are only really needed for
> > printing. It seems wrong to optimise for the uncommon case.
> 
> As I have stated in the previous mails, if you create a container with
> a couple FiniteSet's in it, you will _not_ get hash-independent
> ordering of these FiniteSet's in the container because of essentially
> arbitrary ordering of FiniteSet.args.  This can be plumbed by
> explicitly sorting FiniteSet.args _each time_ a sort key is needed,
> but that, well, sounds slightly horrible to me, especially given that
> all those sorts could easily be substituted with one.
> 
> (But read on, there's a suggestion which I consider better down the
> E-mail.)
> 
> > To be efficient, FiniteSet should use sets internally, but sorting would
> > prevent that. E.g. membership testing is O(1) with a set but O(n) with a
> > list.
> 
> Currently, FiniteSet stores a non-.args frozenset attribute which is
> used exactly for this purposes.  Thus, no matter whether one sorts
> .args or not, membership testing is O(1).


OK, I should have checked. However having two copies of the arguments is
inefficient and complicates the code.
> 
> > Sorting also turns set creation from an O(n) operation into
> > O(n*log(n)), which in itself is probably only significant with large
> > sets.
> 
> Well, if we do patch FiniteSet.sort_key to sort .args _every time_ a
> FiniteSet sort key is needed, you will have that slowdown _every time_
> you sort a container with a FiniteSet.

Well, if we don't sort sets, we don't sort sets of sets either, so we
don't the sort key when we put a set in another set. 
> 
> Now, I do admit that not every use case of FiniteSet implies storing
> it in a container.  Therefore, perhaps, we could go with a lazy
> approach: don't sort .args at FiniteSet creation; however, whenever a
> certain order of .args is required, sort it and leave it like that
> afterwards.  On the one hand, this will bring in sorting strictly when
> needed and, on the other hand, it will avoid the sorting in use cases
> when it is not required.
> 
> Do you think that's a better approach?

Changing the args during the life of the object seems risky and adds
complications. For instance, computing the hash will require that the
args are sorted. And what if something keeps a reference to the original
args after they are sorted?

> 
> > However, there is a more subtle, but far worse, effect of sorting:
> > it forces the computation and (especially) comparison of sort keys,
> > which is very expensive, making set creation far slower than it can be:
> >
> >>>> s = set(map(S, range(100)))
> >>>> %timeit tuple(sorted(s, key=default_sort_key))
> > 100 loops, best of 3: 7.92 ms per loop
> >>>> %timeit frozenset(s)
> > 100000 loops, best of 3: 6.72 us per loop
> >
> > For comparison, the current implementation, which basically does
> > tuple(frozenset(tuple(s))) when given a set:
> >
> >>>> %timeit FiniteSet(*s)
> > 1000 loops, best of 3: 676 us per loop
> >>>> %timeit tuple(frozenset(tuple(s)))
> > 1000 loops, best of 3: 575 us per loop
> 
> You can do FiniteSet(s) directly, though:

Ah yes, I had forgotten about the Swiss Army Knife interface, but the
difference between FiniteSet(s), FiniteSet(*s) and FiniteSet.fromiter(s)
isn't significant.
> 
> In [21]: %timeit frozenset(s)
> 100000 loops, best of 3: 2.4 us per loop
> 
> In [23]: %timeit FiniteSet(s)
> 1000 loops, best of 3: 191 us per loop
> 
> Which puts the two more or less on par, if I am reading the %timeit
> stats correctly (I've never used it before though, so I probably am
> not).

Er, no, if you're comparing frozenset(s) with FiniteSet(s), then the
former is about 80 times faster than the latter. This isn't surprising,
given that set to frozenset is O(1) while set to list to frozenset to
tuple is O(n). 

> 
> > So it's clear that sorting args causes a huge, and IMO unreasonable,
> > performance penalty.
> 
> That's right about the creation of FiniteSet.
> 
> However, there are situations in which sorting of the elements of the
> FiniteSet is required by the algorithm; that's mainly about printing,
> computing complement with respect to real numbers, and, of course,
> about getting hash-independent sort key.
> 
> So, do you think the lazy approach I have suggested above is OK?
> 
> Sergiu
> 


-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sympy?hl=en.

Re: [sympy] FiniteSet.sort_key and the ordering of .args

Reply via email to