On Sun, Jul 22, 2012 at 10:45 PM, Ronan Lamy <[email protected]> wrote:
> Le dimanche 22 juillet 2012 à 17:55 +0300, Sergiu Ivanov a écrit :
>>
>> Therefore, there are still only two solutions on the table to have
>> FiniteSet produce consistent sort keys:
>>
>> 1. have FiniteSet sort its .args at creation;
>>
>> 2. sort the FiniteSet every time a sort_key is needed.
>>
>> I believe that, compared to the other drawbacks of having FiniteSet
>> not sort its .args at creation, this one is decisive.  Apparently, not
>> sorting .args with performance in mind impacts performance even worse.
>>
>> Are there any counter-arguments to making FiniteSet sort its arguments
>> at creation again?
>
> Most objects are never printed, and sort keys are only really needed for
> printing. It seems wrong to optimise for the uncommon case.

As I have stated in the previous mails, if you create a container with
a couple FiniteSet's in it, you will _not_ get hash-independent
ordering of these FiniteSet's in the container because of essentially
arbitrary ordering of FiniteSet.args.  This can be plumbed by
explicitly sorting FiniteSet.args _each time_ a sort key is needed,
but that, well, sounds slightly horrible to me, especially given that
all those sorts could easily be substituted with one.

(But read on, there's a suggestion which I consider better down the
E-mail.)

> To be efficient, FiniteSet should use sets internally, but sorting would
> prevent that. E.g. membership testing is O(1) with a set but O(n) with a
> list.

Currently, FiniteSet stores a non-.args frozenset attribute which is
used exactly for this purposes.  Thus, no matter whether one sorts
.args or not, membership testing is O(1).

> Sorting also turns set creation from an O(n) operation into
> O(n*log(n)), which in itself is probably only significant with large
> sets.

Well, if we do patch FiniteSet.sort_key to sort .args _every time_ a
FiniteSet sort key is needed, you will have that slowdown _every time_
you sort a container with a FiniteSet.

Now, I do admit that not every use case of FiniteSet implies storing
it in a container.  Therefore, perhaps, we could go with a lazy
approach: don't sort .args at FiniteSet creation; however, whenever a
certain order of .args is required, sort it and leave it like that
afterwards.  On the one hand, this will bring in sorting strictly when
needed and, on the other hand, it will avoid the sorting in use cases
when it is not required.

Do you think that's a better approach?

> However, there is a more subtle, but far worse, effect of sorting:
> it forces the computation and (especially) comparison of sort keys,
> which is very expensive, making set creation far slower than it can be:
>
>>>> s = set(map(S, range(100)))
>>>> %timeit tuple(sorted(s, key=default_sort_key))
> 100 loops, best of 3: 7.92 ms per loop
>>>> %timeit frozenset(s)
> 100000 loops, best of 3: 6.72 us per loop
>
> For comparison, the current implementation, which basically does
> tuple(frozenset(tuple(s))) when given a set:
>
>>>> %timeit FiniteSet(*s)
> 1000 loops, best of 3: 676 us per loop
>>>> %timeit tuple(frozenset(tuple(s)))
> 1000 loops, best of 3: 575 us per loop

You can do FiniteSet(s) directly, though:

In [21]: %timeit frozenset(s)
100000 loops, best of 3: 2.4 us per loop

In [23]: %timeit FiniteSet(s)
1000 loops, best of 3: 191 us per loop

Which puts the two more or less on par, if I am reading the %timeit
stats correctly (I've never used it before though, so I probably am
not).

> So it's clear that sorting args causes a huge, and IMO unreasonable,
> performance penalty.

That's right about the creation of FiniteSet.

However, there are situations in which sorting of the elements of the
FiniteSet is required by the algorithm; that's mainly about printing,
computing complement with respect to real numbers, and, of course,
about getting hash-independent sort key.

So, do you think the lazy approach I have suggested above is OK?

Sergiu

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sympy?hl=en.

Reply via email to