Re: [sage-combinat-devel] Python, Sage, categories

William Stein Mon, 26 Jul 2010 11:54:40 -0700

On Mon, Jul 26, 2010 at 11:31 AM, Nicolas M. Thiery
<nicolas.thi...@u-psud.fr> wrote:
> On Fri, Jul 23, 2010 at 01:05:00PM -0700, William Stein wrote:
>> During the last few weeks I've spent time optimizing code, among
>> other things.
>
> Great!
>
>> A surprising number of random things I look at in Sage are too slow
>> by a factor of 10-100, usually due to programming style.  One way to
>> help to remedy that would be to write a book or article about how to
>> program Cython + Python *efficiently*, and avoid the numerous traps.
>> Basically, one false move in Python when implementing mathematics
>> algorithms can knock you down by a factor of 10-100 in speed.
>
> Definitely.
>
>> As an example of this, try multiplying polynomials in ZZ['x'] in
>> Sage right now:
>>
>> sage: R.<x> = ZZ[]
>> sage: timeit('x*x',number=10^6)
>> 1000000 loops, best of 3: 1.96 µs per loop
>>
>> FLINT could do that multiply in a few hundred nanoseconds, so why does
>> it take 2 microseconds (2000 nanoseconds)?  Because the code is
>> complicated and general, but was written without any thought about
>> efficiency.  By making a few changes (really making the code less
>> clever), I get:
>>
>> sage: timeit('x*x', number=10^6)
>> 1000000 loops, best of 3: 687 ns per loop
>>
>> The categories implementation as it is in Sage must be changed or
>> removed.  There, I got your attention.
>
> Let's see about this.
>
> That's before any category code:
>
>    zephyr> /opt/sage-4.2.1/sage
>    sage: R.<x> = ZZ[]
>    sage: timeit('x*x',number=10^6)
>    1000000 loops, best of 3: 1.14 µs per loop
>
> That's after the integration of the main category patches:
>
> zephyr-~monoid>/opt/sage-4.3/sage -br main
>
>    sage: R.<x> = ZZ[]
>    sage: timeit('x*x',number=10^6)
>    1000000 loops, best of 3: 1.12 µs per loop
>
> That's after #7921: Categories for extension types via __getattr___:
>
>    zephyr-~monoid>/opt/sage-4.3.2/sage -br main
>    sage: R.<x> = ZZ[]
>    sage: timeit('x*x',number=10^6)
>    1000000 loops, best of 3: 1.13 µs per loop
>
> That's now:
>
>    zephyr-~monoid>/opt/sage-4.4.4/sage -br main
>    sage: R.<x> = ZZ[]
>    sage: timeit('x*x',number=10^6)
>    1000000 loops, best of 3: 1.19 µs per loop
>
> Please don't blame this one on categories. The overhead is in the
> coercion code which was written by our favorite Cython expert.


I'm talking about the __getattr__ that you added.   Simply commenting
out the __getattr__ function
leads to a significant speedup.  This has been replicated by several
other people, and this
thread contains a deep investigation into why this is happening
(though things still aren't clear).
(But you realized all this below.)

> Now, should the coercion code be changed or removed because it is too
> generic and horribly slow? First note that we are speaking about a
> speed factor of 2 or 3, not 10 or 100. And this is for very low level
> arithmetic; the overhead would be negligible on anything more
> complicated (if not just larger polynomials). So the right thing to do
> is simply to override the generic code by optimized code for those
> element classes which have very cheap arithmetic.  That's what has
> been done for ZZ, ... and as far as I know everybody is happy about
> this approach.

The point -- which you miss -- is that the way the category idea is
implemented in Sage surprisingly causes *all* attribute
lookups to slowdown substantially.  It can easily be the case that a
whole program is just a sequence of attribute lookups,
so the performance penalty of slowing this down is significant.

> speed factor of 2 or 3, not 10 or 100

A factor of 2 or 3 is very significant when it happens uniformly
across the board.

>> A few months ago some code was added to Sage by the combinat folks
>> called "categories".  I recently learned ObjectiveC, and found out
>> that this idea of "categories" goes back to smalltalk at least, and
>> was fully implemented in ObjectiveC in the mid 1980s.  It has little
>> to do with category theory in the sense of mathematics, though is
>> related.  Basically, it is a way to attach a bunch of generic
>> methods to a class without using inheritance.
>
> That's a very reductive view. And actually quite inaccurate: most of
> Sage's parents and elements *do* get their generic methods from the
> categories, through class inheritance.  There is indeed one exception:
> extension types (i.e. Cython classes), which currently use the getattr
> hack (because Cython does not support yet any form of multiple
> inheritance).

Do you know anything about Objective C or smalltalk or "categories" in
those languages?

> Anyway that's just an implementation detail. Of course, categories in
> Sage are strongly rooted in object oriented ideas which date back from
> smalltalk. But what the categories really bring to Sage is about
> interactions between parents, elements, and morphisms, as well as
> generic tests, functorial constructions, etc, all of which have much
> to do with mathematics.
>
>
>> The combinat folks implemented this programming idea in Sage not by
>> changing the Python languge, but by defining a custom __getattr__
>> method at sticking it in the base classes for Sage parents and
>> elements.
>
> Two notes:
>
>  - About the getattr hack, any blame should go to me personally and
>   the reviewer (Robert), not to the "combinat folks".

I don't really care about blame; I just want to find a way to resolve
this problem.
I've slowed down/messed up etc., so many things in Sage over the years,
it isn't even funny.

>
>  - I never liked this hack. Actually the original plan for the
>   implementation of categories before I jumped in was to
>   systematically use this hack. And I had to argue a lot to advance
>   instead the dynamic class approach. I would *love* to be able to
>   get rid of this hack for extension classes as well.
>
>> The problem: this means that every single method lookup on every
>> single Sage object seem to now have about 1500nanoseconds added to
>> it in completely unnecessary overhead.  Most of what Sage does is
>> call functions, so this is a huge, huge problem.
>
> Wow, I confirm this:
>
> Without:
>
>    sage: sage: sage: sage: QQ.testit = 'foobar'
>    timeit('QQ.testit',number=10^6)
>    timeit('QQ._list',number=10^6)
>    timeit('QQ.gens',number=10^6)
>    timeit('QQ.an_element',number=10^6)
>
>    1000000 loops, best of 3: 1.7 µs per loop
>    1000000 loops, best of 3: 422 ns per loop
>    1000000 loops, best of 3: 236 ns per loop
>    1000000 loops, best of 3: 586 ns per loop
>
> With:
>
>    sage: QQ.testit = 'foobar'
>    sage: timeit('QQ.testit',number=10^6)
>    sage: timeit('QQ._list',number=10^6)
>    sage: timeit('QQ.gens',number=10^6)
>    sage: timeit('QQ.an_element',number=10^6)
>    1000000 loops, best of 3: 2.68 µs per loop
>    1000000 loops, best of 3: 1.45 µs per loop
>    1000000 loops, best of 3: 1.2 µs per loop
>    1000000 loops, best of 3: 1.56 µs per loop
>
> Which is totally unacceptable indeed.

I'm very happy you can replicate this.

> Ironically, extension types are far less affected, if at all:
>
> Without:
>
>    sage: timeit('ZZ._list',number=10^6)
>    sage: timeit('ZZ.gens',number=10^6)
>    sage: timeit('ZZ.an_element',number=10^6)
>    1000000 loops, best of 3: 312 ns per loop
>    1000000 loops, best of 3: 170 ns per loop
>    1000000 loops, best of 3: 391 ns per loop
>
> With:
>
>    sage: timeit('ZZ._list',number=10^6)
>    sage: timeit('ZZ.gens',number=10^6)
>    sage: timeit('ZZ.an_element',number=10^6)
>    1000000 loops, best of 3: 318 ns per loop
>    1000000 loops, best of 3: 180 ns per loop
>    1000000 loops, best of 3: 391 ns per loop

I think the impact depends a lot on how many python def'd methods the
class has.    E.g., I was testing with the huge number field classes,
and the slowdown is much greater than with testing with a naked class
that just derives from Element.   I did not notice any real difference
between otherwise identical Cython and Python classes.

>
>
> We had done quite some benchmarking with Robert (see e.g. the IRC log
> around mid-January). But somehow we let this slip through: Python's
> doc for __getattr__ states:
>
>    «Note that if the attribute is found through the normal mechanism,
>    __getattr__() is not called.»
>
> We checked that indeed __getattr__ was only called when the attribute
> is not found. And we ran benchmark to make sure that there was no
> overhead for extension types. Our mistake was to assume that there
> would be no overhead either for Python classes. Ouch! Mea culpa.
>
>
>> This is a major problem.  The best solution I can think of at
>> present is to remove these methods from the top level of Sage, then
>> add objects like Parent_cat and Element_cat, and put them there.
>> Then change all the combinat objects to derive from those classes.
>> I'm not sure this would work though.
>
> I see three options:
>
>  1 Simply revert #7921. Most of the combinat code is plain Python, so
>   won't be affected. What will break are access to generic methods
>   and generic TestSuite's for extension types. This will break some
>   of the recently added tests, but hopefully not too many
>   (e.g. TestSuite calls, or Rob's tests for multiplication tables,
>   things around permutation groups, ...)
>
>  2 Since only extension types need the getattr hack, make sure that
>   __getattr__ is only defined for those. I would love it. Anyone a
>   good idea on how to achieve this? The problem is that all parents
>   and elements classes derive from Parent and Element, which are the
>   natural places to put the getattr hack.

I'm dubious that this is not an issue for Cython classes.

>  3 Implement partial multiple inheritance in Cython
>
> We discussed Option 3 quite some with Robert during the categories
> brainstorm in Nancy (October 2008). As far as I remember from the
> discussions, full multiple inheritance (including merging of data
> structure and method lookup table) was impossible; on the other hand,
> there was no technical obstructions for allowing a Cython class to
> inherit from multiple purely abstract classes besides the main base
> class branch, if one accepts the price of a virtual lookup for the
> methods defined there. Essentially, this would boil down to
> implementing a variant of the getattr trick, but within the Cython
> compiler, and with full control so as to make sure that standard
> attribute and method lookups would not be affected.

That sounds like implementing what is called "categories" in ObjectiveC.
ObjectiveC is an extension to C that was done in the late 1980s to add features
like one has in Smalltalk.   It's built on C structs, like Cython is,
and does *not*
have multiple inheritance.  Categories make up for this.

> Robert: can you confirm? Are you still thinking about it? If yes, do
> you have an idea of a possible timeline?
>
> Best,
>                                Nicolas
>
>
> PS: currently, the "combinat folks" are focusing on features rather
> than pure speed. We can screw up here and there, but we are taking
> great care in the design to make sure to leave full room for later
> optimizations. In particular, a lot of the "generic approach", is
> designed to concentrate all the time critical code in very few spots,
> which will be easy to locate and optimize when someone will have time
> for it (e.g. CombinatorialFreeModule).
>
>
> --
> Nicolas M. Thiéry "Isil" <nthi...@users.sf.net>
> http://Nicolas.Thiery.name/
>
> --
> You received this message because you are subscribed to the Google Groups 
> "sage-combinat-devel" group.
> To post to this group, send email to sage-combinat-de...@googlegroups.com.
> To unsubscribe from this group, send email to 
> sage-combinat-devel+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/sage-combinat-devel?hl=en.
>
>



-- 
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

-- 
You received this message because you are subscribed to the Google Groups 
"sage-combinat-devel" group.
To post to this group, send email to sage-combinat-de...@googlegroups.com.
To unsubscribe from this group, send email to 
sage-combinat-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sage-combinat-devel?hl=en.

Re: [sage-combinat-devel] Python, Sage, categories

Reply via email to