Janzert wrote: > Christoph Zwerschke wrote: >> Janzert schrieb: >>> My understanding and testing seem to show that an inner join doesn't >>> return duplicates. Wrapping the result in a set doesn't seem necessary. >> The query creates duplicates, and they are pulled from the database, but >> they are removed by SQLAlchemy because you are querying object instances >> and it makes no sense for SQLAlchemy to deliver duplicates in this case. > > Ah yes, by running the generated sql directly I see that now. I'll try > and see if there is a way to generate the exists clause that is > compatible with 0.3 on up. > > Janzert >
Before working on getting a SA 0.3 compatible version I decided to take a look at the speed differences. I initialized the database with a user that belonged to 20 groups with each group having 11 out of 60 permissions. The timing below were done with SA 0.5 and postgresql 8.3.1. I also tried sqlite and the timings were basically the same. python -m timeit -s"import perm_test;perm_test.iter()" "perm_test.iter()" 10000 loops, best of 3: 94.9 usec per loop python -m timeit -s"import perm_test;perm_test.join()" "perm_test.join()" 100 loops, best of 3: 13.1 msec per loop python -m timeit -s"import perm_test;perm_test.exist()" "perm_test.exist()" 100 loops, best of 3: 14.7 msec per loop As you can see the current method of iterating through the groups is actually 130 to 150 times faster than single query methods. I've attached my test project and scripts to the trac issue* if someone wants to double check my methods. Just now I also tried a test with 500 groups, 800 permissions and 250 permissions per group, for what should be a torture test to the current method. python -m timeit -s"import perm_test;perm_test.iter()" "perm_test.iter()" 100 loops, best of 3: 16.9 msec per loop python -m timeit -s"import perm_test;perm_test.join()" "perm_test.join()" 10 loops, best of 3: 808 msec per loop python -m timeit -s"import perm_test;perm_test.exist()" "perm_test.exist()" 10 loops, best of 3: 256 msec per loop While a little closer still the current method is much better. As a side note while watching the process usage during these test, the iterative methods has most of the cpu usage in python, the join query mostly in postgres and the exist query is split about evenly. Janzert * http://trac.turbogears.org/ticket/1879 --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "TurboGears" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears?hl=en -~----------~----~----~----~------~----~------~--~---

