Janzert wrote:
> Christoph Zwerschke wrote:
>> Janzert schrieb:
>>> My understanding and testing seem to show that an inner join doesn't
>>> return duplicates. Wrapping the result in a set doesn't seem necessary.
>> The query creates duplicates, and they are pulled from the database, but 
>> they are removed by SQLAlchemy because you are querying object instances 
>> and it makes no sense for SQLAlchemy to deliver duplicates in this case.
> 
> Ah yes, by running the generated sql directly I see that now. I'll try
> and see if there is a way to generate the exists clause that is
> compatible with 0.3 on up.
> 
> Janzert
> 

Before working on getting a SA 0.3 compatible version I decided to take 
a look at the speed differences. I initialized the database with a user 
that belonged to 20 groups with each group having 11 out of 60 
permissions. The timing below were done with SA 0.5 and postgresql 
8.3.1. I also tried sqlite and the timings were basically the same.

python -m timeit -s"import perm_test;perm_test.iter()" 
"perm_test.iter()" 10000 loops, best of 3: 94.9 usec per loop

python -m timeit -s"import perm_test;perm_test.join()" 
"perm_test.join()" 100 loops, best of 3: 13.1 msec per loop

python -m timeit -s"import perm_test;perm_test.exist()" 
"perm_test.exist()" 100 loops, best of 3: 14.7 msec per loop

As you can see the current method of iterating through the groups is 
actually 130 to 150 times faster than single query methods.

I've attached my test project and scripts to the trac issue* if someone 
wants to double check my methods.

Just now I also tried a test with 500 groups, 800 permissions and 250 
permissions per group, for what should be a torture test to the current 
method.

python -m timeit -s"import perm_test;perm_test.iter()" "perm_test.iter()"
100 loops, best of 3: 16.9 msec per loop

python -m timeit -s"import perm_test;perm_test.join()" "perm_test.join()"
10 loops, best of 3: 808 msec per loop

python -m timeit -s"import perm_test;perm_test.exist()" "perm_test.exist()"
10 loops, best of 3: 256 msec per loop

While a little closer still the current method is much better. As a side 
note while watching the process usage during these test, the iterative 
methods has most of the cpu usage in python, the join query mostly in 
postgres and the exist query is split about evenly.

Janzert

* http://trac.turbogears.org/ticket/1879

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"TurboGears" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/turbogears?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to