[sqlalchemy] Re: ORM mapping with Elixir compared to raw cursor query

SinJax Fri, 21 Nov 2008 02:42:35 -0800

I had a feeling the bulk nature of the query would confuse the matter
so i have performed some more tests with a slightly more restricted
query with similarly poor results.


I have not posted much on mail lists much so if it is inappropriate to
post code i apologise.
Here is my code:

----------------------------------------------------------------------------------------
import time

from elixir import *
from sqlalchemy import *

metadata.bind = 'mysql://some:[EMAIL PROTECTED]/SomeTable'
metadata.echo = True
users_groups = Table('usergroup', metadata, autoload=True)
class Group(Entity):
        using_options(tablename='groups', autoload=True)
        users = ManyToMany('User',tablename="usergroup")

class User(Entity):
        using_options(tablename='users', autoload=True)
        using_mapper_options(column_prefix="py_")
        groups = ManyToMany('Group',tablename="usergroup",lazy=False)

class Video(Entity):
        using_options(tablename='videos', autoload=True)
        #subject = ManyToMany('Subject',tablename="subjectvideo",lazy=False)

class Subject(Entity):
        using_options(tablename='subjects', autoload=True)
        #videos = ManyToMany('Video',tablename="subjectvideo",lazy=False)

class Field(Entity):
        using_options(tablename='fields', autoload=True)

class Value(Entity):
        using_options(tablename='values', autoload=True)

class Annotation(Entity):
        using_options(tablename='annotations', autoload=True)
        user = ManyToOne('User',lazy=False)
        subject = ManyToOne('Subject',lazy=False)
        field = ManyToOne('Field',lazy=False)
        value = ManyToOne('Value',lazy=False)

setup_all()
allAnn = Annotation.query().filter(User.py_user == "msn").filter
(Subject.id == 2 )
print allAnn
import MySQLdb
import MySQLdb.cursors

db = MySQLdb.connect(
        passwd="login",
        user="some",
        db="someTable",
        host="someServer",
        cursorclass=MySQLdb.cursors.DictCursor
)

c = db.cursor()
t1 = time.time()
sql = """ SELECT fields.name,values.name,users.user,subjects.leg_id
FROM annotations
        LEFT JOIN `fields` ON annotations.field = fields.id
        LEFT JOIN `values` ON annotations.value = `values`.id
        LEFT JOIN `users` ON annotations.user = `users`.id
        LEFT JOIN `subjects` ON annotations.subject = `subjects`.id
WHERE users.user = "msn"
AND subjects.id = 2
"""
c.execute(sql)
results = c.fetchall()
holder = []

for res in results:
        holder.append(res)
t2 = time.time()
print (t2-t1)

sql = """
SELECT annotations.id AS annotations_id, annotations.user AS
annotations_user, annotations.subject AS annotations_subject,
annotations.field AS annotations_field, annotations.value AS
annotations_value, users_1.id AS users_1_id, users_1.user AS
users_1_user, users_1.pass AS users_1_pass, users_1.lab_id AS
users_1_lab_id, users_1.email AS users_1_email, groups_1.id AS
groups_1_id, groups_1.name AS groups_1_name, subjects_1.id AS
subjects_1_id, subjects_1.leg_id AS subjects_1_leg_id,
subjects_1.`displayOrder` AS `subjects_1_displayOrder`, fields_1.id AS
fields_1_id, fields_1.name AS fields_1_name, fields_1.help AS
fields_1_help, values_1.id AS values_1_id, values_1.`parentId` AS
`values_1_parentId`, values_1.field AS values_1_field, values_1.name
AS values_1_name, values_1.image AS values_1_image, values_1.`order`
AS values_1_order
FROM users, subjects, annotations
        LEFT OUTER JOIN users AS users_1 ON users_1.id = annotations.user
        LEFT OUTER JOIN usergroup AS usergroup_1 ON users_1.id =
usergroup_1.`userId`
        LEFT OUTER JOIN groups AS groups_1 ON groups_1.id =
usergroup_1.`groupId`
        LEFT OUTER JOIN subjects AS subjects_1 ON subjects_1.id =
annotations.subject
        LEFT OUTER JOIN `fields` AS fields_1 ON fields_1.id =
annotations.field
        LEFT OUTER JOIN `values` AS values_1 ON values_1.id =
annotations.value
WHERE users.user = '%s' AND subjects.id = %s
ORDER BY annotations.id, users_1.id, usergroup_1.`userId`,
subjects_1.id, fields_1.id, values_1.id
"""%("msn",2)
t1 = time.time()
c.execute(sql)
results = c.fetchall()
holder = []

for res in results:
        holder.append(res)
t2 = time.time()
print (t2-t1)
t2 = time.time()
print allAnn
# This prints:
"""

"""
t1 = time.time()
print len(allAnn.all())
t2 = time.time()
print (t2-t1)
----------------------------------------------------------------------------------------

Right! So what this code does is map a few classes to existing tables
in a database. The content of the tables should be made obvious by the
huge query SQLAlchemy generates.

The result of this query is 23 rows from a 24,000 row table.

I show a query i constructed by hand (Get every annotation of user
"msn" on subject "2") as i would run using MysqlDb and cursor. I time
this by measuring the time taken between executing the query and
putting each result in a tuple. The running time for this query is
0.05s

This is compared to running Annotation.query().filter
(User.user=="msn").filter(Subject.id=2). The query generated by SQL
alchemy can be seen in the second declaration of the sql string. For
purposes of interest i run this generated query (with fields filled in
manually) through the same MysqlDb cursor. This takes 16 seconds! A
query which returns 23 rows should NOT take 16 seconds to return.

Finally, this is compared to running Annotation.query().filter
(User.user=="msn").filter(Subject.id=2).all() which returns a bunch of
lovely SQLAlchemy mapped objects. However this process takes
approximately 26 seconds!

It should be noted that the process outlined in the code above runs on
the same 24,000 row dataset as the previous test i mentioned (Code for
which can be found here: http://www.sinjax.net/wordpress/?p=1652 ...
it was down over night my apologies). But the difference is literally
the difference between commenting out the ManyToOne definitions in the
Annotation class and removing the filter commands.

What i test here is one of the final queries im currently running in
my system quite regularly so i need it to be relatively fast.  And now
that i put it in such a bare bones version as this, i see I'm CLEARLY
doing something wrong that results in SQLAlchemy generating this query
which takes far too long to run, but i'm not sure what i have to do to
make SQLAlchemy generate something that looks more like the first
query i run which i wrote by hand, and yet gets all the information i
need.

FYI i have also tested the query i've written by hand returning ALL
the fields the SQLAlchemy query requests and also running the ORDER BY
at the end....this takes 0.3seconds. I have no idea why, but the slow
down seems to come from the join section of the generated query?

But i can't quite see why, im investigating that... just thought i'd
post more information while i did

Cheers
----------
Sina Samangooei
PhD Student
University  of Southampton
United Kingdom
[EMAIL PROTECTED]
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Re: ORM mapping with Elixir compared to raw cursor query

Reply via email to