On Apr 28, 2011, at 2:24 PM, Jennifer Rodriguez-Mueller wrote:
Here is the core SQL (lots of distracting content trimmed):
SELECT s.*, a.*
FROM aliquot_table a
JOIN ( -- Biologically (p)lausible aliquot/specimen type logic
SELECT 3 AS a_type, 12 AS s_type UNION -- Serum/Serum
SELECT 15 AS a_type, 7 AS s_type UNION -- Gut Biopsy/Gut Biopsy
SELECT 7 AS a_type, 5 AS s_type UNION -- Saliva/Saliva
SELECT 23 AS a_type, 13 AS s_type-- NO_CODE/Swab
) p ON a.a_type = p.a_type
JOIN specimen_table s ON (s.random_id = a.random_id AND s.s_type =
p.s_type)
WHERE a.test_id = 12345
Here is the (assumed) equivalent python:
def getSpecimenAliquots(fromDB):
from sqlalchemy.ext.sqlsoup import SqlSoup
plausibleTypes = set([( 3,12) #Serum/Serum
,(15, 7) #Gut Biopsy/Gut Biopsy
,( 7, 5) #Saliva/Saliva
,(23,13) #NO_CODE/Swab
])
soupDB = SqlSoup(fromDB)
aliq = soupDB.aliquot_table
spec = soupDB.specimen_table
join1 = soupDB.join(aliq, spec,
aliq.random_id == spec.random_id and (aliq.a_type, spec.s_type) in
plausibleTypes,
isouter=False)
join1 = join1.filter_by(aliq.test_id=12345)
return join1.all()
The code works in SQL, but when I try to translate it into sqlalchemy's
SqlSoup the number of rows is different (specifically the plausibility logic
is being silently ignored for some reason because there are 3 MappedJoin
objects instead of 1 row from the SQL and the two of the MappedObjects have
incompatible specimen/aliquot types).
I'm wondering if I expected too much integration between python and SQL when
I used a python set object to drive the join? When I imagine what sqlalchemy
should ideally do with something like this, it would be to understand how
python types and comparisons over them correspond to database table/index
idioms and (perhaps?) create a temporary table in the DB with optimized
indexes based on the content of the set, write logically equivalent SQL to
use that temp table, get the results, and then clean up after itself. This
would be much better than my SQL, which is logically sound but not
efficiently implemented, because (after all) it is only exploratory SQL, not
a real programming language.
Well there's a leap here, which is that the original SQL uses UNION. The
assumed Python doesn't make any mention of unions.There are also several
elements here of Python that go above what the language can offer us in terms
of operator overloading (that is, SQLAlchemy overloads things like ==, !=, but
can only do this so much) - the Python and operator can't be intercepted (use
and_(), or alternatively ), the (x, y) in set() can't be intercepted
('tuple' and 'set' are both Python builtins that aren't modified).
As a rule, when building things up you can always test individual expressions
by calling str() on them.Such as, if you were to say str(aliq.a_type == 5)
you'd get a SQL expression. If you call it on aliq.a_type == 5 and
spec.s_type==7 you'll get a string like true or false since the and
operator isn't overridden, and the boundary of SQLAlchemy expression generation
is crossed into plain Python.Turning on SQL logging is also a good habit
when building up queries to see what is being emitted as well as what's being
returned (the docs describe how to do this).
The (x, y) in set() is not something that has a universally supported SQL
representation. Postgresql and MySQL for example could state an expression
like that as (x, y, z) IN ((1, 2, 3), (4, 5, 6), ...), but SQLite and others
cannot. The approach you have uses UNIONs which is more verbose but probably
better supported. It doesn't seem like it would present a major inefficiency,
if you wanted to note the difference between using temp tables and the inline
expression you'd work with the databases EXPLAIN services, i.e. features that
illustrate the query plan.
The best approach when translating from literal SQL is to build up as much of
that exact statement as possible (which is typically the whole thing).Your
table of unions can be achieved as:
u = union(*[
select([
literal_column(str(tup[0])).label(a_type),
literal_column(str(tup[1])).label(s_type)
])
for tup in plausibleTypes
]).alias()
'u' is then a selectable that represents the thing you have as the second
element of your join - its columns are available as u.c.name. The join
can then be constructed exactly as the SQL version is, substituting prefix
conjunctions instead of infix (i.e. and_(x, y) instead of x and y):
soupDB.join(aliq, u, u.c.a_type==aliq.a_type).join(spec,
and_(spec.random_id==aliq.random_id, spec.s_type==u.c.s_type))
How am I confused given that I'm surprised by the different number of
rows/objects?
well you should