Re: [sqlalchemy] Can python objects be used in join logic?

2011-04-28 Thread Michael Bayer

On Apr 28, 2011, at 2:24 PM, Jennifer Rodriguez-Mueller wrote:

 
 Here is the core SQL (lots of distracting content trimmed):
 
 SELECT s.*, a.*
   FROM aliquot_table a
 JOIN ( -- Biologically (p)lausible aliquot/specimen type logic
   SELECT  3 AS a_type, 12 AS s_type UNION  -- Serum/Serum
   SELECT 15 AS a_type,  7 AS s_type UNION  -- Gut Biopsy/Gut Biopsy
   SELECT  7 AS a_type,  5 AS s_type UNION  -- Saliva/Saliva
   SELECT 23 AS a_type, 13 AS s_type-- NO_CODE/Swab
   ) p ON a.a_type = p.a_type
 JOIN specimen_table s ON (s.random_id = a.random_id AND s.s_type = 
 p.s_type)
   WHERE a.test_id = 12345
 
 Here is the (assumed) equivalent python:
 
 def getSpecimenAliquots(fromDB):
 from sqlalchemy.ext.sqlsoup import SqlSoup
 plausibleTypes = set([( 3,12) #Serum/Serum
  ,(15, 7) #Gut Biopsy/Gut Biopsy
  ,( 7, 5) #Saliva/Saliva
  ,(23,13) #NO_CODE/Swab
  ])
 soupDB = SqlSoup(fromDB)
 aliq = soupDB.aliquot_table
 spec = soupDB.specimen_table
 join1 = soupDB.join(aliq, spec,
 aliq.random_id == spec.random_id and (aliq.a_type, spec.s_type) in 
 plausibleTypes,
 isouter=False)
 join1 = join1.filter_by(aliq.test_id=12345)
 return join1.all()
 
 The code works in SQL, but when I try to translate it into sqlalchemy's 
 SqlSoup the number of rows is different (specifically the plausibility logic 
 is being silently ignored for some reason because there are 3 MappedJoin 
 objects instead of 1 row from the SQL and the two of the MappedObjects have 
 incompatible specimen/aliquot types).  
 I'm wondering if I expected too much integration between python and SQL when 
 I used a python set object to drive the join?  When I imagine what sqlalchemy 
 should ideally do with something like this, it would be to understand how 
 python types and comparisons over them correspond to database table/index 
 idioms and (perhaps?) create a temporary table in the DB with optimized 
 indexes based on the content of the set, write logically equivalent SQL to 
 use that temp table, get the results, and then clean up after itself.   This 
 would be much better than my SQL, which is logically sound but not 
 efficiently implemented, because (after all) it is only exploratory SQL, not 
 a real programming language.  

Well there's a leap here, which is that the original SQL uses UNION.   The 
assumed Python doesn't make any mention of unions.There are also several 
elements here of Python that go above what the language can offer us in terms 
of operator overloading (that is, SQLAlchemy overloads things like ==, !=, but 
can only do this so much) - the Python and operator can't be intercepted (use 
and_(), or alternatively ), the (x, y) in set() can't be intercepted 
('tuple' and 'set' are both Python builtins that aren't modified).

As a rule, when building things up you can always test individual expressions 
by calling str() on them.Such as, if you were to say str(aliq.a_type == 5) 
you'd get a SQL expression.   If you call it on aliq.a_type == 5 and 
spec.s_type==7 you'll get a string like true or false since the and 
operator isn't overridden, and the boundary of SQLAlchemy expression generation 
is crossed into plain Python.Turning on SQL logging is also a good habit 
when building up queries to see what is being emitted as well as what's being 
returned (the docs describe how to do this).

The (x, y) in set() is not something that has a universally supported SQL 
representation.   Postgresql and MySQL for example could state an expression 
like that as (x, y, z) IN ((1, 2, 3), (4, 5, 6), ...), but SQLite and others 
cannot.   The approach you have uses UNIONs which is more verbose but probably 
better supported.   It doesn't seem like it would present a major inefficiency, 
if you wanted to note the difference between using temp tables and the inline 
expression you'd work with the databases EXPLAIN services, i.e. features that 
illustrate the query plan.

The best approach when translating from literal SQL is to build up as much of 
that exact statement as possible (which is typically the whole thing).Your 
table of unions can be achieved as:

u = union(*[
select([
literal_column(str(tup[0])).label(a_type),
literal_column(str(tup[1])).label(s_type)
])
for tup in plausibleTypes
]).alias()

'u' is then a selectable that represents the thing you have as the second 
element of your join - its columns are available as u.c.name.  The join 
can then be constructed exactly as the SQL version is, substituting prefix 
conjunctions instead of infix (i.e. and_(x, y) instead of x and y):

soupDB.join(aliq, u, u.c.a_type==aliq.a_type).join(spec, 
and_(spec.random_id==aliq.random_id, spec.s_type==u.c.s_type))

 
 How am I confused given that I'm surprised by the different number of 
 rows/objects? 

well you should 

Re: [sqlalchemy] Can python objects be used in join logic?

2011-04-28 Thread Jennifer Rodriguez-Mueller
Thank you Michael, that makes things *much* clearer!

I had been thinking of sqlalchemy as a database manipulation system that
took inspiration from my source code, rather than thinking of a library that
uses compiled but introspectively accessible bytecode with variable names
preserved, so that python's introspective awareness of attempted runtime
attribute access can be directed to drive the logic of database access.

If I understand correctly:

1. I saw no errors because sqlalchemy never even saw the code that I
thought I was passing it (which python fully evaluated before sqlalchemy
ever touched it).

2. This is why functions like sqlalchemy.and_(), sqlalchemy.or_(), and
sqlalchemy.not_() are necessary... they are fundamentally for delaying
evaluation of statements until everything can be evaluated at the same time.

3. This is also why there are no *python* methods or attributes
corresponding to columns when you try to inspect a table object with
dir(selectable_table_instance).  If there were python attributes for columns
that dir() could see, then references to those column/attributes would be
resolved by python before sqlalchemy could do anything clever.

In any case, I re-wrote my UNION logic with or_()'s connecting subclauses
made of and_()'s and everything worked properly as far as join logic goes.
I'll probably come back to this email in the future to make sure I remember
all the rest of your advice about str(), union(), and so on.

Also, for anyone dealing with a similar task in the future, I had a related
problem with SqlSoup table instances that had identical column names.  I was
seeing warnings like this:

/Library/Python/2.6/site-packages/SQLAlchemy-0.6.7-py2.6.egg/sqlalchemy/orm/mapper.py:585:
SAWarning: Implicitly combining column aliquot_table.id with column
specimen_table.id under attribute 'id'.  This usage will be prohibited in
0.7.  Please configure one or more attributes for these same-named columns
explicitly.

To make the warnings go away I ended up spelling out how to resolve column
ambiguities (possibly a violation of DRY?) by repeating the content of one
of the tables I was working with, with slight variations in the names.  The
final working code looked like this:

def getSpecimenAliquots(dbStr):
from sqlalchemy.ext.sqlsoup import SqlSoup
from sqlalchemy import select, and_, or_
soupDB = SqlSoup(dbStr)
aliq = soupDB.aliquot_table
spec = soupDB.specimen_table
spec = select([
# Fix labels that clash with aliquot column names...
 spec.c.id.label(s_id)
,spec.c.random_id.label(s_random_id)
# Things to pass through with their own names...
,spec.c.study_code
,spec.c.patient_id
,spec.c.s_type
]).correlate(None).alias()
join1 = soupDB.join(aliq,spec,
and_(aliq.random_id == spec.c.s_random_id,
or_(and_(aliq.a_type ==   3, spec.c.s_type ==  12), #Serum/Serum
or_(and_(aliq.a_type ==  15, spec.c.s_type ==   7), #Gut/Gut
or_(and_(aliq.a_type ==   7, spec.c.s_type ==   5),
#Saliva/Saliva
   (and_(aliq.a_type ==  23, spec.c.s_type ==  13)
#NO_CODE/Swab
   ),
isouter=False)
join1 = join1.filter_by(test_id=12345)
return join1

Hopefully that helps someone in the future who is dealing with SqlSoup for
the first time and finds this email.  Thank you again for the help, Michael
:-)

-Jennifer

On Thu, Apr 28, 2011 at 1:15 PM, Michael Bayer mike...@zzzcomputing.comwrote:


 On Apr 28, 2011, at 2:24 PM, Jennifer Rodriguez-Mueller wrote:


 Here is the core SQL (lots of distracting content trimmed):

 SELECT s.*, a.*
   FROM aliquot_table a
 JOIN ( -- Biologically (p)lausible aliquot/specimen type logic
   SELECT  3 AS a_type, 12 AS s_type UNION  -- Serum/Serum
   SELECT 15 AS a_type,  7 AS s_type UNION  -- Gut Biopsy/Gut Biopsy
   SELECT  7 AS a_type,  5 AS s_type UNION  -- Saliva/Saliva
   SELECT 23 AS a_type, 13 AS s_type-- NO_CODE/Swab
   ) p ON a.a_type = p.a_type
 JOIN specimen_table s ON (s.random_id = a.random_id AND s.s_type =
 p.s_type)
   WHERE a.test_id = 12345

 Here is the (assumed) equivalent python:

 def getSpecimenAliquots(fromDB):
 from sqlalchemy.ext.sqlsoup import SqlSoup
 plausibleTypes = set([( 3,12) #Serum/Serum
  ,(15, 7) #Gut Biopsy/Gut Biopsy
  ,( 7, 5) #Saliva/Saliva
  ,(23,13) #NO_CODE/Swab
  ])
 soupDB = SqlSoup(fromDB)
 aliq = soupDB.aliquot_table
 spec = soupDB.specimen_table
 join1 = soupDB.join(aliq, spec,
 aliq.random_id == spec.random_id and (aliq.a_type, spec.s_type) in
 plausibleTypes,
 isouter=False)
 join1 = join1.filter_by(aliq.test_id=12345)
 return join1.all()

 The code works in SQL, but when I try to translate it into sqlalchemy's
 SqlSoup the number of rows