Re: [sqlalchemy] Adding additional filters to non-declarative entities in passed around query

Mike Bayer Mon, 06 Nov 2017 15:32:28 -0800

On Mon, Nov 6, 2017 at 5:19 PM, Jacob Magnusson <[email protected]> wrote:


> Hi,
>
> I have this function which returns a list of product variants and the
> first "seasons" they were introduced in (might be multiple on the same
> date, so I'm ranking by from_date in a CTE):
>
> *In [190]: def variants_first_seasons_query():*
> *     ...:     ranking_query = (*
> *     ...:         db.session.query(*
> *     ...:             db.ProductVariant.id.label('productvariant_id'),*
> *     ...:             db.ProductSeason.id.label('productseason_id'),*
> *     ...:             db.sqla.func.rank()*
> *     ...:             .over(*
> *     ...:                 partition_by=db.ProductVariant.id
> <http://db.ProductVariant.id>,*
> *     ...:                 order_by=db.ProductSeason.from_date*
> *     ...:             )*
> *     ...:             .label('ranking')*
> *     ...:         )*
> *     ...:         .select_from(db.ProductSeason)*
> *     ...:         .join(db.ProductSeason.skus)*
> *     ...:         .join(db.ProductSKU.variant)*
> *     ...:         .order_by(db.ProductVariant.id
> <http://db.ProductVariant.id>)*
> *     ...:         .distinct()*
> *     ...:     )*
> *     ...:     ranking_query_cte = db.sqla.aliased(ranking_query.cte(),
> name='ranking_query')*
> *     ...:     return (*
> *     ...:         db.session.query(*
> *     ...:             ranking_query_cte.c.productvariant_id,*
> *     ...:             ranking_query_cte.c.productseason_id*
> *     ...:         )*
> *     ...:         .select_from(ranking_query_cte)*
> *     ...:         .filter(ranking_query_cte.c.ranking == 1)*
> *     ...:     )*
> *     ...:*
>
> *In [191]: q = variants_first_seasons_query()*
>
> *In [192]: print(q)*
> *WITH anon_1 AS*
> *(SELECT DISTINCT productvariant.id <http://productvariant.id> AS
> productvariant_id, productseason.id <http://productseason.id> AS
> productseason_id, rank() OVER (PARTITION BY productvariant.id
> <http://productvariant.id> ORDER BY productseason.from_date) AS ranking*
> *FROM productseason JOIN productseason_productsku AS
> productseason_productsku_1 ON productseason.id <http://productseason.id> =
> productseason_productsku_1.productseason_id JOIN productsku ON
> productsku.id <http://productsku.id> =
> productseason_productsku_1.productsku_id JOIN productvariant ON
> productvariant.id <http://productvariant.id> = productsku.variant_id ORDER
> BY productvariant.id <http://productvariant.id>)*
> * SELECT ranking_query.productvariant_id AS
> ranking_query_productvariant_id, ranking_query.productseason_id AS
> ranking_query_productseason_id*
> *FROM anon_1 AS ranking_query*
> *WHERE ranking_query.ranking = %(ranking_1)s*
>
> Later in my code I want to further filter on the seasons that I care about
> in the outer query, but as there's no standard declarative model in the
> selectable I don't think I can pass the model's attribute in the filter,
> but instead I figure I should go through *.statement.c* of the query:
>
>
> *In [193]: SEASON_IDS = [45]*
>
> *In [194]: q = q.filter(q.statement.c.productseason_id.in_(SEASON_IDS))*
>


calling upon q.statement in the first place is an expensive process,
relatively.  A lot goes on to convert ORM-level things into a select()
statement, and there might be lots of things that make it confusing to
introspect, like eager-loaded columns, though this query() is against
individual columns in the first place.    but still, pulling out
q.statement isn't going to get you what you want.

The ".c." collection of select() are the *exported* columns of the
statement:

stmt = "SELECT a, b, c FROM table"

If I were to say, select([stmt.c.a]), that means:

SELECT a FROM (SELECT a, b, c FROM table)

similarly, stmt.where(stmt.c.a == 5) would mean:

SELECT a, b, c FROM table, (SELECT a, b, c FROM TABLE) WHERE a = 5

Because "stmt.c.a" means "SELECT from the SELECT".

What you want here is the *original* "a":

stmt = stmt.where(table.c.a == 5)

The select() has these columns inside of an iterator called
".inner_columns" but it isn't name-addressable, it's an iterator.   You
need to find another way to keep track of the thing that you're SELECTing
from in order to add limiting criteria to it.

Neither select() nor query() have a namespace of things in the columns
clause, because you can do WHERE, ORDER BY, GROUP BY etc. on any number of
columns that are present in any of the FROM clauses, not just things in the
columns clause.  There's no way to make a generic namespace of everything
that is in the FROM clause except to use the original object that generated
the namespace.    If I had say:

t1 = table('t1', column('a'), column('b'))
t2 = table('t2', column('a'), column('b'))

then a statement:

stmt = select([t1.c.a, t2.c.a])

how would I add "WHERE t1.b = 5 AND t2.c = 10" to that, without referring
to t1 and t2 directly?

*** below is hypothetical feature, does not exist ***

There would need to be some namespace like this:

   stmt.inner.t2.c.b

which OK, we used the table name "t2" as the namespace, but then what if I
had:

  a1 = t1.alias()
  a2 = t1.alias()
  stmt = select([a1.c.a, a2.c.a]).where(a1.c.a > a2.c.a)

those are anonymously-named aliases.   I suppose if we said, well suppose
that we have to give them names:

  a1 = t1.alias()
  a2 = t1.alias('a2')
  stmt = select([a1.c.a, a2.c.a]).where(a1.c.a > a2.c.a)

so above we can get to stmt.inner.a2.c.b but not "a1" unless we added a
name.  Something to think about, I guess.

*** end hypothetical feature ***





>
> *In [195]: print(q)*
> *WITH anon_1 AS*
> *(SELECT DISTINCT productvariant.id <http://productvariant.id> AS
> productvariant_id, productseason.id <http://productseason.id> AS
> productseason_id, rank() OVER (PARTITION BY productvariant.id
> <http://productvariant.id> ORDER BY productseason.from_date) AS ranking*
> *FROM productseason JOIN productseason_productsku AS
> productseason_productsku_1 ON productseason.id <http://productseason.id> =
> productseason_productsku_1.productseason_id JOIN productsku ON
> productsku.id <http://productsku.id> =
> productseason_productsku_1.productsku_id JOIN productvariant ON
> productvariant.id <http://productvariant.id> = productsku.variant_id ORDER
> BY productvariant.id <http://productvariant.id>)*
> * SELECT ranking_query.productvariant_id AS
> ranking_query_productvariant_id, ranking_query.productseason_id AS
> ranking_query_productseason_id*
> *FROM anon_1 AS ranking_query, (SELECT ranking_query.productvariant_id AS
> productvariant_id, ranking_query.productseason_id AS productseason_id*
> *FROM ranking_query*
> *WHERE ranking_query.ranking = %(ranking_1)s)*
> *WHERE ranking_query.ranking = %(ranking_1)s AND productseason_id IN
> (%(productseason_id_1)s)*
>
> *In [196]: q.count()*
> *...*
> *ProgrammingError: (psycopg2.ProgrammingError) subquery in FROM must have
> an alias*
> *LINE 5: FROM anon_1 AS ranking_query, (SELECT ranking_query.productv...*
> *...*
>
> No success (as you can see the red part above is the reason).
>
> However I have found a workaround which produces exactly the query I'm
> looking for:
>
> *In [197]: db.session.rollback()*
>
> *In [198]: q = variants_first_seasons_query()*
>
> *In [199]: col = [c['expr'] for c in q.column_descriptions if
> c['expr'].name == 'productseason_id'][0*
> *     ...: ]*
>
> *In [200]: q = q.filter(col.in_(SEASON_IDS))*
>



In the example as given it looks like you'd want to use
ranking_query_cte.c.productseason_id as the thing you are filter()ing on.
So that's one way to get at that column, sure.    But I would just break
the function up into two:

ranking = get_ranking_first_seasons_query()
q =
variants_first_seasons_query(ranking).filter(ranking.c.productseason_id.in_(SEASON_IDS))

or just return both:

q, ranking = variants_first_seasons_query()
q = q.filter(ranking.c.productseason_id.in_(SEASON_IDS)

or build that use case into the function (this is better encapsulation):

q = variants_first_seasons_query(limit_productseason_id = SEASON_IDS)

or build it in with a lambda:

q = variants_first_seasons_query(lambda q, ranking:
q.filter(ranking.c.productseason_id.in_(SEASON_IDS))

all of the above ways assume less about the structure of "q" internally and
allow it to be less coupled to the filter you're calling after the fact.
Adding keyword arguments with known modifier use cases is the easiest to
work with because knowledge about that structure is kept local.





>
> *In [201]: print(q)*
> *WITH anon_1 AS*
> *(SELECT DISTINCT productvariant.id <http://productvariant.id> AS
> productvariant_id, productseason.id <http://productseason.id> AS
> productseason_id, rank() OVER (PARTITION BY productvariant.id
> <http://productvariant.id> ORDER BY productseason.from_date) AS ranking*
> *FROM productseason JOIN productseason_productsku AS
> productseason_productsku_1 ON productseason.id <http://productseason.id> =
> productseason_productsku_1.productseason_id JOIN productsku ON
> productsku.id <http://productsku.id> =
> productseason_productsku_1.productsku_id JOIN productvariant ON
> productvariant.id <http://productvariant.id> = productsku.variant_id ORDER
> BY productvariant.id <http://productvariant.id>)*
> * SELECT ranking_query.productvariant_id AS
> ranking_query_productvariant_id, ranking_query.productseason_id AS
> ranking_query_productseason_id*
> *FROM anon_1 AS ranking_query*
> *WHERE ranking_query.ranking = %(ranking_1)s AND
> ranking_query.productseason_id IN (%(productseason_id_1)s)*
>
> *In [202]: q.count()*
> *Out[202]: 165*
>
> Now my question is, how can accomplish this cleanly, without going through
> *.column_descriptions*?
>
> Any suggestions would be greatly appreciated ❤
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Adding additional filters to non-declarative entities in passed around query

Reply via email to