On Fri, Nov 29, 2019, at 8:02 AM, Zsolt Ero wrote:
> Thanks for the detailed explanation. I think I should explain why I
> run into this problem, as it's in line with emitting a warning on this
> use case.
> 
> 1. I wrote the query without any kind of alias in SQLAlchemy.
> 2. Received an error from PostgreSQL "subquery in FROM must have an
> alias", so I added it in both places "to be safe". One for the
> subquery, one for the join.
> 3. After this the query worked - so I confusingly thought I did the
> right thing - but became unusably slow.


right because the FROM list thing really tends towards creating cartesian 
products because of the implicit FROM thing, there is also a recipe + upcoming 
extension that will warn when this situation occurs, that's in review at 
https://gerrit.sqlalchemy.org/#/c/sqlalchemy/sqlalchemy/+/1461/ .


> 
> Then I tried to debug step by step and arrived that the adding of
> alias to join() made it into a very different query which happened to
> be slow.
> 
> I think I'm not the only user who adds aliases "everywhere" once
> receiving that alias is needed error. I thought that the j = ... is a
> subquery as it's in a select_from() line.

I'm not sure that's necessarily true as usually you'd want to be looking at the 
SQL and matching up...the ORM is less prone to this kind of thing because you 
don't deal with join() as a separate object (having this work for Core select 
is also a thing I am pursuing).


> 
> So yes, in my case adding those warnings would be very helpful,
> directing me in the right direction.

the only thing that worries me lately is that deprecationwarnings in Python 
don't emit by default now, so just curious is this in the context of a test 
suite that will enable deprecationwarning ?



> 
> 
> Zsolt
> 
> On Fri, 29 Nov 2019 at 04:50, Mike Bayer <[email protected]> wrote:
> >
> >
> >
> > On Thu, Nov 28, 2019, at 4:31 PM, Zsolt Ero wrote:
> >
> > I have the following query:
> >
> > min_values = (
> > sa.select(
> > [
> > table.c.region_id,
> > table.c.operator,
> > table.c.day_str,
> > sa.func.min(table.c.processed_time).label('min_processed_time'),
> > ]
> > )
> > .group_by(table.c.region_id, table.c.operator, table.c.day_str)
> > .order_by(table.c.region_id, table.c.operator, table.c.day_str)
> > ).alias('mv')
> >
> > j = min_values.join(
> > config_mds,
> > sa.and_(
> > min_values.c.region_id == config_mds.c.region_id,
> > min_values.c.operator == config_mds.c.operator,
> > ),
> > ).alias('j')
> >
> > stmt = (
> > sa.select([min_values.c.region_id, min_values.c.operator, 
> > min_values.c.day_str])
> > .select_from(j)
> > .where(min_values.c.min_processed_time < config_mds.c.last_reprocess_time)
> > )
> >
> >
> > Which works well, as long as I do not add the .alias('j') part. If I add 
> > it, it generates a wildly different SQL code, which doesn't finish in 
> > reasonable time (I terminated it after a few minutes whereas the first one 
> > finishes in a few seconds.
> >
> >
> > sure, the "j" is shorthand for calling a SELECT of the join, then aliasing 
> > it. you can no longer refer to 'min_values" in the subsequent query because 
> > that means you want to add it to the FROM clause separately from the "j" 
> > and it will go poorly.
> >
> >
> >
> > Here is the generated SQL without alias('j'):
> >
> > SELECT
> > mv.region_id,
> > mv.operator,
> > mv.day_str
> > FROM
> > (
> > SELECT
> > status_changes.region_id AS region_id,
> > status_changes.operator AS OPERATOR,
> > status_changes.day_str AS day_str,
> > min(status_changes.processed_time) AS min_processed_time
> > FROM
> > status_changes
> > GROUP BY
> > status_changes.region_id,
> > status_changes.operator,
> > status_changes.day_str
> > ORDER BY
> > status_changes.region_id,
> > status_changes.operator,
> > status_changes.day_str
> > ) AS mv
> > JOIN config_mds ON mv.region_id = config_mds.region_id
> > AND mv.operator = config_mds.operator
> > WHERE
> > config_mds.last_reprocess_time > mv.min_processed_time
> >
> >
> > And here is with the added alias('j'), which doesn't terminates:
> >
> > SELECT
> > mv.region_id,
> > mv.operator,
> > mv.day_str
> > FROM
> > (
> > SELECT
> > status_changes.region_id AS region_id,
> > status_changes.operator AS OPERATOR,
> > status_changes.day_str AS day_str,
> > min(status_changes.processed_time) AS min_processed_time
> > FROM
> > status_changes
> > GROUP BY
> > status_changes.region_id,
> > status_changes.operator,
> > status_changes.day_str
> > ORDER BY
> > status_changes.region_id,
> > status_changes.operator,
> > status_changes.day_str
> > ) AS mv,
> > config_mds,
> > (
> > SELECT
> > mv.region_id AS mv_region_id,
> > mv.operator AS mv_operator,
> > mv.day_str AS mv_day_str,
> > mv.min_processed_time AS mv_min_processed_time,
> > config_mds.region_id AS config_mds_region_id,
> > config_mds.operator AS config_mds_operator,
> > config_mds.disable_download AS config_mds_disable_download,
> > config_mds.trips_url AS config_mds_trips_url,
> > config_mds.status_changes_url AS config_mds_status_changes_url,
> > config_mds.url_headers AS config_mds_url_headers,
> > config_mds.login_data AS config_mds_login_data,
> > config_mds.use_different_region AS config_mds_use_different_region,
> > config_mds.comment AS config_mds_comment,
> > config_mds.last_reprocess_time AS config_mds_last_reprocess_time,
> > config_mds.token_cache AS config_mds_token_cache
> > FROM
> > (
> > SELECT
> > status_changes.region_id AS region_id,
> > status_changes.operator AS OPERATOR,
> > status_changes.day_str AS day_str,
> > min(status_changes.processed_time) AS min_processed_time
> > FROM
> > status_changes
> > GROUP BY
> > status_changes.region_id,
> > status_changes.operator,
> > status_changes.day_str
> > ORDER BY
> > status_changes.region_id,
> > status_changes.operator,
> > status_changes.day_str
> > ) AS mv
> > JOIN config_mds ON mv.region_id = config_mds.region_id
> > AND mv.operator = config_mds.operator
> > ) AS j
> > WHERE
> > config_mds.last_reprocess_time > mv.min_processed_time;
> >
> >
> >
> > This behaviour confuses me a bit. Why would adding an alias make a 
> > difference to the generated query?
> >
> >
> >
> > you have:
> >
> > min_values = "(SELECT .... FROM ... ) AS mv"
> >
> > select([min_values.c.x]) = "SELECT mv.x FROM (SELECT ... FROM ..) AS mv"
> >
> >
> > then you say, "j = min_values.join(config_mds, ...).alias("j")"
> >
> > this is the same as saying:
> >
> > j = select([min_values.join(config_mds, ...)]).alias("j")
> >
> > an "alias" is only against a SELECT or a table; there is no "SELECT * FROM 
> > (x JOIN y) AS j" syntax. Did a quick test and apparently Postgresql allows 
> > that syntax but it fails on MySQL. SQLite seems to support it but that's a 
> > newer thing as SQLite didn't support nested JOINs at all some years ago. 
> > Adding the "AS <name>" to the plain parenthesies like that It's not 
> > something I've seen in SQL before and SQLAlchemy has no concept of that 
> > right now.
> >
> > so given that join().alias() is shorthand for select([join()]).alias(), 
> > this means:
> >
> > select([j.c.x]) = "SELECT j.x FROM (SELECT mv.*, config_mds.* FROM (SELECT 
> > ... FROM ..) AS mv JOIN config_mds ON ...) AS j"
> >
> > In SQLAlchemy 1.4 there's been a big scale-back of how things like alias() 
> > and join() can be used in order to decrease this kind of confusion, and I 
> > was sort of hoping this particular use was also deprecated but apparently 
> > it isn't. so https://github.com/sqlalchemy/sqlalchemy/issues/5010 is added 
> > and the join().alias() usage will emit a deprecation warning in 1.4 because 
> > it does a thing that by today's standards is surprising.
> >
> > anyway, if you mix the join() and the join().alias() together, like:
> >
> > select([min_values.c.x]).select_from(j)...
> >
> > this is like:
> >
> > "SELECT mv.x FROM (SELECT ... FROM ..) AS mv, (SELECT mv.*, config_mds.* 
> > FROM (SELECT ... FROM ..) AS mv JOIN config_mds ON ...) AS j"
> >
> > that is, saying select_from(j) doesn't mean that's the only thing you are 
> > selecting from, it also has to select from min_values directly.
> >
> > TLDR join().alias() is doing a large implicit thing based on older 
> > assumptions, when parenthesized "(a JOIN b)" clauses were not accepted by 
> > SQLite and possibly others and SQLAlchemy always used SELECT... in order to 
> > group joins. you can group joins in parentheses now but you still can't 
> > give them an alias() without a SELECT implied.
> >
> >
> >
> >
> >
> > --
> > SQLAlchemy -
> > The Python SQL Toolkit and Object Relational Mapper
> >
> > http://www.sqlalchemy.org/
> >
> > To post example code, please provide an MCVE: Minimal, Complete, and 
> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> > description.
> > ---
> > You received this message because you are subscribed to the Google Groups 
> > "sqlalchemy" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to [email protected].
> > To view this discussion on the web visit 
> > https://groups.google.com/d/msgid/sqlalchemy/bbf11796-b9ed-4b6f-9188-c35585dfc01d%40googlegroups.com.
> >
> >
> > --
> > SQLAlchemy -
> > The Python SQL Toolkit and Object Relational Mapper
> >
> > http://www.sqlalchemy.org/
> >
> > To post example code, please provide an MCVE: Minimal, Complete, and 
> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> > description.
> > ---
> > You received this message because you are subscribed to a topic in the 
> > Google Groups "sqlalchemy" group.
> > To unsubscribe from this topic, visit 
> > https://groups.google.com/d/topic/sqlalchemy/nPKcBGWWY3s/unsubscribe.
> > To unsubscribe from this group and all its topics, send an email to 
> > [email protected].
> > To view this discussion on the web visit 
> > https://groups.google.com/d/msgid/sqlalchemy/f26cf2af-cb16-4488-b75b-baca3c457e00%40www.fastmail.com.
> 
> -- 
> SQLAlchemy - 
> The Python SQL Toolkit and Object Relational Mapper
> 
> http://www.sqlalchemy.org/
> 
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> --- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/CAKw-smA-O2CPJwTys6r2KE0fFV%2BdnNL4jEgnB66VcGbJKO%2BObw%40mail.gmail.com.
> 

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/bdaf9ba3-8421-43e9-a86c-01f5a248bc72%40www.fastmail.com.

Reply via email to