On Thu, Nov 28, 2019, at 4:31 PM, Zsolt Ero wrote:
> I have the following query:
> 
> min_values = (
>  sa.select(
>  [
>  table.c.region_id,
>  table.c.operator,
>  table.c.day_str,
>  sa.func.min(table.c.processed_time).label('min_processed_time'),
>  ]
>  )
>  .group_by(table.c.region_id, table.c.operator, table.c.day_str)
>  .order_by(table.c.region_id, table.c.operator, table.c.day_str)
>  ).alias('mv')
> 
>  j = min_values.join(
>  config_mds,
>  sa.and_(
>  min_values.c.region_id == config_mds.c.region_id,
>  min_values.c.operator == config_mds.c.operator,
>  ),
>  ).alias('j')
> 
>  stmt = (
>  sa.select([min_values.c.region_id, min_values.c.operator, 
> min_values.c.day_str])
>  .select_from(j)
>  .where(min_values.c.min_processed_time < config_mds.c.last_reprocess_time)
>  )
> 
> 
> Which works well, as long as I do not add the .alias('j') part. If I add it, 
> it generates a wildly different SQL code, which doesn't finish in reasonable 
> time (I terminated it after a few minutes whereas the first one finishes in a 
> few seconds.

sure, the "j" is shorthand for calling a SELECT of the join, then aliasing it. 
you can no longer refer to 'min_values" in the subsequent query because that 
means you want to add it to the FROM clause separately from the "j" and it will 
go poorly.


> 
> Here is the generated SQL without alias('j'):
> 
> SELECT
>  mv.region_id,
>  mv.operator,
>  mv.day_str
> FROM
>  (
>  SELECT
>  status_changes.region_id AS region_id,
>  status_changes.operator AS OPERATOR,
>  status_changes.day_str AS day_str,
>  min(status_changes.processed_time) AS min_processed_time
>  FROM
>  status_changes
>  GROUP BY
>  status_changes.region_id,
>  status_changes.operator,
>  status_changes.day_str
>  ORDER BY
>  status_changes.region_id,
>  status_changes.operator,
>  status_changes.day_str
>  ) AS mv
>  JOIN config_mds ON mv.region_id = config_mds.region_id
>  AND mv.operator = config_mds.operator
> WHERE
>  config_mds.last_reprocess_time > mv.min_processed_time
> 
> 
> And here is with the added alias('j'), which doesn't terminates:
> 
> SELECT
>  mv.region_id,
>  mv.operator,
>  mv.day_str
> FROM
>  (
>  SELECT
>  status_changes.region_id AS region_id,
>  status_changes.operator AS OPERATOR,
>  status_changes.day_str AS day_str,
>  min(status_changes.processed_time) AS min_processed_time
>  FROM
>  status_changes
>  GROUP BY
>  status_changes.region_id,
>  status_changes.operator,
>  status_changes.day_str
>  ORDER BY
>  status_changes.region_id,
>  status_changes.operator,
>  status_changes.day_str
>  ) AS mv,
>  config_mds,
>  (
>  SELECT
>  mv.region_id AS mv_region_id,
>  mv.operator AS mv_operator,
>  mv.day_str AS mv_day_str,
>  mv.min_processed_time AS mv_min_processed_time,
>  config_mds.region_id AS config_mds_region_id,
>  config_mds.operator AS config_mds_operator,
>  config_mds.disable_download AS config_mds_disable_download,
>  config_mds.trips_url AS config_mds_trips_url,
>  config_mds.status_changes_url AS config_mds_status_changes_url,
>  config_mds.url_headers AS config_mds_url_headers,
>  config_mds.login_data AS config_mds_login_data,
>  config_mds.use_different_region AS config_mds_use_different_region,
>  config_mds.comment AS config_mds_comment,
>  config_mds.last_reprocess_time AS config_mds_last_reprocess_time,
>  config_mds.token_cache AS config_mds_token_cache
>  FROM
>  (
>  SELECT
>  status_changes.region_id AS region_id,
>  status_changes.operator AS OPERATOR,
>  status_changes.day_str AS day_str,
>  min(status_changes.processed_time) AS min_processed_time
>  FROM
>  status_changes
>  GROUP BY
>  status_changes.region_id,
>  status_changes.operator,
>  status_changes.day_str
>  ORDER BY
>  status_changes.region_id,
>  status_changes.operator,
>  status_changes.day_str
>  ) AS mv
>  JOIN config_mds ON mv.region_id = config_mds.region_id
>  AND mv.operator = config_mds.operator
>  ) AS j
> WHERE
>  config_mds.last_reprocess_time > mv.min_processed_time;
> 
> 
> 
> This behaviour confuses me a bit. Why would adding an alias make a difference 
> to the generated query?


you have:

min_values = "(SELECT .... FROM ... ) AS mv"

select([min_values.c.x]) = "SELECT mv.x FROM (SELECT ... FROM ..) AS mv"


then you say, "j = min_values.join(config_mds, ...).alias("j")"

this is the same as saying:

j = select([min_values.join(config_mds, ...)]).alias("j")

an "alias" is only against a SELECT or a table; there is no "SELECT * FROM (x 
JOIN y) AS j" syntax. Did a quick test and apparently Postgresql allows that 
syntax but it fails on MySQL. SQLite seems to support it but that's a newer 
thing as SQLite didn't support nested JOINs at all some years ago. Adding the 
"AS <name>" to the plain parenthesies like that It's not something I've seen in 
SQL before and SQLAlchemy has no concept of that right now.

so given that join().alias() is shorthand for select([join()]).alias(), this 
means:

select([j.c.x]) = "SELECT j.x FROM (SELECT mv.*, config_mds.* FROM (SELECT ... 
FROM ..) AS mv JOIN config_mds ON ...) AS j"

In SQLAlchemy 1.4 there's been a big scale-back of how things like alias() and 
join() can be used in order to decrease this kind of confusion, and I was sort 
of hoping this particular use was also deprecated but apparently it isn't. so 
https://github.com/sqlalchemy/sqlalchemy/issues/5010 is added and the 
join().alias() usage will emit a deprecation warning in 1.4 because it does a 
thing that by today's standards is surprising.

anyway, if you mix the join() and the join().alias() together, like:

select([min_values.c.x]).select_from(j)...

this is like:

"SELECT mv.x FROM (SELECT ... FROM ..) AS mv, (SELECT mv.*, config_mds.* FROM 
(SELECT ... FROM ..) AS mv JOIN config_mds ON ...) AS j"

that is, saying select_from(j) doesn't mean that's the only thing you are 
selecting from, it also has to select from min_values directly.

TLDR join().alias() is doing a large implicit thing based on older assumptions, 
when parenthesized "(a JOIN b)" clauses were not accepted by SQLite and 
possibly others and SQLAlchemy always used SELECT... in order to group joins. 
you can group joins in parentheses now but you still can't give them an alias() 
without a SELECT implied.

> 
> 
> 
> 

> --
>  SQLAlchemy - 
>  The Python SQL Toolkit and Object Relational Mapper
> 
> http://www.sqlalchemy.org/
> 
>  To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
>  --- 
>  You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
>  To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
>  To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/bbf11796-b9ed-4b6f-9188-c35585dfc01d%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/sqlalchemy/bbf11796-b9ed-4b6f-9188-c35585dfc01d%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/f26cf2af-cb16-4488-b75b-baca3c457e00%40www.fastmail.com.

Reply via email to