peter-toth opened a new pull request, #40856:
URL: https://github.com/apache/spark/pull/40856
### What changes were proposed in this pull request?
This PR fixes `InlineCTE` idempotence. E.g. the following query:
```
WITH
x(r) AS (SELECT random()),
y(r) AS (SELECT * FROM x),
z(r) AS (SELECT * FROM x)
SELECT * FROM z
```
currently breaks it because we take into account the reference to `x` from
`y` when deciding about not inlining `x` in the first round:
```
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.InlineCTE ===
WithCTE
WithCTE
:- CTERelationDef 0, false
:- CTERelationDef 0, false
: +- Project [1#218 AS id#220, rand()#219 AS r#221]
: +- Project [1#218 AS id#220, rand()#219 AS r#221]
: +- Project [1 AS 1#218, random(-2837267160392907379) AS rand()#219]
: +- Project [1 AS 1#218, random(-2837267160392907379) AS rand()#219]
: +- OneRowRelation
: +- OneRowRelation
!:- CTERelationDef 1, false
+- Project [id#226, r#227]
!: +- Project [id#220 AS id#224, r#221 AS r#225]
+- Project [id#222 AS id#226, r#223 AS r#227]
!: +- Project [id#220, r#221]
+- Project [id#222, r#223]
!: +- CTERelationRef 0, true, [id#220, r#221]
+- CTERelationRef 0, true, [id#222, r#223]
!:- CTERelationDef 2, false
!: +- Project [id#222 AS id#226, r#223 AS r#227]
!: +- Project [id#222, r#223]
!: +- CTERelationRef 0, true, [id#222, r#223]
!+- Project [id#226, r#227]
! +- CTERelationRef 2, true, [id#226, r#227]
```
But in the next round we inline `x` because `y` was removed due to lack of
references:
```
Once strategy's idempotence is broken for batch Inline CTE
!WithCTE
Project [id#226, r#227]
!:- CTERelationDef 0, false
+- Project [id#222 AS id#226, r#223 AS r#227]
!: +- Project [1#218 AS id#220, rand()#219 AS r#221]
+- Project [id#222, r#223]
!: +- Project [1 AS 1#218, random(-2837267160392907379) AS rand()#219]
+- Project [id#232 AS id#222, r#233 AS r#223]
!: +- OneRowRelation
+- Project [1#218 AS id#232, rand()#219 AS r#233]
!+- Project [id#226, r#227]
+- Project [1 AS 1#218, random(-2837267160392907379) AS rand()#219]
! +- Project [id#222 AS id#226, r#223 AS r#227]
+- OneRowRelation
! +- Project [id#222, r#223]
! +- CTERelationRef 0, true, [id#222, r#223]
```
### Why are the changes needed?
We use `InlineCTE` as an idempotent rule in the `Optimizer`, `CheckAnalysis`
and `ProgressReporter`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Added new UT.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]