[GitHub] [spark] peter-toth commented on a diff in pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

GitBox Fri, 30 Sep 2022 06:12:40 -0700


peter-toth commented on code in PR #38052:
URL: https://github.com/apache/spark/pull/38052#discussion_r984342208



##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueriesSuite.scala:
##########
@@ -534,20 +534,10 @@ class MergeScalarSubqueriesSuite extends PlanTest {
 
   test("Merging subqueries from different places") {
     val subquery1 = ScalarSubquery(testRelation.select(('a + 1).as("a_plus1")))
-    val subquery2 = ScalarSubquery(testRelation.select(('a + 2).as("a_plus2")))

Review Comment:
   Hi @dtenedor, thanks for reporting this bug and pinging me.
   
   I think the test is a valid usecase so I'm thinking about if we could come 
up with a better fix. I think the root cause of the issue is in 
`tryMergePlans()`. When we try merging `Project`s or `Aggregate`s and don't 
take into accout that `np` can contain `ScalarSubqueryReference`. So probably 
filtering out `ScalarSubqueryReference`s there should fix the bug but keep this 
usecase working. Also, I think `EXISTS_SUBQUERY`, `IN_SUBQUERY` should not 
matter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] peter-toth commented on a diff in pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

Reply via email to