[GitHub] [spark] abellina commented on pull request #36505: [SPARK-39131][SQL] Rewrite exists as LeftSemi earlier to allow filters to be inferred

GitBox Tue, 17 May 2022 10:52:15 -0700


abellina commented on PR #36505:
URL: https://github.com/apache/spark/pull/36505#issuecomment-1129153161


   > All other queries in the test are passing, except for the negative case 
for the multi-column support. It is commented out in my last patch (obviously 
that's not the solution) 
https://github.com/apache/spark/commit/baac1e4119f755b3a906f27c4f7324022fc27e85#diff-4279d0074cd860be3eab329b3716ac502a14ae4512c3e371a5f0e68636cec07dR1163-R1166.
 I believe this also has to do with the nullability, in this case of the second 
column b, and I am looking into it, but I could use some help.
   
   Ok, the reason for this is that the condition:
   ```
   (((key = a) OR isnull((key = a))) AND ((key + 1) = b))
   ```
   
   Is split into the equi-join part (`(key + 1) = b`) and the rest is turned 
into a conditional expression `(key = a) OR isnull((key = a))`  by 
`ExtractEquiJoinKeys`.
   
   If `b` were nullable, we wind up with a different condition:
   
   ```
   (((key = a) OR isnull((key = a))) AND (((key + 1) = b) OR isnull(((key + 1) 
= b))))
   ```
   
   This can't be split by `ExtractEquiJoinKeys`, so the join isn't an equi join 
so far. It goes on to `ExtractSingleColumnNullAwareAntiJoin` and passes through 
since it can't be matched with the single-column expression this rule is 
expecting => this is the negative case for the test. I'll change the table used 
to have nullable key,value columns in the test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] abellina commented on pull request #36505: [SPARK-39131][SQL] Rewrite exists as LeftSemi earlier to allow filters to be inferred

Reply via email to