[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

mgaido91 Tue, 31 Jul 2018 04:45:02 -0700

Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    @cloud-fan the problem is that the change is not only for the case when IN 
is followed by a listquery. The change is needed also in the other case. And 
the reason why this change is needed is to detect the difference between these 
2 queries:
     1. `select 1 from (select (1, 'a') as col1) tab1 where col1 in (select 1, 
'a')` or equivalently `select 1 from (select (1, 'a') as col1) tab1 where col1 
in ((1, 'a'))`
     2. `select 1 from (select 1 as col1, 'a' as col2) tab1 where (col1, col2) 
in (select 1, 'a')` or equivalently `select 1 from (select 1 as col1, 'a' as 
col2) tab1 where (col1, col2) in ((1, 'a'))`
    
    In particular, queries 1 are invalid as they are comparing one value column 
with 2 column in the inner query/list of constants; while queries 2 are valid 
as they are comparing 2 columns on both sides. I hope this clarifies that 
introducing a specific `InListQuery` couldn't solve the problem.
    
    > It's not public so we can change it, but I believe some advanced users 
use these internal classes and we should keep these classes unchanged as 
possible as we can.
    
    I agree with you on this point, that is why I initially changed my proposal 
from `Seq[Expression]` to introducing the new `InValues`expression. Though also 
this might break existing user code as there is an extra expression they 
wouln't expect. So I think both solutions are equivalent. The only thing we cn 
do about this point is wait for 3.0 to have this in if we consider this a 
breaking change.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Reply via email to