Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    @juliuszsompolski I see your point and I can say it is an acceptable 
solution. Though it has some problems I think. If we follow this path, we are 
saying that: `(a, b) IN (select c, d from ...)` has a different result from 
`(a, b) IN (select (c, d) from ..)` and `(a, b) IN ((1, 2))`. We can probably 
argument that they are different things so they can lead to different results, 
but this is no very intuitive for a user.
    
    I'd prefer, in this case, having a rule about how we behave and follow 
that, throwing an AnalysisException otherwise. This is also the behavior of 
other RDBMS (I checked Oracle and Postgres):
    
     - `(a, b) IN (select c, d from ...)` unpacks them;
     - `(a, b) IN (select (c, d) from ..)` throws an `AnalysisException`
    
    So I would suggest going on with this approach, which could solve also 
other issues like SPARK-24395 since they would be considered as invalid.
    
    cc @hvanhovell what do you think?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to