Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/21403
@juliuszsompolski I see your point and I can say it is an acceptable
solution. Though it has some problems I think. If we follow this path, we are
saying that: `(a, b) IN (select c, d from ...)` has a different result from
`(a, b) IN (select (c, d) from ..)` and `(a, b) IN ((1, 2))`. We can probably
argument that they are different things so they can lead to different results,
but this is no very intuitive for a user.
I'd prefer, in this case, having a rule about how we behave and follow
that, throwing an AnalysisException otherwise. This is also the behavior of
other RDBMS (I checked Oracle and Postgres):
- `(a, b) IN (select c, d from ...)` unpacks them;
- `(a, b) IN (select (c, d) from ..)` throws an `AnalysisException`
So I would suggest going on with this approach, which could solve also
other issues like SPARK-24395 since they would be considered as invalid.
cc @hvanhovell what do you think?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]