Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/21403
@cloud-fan the problem is that the change is not only for the case when IN
is followed by a listquery. The change is needed also in the other case. And
the reason why this change is needed is to detect the difference between these
2 queries:
1. `select 1 from (select (1, 'a') as col1) tab1 where col1 in (select 1,
'a')` or equivalently `select 1 from (select (1, 'a') as col1) tab1 where col1
in ((1, 'a'))`
2. `select 1 from (select 1 as col1, 'a' as col2) tab1 where (col1, col2)
in (select 1, 'a')` or equivalently `select 1 from (select 1 as col1, 'a' as
col2) tab1 where (col1, col2) in ((1, 'a'))`
In particular, queries 1 are invalid as they are comparing one value column
with 2 column in the inner query/list of constants; while queries 2 are valid
as they are comparing 2 columns on both sides. I hope this clarifies that
introducing a specific `InListQuery` couldn't solve the problem.
> It's not public so we can change it, but I believe some advanced users
use these internal classes and we should keep these classes unchanged as
possible as we can.
I agree with you on this point, that is why I initially changed my proposal
from `Seq[Expression]` to introducing the new `InValues`expression. Though also
this might break existing user code as there is an extra expression they
wouln't expect. So I think both solutions are equivalent. The only thing we cn
do about this point is wait for 3.0 to have this in if we consider this a
breaking change.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]