Github user dilipbiswal commented on a diff in the pull request:
https://github.com/apache/spark/pull/21403#discussion_r206304130
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -1422,11 +1422,26 @@ class Analyzer(
resolveSubQuery(s, plans)(ScalarSubquery(_, _, exprId))
case e @ Exists(sub, _, exprId) if !sub.resolved =>
resolveSubQuery(e, plans)(Exists(_, _, exprId))
- case In(value, Seq(l @ ListQuery(sub, _, exprId, _))) if
value.resolved && !l.resolved =>
+ case In(values, Seq(l @ ListQuery(_, _, exprId, _)))
+ if values.forall(_.resolved) && !l.resolved =>
val expr = resolveSubQuery(l, plans)((plan, exprs) => {
ListQuery(plan, exprs, exprId, plan.output)
})
- In(value, Seq(expr))
+ val subqueryOutput = expr.plan.output
+ val resolvedIn = In(values, Seq(expr))
+ if (values.length != subqueryOutput.length) {
+ throw new AnalysisException(
--- End diff --
@mgaido91 I quickly tried the error case to check out the message -
```
spark-sql> select * from ut1 where (c1, c2) in (select (c1, c2) from ut2);
Error in query: Cannot analyze (named_struct('c1', ut1.`c1`, 'c2',
ut1.`c2`) IN (listquery())).
The number of columns in the left hand side of an IN subquery does not
match the
number of columns in the output of subquery.
#columns in left hand side: 2.
#columns in right hand side: 1.
Left side columns:
[ut1.`c1`, ut1.`c2`].
Right side columns:
[`named_struct(c1, c1, c2, c2)`].;
```
The right hand side columns looks confusing. Should we only display the
value exprs or the name exprs instead of both ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]