Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r206304130
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
    @@ -1422,11 +1422,26 @@ class Analyzer(
               resolveSubQuery(s, plans)(ScalarSubquery(_, _, exprId))
             case e @ Exists(sub, _, exprId) if !sub.resolved =>
               resolveSubQuery(e, plans)(Exists(_, _, exprId))
    -        case In(value, Seq(l @ ListQuery(sub, _, exprId, _))) if 
value.resolved && !l.resolved =>
    +        case In(values, Seq(l @ ListQuery(_, _, exprId, _)))
    +            if values.forall(_.resolved) && !l.resolved =>
               val expr = resolveSubQuery(l, plans)((plan, exprs) => {
                 ListQuery(plan, exprs, exprId, plan.output)
               })
    -          In(value, Seq(expr))
    +          val subqueryOutput = expr.plan.output
    +          val resolvedIn = In(values, Seq(expr))
    +          if (values.length != subqueryOutput.length) {
    +            throw new AnalysisException(
    --- End diff --
    
    @mgaido91 I quickly tried the error case to check out the message - 
    ```
    spark-sql> select * from ut1 where (c1, c2) in (select (c1, c2) from ut2);
    Error in query: Cannot analyze (named_struct('c1', ut1.`c1`, 'c2', 
ut1.`c2`) IN (listquery())).
    The number of columns in the left hand side of an IN subquery does not 
match the
    number of columns in the output of subquery.
    #columns in left hand side: 2.
    #columns in right hand side: 1.
    Left side columns:
    [ut1.`c1`, ut1.`c2`].
    Right side columns:
    [`named_struct(c1, c1, c2, c2)`].;
    ```
    The right hand side columns looks confusing. Should we only display the 
value exprs or the name exprs instead of both ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to