Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12306#discussion_r60023217
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
 ---
    @@ -77,3 +79,81 @@ case class ScalarSubquery(
     
       override def toString: String = s"subquery#${exprId.id}"
     }
    +
    +/**
    + * A predicate subquery checks the existence of a value in a sub-query. We 
currently only allow
    + * [[PredicateSubquery]] expressions within a Filter plan (i.e. WHERE or a 
HAVING clause). This will
    + * be rewritten into a left semi/anti join during analysis.
    + */
    +abstract class PredicateSubquery extends SubqueryExpression with 
Unevaluable with Predicate {
    +  override def nullable: Boolean = false
    +  override def plan: LogicalPlan = SubqueryAlias(prettyName, query)
    +}
    +
    +object PredicateSubquery {
    +  def hasPredicateSubquery(e: Expression): Boolean = {
    +    e.find(_.isInstanceOf[PredicateSubquery]).isDefined
    +  }
    +}
    +
    +/**
    + * The [[InSubQuery]] predicate checks the existence of a value in a 
sub-query. For example (SQL):
    + * {{{
    + *   SELECT  *
    + *   FROM    a
    + *   WHERE   a.id IN (SELECT  id
    + *                    FROM    b)
    + * }}}
    + */
    +case class InSubQuery(value: Expression, query: LogicalPlan) extends 
PredicateSubquery {
    +  override def children: Seq[Expression] = value :: Nil
    +  override lazy val resolved: Boolean = value.resolved && query.resolved
    +  override def withNewPlan(plan: LogicalPlan): InSubQuery = 
InSubQuery(value, plan)
    +
    +  /**
    +   * The unwrapped value side expressions.
    +   */
    +  lazy val expressions: Seq[Expression] = value match {
    +    case CreateStruct(cols) => cols
    +    case col => Seq(col)
    +  }
    +
    +  /**
    +   * Check if the number of columns and the data types on both sides match.
    +   */
    +  override def checkInputDataTypes(): TypeCheckResult = {
    +    // Check the number of arguments.
    +    if (expressions.length != query.output.length) {
    +      TypeCheckResult.TypeCheckFailure(
    +        s"The number of fields in the value (${expressions.length}) does 
not match with " +
    +          s"the number of columns in the subquery 
(${query.output.length})")
    +    }
    +
    +    // Check the argument types.
    +    expressions.zip(query.output).zipWithIndex.foreach {
    +      case ((e, a), i) if e.dataType != a.dataType =>
    +        TypeCheckResult.TypeCheckFailure(
    +          s"The data type of value[$i](${e.dataType}) does not match " +
    +            s"subquery column '${a.name}' (${a.dataType}).")
    +      case _ =>
    +    }
    +
    +    TypeCheckResult.TypeCheckSuccess
    +  }
    +}
    +
    +/**
    + * The [[Exists]] expression checks if a row exists in a subquery given 
some correlated condition.
    + * For example (SQL):
    + * {{{
    + *   SELECT  *
    + *   FROM    a
    + *   WHERE   EXISTS (SELECT  *
    + *                   FROM    b
    + *                   WHERE   b.id = a.id)
    + * }}}
    + */
    +case class Exists(query: LogicalPlan) extends PredicateSubquery {
    +  override def children: Seq[Expression] = Nil
    +  override def withNewPlan(plan: LogicalPlan): Exists = Exists(plan)
    --- End diff --
    
    We could also push down entire left semi/anti joins (assuming we keep the 
rewrite in the Optimizer). 
    
    I would pull out all the entire correlated conditions (getting references 
from this is trivial). I was actually working on this. This involves rewriting 
the plan and the (small) downside here is that generating SQL will be more 
complicated after this step.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to