[GitHub] spark pull request: [SPARK-2053][SQL] Add Catalyst expressions for...

concretevitamin Thu, 12 Jun 2014 13:04:59 -0700

Github user concretevitamin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1055#discussion_r13723141
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ---
    @@ -202,3 +203,139 @@ case class If(predicate: Expression, trueValue: 
Expression, falseValue: Expressi
     
       override def toString = s"if ($predicate) $trueValue else $falseValue"
     }
    +
    +// scalastyle:off
    +/**
    + * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE 
e] END".
    + * Refer to this link for the corresponding semantics:
    + * 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-ConditionalFunctions
    + *
    + * Note that branches are considered in consecutive pairs (cond, val), and 
the optional last element
    + * is the val for the default catch-all case (if provided). Hence, 
`branches` consist of at least
    + * two elements, and can have an odd or even length.
    + */
    +// scalastyle:on
    +case class CaseWhen(branches: Seq[Expression]) extends Expression {
    +  type EvaluatedType = Any
    +  def children = branches
    +  def references = children.flatMap(_.references).toSet
    +  def dataType = {
    +    if (!resolved) {
    +      throw new UnresolvedException(this, "cannot resolve due to differing 
types in some branches")
    +    }
    +    branches(1).dataType
    +  }
    +
    +  override def nullable = branches.sliding(2, 2).map {
    +      case Seq(cond, value) => value.nullable
    +      case Seq(elseValue) => elseValue.nullable
    +    }.reduce(_ || _)
    +
    +  override lazy val resolved = {
    +    lazy val allCondBooleans = branches.sliding(2, 2).map {
    +      case Seq(cond, value) => cond.dataType == BooleanType
    +      case _ => true
    +    }.reduce(_ && _)
    +    lazy val dataTypes = branches.sliding(2, 2).map {
    +      case Seq(cond, value) => value.dataType
    +      case Seq(elseValue) => elseValue.dataType
    +    }.toSeq
    +    lazy val dataTypesEqual =
    +      if (dataTypes.size <= 1) true else dataTypes.drop(1).map(_ == 
dataTypes(0)).reduce(_ && _)
    +    if (!childrenResolved) false else allCondBooleans && dataTypesEqual
    +  }
    +
    +  /** Written in imperative fashion for performance considerations.  Same 
for CaseKeyWhen. */
    +  override def eval(input: Row): Any = {
    +    val branchesArr = branches.toArray
    +    val len = branchesArr.length
    +    var i = 0
    +    // If all branches fail and an elseVal is not provided, the whole 
statement
    +    // defaults to null, according to Hive's semantics.
    +    var res: Any = null
    +    while (i < len - 1) {
    +      if (branches(i).eval(input) == true) {
    +        res = branches(i + 1).eval(input)
    +        return res
    +      }
    +      i += 2
    +    }
    +    if (i == len - 1) {
    +      res = branches(i).eval(input)
    +    }
    +    res
    +  }
    +
    +  override def toString = {
    +    val firstBranch = s"if (${branches(0)} == true) { ${branches(1)} }"
    --- End diff --
    
    There isn't a special reason for this. However, the previous 
`s"CASE($branches)"` would not group the conditions and values together and 
would just present them in a flattened fashion. So either way we would need 
some logic to group two adjacent elements. Thoughts?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2053][SQL] Add Catalyst expressions for...

Reply via email to