Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15047#discussion_r80848863
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
    @@ -559,3 +607,219 @@ case class CurrentDatabase() extends LeafExpression 
with Unevaluable {
       override def foldable: Boolean = true
       override def nullable: Boolean = false
     }
    +
    +/**
    + * Simulates Hive's hashing function at
    + * 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils#hashcode() 
in Hive
    + *
    + * We should use this hash function for both shuffle and bucket of Hive 
tables, so that
    + * we can guarantee shuffle and bucketing have same data distribution
    + *
    + * TODO: Support Decimal and date related types
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(a1, a2, ...) - Returns a hash value of the arguments.")
    +case class HiveHash(children: Seq[Expression]) extends HashExpression[Int] 
{
    +  override val seed = 0
    +
    +  override def dataType: DataType = IntegerType
    +
    +  override def prettyName: String = "hive-hash"
    +
    +  override protected def hasherClassName: String = 
classOf[HiveHasher].getName
    +
    +  override protected def computeHash(value: Any, dataType: DataType, seed: 
Int): Int = {
    +    HiveHashFunction.hash(value, dataType, seed).toInt
    +  }
    +
    +  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    ev.isNull = "false"
    +    val childHash = ctx.freshName("childHash")
    +    val childrenHash = children.map { child =>
    +      val childGen = child.genCode(ctx)
    +      childGen.code + ctx.nullSafeExec(child.nullable, childGen.isNull) {
    +        computeHash(childGen.value, child.dataType, childHash, ctx)
    +      } + s"${ev.value} = (31 * ${ev.value}) + $childHash;"
    +    }.mkString(s"int $childHash = 0;", s"\n$childHash = 0;\n", "")
    +
    +    ev.copy(code = s"""
    +      ${ctx.javaType(dataType)} ${ev.value} = $seed;
    +      $childrenHash""")
    +  }
    +
    +  @tailrec
    +  private def computeHash(
    --- End diff --
    
    Yes. `@tailrec` only works with `private` modifier so I was unable to make 
the parent class' version to be accessible to child classes. 
    
    I am introducing a wrapper method to avoid code duplication while still 
keeping tailrec's benefits. This method is used for generating the codegen 
string so would have negligible impact on overall query perf. 
    
    If you got any better solution, let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to