Github user yucai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22066#discussion_r209267827
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
 ---
    @@ -778,21 +783,22 @@ case class HiveHash(children: Seq[Expression]) 
extends HashExpression[Int] {
           input: String,
           result: String,
           fields: Array[StructField]): String = {
    +    val tmpInput = ctx.freshName("input")
    --- End diff --
    
    Seems like `HiveHash` cannot be triggered in the normal way. Because Spark 
uses `Murmur3Hash`.
    But this function does have this issue. You can hack to test in this way.
    In `HashPartitioning`:
    ```
      def partitionIdExpression: Expression = Pmod(new 
Murmur3Hash(expressions), Literal(numPartitions))
    ```
    to
    ```
      def partitionIdExpression: Expression = Pmod(new HiveHash(expressions), 
Literal(numPartitions))
    ```
    Then run tests:
    ```
      val df = spark.range(1000)
      val columns = (0 until 400).map{ i => s"id as id$i" }
      val distributeExprs = (0 until 100).map(c => s"id$c").mkString(",")
      df.selectExpr(columns : _*).createTempView("test")
      spark.sql(s"select * from test distribute by ($distributeExprs)").count()
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to