Github user yucai commented on a diff in the pull request:
https://github.com/apache/spark/pull/22066#discussion_r209267827
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
---
@@ -778,21 +783,22 @@ case class HiveHash(children: Seq[Expression])
extends HashExpression[Int] {
input: String,
result: String,
fields: Array[StructField]): String = {
+ val tmpInput = ctx.freshName("input")
--- End diff --
Seems like `HiveHash` cannot be triggered in the normal way. Because Spark
uses `Murmur3Hash`.
But this function does have this issue. You can hack to test in this way.
In `HashPartitioning`:
```
def partitionIdExpression: Expression = Pmod(new
Murmur3Hash(expressions), Literal(numPartitions))
```
to
```
def partitionIdExpression: Expression = Pmod(new HiveHash(expressions),
Literal(numPartitions))
```
Then run tests:
```
val df = spark.range(1000)
val columns = (0 until 400).map{ i => s"id as id$i" }
val distributeExprs = (0 until 100).map(c => s"id$c").mkString(",")
df.selectExpr(columns : _*).createTempView("test")
spark.sql(s"select * from test distribute by ($distributeExprs)").count()
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]