[GitHub] spark pull request #21931: [SPARK-24978][SQL]Add spark.sql.fast.hash.aggrega...

kiszk Fri, 03 Aug 2018 04:43:13 -0700

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21931#discussion_r207519280
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/VectorizedHashMapGenerator.scala
 ---
    @@ -83,7 +84,7 @@ class VectorizedHashMapGenerator(
            |  private ${classOf[ColumnarBatch].getName} batch;
            |  private ${classOf[MutableColumnarRow].getName} aggBufferRow;
            |  private int[] buckets;
    -       |  private int capacity = 1 << 16;
    +       |  private int capacity = $maxCapacity;
    --- End diff --
    
    We can see the following code at L226. If a user specify `2^n` value (e.g. 
1024), it works functionally correct. What happens if a user specified non 
`2^n` value (e.g. 127)?
    ```
    idx = (idx + 1) & (numBuckets - 1);
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21931: [SPARK-24978][SQL]Add spark.sql.fast.hash.aggrega...

Reply via email to