Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19847#discussion_r153823326
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/VectorizedHashMapGenerator.scala
---
@@ -141,29 +136,35 @@ class VectorizedHashMapGenerator(
}
/**
- * Generates a method that returns a mutable
- * [[org.apache.spark.sql.execution.vectorized.ColumnarRow]] which keeps
track of the
+ * Generates a method that returns a
+ * [[org.apache.spark.sql.execution.vectorized.MutableColumnarRow]]
which keeps track of the
* aggregate value(s) for a given set of keys. If the corresponding row
doesn't exist, the
* generated method adds the corresponding row in the associated
- * [[org.apache.spark.sql.execution.vectorized.ColumnarBatch]]. For
instance, if we
+ * [[org.apache.spark.sql.execution.vectorized.OnHeapColumnVector]]. For
instance, if we
* have 2 long group-by keys, the generated function would be of the
form:
*
* {{{
- * public org.apache.spark.sql.execution.vectorized.ColumnarRow
findOrInsert(
- * long agg_key, long agg_key1) {
+ * public MutableColumnarRow findOrInsert(long agg_key, long agg_key1) {
* long h = hash(agg_key, agg_key1);
* int step = 0;
* int idx = (int) h & (numBuckets - 1);
* while (step < maxSteps) {
* // Return bucket index if it's either an empty slot or already
contains the key
* if (buckets[idx] == -1) {
- * batchVectors[0].putLong(numRows, agg_key);
- * batchVectors[1].putLong(numRows, agg_key1);
- * batchVectors[2].putLong(numRows, 0);
- * buckets[idx] = numRows++;
- * return batch.getRow(buckets[idx]);
+ * if (numRows < capacity) {
--- End diff --
update the comment to match the real code.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]