amaliujia commented on code in PR #39057:
URL: https://github.com/apache/spark/pull/39057#discussion_r1048850181
##########
core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala:
##########
@@ -513,3 +513,81 @@ class CollectionAccumulator[T] extends AccumulatorV2[T,
java.util.List[T]] {
getOrCreate.addAll(newValue)
}
}
+
+
+/**
+ * An [[AccumulatorV2 counter]] for collecting a list of (mapper id, row
count).
+ *
+ * @since 3.4.0
+ */
+class MapperRowCounter extends AccumulatorV2[jl.Long,
java.util.List[java.util.List[jl.Long]]] {
+
+ private var _agg: java.util.List[java.util.List[jl.Long]] = _
Review Comment:
Which of the following that you are thinking:
1. a list of integers where every two forms a pair for partition id and its
row count?
2. a list of integers that the index is the mapper/partition id and the
value is the row count?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]