Github user tdas commented on a diff in the pull request:
https://github.com/apache/spark/pull/21385#discussion_r190396813
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala
---
@@ -42,16 +47,24 @@ case class ContinuousShuffleReadPartition(index: Int,
queueSize: Int) extends Pa
* RDD at the map side of each continuous processing shuffle task.
Upstream tasks send their
* shuffle output to the wrapped receivers in partitions of this RDD; each
of the RDD's tasks
* poll from their receiver until an epoch marker is sent.
+ *
+ * @param sc the RDD context
+ * @param numPartitions the number of read partitions for this RDD
+ * @param queueSize the size of the row buffers to use
+ * @param numShuffleWriters the number of continuous shuffle writers
feeding into this RDD
+ * @param checkpointIntervalMs the checkpoint interval of the streaming
query
*/
class ContinuousShuffleReadRDD(
sc: SparkContext,
numPartitions: Int,
- queueSize: Int = 1024)
+ queueSize: Int = 1024,
+ numShuffleWriters: Int = 1,
+ checkpointIntervalMs: Long = 1000)
--- End diff --
Isnt this the same as epochInterval? the term "epoch" is more well known in
the code than "checkpoint"
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]