squito commented on a change in pull request #25007:
[SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
URL: https://github.com/apache/spark/pull/25007#discussion_r305384481
##########
File path: core/src/test/scala/org/apache/spark/ShuffleSuite.scala
##########
@@ -383,13 +383,19 @@ abstract class ShuffleSuite extends SparkFunSuite with
Matchers with LocalSparkC
// simultaneously, and everything is still OK
def writeAndClose(
- writer: ShuffleWriter[Int, Int])(
+ writer: ShuffleWriter[Int, Int],
+ taskContext: TaskContext)(
iter: Iterator[(Int, Int)]): Option[MapStatus] = {
- val files = writer.write(iter)
- writer.stop(true)
+ TaskContext.setTaskContext(taskContext)
Review comment:
though I mentioned this comment on the test, the whole reason was just
because it made me worry about the general api design.
Another shuffle implementation may have its own metric system, to monitor
that system -- but nonetheless, the actual end user of spark is going to want
to see metrics in the Spark UI about the shuffle. We don't have a way for the
alternative shuffle implementation to plugin their own metrics to the UI (nor
do I think we want to). I guess the most important metrics, the number of
records & bytes, are recorded outside of the plugin -- but the plugin should be
updating the write time metric, regardless of what storage its using.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]