Github user squito commented on the pull request:
https://github.com/apache/spark/pull/2851#issuecomment-74293360
Hi @CodingCat ,
thanks for working on this, sorry the review has been dragging out. This
will be a nice addition, but I have some concerns similar to the other
reviewers -- but lemme explain in a little more detail. I don't necessarily
see anything wrong with your approach for getting the Broadcast block info by
itself, but it seems like it ought to share a similar code path to tracking the
RDDBlocks. RDDBlocks get updated in the `SparkListenerTaskEnd` event, through
`event.taskMetrics.updatedBlocks`. this makes its way to the current
`StorageTab` in
[`StorageListener`](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala#L59),
which then [filters down to the
RDDBlocks](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala#L50)
via `BlockId.asRDDId`.
So, this makes me wonder about two possible alternatives for unifying the
code paths:
1) Can you get the Broadcast block ids into the same `SparkListenerTaskEnd`
event, which would eliminate the need to create a new event? (In fact, maybe
those block ids are already there? I am still looking into the code for how
those events get created ...)
2) If we can't get Broadcast block ids into the same
`SparkListenerTaskEnd`, does it make sense to change the RDD block tracking
logic to do the same thing? Yes this will be a bigger change, but it seems
strange to have these two blocks tracked by such different mechanisms so it
will lead to simpler code overall. (But really that pushes me to option #1)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]