mridulm commented on a change in pull request #31876:
URL: https://github.com/apache/spark/pull/31876#discussion_r617752756
##########
File path: core/src/main/scala/org/apache/spark/TaskEndReason.scala
##########
@@ -81,7 +81,7 @@ case object Resubmitted extends TaskFailedReason {
*/
@DeveloperApi
case class FetchFailed(
- bmAddress: BlockManagerId, // Note that bmAddress can be null
+ bmAddress: Location, // Note that bmAddress can be null
Review comment:
> And I have a new idea that we can introduce a new fetch failed class
for the custom location and leave this one unchanged. For example, we can have
CustomStorageFetchFailed. Thus, we the location is BlockManagerId then we use
FetchFailed, otherwise, uses CustomStorageFetchFailed. WDYT?
`CustomStorageFetchFailed` looks like a promising approach, we will need to
think through what the implications of it would be would on the face of it, it
should address immediate concerns IMO.
Thoughts @attilapiros, @tgravescs ?
> The only problem is the custom location. It's new data, e.g.,
("XXXLocation" -> XXXLocationJson). So it can be a problem if users use the old
version Spark to load event files. Although, I think this's really an
unexpected usage.
There are couple of issues here:
* A simpler question of how to handle custom location - from programmatic
and data point of view.
* How to handle different shuffle impls being in play for the same event
directory.
* If deployments have multiple shuffle infra in use over course of time
(or different clusters with different configs and a shared history event dir),
each with their own Location's.
* How will SHS/REST api, etc understand which location class is being
used/how to parse them.
I actually dont have good solutions on this - other than adding some
metadata per location record to indicate the 'type'.
Any other thoughts ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]