[GitHub] [spark] attilapiros edited a comment on pull request #30763: [SPARK-31801][API][SHUFFLE] Register map output metadata

GitBox Mon, 08 Mar 2021 07:34:40 -0800


attilapiros edited a comment on pull request #30763:
URL: https://github.com/apache/spark/pull/30763#issuecomment-792821433



   >. Then, `BlockManagerId` would be a native implementation for Spark and 
users could implement `Location` to support custom storage.
   
   To test the idea I try to come up with hard situations but  this does not 
mean I am against the idea.
   
   So if I understand correctly `BlockManagerId` would extend the `Location` 
class, right? 
   And here `MapStatus#location` would be a generic `Location`?
   
   In this case we should check the references of this `MapStatus#location` and 
based on that decide where we are safe to cast `Location` to `BlockManagerId` 
or where else we would pass the location further as a  `Location` (or at least 
what else the generic location should contain to have the existing things 
working...).
   
   As the current reader uses `MapOutputTracker#getMapSizesByExecutorId` you 
would like to keep that method and runtime throw an exception when it's called 
and location is not `BlockManagerId`? This is a central method to get 
`blocksByAddress` for fetching in the Spark shuffle.
   
   For example as I see `MapOutputTracker` is tailored to satisfy the current 
shuffle solution. This should be checked for the idea.
   
   On the other hand write side might be easier as there MapStatus is filled 
with the id of the current block manager. So a new writer implementation just 
uses its location.
   
   But for the read side my worry is having runtime checks/assert/guards to 
enforce when allowed to use what.
   
    


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] attilapiros edited a comment on pull request #30763: [SPARK-31801][API][SHUFFLE] Register map output metadata

Reply via email to