Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1486#issuecomment-54901482
  
    Okay the thing I said before won't work because we can't return a rich type 
from `getPreferredLocations`.
    
    So after looking some more, how about this:
    
    1. We extend `TaskLocation` to have a boolean field called `cached`.
    2. We add a simple scheme to tag in-memory locations in 
getPrefferredLocations. For now let's just keep it simple and introduce a 
single type of tag. We modify the `apply` function in `TaskLocation` to parse 
this correctly. This would be similar to the logic you have right now in 
`PartitionLocation`.
    3. In the `TaskSetManager` in `addPendingTasks` we check whether `cached` 
is set to true. If it is we lookup if we have an executors on the host (via 
`sched.executorsByHost`)... if we have an executor there we add this to the 
list of pending executors.
    
    This defers handling other types of hierarchical storage since the `cached` 
thing is hard coded in `TaskLocation`, but getting that working throughout all 
of Spark requires IMO a much larger design discussion. There are many open 
questions like whether we need to provide a richer type signature for 
`getPreferredLocations`, how delay scheduling will work, etc.
    
    Overall this proposal would be similar to what is there now, except you 
wouldn't add a new class called `PartitionLocation` (in lieu of just using the 
existing `TaskLocation`). Also, you'd do a binding in the `TaskSetManager` to 
specific executors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to