[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

cmccabe Wed, 27 Aug 2014 15:09:13 -0700

Github user cmccabe commented on the pull request:

    https://github.com/apache/spark/pull/1486#issuecomment-53648074
  
    Hi all.  I'm uploading a new rev with Sandy's comments.  I also took a stab 
at implementing delay scheduling for HDFS-cached data, but the patch got a 
little bigger than I would like, since it involved many changes to 
TaskSetManager.  I think the best thing to do is to get this change in now and 
then work on the delay scheduling part in a follow-up JIRA.
    
    I changed references to "cached" to "inmemory" to avoid confusion.  We 
already call a lot of things "cached" because they're in memory in the 
executors.  I also think that eventually we may want to extend 
PartitionLocation to take into account other things like whether the location 
is on SSD (faster) or archival storage (slower), so I made PartitionPriority an 
enum.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

Reply via email to