Github user cmccabe commented on the pull request:
https://github.com/apache/spark/pull/1486#issuecomment-53648074
Hi all. I'm uploading a new rev with Sandy's comments. I also took a stab
at implementing delay scheduling for HDFS-cached data, but the patch got a
little bigger than I would like, since it involved many changes to
TaskSetManager. I think the best thing to do is to get this change in now and
then work on the delay scheduling part in a follow-up JIRA.
I changed references to "cached" to "inmemory" to avoid confusion. We
already call a lot of things "cached" because they're in memory in the
executors. I also think that eventually we may want to extend
PartitionLocation to take into account other things like whether the location
is on SSD (faster) or archival storage (slower), so I made PartitionPriority an
enum.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]