[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1212 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-49200501 Thanks @lirui-intel merged finally :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-49211502 Hey @mridulm I usually won't merge any code in the scheduler unless @markhamstra or @kayhousterhout has looked at it and signed off, since they are the most active maintainers of this code. That might be a good practice to follow in the future. We can try to come up with a list of maintainers to make it more clear who should be consulted for code in various parts of Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-49214821 Thanks, but that is fine, I merged it in after I resolved my local hardware issues today. So did not need to impose on you to merge after all ! On 17-Jul-2014 12:33 am, Patrick Wendell notificati...@github.com wrote: Hey @mridulm https://github.com/mridulm I usually won't merge any code in the scheduler unless @markhamstra https://github.com/markhamstra or @kayhousterhout has looked at it and signed off, since they are the most active maintainers of this code. That might be a good practice to follow in the future. We can try to come up with a list of maintainers to make it more clear who should be consulted for code in various parts of Spark. â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/1212#issuecomment-49211502. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-49218616 @mridulm what I meant was that it would be good in the future if you try to have Mark or Kay look at patches in the scheduler code before you merge them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-49248835 Thanks everybody :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-49006034 hi @lirui-intel looks good to me ! Will merge when I get my laptop working again - unfortunate state of affairs :-) In meantime, if @pwendell or someone else could merge this, that would be great too ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-49125959 Thanks @mridulm and sorry for your laptop :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-48873878 Hi @mridulm , I've added some test case to capture schedule behavior of RACK_LOCAL tasks. Let me know if I got anything wrong. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-48776662 s/not/now/ :-) As in, to test the expected change in behavior for a specific host before rack_local host gets added the schedules should be to ANY; and after making rack local host is added, it becomes rack_local (with allowed levels reflecting this). This is to ensure that we do not regress on this behavior in future. The modified test case does check this in part and is equivalent based on current state of code : but would be good to have something which verifies this concretely to future-proof it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1212#discussion_r14552579 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -431,6 +442,10 @@ private[spark] class TaskSchedulerImpl( executorsByHost.contains(host) } + def hasHostOnRack(rack: String): Boolean = synchronized { --- End diff -- make this protected ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1212#discussion_r14552666 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -431,6 +442,10 @@ private[spark] class TaskSchedulerImpl( executorsByHost.contains(host) } + def hasHostOnRack(rack: String): Boolean = synchronized { --- End diff -- Maybe rename it to hasHostAliveOnRack to be clear on what it is doing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-48024156 Looks good, thanks ! Please add a specific testcase which tests the change in behavior : namely, what used to be ANY schedule earlier is not RACK_LOCAL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-48025283 Thanks @mridulm for the review! I don't quite get your point about the testcase though, could you please be more specific on what testcase should be added? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
GitHub user lirui-intel opened a pull request: https://github.com/apache/spark/pull/1212 SPARK-2277: make TaskScheduler track hosts on rack You can merge this pull request into a Git repository by running: $ git pull https://github.com/lirui-intel/spark trackHostOnRack Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1212.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1212 commit 79ac750154eb37e36fcb733559a35d66f043e31d Author: Rui Li rui...@intel.com Date: 2014-06-25T14:33:22Z SPARK-2277: make TaskScheduler track hosts on rack commit 5e4ef62b7a31ff2c3207a53959079b1acfe3d6fb Author: Rui Li rui...@intel.com Date: 2014-06-25T14:39:43Z SPARK-2277: remove unnecessary import --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-47111959 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---