[jira] [Created] (SPARK-2872) Fix conflict between code and doc in YarnClientSchedulerBackend
Zhihui created SPARK-2872: - Summary: Fix conflict between code and doc in YarnClientSchedulerBackend Key: SPARK-2872 URL: https://issues.apache.org/jira/browse/SPARK-2872 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.0.0 Reporter: Zhihui Doc say: system properties override environment variables. https://github.com/apache/spark/blob/master/yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala#L71 But code is conflict with it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2872) Fix conflict between code and doc in YarnClientSchedulerBackend
[ https://issues.apache.org/jira/browse/SPARK-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087120#comment-14087120 ] Zhihui commented on SPARK-2872: --- PR https://github.com/apache/spark/pull/1684 Fix conflict between code and doc in YarnClientSchedulerBackend --- Key: SPARK-2872 URL: https://issues.apache.org/jira/browse/SPARK-2872 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.0.0 Reporter: Zhihui Doc say: system properties override environment variables. https://github.com/apache/spark/blob/master/yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala#L71 But code is conflict with it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2713) Executors of same application in same host should only download files jars once
Zhihui created SPARK-2713: - Summary: Executors of same application in same host should only download files jars once Key: SPARK-2713 URL: https://issues.apache.org/jira/browse/SPARK-2713 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Zhihui If spark lunched multiple executors in one host for one application, every executor will download it dependent files and jars (if not using local: url) independently. It maybe result to huge latency. In my case, it result to 20 seconds latency to download dependent jars(about 17M) when I lunch 32 executors in one host(total 4 hosts). This patch will cache downloaded files and jars for executors to reduce network throughput and download latency. I my case, the latency was reduced from 20 seconds to less than 1 second. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2713) Executors of same application in same host should only download files jars once
[ https://issues.apache.org/jira/browse/SPARK-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihui updated SPARK-2713: -- Description: If Spark lunched multiple executors in one host for one application, every executor would download it dependent files and jars (if not using local: url) independently. It maybe result in huge latency. In my case, it result in 20 seconds latency to download dependent jars(about 17M) when I lunch 32 executors in one host(total 4 hosts). This patch will cache downloaded files and jars for executors to reduce network throughput and download latency. I my case, the latency was reduced from 20 seconds to less than 1 second. was: If spark lunched multiple executors in one host for one application, every executor will download it dependent files and jars (if not using local: url) independently. It maybe result to huge latency. In my case, it result to 20 seconds latency to download dependent jars(about 17M) when I lunch 32 executors in one host(total 4 hosts). This patch will cache downloaded files and jars for executors to reduce network throughput and download latency. I my case, the latency was reduced from 20 seconds to less than 1 second. Executors of same application in same host should only download files jars once - Key: SPARK-2713 URL: https://issues.apache.org/jira/browse/SPARK-2713 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Zhihui If Spark lunched multiple executors in one host for one application, every executor would download it dependent files and jars (if not using local: url) independently. It maybe result in huge latency. In my case, it result in 20 seconds latency to download dependent jars(about 17M) when I lunch 32 executors in one host(total 4 hosts). This patch will cache downloaded files and jars for executors to reduce network throughput and download latency. I my case, the latency was reduced from 20 seconds to less than 1 second. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2713) Executors of same application in same host should only download files jars once
[ https://issues.apache.org/jira/browse/SPARK-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075914#comment-14075914 ] Zhihui commented on SPARK-2713: --- PR https://github.com/apache/spark/pull/1616 Executors of same application in same host should only download files jars once - Key: SPARK-2713 URL: https://issues.apache.org/jira/browse/SPARK-2713 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Zhihui If Spark lunched multiple executors in one host for one application, every executor would download it dependent files and jars (if not using local: url) independently. It maybe result in huge latency. In my case, it result in 20 seconds latency to download dependent jars(about 17M) when I lunch 32 executors in one host(total 4 hosts). This patch will cache downloaded files and jars for executors to reduce network throughput and download latency. I my case, the latency was reduced from 20 seconds to less than 1 second. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2635) Fix race condition at SchedulerBackend.isReady in standalone mode
Zhihui created SPARK-2635: - Summary: Fix race condition at SchedulerBackend.isReady in standalone mode Key: SPARK-2635 URL: https://issues.apache.org/jira/browse/SPARK-2635 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Zhihui In SPARK-1946(PR #900), configuration spark.scheduler.minRegisteredExecutorsRatio was introduced. However, in standalone mode, there is a race condition where isReady() can return true because totalExpectedExecutors has not been correctly set. Because expected executors is uncertain in standalone mode, the PR try to use CPU cores(--total-executor-cores) as expected resources to judge whether SchedulerBackend is ready. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2635) Fix race condition at SchedulerBackend.isReady in standalone mode
[ https://issues.apache.org/jira/browse/SPARK-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071302#comment-14071302 ] Zhihui commented on SPARK-2635: --- PR https://github.com/apache/spark/pull/1525 Fix race condition at SchedulerBackend.isReady in standalone mode - Key: SPARK-2635 URL: https://issues.apache.org/jira/browse/SPARK-2635 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Zhihui In SPARK-1946(PR #900), configuration spark.scheduler.minRegisteredExecutorsRatio was introduced. However, in standalone mode, there is a race condition where isReady() can return true because totalExpectedExecutors has not been correctly set. Because expected executors is uncertain in standalone mode, the PR try to use CPU cores(--total-executor-cores) as expected resources to judge whether SchedulerBackend is ready. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.
Zhihui created SPARK-2555: - Summary: Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode. Key: SPARK-2555 URL: https://issues.apache.org/jira/browse/SPARK-2555 Project: Spark Issue Type: Improvement Components: Mesos, Spark Core Affects Versions: 1.0.0 Reporter: Zhihui -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.
[ https://issues.apache.org/jira/browse/SPARK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihui updated SPARK-2555: -- Description: In SPARK-1946, Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode. Key: SPARK-2555 URL: https://issues.apache.org/jira/browse/SPARK-2555 Project: Spark Issue Type: Improvement Components: Mesos, Spark Core Affects Versions: 1.0.0 Reporter: Zhihui In SPARK-1946, -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.
[ https://issues.apache.org/jira/browse/SPARK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihui updated SPARK-2555: -- Description: In SPARK-1946, configuration spark.scheduler.minRegisteredExecutorsRatio was introduced, but it only support Standalone and Yarn mode. This jira ticket try to introduce the configuration to Mesos mode. was:In SPARK-1946, Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode. Key: SPARK-2555 URL: https://issues.apache.org/jira/browse/SPARK-2555 Project: Spark Issue Type: Improvement Components: Mesos, Spark Core Affects Versions: 1.0.0 Reporter: Zhihui In SPARK-1946, configuration spark.scheduler.minRegisteredExecutorsRatio was introduced, but it only support Standalone and Yarn mode. This jira ticket try to introduce the configuration to Mesos mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.
[ https://issues.apache.org/jira/browse/SPARK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064743#comment-14064743 ] Zhihui commented on SPARK-2555: --- I submit a PR https://github.com/apache/spark/pull/1462 Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode. Key: SPARK-2555 URL: https://issues.apache.org/jira/browse/SPARK-2555 Project: Spark Issue Type: Improvement Components: Mesos, Spark Core Affects Versions: 1.0.0 Reporter: Zhihui In SPARK-1946, configuration spark.scheduler.minRegisteredExecutorsRatio was introduced, but it only support Standalone and Yarn mode. This jira ticket try to introduce the configuration to Mesos mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.
[ https://issues.apache.org/jira/browse/SPARK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihui updated SPARK-2555: -- Description: In SPARK-1946, configuration spark.scheduler.minRegisteredExecutorsRatio was introduced, but it only support Standalone and Yarn mode. This is try to introduce the configuration to Mesos mode. was: In SPARK-1946, configuration spark.scheduler.minRegisteredExecutorsRatio was introduced, but it only support Standalone and Yarn mode. This jira ticket try to introduce the configuration to Mesos mode. Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode. Key: SPARK-2555 URL: https://issues.apache.org/jira/browse/SPARK-2555 Project: Spark Issue Type: Improvement Components: Mesos, Spark Core Affects Versions: 1.0.0 Reporter: Zhihui In SPARK-1946, configuration spark.scheduler.minRegisteredExecutorsRatio was introduced, but it only support Standalone and Yarn mode. This is try to introduce the configuration to Mesos mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2193) Improve tasks‘ preferred locality by sorting tasks partial ordering
[ https://issues.apache.org/jira/browse/SPARK-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihui updated SPARK-2193: -- Description: Now, the last executor(s) maybe not get it’s preferred task(s), although these tasks have build in pendingTasksForHosts map. Because executers pick up tasks sequential, their preferred task(s) maybe picked up by other executors. This appearance can be eliminated by sorting tasks partial ordering. Executor pick up task by host’s order of task’s preferredLocation, that mean, executor firstly pick up all tasks which task.preferredLocations.1 = executor.hostName, then secondly… Improve tasks‘ preferred locality by sorting tasks partial ordering --- Key: SPARK-2193 URL: https://issues.apache.org/jira/browse/SPARK-2193 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.1.0 Reporter: Zhihui Now, the last executor(s) maybe not get it’s preferred task(s), although these tasks have build in pendingTasksForHosts map. Because executers pick up tasks sequential, their preferred task(s) maybe picked up by other executors. This appearance can be eliminated by sorting tasks partial ordering. Executor pick up task by host’s order of task’s preferredLocation, that mean, executor firstly pick up all tasks which task.preferredLocations.1 = executor.hostName, then secondly… -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2193) Improve tasks‘ preferred locality by sorting tasks partial ordering
[ https://issues.apache.org/jira/browse/SPARK-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihui updated SPARK-2193: -- Attachment: Improve Tasks Preferred Locality.pptx Improve tasks‘ preferred locality by sorting tasks partial ordering --- Key: SPARK-2193 URL: https://issues.apache.org/jira/browse/SPARK-2193 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.1.0 Reporter: Zhihui Attachments: Improve Tasks Preferred Locality.pptx Now, the last executor(s) maybe not get it’s preferred task(s), although these tasks have build in pendingTasksForHosts map. Because executers pick up tasks sequential, their preferred task(s) maybe picked up by other executors. This appearance can be eliminated by sorting tasks partial ordering. Executor pick up task by host’s order of task’s preferredLocation, that mean, executor firstly pick up all tasks which task.preferredLocations.1 = executor.hostName, then secondly… -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2193) Improve tasks‘ preferred locality by sorting tasks partial ordering
[ https://issues.apache.org/jira/browse/SPARK-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037121#comment-14037121 ] Zhihui commented on SPARK-2193: --- PR 1131 https://github.com/apache/spark/pull/1131 Improve tasks‘ preferred locality by sorting tasks partial ordering --- Key: SPARK-2193 URL: https://issues.apache.org/jira/browse/SPARK-2193 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.1.0 Reporter: Zhihui Attachments: Improve Tasks Preferred Locality.pptx Now, the last executor(s) maybe not get it’s preferred task(s), although these tasks have build in pendingTasksForHosts map. Because executers pick up tasks sequential, their preferred task(s) maybe picked up by other executors. This appearance can be eliminated by sorting tasks partial ordering. Executor pick up task by host’s order of task’s preferredLocation, that mean, executor firstly pick up all tasks which task.preferredLocations.1 = executor.hostName, then secondly… -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1946) Submit stage after executors have been registered
[ https://issues.apache.org/jira/browse/SPARK-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihui updated SPARK-1946: -- Description: Because creating TaskSetManager and registering executors are asynchronous, if running job without enough executors, it will lead to some issues * early stages' tasks run without preferred locality. * the default parallelism in yarn is based on number of executors, * the number of intermediate files per node for shuffle (this can bring the node down btw) * and amount of memory consumed on a node for rdd MEMORY persisted data (making the job fail if disk is not specified : like some of the mllib algos ?) * and so on ... (thanks [~mridulm80] 's [comments | https://github.com/apache/spark/pull/900#issuecomment-45780405]) A simple solution is sleeping few seconds in application, so that executors have enough time to register. A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties. \# submit stage only after successfully registered executors arrived the ratio, default value 0 in Standalone mode and 0.9 in Yarn mode spark.scheduler.minRegisteredRatio = 0.8 \# whatever registered number is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 1 spark.scheduler.maxRegisteredWaitingTime = 5000 was: Because creating TaskSetManager and registering executors are asynchronous, if running job without enough executors, it will lead to some issues * early stages' tasks run without preferred locality. * the default parallelism in yarn is based on number of executors, * the number of intermediate files per node for shuffle (this can bring the node down btw) * and amount of memory consumed on a node for rdd MEMORY persisted data (making the job fail if disk is not specified : like some of the mllib algos ?) * and so on ... (thanks [~mridulm80] 's [comments | https://github.com/apache/spark/pull/900#issuecomment-45780405]) A simple solution is sleeping few seconds in application, so that executors have enough time to register. A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties. \# submit stage only after successfully registered executors arrived the number, default value 0 spark.executor.minRegisteredNum = 20 \# whatever registeredRatio is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 1 spark.executor.maxRegisteredWaitingTime = 5000 Submit stage after executors have been registered - Key: SPARK-1946 URL: https://issues.apache.org/jira/browse/SPARK-1946 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Zhihui Attachments: Spark Task Scheduler Optimization Proposal.pptx Because creating TaskSetManager and registering executors are asynchronous, if running job without enough executors, it will lead to some issues * early stages' tasks run without preferred locality. * the default parallelism in yarn is based on number of executors, * the number of intermediate files per node for shuffle (this can bring the node down btw) * and amount of memory consumed on a node for rdd MEMORY persisted data (making the job fail if disk is not specified : like some of the mllib algos ?) * and so on ... (thanks [~mridulm80] 's [comments | https://github.com/apache/spark/pull/900#issuecomment-45780405]) A simple solution is sleeping few seconds in application, so that executors have enough time to register. A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties. \# submit stage only after successfully registered executors arrived the ratio, default value 0 in Standalone mode and 0.9 in Yarn mode spark.scheduler.minRegisteredRatio = 0.8 \# whatever registered number is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 1 spark.scheduler.maxRegisteredWaitingTime = 5000 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1946) Submit stage after executors have been registered
[ https://issues.apache.org/jira/browse/SPARK-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihui updated SPARK-1946: -- Description: Because creating TaskSetManager and registering executors are asynchronous, if running job without enough executors, it will lead to some issues * early stages' tasks run without preferred locality. * the default parallelism in yarn is based on number of executors, * the number of intermediate files per node for shuffle (this can bring the node down btw) * and amount of memory consumed on a node for rdd MEMORY persisted data (making the job fail if disk is not specified : like some of the mllib algos ?) * and so on ... A simple solution is sleeping few seconds in application, so that executors have enough time to register. A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties. \# submit stage only after successfully registered executors arrived the ratio, default value 0 spark.executor.registeredRatio = 0.8 \# whatever registeredRatio is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 1 spark.executor.maxRegisteredWaitingTime = 5000 was: Because creating TaskSetManager and registering executors are asynchronous, in most situation, early stages' tasks run without preferred locality. A simple solution is sleeping few seconds in application, so that executors have enough time to register. A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties. \# submit stage only after successfully registered executors arrived the ratio, default value 0 spark.executor.registeredRatio = 0.8 \# whatever registeredRatio is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 1 spark.executor.maxRegisteredWaitingTime = 5000 Submit stage after executors have been registered - Key: SPARK-1946 URL: https://issues.apache.org/jira/browse/SPARK-1946 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Zhihui Attachments: Spark Task Scheduler Optimization Proposal.pptx Because creating TaskSetManager and registering executors are asynchronous, if running job without enough executors, it will lead to some issues * early stages' tasks run without preferred locality. * the default parallelism in yarn is based on number of executors, * the number of intermediate files per node for shuffle (this can bring the node down btw) * and amount of memory consumed on a node for rdd MEMORY persisted data (making the job fail if disk is not specified : like some of the mllib algos ?) * and so on ... A simple solution is sleeping few seconds in application, so that executors have enough time to register. A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties. \# submit stage only after successfully registered executors arrived the ratio, default value 0 spark.executor.registeredRatio = 0.8 \# whatever registeredRatio is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 1 spark.executor.maxRegisteredWaitingTime = 5000 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1946) Submit stage after executors have been registered
[ https://issues.apache.org/jira/browse/SPARK-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihui updated SPARK-1946: -- Description: Because creating TaskSetManager and registering executors are asynchronous, if running job without enough executors, it will lead to some issues * early stages' tasks run without preferred locality. * the default parallelism in yarn is based on number of executors, * the number of intermediate files per node for shuffle (this can bring the node down btw) * and amount of memory consumed on a node for rdd MEMORY persisted data (making the job fail if disk is not specified : like some of the mllib algos ?) * and so on ... (thanks [~mridulm80] 's [comments | https://github.com/apache/spark/pull/900#issuecomment-45780405]) A simple solution is sleeping few seconds in application, so that executors have enough time to register. A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties. \# submit stage only after successfully registered executors arrived the number, default value 0 spark.executor.minRegisteredNum = 20 \# whatever registeredRatio is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 1 spark.executor.maxRegisteredWaitingTime = 5000 was: Because creating TaskSetManager and registering executors are asynchronous, if running job without enough executors, it will lead to some issues * early stages' tasks run without preferred locality. * the default parallelism in yarn is based on number of executors, * the number of intermediate files per node for shuffle (this can bring the node down btw) * and amount of memory consumed on a node for rdd MEMORY persisted data (making the job fail if disk is not specified : like some of the mllib algos ?) * and so on ... A simple solution is sleeping few seconds in application, so that executors have enough time to register. A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties. \# submit stage only after successfully registered executors arrived the ratio, default value 0 spark.executor.registeredRatio = 0.8 \# whatever registeredRatio is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 1 spark.executor.maxRegisteredWaitingTime = 5000 Submit stage after executors have been registered - Key: SPARK-1946 URL: https://issues.apache.org/jira/browse/SPARK-1946 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Zhihui Attachments: Spark Task Scheduler Optimization Proposal.pptx Because creating TaskSetManager and registering executors are asynchronous, if running job without enough executors, it will lead to some issues * early stages' tasks run without preferred locality. * the default parallelism in yarn is based on number of executors, * the number of intermediate files per node for shuffle (this can bring the node down btw) * and amount of memory consumed on a node for rdd MEMORY persisted data (making the job fail if disk is not specified : like some of the mllib algos ?) * and so on ... (thanks [~mridulm80] 's [comments | https://github.com/apache/spark/pull/900#issuecomment-45780405]) A simple solution is sleeping few seconds in application, so that executors have enough time to register. A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties. \# submit stage only after successfully registered executors arrived the number, default value 0 spark.executor.minRegisteredNum = 20 \# whatever registeredRatio is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 1 spark.executor.maxRegisteredWaitingTime = 5000 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1946) Submit stage after executors have been registered
Zhihui created SPARK-1946: - Summary: Submit stage after executors have been registered Key: SPARK-1946 URL: https://issues.apache.org/jira/browse/SPARK-1946 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Zhihui Because creating TaskSetManager and registering executors are asynchronous, in most situation, early stages' tasks run without preferred locality. A simple solution is sleeping few seconds in application, so that executors have enough time to register. A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties. # submit stage only after successfully registered executors arrived the ratio, default value 0 spark.executor.registeredRatio = 0.8 # whatever registeredRatio is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 1 spark.executor.maxRegisteredWaitingTime = 5000 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1946) Submit stage after executors have been registered
[ https://issues.apache.org/jira/browse/SPARK-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihui updated SPARK-1946: -- Attachment: Spark Task Scheduler Optimization Proposal.pptx Submit stage after executors have been registered - Key: SPARK-1946 URL: https://issues.apache.org/jira/browse/SPARK-1946 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Zhihui Attachments: Spark Task Scheduler Optimization Proposal.pptx Because creating TaskSetManager and registering executors are asynchronous, in most situation, early stages' tasks run without preferred locality. A simple solution is sleeping few seconds in application, so that executors have enough time to register. A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties. # submit stage only after successfully registered executors arrived the ratio, default value 0 spark.executor.registeredRatio = 0.8 # whatever registeredRatio is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 1 spark.executor.maxRegisteredWaitingTime = 5000 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1946) Submit stage after executors have been registered
[ https://issues.apache.org/jira/browse/SPARK-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihui updated SPARK-1946: -- Description: Because creating TaskSetManager and registering executors are asynchronous, in most situation, early stages' tasks run without preferred locality. A simple solution is sleeping few seconds in application, so that executors have enough time to register. A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties. \# submit stage only after successfully registered executors arrived the ratio, default value 0 spark.executor.registeredRatio = 0.8 \# whatever registeredRatio is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 1 spark.executor.maxRegisteredWaitingTime = 5000 was: Because creating TaskSetManager and registering executors are asynchronous, in most situation, early stages' tasks run without preferred locality. A simple solution is sleeping few seconds in application, so that executors have enough time to register. A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties. # submit stage only after successfully registered executors arrived the ratio, default value 0 spark.executor.registeredRatio = 0.8 # whatever registeredRatio is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 1 spark.executor.maxRegisteredWaitingTime = 5000 Submit stage after executors have been registered - Key: SPARK-1946 URL: https://issues.apache.org/jira/browse/SPARK-1946 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Zhihui Attachments: Spark Task Scheduler Optimization Proposal.pptx Because creating TaskSetManager and registering executors are asynchronous, in most situation, early stages' tasks run without preferred locality. A simple solution is sleeping few seconds in application, so that executors have enough time to register. A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties. \# submit stage only after successfully registered executors arrived the ratio, default value 0 spark.executor.registeredRatio = 0.8 \# whatever registeredRatio is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 1 spark.executor.maxRegisteredWaitingTime = 5000 -- This message was sent by Atlassian JIRA (v6.2#6252)