[jira] [Created] (SPARK-2872) Fix conflict between code and doc in YarnClientSchedulerBackend

2014-08-05 Thread Zhihui (JIRA)
Zhihui created SPARK-2872:
-

 Summary: Fix conflict between code and doc in 
YarnClientSchedulerBackend
 Key: SPARK-2872
 URL: https://issues.apache.org/jira/browse/SPARK-2872
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.0.0
Reporter: Zhihui


Doc say: system properties override environment variables.
https://github.com/apache/spark/blob/master/yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala#L71

But code is conflict with it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2872) Fix conflict between code and doc in YarnClientSchedulerBackend

2014-08-05 Thread Zhihui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087120#comment-14087120
 ] 

Zhihui commented on SPARK-2872:
---

PR https://github.com/apache/spark/pull/1684

 Fix conflict between code and doc in YarnClientSchedulerBackend
 ---

 Key: SPARK-2872
 URL: https://issues.apache.org/jira/browse/SPARK-2872
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.0.0
Reporter: Zhihui

 Doc say: system properties override environment variables.
 https://github.com/apache/spark/blob/master/yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala#L71
 But code is conflict with it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-2713) Executors of same application in same host should only download files jars once

2014-07-27 Thread Zhihui (JIRA)
Zhihui created SPARK-2713:
-

 Summary: Executors of same application in same host should only 
download files  jars once
 Key: SPARK-2713
 URL: https://issues.apache.org/jira/browse/SPARK-2713
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Zhihui


If spark lunched multiple executors in one host for one application, every 
executor will download it dependent files and jars (if not using local: url) 
independently. It maybe result to huge latency. In my case, it result to 20 
seconds latency to download dependent jars(about 17M) when I lunch 32 executors 
in one host(total 4 hosts). 

This patch will cache downloaded files and jars for executors to reduce network 
throughput and download latency. I my case, the latency was reduced from 20 
seconds to less than 1 second.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2713) Executors of same application in same host should only download files jars once

2014-07-27 Thread Zhihui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihui updated SPARK-2713:
--

Description: 
If Spark lunched multiple executors in one host for one application, every 
executor would download it dependent files and jars (if not using local: url) 
independently. It maybe result in huge latency. In my case, it result in 20 
seconds latency to download dependent jars(about 17M) when I lunch 32 executors 
in one host(total 4 hosts). 

This patch will cache downloaded files and jars for executors to reduce network 
throughput and download latency. I my case, the latency was reduced from 20 
seconds to less than 1 second.

  was:
If spark lunched multiple executors in one host for one application, every 
executor will download it dependent files and jars (if not using local: url) 
independently. It maybe result to huge latency. In my case, it result to 20 
seconds latency to download dependent jars(about 17M) when I lunch 32 executors 
in one host(total 4 hosts). 

This patch will cache downloaded files and jars for executors to reduce network 
throughput and download latency. I my case, the latency was reduced from 20 
seconds to less than 1 second.


 Executors of same application in same host should only download files  jars 
 once
 -

 Key: SPARK-2713
 URL: https://issues.apache.org/jira/browse/SPARK-2713
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Zhihui

 If Spark lunched multiple executors in one host for one application, every 
 executor would download it dependent files and jars (if not using local: url) 
 independently. It maybe result in huge latency. In my case, it result in 20 
 seconds latency to download dependent jars(about 17M) when I lunch 32 
 executors in one host(total 4 hosts). 
 This patch will cache downloaded files and jars for executors to reduce 
 network throughput and download latency. I my case, the latency was reduced 
 from 20 seconds to less than 1 second.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2713) Executors of same application in same host should only download files jars once

2014-07-27 Thread Zhihui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075914#comment-14075914
 ] 

Zhihui commented on SPARK-2713:
---

PR https://github.com/apache/spark/pull/1616

 Executors of same application in same host should only download files  jars 
 once
 -

 Key: SPARK-2713
 URL: https://issues.apache.org/jira/browse/SPARK-2713
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Zhihui

 If Spark lunched multiple executors in one host for one application, every 
 executor would download it dependent files and jars (if not using local: url) 
 independently. It maybe result in huge latency. In my case, it result in 20 
 seconds latency to download dependent jars(about 17M) when I lunch 32 
 executors in one host(total 4 hosts). 
 This patch will cache downloaded files and jars for executors to reduce 
 network throughput and download latency. I my case, the latency was reduced 
 from 20 seconds to less than 1 second.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-2635) Fix race condition at SchedulerBackend.isReady in standalone mode

2014-07-22 Thread Zhihui (JIRA)
Zhihui created SPARK-2635:
-

 Summary: Fix race condition at SchedulerBackend.isReady in 
standalone mode
 Key: SPARK-2635
 URL: https://issues.apache.org/jira/browse/SPARK-2635
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Zhihui


In SPARK-1946(PR #900), configuration 
spark.scheduler.minRegisteredExecutorsRatio was introduced. However, in 
standalone mode, there is a race condition where isReady() can return true 
because totalExpectedExecutors has not been correctly set.

Because expected executors is uncertain in standalone mode, the PR try to use 
CPU cores(--total-executor-cores) as expected resources to judge whether 
SchedulerBackend is ready.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2635) Fix race condition at SchedulerBackend.isReady in standalone mode

2014-07-22 Thread Zhihui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071302#comment-14071302
 ] 

Zhihui commented on SPARK-2635:
---

PR https://github.com/apache/spark/pull/1525

 Fix race condition at SchedulerBackend.isReady in standalone mode
 -

 Key: SPARK-2635
 URL: https://issues.apache.org/jira/browse/SPARK-2635
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Zhihui

 In SPARK-1946(PR #900), configuration 
 spark.scheduler.minRegisteredExecutorsRatio was introduced. However, in 
 standalone mode, there is a race condition where isReady() can return true 
 because totalExpectedExecutors has not been correctly set.
 Because expected executors is uncertain in standalone mode, the PR try to use 
 CPU cores(--total-executor-cores) as expected resources to judge whether 
 SchedulerBackend is ready.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.

2014-07-17 Thread Zhihui (JIRA)
Zhihui created SPARK-2555:
-

 Summary: Support configuration 
spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.
 Key: SPARK-2555
 URL: https://issues.apache.org/jira/browse/SPARK-2555
 Project: Spark
  Issue Type: Improvement
  Components: Mesos, Spark Core
Affects Versions: 1.0.0
Reporter: Zhihui






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.

2014-07-17 Thread Zhihui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihui updated SPARK-2555:
--

Description: In SPARK-1946, 

 Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos 
 mode.
 

 Key: SPARK-2555
 URL: https://issues.apache.org/jira/browse/SPARK-2555
 Project: Spark
  Issue Type: Improvement
  Components: Mesos, Spark Core
Affects Versions: 1.0.0
Reporter: Zhihui

 In SPARK-1946, 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.

2014-07-17 Thread Zhihui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihui updated SPARK-2555:
--

Description: 
In SPARK-1946, configuration spark.scheduler.minRegisteredExecutorsRatio was 
introduced, but it only support  Standalone and Yarn mode.
This jira ticket try to introduce the configuration to Mesos mode.

  was:In SPARK-1946, 


 Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos 
 mode.
 

 Key: SPARK-2555
 URL: https://issues.apache.org/jira/browse/SPARK-2555
 Project: Spark
  Issue Type: Improvement
  Components: Mesos, Spark Core
Affects Versions: 1.0.0
Reporter: Zhihui

 In SPARK-1946, configuration spark.scheduler.minRegisteredExecutorsRatio was 
 introduced, but it only support  Standalone and Yarn mode.
 This jira ticket try to introduce the configuration to Mesos mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.

2014-07-17 Thread Zhihui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064743#comment-14064743
 ] 

Zhihui commented on SPARK-2555:
---

I submit a PR https://github.com/apache/spark/pull/1462


 Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos 
 mode.
 

 Key: SPARK-2555
 URL: https://issues.apache.org/jira/browse/SPARK-2555
 Project: Spark
  Issue Type: Improvement
  Components: Mesos, Spark Core
Affects Versions: 1.0.0
Reporter: Zhihui

 In SPARK-1946, configuration spark.scheduler.minRegisteredExecutorsRatio was 
 introduced, but it only support  Standalone and Yarn mode.
 This jira ticket try to introduce the configuration to Mesos mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.

2014-07-17 Thread Zhihui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihui updated SPARK-2555:
--

Description: 
In SPARK-1946, configuration spark.scheduler.minRegisteredExecutorsRatio was 
introduced, but it only support  Standalone and Yarn mode.
This is try to introduce the configuration to Mesos mode.

  was:
In SPARK-1946, configuration spark.scheduler.minRegisteredExecutorsRatio was 
introduced, but it only support  Standalone and Yarn mode.
This jira ticket try to introduce the configuration to Mesos mode.


 Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos 
 mode.
 

 Key: SPARK-2555
 URL: https://issues.apache.org/jira/browse/SPARK-2555
 Project: Spark
  Issue Type: Improvement
  Components: Mesos, Spark Core
Affects Versions: 1.0.0
Reporter: Zhihui

 In SPARK-1946, configuration spark.scheduler.minRegisteredExecutorsRatio was 
 introduced, but it only support  Standalone and Yarn mode.
 This is try to introduce the configuration to Mesos mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2193) Improve tasks‘ preferred locality by sorting tasks partial ordering

2014-06-19 Thread Zhihui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihui updated SPARK-2193:
--

Description: 
Now, the last executor(s) maybe not get it’s preferred task(s), although these 
tasks have build in pendingTasksForHosts map. Because executers pick up tasks 
sequential, their preferred task(s) maybe picked up by other executors.

This appearance can be eliminated by sorting tasks partial ordering. Executor 
pick up task by host’s order of task’s preferredLocation, that mean, executor 
firstly pick up all tasks which task.preferredLocations.1 = executor.hostName, 
then secondly…  


 Improve tasks‘ preferred locality by sorting tasks partial ordering
 ---

 Key: SPARK-2193
 URL: https://issues.apache.org/jira/browse/SPARK-2193
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Zhihui

 Now, the last executor(s) maybe not get it’s preferred task(s), although 
 these tasks have build in pendingTasksForHosts map. Because executers pick up 
 tasks sequential, their preferred task(s) maybe picked up by other executors.
 This appearance can be eliminated by sorting tasks partial ordering. Executor 
 pick up task by host’s order of task’s preferredLocation, that mean, executor 
 firstly pick up all tasks which task.preferredLocations.1 = 
 executor.hostName, then secondly…  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2193) Improve tasks‘ preferred locality by sorting tasks partial ordering

2014-06-19 Thread Zhihui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihui updated SPARK-2193:
--

Attachment: Improve Tasks Preferred Locality.pptx

 Improve tasks‘ preferred locality by sorting tasks partial ordering
 ---

 Key: SPARK-2193
 URL: https://issues.apache.org/jira/browse/SPARK-2193
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Zhihui
 Attachments: Improve Tasks Preferred Locality.pptx


 Now, the last executor(s) maybe not get it’s preferred task(s), although 
 these tasks have build in pendingTasksForHosts map. Because executers pick up 
 tasks sequential, their preferred task(s) maybe picked up by other executors.
 This appearance can be eliminated by sorting tasks partial ordering. Executor 
 pick up task by host’s order of task’s preferredLocation, that mean, executor 
 firstly pick up all tasks which task.preferredLocations.1 = 
 executor.hostName, then secondly…  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2193) Improve tasks‘ preferred locality by sorting tasks partial ordering

2014-06-19 Thread Zhihui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037121#comment-14037121
 ] 

Zhihui commented on SPARK-2193:
---

PR 1131 https://github.com/apache/spark/pull/1131

 Improve tasks‘ preferred locality by sorting tasks partial ordering
 ---

 Key: SPARK-2193
 URL: https://issues.apache.org/jira/browse/SPARK-2193
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Zhihui
 Attachments: Improve Tasks Preferred Locality.pptx


 Now, the last executor(s) maybe not get it’s preferred task(s), although 
 these tasks have build in pendingTasksForHosts map. Because executers pick up 
 tasks sequential, their preferred task(s) maybe picked up by other executors.
 This appearance can be eliminated by sorting tasks partial ordering. Executor 
 pick up task by host’s order of task’s preferredLocation, that mean, executor 
 firstly pick up all tasks which task.preferredLocations.1 = 
 executor.hostName, then secondly…  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1946) Submit stage after executors have been registered

2014-06-16 Thread Zhihui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihui updated SPARK-1946:
--

Description: 
Because creating TaskSetManager and registering executors are asynchronous, if 
running job without enough executors, it will lead to some issues
* early stages' tasks run without preferred locality.
* the default parallelism in yarn is based on number of executors, 
* the number of intermediate files per node for shuffle (this can bring the 
node down btw)
* and amount of memory consumed on a node for rdd MEMORY persisted data (making 
the job fail if disk is not specified : like some of the mllib algos ?)
* and so on ...
(thanks [~mridulm80] 's [comments | 
https://github.com/apache/spark/pull/900#issuecomment-45780405])

A simple solution is sleeping few seconds in application, so that executors 
have enough time to register.

A better way is to make DAGScheduler submit stage after a few of executors have 
been registered by configuration properties.

\# submit stage only after successfully registered executors arrived the ratio, 
default value 0 in Standalone mode and 0.9 in Yarn mode
spark.scheduler.minRegisteredRatio = 0.8

\# whatever registered number is arrived, submit stage after the 
maxRegisteredWaitingTime(millisecond), default value 1
spark.scheduler.maxRegisteredWaitingTime = 5000

  was:
Because creating TaskSetManager and registering executors are asynchronous, if 
running job without enough executors, it will lead to some issues
* early stages' tasks run without preferred locality.
* the default parallelism in yarn is based on number of executors, 
* the number of intermediate files per node for shuffle (this can bring the 
node down btw)
* and amount of memory consumed on a node for rdd MEMORY persisted data (making 
the job fail if disk is not specified : like some of the mllib algos ?)
* and so on ...
(thanks [~mridulm80] 's [comments | 
https://github.com/apache/spark/pull/900#issuecomment-45780405])

A simple solution is sleeping few seconds in application, so that executors 
have enough time to register.

A better way is to make DAGScheduler submit stage after a few of executors have 
been registered by configuration properties.

\# submit stage only after successfully registered executors arrived the 
number, default value 0
spark.executor.minRegisteredNum = 20

\# whatever registeredRatio is arrived, submit stage after the 
maxRegisteredWaitingTime(millisecond), default value 1
spark.executor.maxRegisteredWaitingTime = 5000


 Submit stage after executors have been registered
 -

 Key: SPARK-1946
 URL: https://issues.apache.org/jira/browse/SPARK-1946
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Zhihui
 Attachments: Spark Task Scheduler Optimization Proposal.pptx


 Because creating TaskSetManager and registering executors are asynchronous, 
 if running job without enough executors, it will lead to some issues
 * early stages' tasks run without preferred locality.
 * the default parallelism in yarn is based on number of executors, 
 * the number of intermediate files per node for shuffle (this can bring the 
 node down btw)
 * and amount of memory consumed on a node for rdd MEMORY persisted data 
 (making the job fail if disk is not specified : like some of the mllib algos 
 ?)
 * and so on ...
 (thanks [~mridulm80] 's [comments | 
 https://github.com/apache/spark/pull/900#issuecomment-45780405])
 A simple solution is sleeping few seconds in application, so that executors 
 have enough time to register.
 A better way is to make DAGScheduler submit stage after a few of executors 
 have been registered by configuration properties.
 \# submit stage only after successfully registered executors arrived the 
 ratio, default value 0 in Standalone mode and 0.9 in Yarn mode
 spark.scheduler.minRegisteredRatio = 0.8
 \# whatever registered number is arrived, submit stage after the 
 maxRegisteredWaitingTime(millisecond), default value 1
 spark.scheduler.maxRegisteredWaitingTime = 5000



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1946) Submit stage after executors have been registered

2014-06-12 Thread Zhihui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihui updated SPARK-1946:
--

Description: 
Because creating TaskSetManager and registering executors are asynchronous, if 
running job without enough executors, it will lead to some issues
* early stages' tasks run without preferred locality.
* the default parallelism in yarn is based on number of executors, 
* the number of intermediate files per node for shuffle (this can bring the 
node down btw)
* and amount of memory consumed on a node for rdd MEMORY persisted data (making 
the job fail if disk is not specified : like some of the mllib algos ?)
* and so on ...


A simple solution is sleeping few seconds in application, so that executors 
have enough time to register.

A better way is to make DAGScheduler submit stage after a few of executors have 
been registered by configuration properties.

\# submit stage only after successfully registered executors arrived the ratio, 
default value 0
spark.executor.registeredRatio = 0.8

\# whatever registeredRatio is arrived, submit stage after the 
maxRegisteredWaitingTime(millisecond), default value 1
spark.executor.maxRegisteredWaitingTime = 5000

  was:
Because creating TaskSetManager and registering executors are asynchronous, in 
most situation, early stages' tasks run without preferred locality.

A simple solution is sleeping few seconds in application, so that executors 
have enough time to register.

A better way is to make DAGScheduler submit stage after a few of executors have 
been registered by configuration properties.

\# submit stage only after successfully registered executors arrived the ratio, 
default value 0
spark.executor.registeredRatio = 0.8

\# whatever registeredRatio is arrived, submit stage after the 
maxRegisteredWaitingTime(millisecond), default value 1
spark.executor.maxRegisteredWaitingTime = 5000



 Submit stage after executors have been registered
 -

 Key: SPARK-1946
 URL: https://issues.apache.org/jira/browse/SPARK-1946
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Zhihui
 Attachments: Spark Task Scheduler Optimization Proposal.pptx


 Because creating TaskSetManager and registering executors are asynchronous, 
 if running job without enough executors, it will lead to some issues
 * early stages' tasks run without preferred locality.
 * the default parallelism in yarn is based on number of executors, 
 * the number of intermediate files per node for shuffle (this can bring the 
 node down btw)
 * and amount of memory consumed on a node for rdd MEMORY persisted data 
 (making the job fail if disk is not specified : like some of the mllib algos 
 ?)
 * and so on ...
 A simple solution is sleeping few seconds in application, so that executors 
 have enough time to register.
 A better way is to make DAGScheduler submit stage after a few of executors 
 have been registered by configuration properties.
 \# submit stage only after successfully registered executors arrived the 
 ratio, default value 0
 spark.executor.registeredRatio = 0.8
 \# whatever registeredRatio is arrived, submit stage after the 
 maxRegisteredWaitingTime(millisecond), default value 1
 spark.executor.maxRegisteredWaitingTime = 5000



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1946) Submit stage after executors have been registered

2014-06-12 Thread Zhihui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihui updated SPARK-1946:
--

Description: 
Because creating TaskSetManager and registering executors are asynchronous, if 
running job without enough executors, it will lead to some issues
* early stages' tasks run without preferred locality.
* the default parallelism in yarn is based on number of executors, 
* the number of intermediate files per node for shuffle (this can bring the 
node down btw)
* and amount of memory consumed on a node for rdd MEMORY persisted data (making 
the job fail if disk is not specified : like some of the mllib algos ?)
* and so on ...
(thanks [~mridulm80] 's [comments | 
https://github.com/apache/spark/pull/900#issuecomment-45780405])

A simple solution is sleeping few seconds in application, so that executors 
have enough time to register.

A better way is to make DAGScheduler submit stage after a few of executors have 
been registered by configuration properties.

\# submit stage only after successfully registered executors arrived the 
number, default value 0
spark.executor.minRegisteredNum = 20

\# whatever registeredRatio is arrived, submit stage after the 
maxRegisteredWaitingTime(millisecond), default value 1
spark.executor.maxRegisteredWaitingTime = 5000

  was:
Because creating TaskSetManager and registering executors are asynchronous, if 
running job without enough executors, it will lead to some issues
* early stages' tasks run without preferred locality.
* the default parallelism in yarn is based on number of executors, 
* the number of intermediate files per node for shuffle (this can bring the 
node down btw)
* and amount of memory consumed on a node for rdd MEMORY persisted data (making 
the job fail if disk is not specified : like some of the mllib algos ?)
* and so on ...


A simple solution is sleeping few seconds in application, so that executors 
have enough time to register.

A better way is to make DAGScheduler submit stage after a few of executors have 
been registered by configuration properties.

\# submit stage only after successfully registered executors arrived the ratio, 
default value 0
spark.executor.registeredRatio = 0.8

\# whatever registeredRatio is arrived, submit stage after the 
maxRegisteredWaitingTime(millisecond), default value 1
spark.executor.maxRegisteredWaitingTime = 5000


 Submit stage after executors have been registered
 -

 Key: SPARK-1946
 URL: https://issues.apache.org/jira/browse/SPARK-1946
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Zhihui
 Attachments: Spark Task Scheduler Optimization Proposal.pptx


 Because creating TaskSetManager and registering executors are asynchronous, 
 if running job without enough executors, it will lead to some issues
 * early stages' tasks run without preferred locality.
 * the default parallelism in yarn is based on number of executors, 
 * the number of intermediate files per node for shuffle (this can bring the 
 node down btw)
 * and amount of memory consumed on a node for rdd MEMORY persisted data 
 (making the job fail if disk is not specified : like some of the mllib algos 
 ?)
 * and so on ...
 (thanks [~mridulm80] 's [comments | 
 https://github.com/apache/spark/pull/900#issuecomment-45780405])
 A simple solution is sleeping few seconds in application, so that executors 
 have enough time to register.
 A better way is to make DAGScheduler submit stage after a few of executors 
 have been registered by configuration properties.
 \# submit stage only after successfully registered executors arrived the 
 number, default value 0
 spark.executor.minRegisteredNum = 20
 \# whatever registeredRatio is arrived, submit stage after the 
 maxRegisteredWaitingTime(millisecond), default value 1
 spark.executor.maxRegisteredWaitingTime = 5000



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1946) Submit stage after executors have been registered

2014-05-28 Thread Zhihui (JIRA)
Zhihui created SPARK-1946:
-

 Summary: Submit stage after executors have been registered
 Key: SPARK-1946
 URL: https://issues.apache.org/jira/browse/SPARK-1946
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Zhihui


Because creating TaskSetManager and registering executors are asynchronous, in 
most situation, early stages' tasks run without preferred locality.

A simple solution is sleeping few seconds in application, so that executors 
have enough time to register.

A better way is to make DAGScheduler submit stage after a few of executors have 
been registered by configuration properties.

# submit stage only after successfully registered executors arrived the ratio, 
default value 0
spark.executor.registeredRatio = 0.8

# whatever registeredRatio is arrived, submit stage after the 
maxRegisteredWaitingTime(millisecond), default value 1
spark.executor.maxRegisteredWaitingTime = 5000




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1946) Submit stage after executors have been registered

2014-05-28 Thread Zhihui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihui updated SPARK-1946:
--

Attachment: Spark Task Scheduler Optimization Proposal.pptx

 Submit stage after executors have been registered
 -

 Key: SPARK-1946
 URL: https://issues.apache.org/jira/browse/SPARK-1946
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Zhihui
 Attachments: Spark Task Scheduler Optimization Proposal.pptx


 Because creating TaskSetManager and registering executors are asynchronous, 
 in most situation, early stages' tasks run without preferred locality.
 A simple solution is sleeping few seconds in application, so that executors 
 have enough time to register.
 A better way is to make DAGScheduler submit stage after a few of executors 
 have been registered by configuration properties.
 # submit stage only after successfully registered executors arrived the 
 ratio, default value 0
 spark.executor.registeredRatio = 0.8
 # whatever registeredRatio is arrived, submit stage after the 
 maxRegisteredWaitingTime(millisecond), default value 1
 spark.executor.maxRegisteredWaitingTime = 5000



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1946) Submit stage after executors have been registered

2014-05-28 Thread Zhihui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihui updated SPARK-1946:
--

Description: 
Because creating TaskSetManager and registering executors are asynchronous, in 
most situation, early stages' tasks run without preferred locality.

A simple solution is sleeping few seconds in application, so that executors 
have enough time to register.

A better way is to make DAGScheduler submit stage after a few of executors have 
been registered by configuration properties.

\# submit stage only after successfully registered executors arrived the ratio, 
default value 0
spark.executor.registeredRatio = 0.8

\# whatever registeredRatio is arrived, submit stage after the 
maxRegisteredWaitingTime(millisecond), default value 1
spark.executor.maxRegisteredWaitingTime = 5000


  was:
Because creating TaskSetManager and registering executors are asynchronous, in 
most situation, early stages' tasks run without preferred locality.

A simple solution is sleeping few seconds in application, so that executors 
have enough time to register.

A better way is to make DAGScheduler submit stage after a few of executors have 
been registered by configuration properties.

# submit stage only after successfully registered executors arrived the ratio, 
default value 0
spark.executor.registeredRatio = 0.8

# whatever registeredRatio is arrived, submit stage after the 
maxRegisteredWaitingTime(millisecond), default value 1
spark.executor.maxRegisteredWaitingTime = 5000



 Submit stage after executors have been registered
 -

 Key: SPARK-1946
 URL: https://issues.apache.org/jira/browse/SPARK-1946
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Zhihui
 Attachments: Spark Task Scheduler Optimization Proposal.pptx


 Because creating TaskSetManager and registering executors are asynchronous, 
 in most situation, early stages' tasks run without preferred locality.
 A simple solution is sleeping few seconds in application, so that executors 
 have enough time to register.
 A better way is to make DAGScheduler submit stage after a few of executors 
 have been registered by configuration properties.
 \# submit stage only after successfully registered executors arrived the 
 ratio, default value 0
 spark.executor.registeredRatio = 0.8
 \# whatever registeredRatio is arrived, submit stage after the 
 maxRegisteredWaitingTime(millisecond), default value 1
 spark.executor.maxRegisteredWaitingTime = 5000



--
This message was sent by Atlassian JIRA
(v6.2#6252)