[jira] [Comment Edited] (SPARK-3174) Provide elastic scaling within a Spark application

2014-12-21 Thread Nathan M (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255344#comment-14255344
 ] 

Nathan M edited comment on SPARK-3174 at 12/22/14 1:36 AM:
---

This is something we're very interested in but we aren't using YARN. 

Is there a JIRA to add this feature to Mesos?


was (Author: nemccarthy):
Is there a JIRA to add this feature to Mesos?

> Provide elastic scaling within a Spark application
> --
>
> Key: SPARK-3174
> URL: https://issues.apache.org/jira/browse/SPARK-3174
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 1.0.2
>Reporter: Sandy Ryza
>Assignee: Andrew Or
> Fix For: 1.2.0
>
> Attachments: SPARK-3174design.pdf, SparkElasticScalingDesignB.pdf, 
> dynamic-scaling-executors-10-6-14.pdf
>
>
> A common complaint with Spark in a multi-tenant environment is that 
> applications have a fixed allocation that doesn't grow and shrink with their 
> resource needs.  We're blocked on YARN-1197 for dynamically changing the 
> resources within executors, but we can still allocate and discard whole 
> executors.
> It would be useful to have some heuristics that
> * Request more executors when many pending tasks are building up
> * Discard executors when they are idle
> See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-16 Thread Praveen Seluka (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173542#comment-14173542
 ] 

Praveen Seluka edited comment on SPARK-3174 at 10/16/14 8:56 AM:
-

[~vanzin] Regarding your question related to scaling heuristic
 
- It does take care of the backlog of tasks also. Once some task completes, it 
calculates the average runtime of a task (within a stage). Then it estimates 
the runtime(remaining) of the stage using the following heuristic
 var estimatedRuntimeForStage = averageRuntime * (remainingTasks + 
(activeTasks/2))
 We also add (activeTasks/2) as we need to take the current running tasks into 
account.

- I think, the proposal I have made is not very different from the exponential 
approach. Lets say we start the Spark application with just 2 executors. It 
will double the number of executors and hence goes to 4, 8 and so on. The 
difference I see here comparing to exponential approach is, we start doubling 
the current count of executors whereas exponential starts from 1 (also resets 
to 1 when there are no pending tasks).  But yeah, this could be altered to do 
the same as exponential approach also.

- In a way, this proposal adds some ETA based heuristic in addition. (threshold 
for stage completion time)

- Also, this proposal adds the "memory used" heuristic too for scaling 
decisions which is missing in Andrew's PR. (Correct me if am wrong here). This 
for sure will be very useful. 

- The main point being, It does all these without making any changes in 
TaskSchedulerImpl/TaskSetManager code base. And I think, thats a really nice 
thing.



was (Author: praveenseluka):
[~vanzin] Regarding your question related to scaling heuristic
 
- It does take care of the backlog of tasks also. Once some task completes, it 
calculates the average runtime of a task (within a stage). Then it estimates 
the runtime(remaining) of the stage using the following heuristic
 var estimatedRuntimeForStage = averageRuntime * (remainingTasks + 
(activeTasks/2))
 We also add (activeTasks/2) as we need to take the current running tasks into 
account.

- I think, the proposal I have made is not very different from the exponential 
approach. Lets say we start the Spark application with just 2 executors. It 
will double the number of executors and hence goes to 4, 8 and so on. The 
difference I see here comparing to exponential approach is, we start doubling 
the current count of executors whereas exponential starts from 1 (also resets 
to 1 when there are no pending tasks).  But yeah, this could be altered to do 
the same as exponential approach also.

- In a way, this proposal adds some ETA based heuristic in addition. (threshold 
for stage completion time)

- Also, this proposal adds the "memory used" heuristic too for scaling 
decisions which is missing in Andrew's PR. (Correct me if am wrong here). This 
for sure will be very useful. 

- The main point being, It does all these without making any changes in 
TaskSchedulerImpl/TaskSetManager code base.


> Provide elastic scaling within a Spark application
> --
>
> Key: SPARK-3174
> URL: https://issues.apache.org/jira/browse/SPARK-3174
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 1.0.2
>Reporter: Sandy Ryza
>Assignee: Andrew Or
> Attachments: SPARK-3174design.pdf, SparkElasticScalingDesignB.pdf, 
> dynamic-scaling-executors-10-6-14.pdf
>
>
> A common complaint with Spark in a multi-tenant environment is that 
> applications have a fixed allocation that doesn't grow and shrink with their 
> resource needs.  We're blocked on YARN-1197 for dynamically changing the 
> resources within executors, but we can still allocate and discard whole 
> executors.
> It would be useful to have some heuristics that
> * Request more executors when many pending tasks are building up
> * Discard executors when they are idle
> See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-16 Thread Praveen Seluka (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173542#comment-14173542
 ] 

Praveen Seluka edited comment on SPARK-3174 at 10/16/14 8:55 AM:
-

[~vanzin] Regarding your question related to scaling heuristic
 
- It does take care of the backlog of tasks also. Once some task completes, it 
calculates the average runtime of a task (within a stage). Then it estimates 
the runtime(remaining) of the stage using the following heuristic
 var estimatedRuntimeForStage = averageRuntime * (remainingTasks + 
(activeTasks/2))
 We also add (activeTasks/2) as we need to take the current running tasks into 
account.

- I think, the proposal I have made is not very different from the exponential 
approach. Lets say we start the Spark application with just 2 executors. It 
will double the number of executors and hence goes to 4, 8 and so on. The 
difference I see here comparing to exponential approach is, we start doubling 
the current count of executors whereas exponential starts from 1 (also resets 
to 1 when there are no pending tasks).  But yeah, this could be altered to do 
the same as exponential approach also.

- In a way, this proposal adds some ETA based heuristic in addition. (threshold 
for stage completion time)

- Also, this proposal adds the "memory used" heuristic too for scaling 
decisions which is missing in Andrew's PR. (Correct me if am wrong here). This 
for sure will be very useful. 

- The main point being, It does all these without making any changes in 
TaskSchedulerImpl/TaskSetManager code base.



was (Author: praveenseluka):
[~vanzin] Regarding your question related to scaling heuristic
 
- It does take care of the backlog of tasks also. Once some task completes, it 
calculates the average runtime of a task (within a stage). Then it estimates 
the runtime(remaining) of the stage using the following heuristic
 var estimatedRuntimeForStage = averageRuntime * (remainingTasks + 
(activeTasks/2))
 We also add (activeTasks/2) as we need to take the current running tasks into 
account.

- I think, the proposal I have made is not very different from the exponential 
approach. Lets say we start the Spark application with just 2 executors. It 
will double the number of executors and hence goes to 4, 8 and so on. The 
difference I see here comparing to exponential approach is, we start doubling 
the current count of executors whereas exponential starts from 1 (also resets 
to 1 when there are no pending tasks).  But yeah, this could be altered to do 
the same as exponential approach also.

- In a way, this proposal adds some ETA based heuristic in addition. (threshold 
for stage completion time)

- Also, this proposal adds the "memory used" heuristic too for scaling 
decisions which is missing in Andrew's PR. (Correct me if am wrong here). This 
for sure will be very useful. 


> Provide elastic scaling within a Spark application
> --
>
> Key: SPARK-3174
> URL: https://issues.apache.org/jira/browse/SPARK-3174
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 1.0.2
>Reporter: Sandy Ryza
>Assignee: Andrew Or
> Attachments: SPARK-3174design.pdf, SparkElasticScalingDesignB.pdf, 
> dynamic-scaling-executors-10-6-14.pdf
>
>
> A common complaint with Spark in a multi-tenant environment is that 
> applications have a fixed allocation that doesn't grow and shrink with their 
> resource needs.  We're blocked on YARN-1197 for dynamically changing the 
> resources within executors, but we can still allocate and discard whole 
> executors.
> It would be useful to have some heuristics that
> * Request more executors when many pending tasks are building up
> * Discard executors when they are idle
> See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-14 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171024#comment-14171024
 ] 

Sandy Ryza edited comment on SPARK-3174 at 10/14/14 3:03 PM:
-

bq. If I understand correctly, your concern with requesting executors in rounds 
is that we will end up making many requests if we have many executors, when we 
could instead just batch them?
My original claim was that a policy that could be aware of the number of tasks 
needed by a stage would be necessary to get enough executors quickly when going 
from idle to running a job.  Your claim was that the exponential policy can be 
tuned to give similar enough behavior to the type of policy I'm describing.  My 
concern was that, even if that is true, tuning the exponential policy in this 
way would be detrimental at other points of the application lifecycle: e.g., 
starting the next stage within a job could lead to over-allocation.



was (Author: sandyr):
bq. If I understand correctly, your concern with requesting executors in rounds 
is that we will end up making many requests if we have many executors, when we 
could instead just batch them?
My original claim was that a policy that could be aware of the number of tasks 
needed by a stage would be necessary to get enough executors quickly when going 
from idle to running a job.  You claim was that the exponential policy can be 
tuned to give similar enough behavior to the type of policy I'm describing.  My 
concern was that, even if that is true, tuning the exponential policy in this 
way would be detrimental at other points of the application lifecycle: e.g., 
starting the next stage within a job could lead to over-allocation.


> Provide elastic scaling within a Spark application
> --
>
> Key: SPARK-3174
> URL: https://issues.apache.org/jira/browse/SPARK-3174
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 1.0.2
>Reporter: Sandy Ryza
>Assignee: Andrew Or
> Attachments: SPARK-3174design.pdf, SparkElasticScalingDesignB.pdf, 
> dynamic-scaling-executors-10-6-14.pdf
>
>
> A common complaint with Spark in a multi-tenant environment is that 
> applications have a fixed allocation that doesn't grow and shrink with their 
> resource needs.  We're blocked on YARN-1197 for dynamically changing the 
> resources within executors, but we can still allocate and discard whole 
> executors.
> It would be useful to have some heuristics that
> * Request more executors when many pending tasks are building up
> * Discard executors when they are idle
> See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-14 Thread Praveen Seluka (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170998#comment-14170998
 ] 

Praveen Seluka edited comment on SPARK-3174 at 10/14/14 2:42 PM:
-

Posted a detailed autoscaling design doc => 
https://issues.apache.org/jira/secure/attachment/12674773/SparkElasticScalingDesignB.pdf
 - Created PR for https://issues.apache.org/jira/browse/SPARK-3822 (hooks to 
add/delete executors from SparkContext)
 - Posted the detailed design on Autoscaling criteria's and also have a patch 
ready for the same. 

Just to give some context, I have been looking at elastic autoscaling for quite 
sometime. Mailed spark­users mailing list few weeks back on the idea of having 
hooks for adding and deleting executors and also ready to submit a patch. Last 
week, I saw the initial details design doc posted in SPARK­3174 JIRA here. 
After looking briefly at the design proposed, few things were evident for me.
- ­ The design largely overlaps with what I have built so far. Also, I have the 
patch in working state for this now.
- ­ Sharing this design doc which highlights the idea in slightly more detail. 
It will be great if we could collaborate on this as I have the basic pieces 
implemented already.
- ­ Contrasting some implementation level details when compared to the one 
proposed already (hence, calling this design B) and we could take the best 
course of action.

Looking forward to hear your views and collaborate on this.



was (Author: praveenseluka):
Posted a detailed autoscaling design doc => 
https://issues.apache.org/jira/secure/attachment/12674773/SparkElasticScalingDesignB.pdf
 - Created PR for https://issues.apache.org/jira/browse/SPARK-3822 (hooks to 
add/delete executors from SparkContext)
 - Posted the detailed design on Autoscaling criteria's and also have a patch 
ready for the same. 

Just to give some context, I have been looking at elastic autoscaling for quite 
sometime. Mailed spark­users mailing list few weeks back on the idea of having 
hooks for adding and deleting executors and also ready to submit a patch. Last 
week, I saw the initial details design doc posted in SPARK­3174 JIRA here. 
After looking briefly at the design proposed, few things were evident for me.
- ­ The design largely overlaps with what I have built so far. Also, I have the 
patch in working state for this now.
- ­ Sharing this design doc which highlights the idea in slightly more detail. 
It will be great if we could collaborate on this as I have the basic pieces 
implemented already.
- ­ Contrasting some implementation level details when compared to the one 
proposed already (hence, calling this design B) and we could take the best 
course of action.

Looking forward to hear your view and collaborate on this.


> Provide elastic scaling within a Spark application
> --
>
> Key: SPARK-3174
> URL: https://issues.apache.org/jira/browse/SPARK-3174
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 1.0.2
>Reporter: Sandy Ryza
>Assignee: Andrew Or
> Attachments: SPARK-3174design.pdf, SparkElasticScalingDesignB.pdf, 
> dynamic-scaling-executors-10-6-14.pdf
>
>
> A common complaint with Spark in a multi-tenant environment is that 
> applications have a fixed allocation that doesn't grow and shrink with their 
> resource needs.  We're blocked on YARN-1197 for dynamically changing the 
> resources within executors, but we can still allocate and discard whole 
> executors.
> It would be useful to have some heuristics that
> * Request more executors when many pending tasks are building up
> * Discard executors when they are idle
> See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-14 Thread Praveen Seluka (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170998#comment-14170998
 ] 

Praveen Seluka edited comment on SPARK-3174 at 10/14/14 2:41 PM:
-

Posted a detailed autoscaling design doc => 
https://issues.apache.org/jira/secure/attachment/12674773/SparkElasticScalingDesignB.pdf
 - Created PR for https://issues.apache.org/jira/browse/SPARK-3822 (hooks to 
add/delete executors from SparkContext)
 - Posted the detailed design on Autoscaling criteria's and also have a patch 
ready for the same. 

Just to give some context, I have been looking at elastic autoscaling for quite 
sometime. Mailed spark­users mailing list few weeks back on the idea of having 
hooks for adding and deleting executors and also ready to submit a patch. Last 
week, I saw the initial details design doc posted in SPARK­3174 JIRA here. 
After looking briefly at the design proposed, few things were evident for me.
- ­ The design largely overlaps with what I have built so far. Also, I have the 
patch in working state for this now.
- ­ Sharing this design doc which highlights the idea in slightly more detail. 
It will be great if we could collaborate on this as I have the basic pieces 
implemented already.
- ­ Contrasting some implementation level details when compared to the one 
proposed already (hence, calling this design B) and we could take the best 
course of action.

Looking forward to hear your view and collaborate on this.



was (Author: praveenseluka):
Posted a detailed autoscaling design doc => 
https://issues.apache.org/jira/secure/attachment/12674773/SparkElasticScalingDesignB.pdf
 - Created PR for https://issues.apache.org/jira/browse/SPARK-3822 (hooks to 
add/delete executors from SparkContext)
 - Posted the detailed design on Autoscaling criteria's and also have a patch 
ready for the same. 

Just to give some context, I have been looking at elastic autoscaling for quite 
sometime. Mailed spark­users mailing list few weeks back on the idea of having 
hooks for adding and deleting executors and also ready to submit a patch. Last 
week, I saw the initial details design doc posted in SPARK­3174 JIRA here. 
After looking briefly at the design proposed, few things were evident for me.
- ­ The design largely overlaps with what I have built so far. Also, I have the 
patch in working state for this now.
- ­ Sharing this design doc which highlights the idea in slightly more detail. 
It will be great if we could collaborate on this as I have the basic pieces 
implemented already.
- ­ Contrasting some implementation level details when compared to the one 
proposed already (hence, calling this design B) and we could take the best 
course of action.


> Provide elastic scaling within a Spark application
> --
>
> Key: SPARK-3174
> URL: https://issues.apache.org/jira/browse/SPARK-3174
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 1.0.2
>Reporter: Sandy Ryza
>Assignee: Andrew Or
> Attachments: SPARK-3174design.pdf, SparkElasticScalingDesignB.pdf, 
> dynamic-scaling-executors-10-6-14.pdf
>
>
> A common complaint with Spark in a multi-tenant environment is that 
> applications have a fixed allocation that doesn't grow and shrink with their 
> resource needs.  We're blocked on YARN-1197 for dynamically changing the 
> resources within executors, but we can still allocate and discard whole 
> executors.
> It would be useful to have some heuristics that
> * Request more executors when many pending tasks are building up
> * Discard executors when they are idle
> See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-13 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169830#comment-14169830
 ] 

Andrew Or edited comment on SPARK-3174 at 10/13/14 7:45 PM:


[~vanzin]

bq. Are you proposing a change to the current semantics, where Yarn will 
request "--num-executors" up front? If you keep that, I think that would cover 
my above concerns. But switching to a slow start with no option to pre-allocate 
a certain numbers seems like it might harm certain jobs.

I'm actually not proposing to change the application start-up behavior. Spark 
will continue to request however many number of executors it will today 
upfront. The slow-start comes in when you want to add executors after removing 
them. Also, you can control how often you want to add executors with a config, 
so if the application wants the behavior where it requests all executors at 
once, it can still do that.

bq. My second is about the shuffle service you're proposing. Have you 
investigated whether it would be possible to make Hadoop's shuffle service more 
generic, so that Spark can benefit from it? It does mean that this feature 
might be constrained to certain versions of Hadoop, but maybe that's not 
necessarily a bad thing if it means more infrastructure is shared.

I have indeed. The main difficulty to integrating Spark and Yarn cleanly there 
stems from the hard-coded shuffle file paths and index shuffle file format. 
Currently, both are highly specific to MR, and although we can work around them 
by adapting Spark's shuffle behavior to MR's (non-trivial but certainly 
possible), we'll only be able to use the feature on Yarn. If we decide to 
extend this feature to standalone or mesos mode, we'll have to do what we're 
doing right now anyway since we can't rely on the Yarn ShuffleHandler there.


was (Author: andrewor14):
[~vanzin]

bq. My first question I think is similar to Tom's. It was not clear to me how 
the app will behave when it starts up. I'd expect the first job to be the one 
that has to process the largest amount of data, so it would benefit from having 
as many executors as possible available as quickly as possible - something that 
seems to conflict with the idea of a slow start.
bq. Are you proposing a change to the current semantics, where Yarn will 
request "--num-executors" up front? If you keep that, I think that would cover 
my above concerns. But switching to a slow start with no option to pre-allocate 
a certain numbers seems like it might harm certain jobs.

I'm actually not proposing to change the application start-up behavior. Spark 
will continue to request however many number of executors it will today 
upfront. The slow-start comes in when you want to add executors after removing 
them. Also, you can control how often you want to add executors with a config, 
so if the application wants the behavior where it requests all executors at 
once, it can still do that.

bq. My second is about the shuffle service you're proposing. Have you 
investigated whether it would be possible to make Hadoop's shuffle service more 
generic, so that Spark can benefit from it? It does mean that this feature 
might be constrained to certain versions of Hadoop, but maybe that's not 
necessarily a bad thing if it means more infrastructure is shared.

I have indeed. The main difficulty to integrating Spark and Yarn cleanly there 
stems from the hard-coded shuffle file paths and index shuffle file format. 
Currently, both are highly specific to MR, and although we can work around them 
by adapting Spark's shuffle behavior to MR's (non-trivial but certainly 
possible), we'll only be able to use the feature on Yarn. If we decide to 
extend this feature to standalone or mesos mode, we'll have to do what we're 
doing right now anyway since we can't rely on the Yarn ShuffleHandler there.

> Provide elastic scaling within a Spark application
> --
>
> Key: SPARK-3174
> URL: https://issues.apache.org/jira/browse/SPARK-3174
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 1.0.2
>Reporter: Sandy Ryza
>Assignee: Andrew Or
> Attachments: SPARK-3174design.pdf, 
> dynamic-scaling-executors-10-6-14.pdf
>
>
> A common complaint with Spark in a multi-tenant environment is that 
> applications have a fixed allocation that doesn't grow and shrink with their 
> resource needs.  We're blocked on YARN-1197 for dynamically changing the 
> resources within executors, but we can still allocate and discard whole 
> executors.
> It would be useful to have some heuristics that
> * Request more executors when many pending tasks are building up
> * Discard executors when they are idle
> See the latest design doc for more information.



--
This 

[jira] [Comment Edited] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-13 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169830#comment-14169830
 ] 

Andrew Or edited comment on SPARK-3174 at 10/13/14 7:45 PM:


[~vanzin]

bq. My first question I think is similar to Tom's. It was not clear to me how 
the app will behave when it starts up. I'd expect the first job to be the one 
that has to process the largest amount of data, so it would benefit from having 
as many executors as possible available as quickly as possible - something that 
seems to conflict with the idea of a slow start.
bq. Are you proposing a change to the current semantics, where Yarn will 
request "--num-executors" up front? If you keep that, I think that would cover 
my above concerns. But switching to a slow start with no option to pre-allocate 
a certain numbers seems like it might harm certain jobs.

I'm actually not proposing to change the application start-up behavior. Spark 
will continue to request however many number of executors it will today 
upfront. The slow-start comes in when you want to add executors after removing 
them. Also, you can control how often you want to add executors with a config, 
so if the application wants the behavior where it requests all executors at 
once, it can still do that.

bq. My second is about the shuffle service you're proposing. Have you 
investigated whether it would be possible to make Hadoop's shuffle service more 
generic, so that Spark can benefit from it? It does mean that this feature 
might be constrained to certain versions of Hadoop, but maybe that's not 
necessarily a bad thing if it means more infrastructure is shared.

I have indeed. The main difficulty to integrating Spark and Yarn cleanly there 
stems from the hard-coded shuffle file paths and index shuffle file format. 
Currently, both are highly specific to MR, and although we can work around them 
by adapting Spark's shuffle behavior to MR's (non-trivial but certainly 
possible), we'll only be able to use the feature on Yarn. If we decide to 
extend this feature to standalone or mesos mode, we'll have to do what we're 
doing right now anyway since we can't rely on the Yarn ShuffleHandler there.


was (Author: andrewor14):
[~vanzin]

bq. My first question I think is similar to Tom's. It was not clear to me how 
the app will behave when it starts up. I'd expect the first job to be the one 
that has to process the largest amount of data, so it would benefit from having 
as many executors as possible available as quickly as possible - something that 
seems to conflict with the idea of a slow start.
Are you proposing a change to the current semantics, where Yarn will request 
"--num-executors" up front? If you keep that, I think that would cover my above 
concerns. But switching to a slow start with no option to pre-allocate a 
certain numbers seems like it might harm certain jobs.

I'm actually not proposing to change the application start-up behavior. Spark 
will continue to request however many number of executors it will today 
upfront. The slow-start comes in when you want to add executors after removing 
them. Also, you can control how often you want to add executors with a config, 
so if the application wants the behavior where it requests all executors at 
once, it can still do that.

bq. My second is about the shuffle service you're proposing. Have you 
investigated whether it would be possible to make Hadoop's shuffle service more 
generic, so that Spark can benefit from it? It does mean that this feature 
might be constrained to certain versions of Hadoop, but maybe that's not 
necessarily a bad thing if it means more infrastructure is shared.

I have indeed. The main difficulty to integrating Spark and Yarn cleanly there 
stems from the hard-coded shuffle file paths and index shuffle file format. 
Currently, both are highly specific to MR, and although we can work around them 
by adapting Spark's shuffle behavior to MR's (non-trivial but certainly 
possible), we'll only be able to use the feature on Yarn. If we decide to 
extend this feature to standalone or mesos mode, we'll have to do what we're 
doing right now anyway since we can't rely on the Yarn ShuffleHandler there.

> Provide elastic scaling within a Spark application
> --
>
> Key: SPARK-3174
> URL: https://issues.apache.org/jira/browse/SPARK-3174
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 1.0.2
>Reporter: Sandy Ryza
>Assignee: Andrew Or
> Attachments: SPARK-3174design.pdf, 
> dynamic-scaling-executors-10-6-14.pdf
>
>
> A common complaint with Spark in a multi-tenant environment is that 
> applications have a fixed allocation that doesn't grow and shrink with their 
> resource needs. 

[jira] [Comment Edited] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-13 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169830#comment-14169830
 ] 

Andrew Or edited comment on SPARK-3174 at 10/13/14 7:45 PM:


[~vanzin]

bq. My first question I think is similar to Tom's. It was not clear to me how 
the app will behave when it starts up. I'd expect the first job to be the one 
that has to process the largest amount of data, so it would benefit from having 
as many executors as possible available as quickly as possible - something that 
seems to conflict with the idea of a slow start.
Are you proposing a change to the current semantics, where Yarn will request 
"--num-executors" up front? If you keep that, I think that would cover my above 
concerns. But switching to a slow start with no option to pre-allocate a 
certain numbers seems like it might harm certain jobs.

I'm actually not proposing to change the application start-up behavior. Spark 
will continue to request however many number of executors it will today 
upfront. The slow-start comes in when you want to add executors after removing 
them. Also, you can control how often you want to add executors with a config, 
so if the application wants the behavior where it requests all executors at 
once, it can still do that.

bq. My second is about the shuffle service you're proposing. Have you 
investigated whether it would be possible to make Hadoop's shuffle service more 
generic, so that Spark can benefit from it? It does mean that this feature 
might be constrained to certain versions of Hadoop, but maybe that's not 
necessarily a bad thing if it means more infrastructure is shared.

I have indeed. The main difficulty to integrating Spark and Yarn cleanly there 
stems from the hard-coded shuffle file paths and index shuffle file format. 
Currently, both are highly specific to MR, and although we can work around them 
by adapting Spark's shuffle behavior to MR's (non-trivial but certainly 
possible), we'll only be able to use the feature on Yarn. If we decide to 
extend this feature to standalone or mesos mode, we'll have to do what we're 
doing right now anyway since we can't rely on the Yarn ShuffleHandler there.


was (Author: andrewor14):
[~vanzin]

bq. My first question I think is similar to Tom's. It was not clear to me how 
the app will behave when it starts up. I'd expect the first job to be the one 
that has to process the largest amount of data, so it would benefit from having 
as many executors as possible available as quickly as possible - something that 
seems to conflict with the idea of a slow start.
bq. Are you proposing a change to the current semantics, where Yarn will 
request "--num-executors" up front? If you keep that, I think that would cover 
my above concerns. But switching to a slow start with no option to pre-allocate 
a certain numbers seems like it might harm certain jobs.

I'm actually not proposing to change the application start-up behavior. Spark 
will continue to request however many number of executors it will today 
upfront. The slow-start comes in when you want to add executors after removing 
them. Also, you can control how often you want to add executors with a config, 
so if the application wants the behavior where it requests all executors at 
once, it can still do that.

bq. My second is about the shuffle service you're proposing. Have you 
investigated whether it would be possible to make Hadoop's shuffle service more 
generic, so that Spark can benefit from it? It does mean that this feature 
might be constrained to certain versions of Hadoop, but maybe that's not 
necessarily a bad thing if it means more infrastructure is shared.

I have indeed. The main difficulty to integrating Spark and Yarn cleanly there 
stems from the hard-coded shuffle file paths and index shuffle file format. 
Currently, both are highly specific to MR, and although we can work around them 
by adapting Spark's shuffle behavior to MR's (non-trivial but certainly 
possible), we'll only be able to use the feature on Yarn. If we decide to 
extend this feature to standalone or mesos mode, we'll have to do what we're 
doing right now anyway since we can't rely on the Yarn ShuffleHandler there.

> Provide elastic scaling within a Spark application
> --
>
> Key: SPARK-3174
> URL: https://issues.apache.org/jira/browse/SPARK-3174
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 1.0.2
>Reporter: Sandy Ryza
>Assignee: Andrew Or
> Attachments: SPARK-3174design.pdf, 
> dynamic-scaling-executors-10-6-14.pdf
>
>
> A common complaint with Spark in a multi-tenant environment is that 
> applications have a fixed allocation that doesn't grow and shrink with their 
> resource needs. 

[jira] [Comment Edited] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-13 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169830#comment-14169830
 ] 

Andrew Or edited comment on SPARK-3174 at 10/13/14 7:45 PM:


[~vanzin]

bq. My first question I think is similar to Tom's. It was not clear to me how 
the app will behave when it starts up. I'd expect the first job to be the one 
that has to process the largest amount of data, so it would benefit from having 
as many executors as possible available as quickly as possible - something that 
seems to conflict with the idea of a slow start.
bq. Are you proposing a change to the current semantics, where Yarn will 
request "--num-executors" up front? If you keep that, I think that would cover 
my above concerns. But switching to a slow start with no option to pre-allocate 
a certain numbers seems like it might harm certain jobs.

I'm actually not proposing to change the application start-up behavior. Spark 
will continue to request however many number of executors it will today 
upfront. The slow-start comes in when you want to add executors after removing 
them. Also, you can control how often you want to add executors with a config, 
so if the application wants the behavior where it requests all executors at 
once, it can still do that.

bq. My second is about the shuffle service you're proposing. Have you 
investigated whether it would be possible to make Hadoop's shuffle service more 
generic, so that Spark can benefit from it? It does mean that this feature 
might be constrained to certain versions of Hadoop, but maybe that's not 
necessarily a bad thing if it means more infrastructure is shared.

I have indeed. The main difficulty to integrating Spark and Yarn cleanly there 
stems from the hard-coded shuffle file paths and index shuffle file format. 
Currently, both are highly specific to MR, and although we can work around them 
by adapting Spark's shuffle behavior to MR's (non-trivial but certainly 
possible), we'll only be able to use the feature on Yarn. If we decide to 
extend this feature to standalone or mesos mode, we'll have to do what we're 
doing right now anyway since we can't rely on the Yarn ShuffleHandler there.


was (Author: andrewor14):
[~vanzin]

bq. My first question I think is similar to Tom's. It was not clear to me how 
the app will behave when it starts up. I'd expect the first job to be the one 
that has to process the largest amount of data, so it would benefit from having 
as many executors as possible available as quickly as possible - something that 
seems to conflict with the idea of a slow start.
Are you proposing a change to the current semantics, where Yarn will request 
"--num-executors" up front? If you keep that, I think that would cover my above 
concerns. But switching to a slow start with no option to pre-allocate a 
certain numbers seems like it might harm certain jobs.

I'm actually not proposing to change the application start-up behavior. Spark 
will continue to request however many number of executors it will today 
upfront. The slow-start comes in when you want to add executors after removing 
them. Also, you can control how often you want to add executors with a config, 
so if the application wants the behavior where it requests all executors at 
once, it can still do that.

bq. My second is about the shuffle service you're proposing. Have you 
investigated whether it would be possible to make Hadoop's shuffle service more 
generic, so that Spark can benefit from it? It does mean that this feature 
might be constrained to certain versions of Hadoop, but maybe that's not 
necessarily a bad thing if it means more infrastructure is shared.

I have indeed. The main difficulty to integrating Spark and Yarn cleanly there 
stems from the hard-coded shuffle file paths and index shuffle file format. 
Currently, both are highly specific to MR, and although we can work around them 
by adapting Spark's shuffle behavior to MR's (non-trivial but certainly 
possible), we'll only be able to use the feature on Yarn. If we decide to 
extend this feature to standalone or mesos mode, we'll have to do what we're 
doing right now anyway since we can't rely on the Yarn ShuffleHandler there.

> Provide elastic scaling within a Spark application
> --
>
> Key: SPARK-3174
> URL: https://issues.apache.org/jira/browse/SPARK-3174
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 1.0.2
>Reporter: Sandy Ryza
>Assignee: Andrew Or
> Attachments: SPARK-3174design.pdf, 
> dynamic-scaling-executors-10-6-14.pdf
>
>
> A common complaint with Spark in a multi-tenant environment is that 
> applications have a fixed allocation that doesn't grow and shrink with their 
> resource needs. 

[jira] [Comment Edited] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-06 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161372#comment-14161372
 ] 

Andrew Or edited comment on SPARK-3174 at 10/7/14 2:20 AM:
---

@[~sandyr] Replying inline:

bq. I would expect properties underneath spark.executor.* to pertain to what 
goes on inside of executors. This is really more of a driver/scheduler feature, 
so a different prefix might make more sense.

Ah I see. Makes sense. Maybe it makes sense to just call it 
`spark.dynamicAllocation.*`. Do you have other suggestions?

bq. Because it's a large task and there's still significant value without it, I 
assume we'll hold off on implementing the graceful decommission until we're 
done with the first parts?

I intend for all of these components to be in 1.2. We can actually do the 
decommission part in parallel. Aaron is working on a more general service that 
serves shuffle files independently of the executors in SPARK-3796, and after 
that's ready I will integrate it into Yarn's aux service in SPARK-3797. 
Meanwhile we can still work on the heuristics and mechanisms for scaling. (All 
of this is assuming we only handle the shuffles but not the blocks).

bq. This is probably out of scope for the first cut, but in the future it might 
be useful to include addition/removal policies that use what Spark knows about 
upcoming stages to anticipate the number of executors needed. Can we structure 
the config property names in a way that will make sense if we choose to add 
more advanced functionality like this?

I think in general we should limit the number of things that will affect 
adding/removing executors. Otherwise an application might get/lose many 
executors all of a sudden without a good understanding of why. Also 
anticipating what's needed in a future stage is usually fairly difficult, 
because you don't know a priori how long each stage is running. I don't see a 
good metric to decide how far in the future to anticipate for.

bq. When cluster resources are constrained, we may find ourselves in situations 
where YARN is unable to allocate the additional resources we requested before 
the next time interval. I haven't thought about it extremely deeply, but it 
seems like there may be some pathological situations in which we request an 
enormous number of additional executors while waiting. It might make sense to 
do something like avoid increasing the number requested until we've actually 
received some?

Yes, there is a config that limits the number of executors you can have. You 
raise a good point that if the resource manager keeps rejecting your requests 
for more executors, your application might want to back off a little before 
trying again so you don't flood the RM, though that adds some complexity.

bq. Last, any thoughts on what reasonable intervals would be? For the add 
interval, I imagine that we want it to be at least the amount of time required 
between invoking requestExecutors and being able to schedule tasks on the 
executors requested.

I think the intervals high depend on your workload. I don't have a concrete 
number on this, but I think the time between sending a request and being able 
to schedule tasks on the new executor is on the order of seconds (fairly 
quick). This feature also targets heavy workloads, the add interval should be 
on the order of minutes.

bq. My opinion is that, for a first cut without the graceful decommission, we 
should go with option 2 and only remove executors that have no cached blocks.

Maybe. I've been back and forth about this one. I suppose approach (b) of 
removing executors only if they have no cached blocks is more conservative. The 
worst that can happen with (b) is that you don't remove executors, but before 
1.2.0 you already can't remove executors, so there's little possibility for 
regression there, whereas the worst that can happen with (a) is that your 
application suddenly has worse performance. Note that if we throw an exception 
when the application attempts to cache blocks then (a) and (b) are basically 
equivalent.



was (Author: andrewor14):
@[~sandyr] Replying inline:

bq. I would expect properties underneath spark.executor.* to pertain to what 
goes on inside of executors. This is really more of a driver/scheduler feature, 
so a different prefix might make more sense.

Ah I see. Makes sense. Maybe it makes sense to just call it 
`spark.dynamicAllocation.*`. Do you have other suggestions?

bq. Because it's a large task and there's still significant value without it, I 
assume we'll hold off on implementing the graceful decommission until we're 
done with the first parts?

I intend for all of these components to be in 1.2. We can actually do the 
decommission part in parallel. Aaron is working on a more general service that 
serves shuffle files independently of the executors in SPARK-3796, and after 
that's ready