subject:"\[jira\] \[Commented\] \(SPARK\-3174\) Provide elastic scaling within a Spark application"

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-12-22 Thread Andrew Or (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256045#comment-14256045
 ] 

Andrew Or commented on SPARK-3174:
--

Hey [~nemccarthy] I filed one at SPARK-4922, which is for coarse-grained mode. 
For fine-grained mode, there is already one that enables dynamically scaling 
memory instead of just CPU at SPARK-1882. I believe there has not been progress 
on either issue yet.

 Provide elastic scaling within a Spark application
 --

 Key: SPARK-3174
 URL: https://issues.apache.org/jira/browse/SPARK-3174
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Affects Versions: 1.0.2
Reporter: Sandy Ryza
Assignee: Andrew Or
 Fix For: 1.2.0

 Attachments: SPARK-3174design.pdf, SparkElasticScalingDesignB.pdf, 
 dynamic-scaling-executors-10-6-14.pdf


 A common complaint with Spark in a multi-tenant environment is that 
 applications have a fixed allocation that doesn't grow and shrink with their 
 resource needs.  We're blocked on YARN-1197 for dynamically changing the 
 resources within executors, but we can still allocate and discard whole 
 executors.
 It would be useful to have some heuristics that
 * Request more executors when many pending tasks are building up
 * Discard executors when they are idle
 See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-12-21 Thread Nathan M (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255344#comment-14255344
 ] 

Nathan M commented on SPARK-3174:
-

Is there a JIRA to add this feature to Mesos?

 Provide elastic scaling within a Spark application
 --

 Key: SPARK-3174
 URL: https://issues.apache.org/jira/browse/SPARK-3174
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Affects Versions: 1.0.2
Reporter: Sandy Ryza
Assignee: Andrew Or
 Fix For: 1.2.0

 Attachments: SPARK-3174design.pdf, SparkElasticScalingDesignB.pdf, 
 dynamic-scaling-executors-10-6-14.pdf


 A common complaint with Spark in a multi-tenant environment is that 
 applications have a fixed allocation that doesn't grow and shrink with their 
 resource needs.  We're blocked on YARN-1197 for dynamically changing the 
 resources within executors, but we can still allocate and discard whole 
 executors.
 It would be useful to have some heuristics that
 * Request more executors when many pending tasks are building up
 * Discard executors when they are idle
 See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-21 Thread Dag Liodden (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178554#comment-14178554
 ] 

Dag Liodden commented on SPARK-3174:


Hey guys, glad to see some progress on this!

I  believe a very common use case here is to not do scaling based on the 
internal needs of the job, but also external factors. For instance, we're using 
the fair scheduler and want to have Spark jobs opportunistically leverage 
available containers, but it's equally important that the executor count is 
scaled down if the job does over its fair share during the job lifetime. I'm 
not sure that's being covered here?

I'm sure there will be classes of jobs that heavily leverage RDD caching where 
heuristic yielding will be very hard to implement with deterministic 
performance impacts, but regular MR jobs should be pretty straight-forward. 
Pluggable algorithms definitely sound like a viable approach for this and for 
regular MR it could be as simple as regularly check the current target 
container count and create or drain + kill executors accordingly? 

Dag

 Provide elastic scaling within a Spark application
 --

 Key: SPARK-3174
 URL: https://issues.apache.org/jira/browse/SPARK-3174
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Affects Versions: 1.0.2
Reporter: Sandy Ryza
Assignee: Andrew Or
 Attachments: SPARK-3174design.pdf, SparkElasticScalingDesignB.pdf, 
 dynamic-scaling-executors-10-6-14.pdf


 A common complaint with Spark in a multi-tenant environment is that 
 applications have a fixed allocation that doesn't grow and shrink with their 
 resource needs.  We're blocked on YARN-1197 for dynamically changing the 
 resources within executors, but we can still allocate and discard whole 
 executors.
 It would be useful to have some heuristics that
 * Request more executors when many pending tasks are building up
 * Discard executors when they are idle
 See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-16 Thread Praveen Seluka (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173542#comment-14173542
]

Praveen Seluka commented on SPARK-3174:
---

[~vanzin] Regarding your question related to scaling heuristic

- It does take care of the backlog of tasks also. Once some task completes, it
calculates the average runtime of a task (within a stage). Then it estimates
the runtime(remaining) of the stage using the following heuristic
var estimatedRuntimeForStage = averageRuntime * (remainingTasks +
(activeTasks/2))
We also add (activeTasks/2) as we need to take the current running tasks into
account.

- I think, the proposal I have made is not very different from the exponential
approach. Lets say we start the Spark application with just 2 executors. It
will double the number of executors and hence goes to 4, 8 and so on. The
difference I see here comparing to exponential approach is, we start doubling
the current count of executors whereas exponential starts from 1 (also resets
to 1 when there are no pending tasks). But yeah, this could be altered to do
the same as exponential approach also.

- In a way, this proposal adds some ETA based heuristic in addition. (threshold
for stage completion time)

- Also, this proposal adds the memory used heuristic too for scaling
decisions which is missing in Andrew's PR. (Correct me if am wrong here). This
for sure will be very useful.

Provide elastic scaling within a Spark application
--

Key: SPARK-3174
URL: https://issues.apache.org/jira/browse/SPARK-3174
Project: Spark
Issue Type: Improvement
Components: Spark Core, YARN
Affects Versions: 1.0.2
Reporter: Sandy Ryza
Assignee: Andrew Or
Attachments: SPARK-3174design.pdf, SparkElasticScalingDesignB.pdf,
dynamic-scaling-executors-10-6-14.pdf

A common complaint with Spark in a multi-tenant environment is that
applications have a fixed allocation that doesn't grow and shrink with their
resource needs. We're blocked on YARN-1197 for dynamically changing the
resources within executors, but we can still allocate and discard whole
executors.
It would be useful to have some heuristics that
* Request more executors when many pending tasks are building up
* Discard executors when they are idle
See the latest design doc for more information.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-16 Thread Marcelo Vanzin (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173951#comment-14173951
]

Marcelo Vanzin commented on SPARK-3174:
---

bq. Lets say we start the Spark application with just 2 executors. It will
double the number of executors and hence goes to 4, 8 and so on.

Well, I'd say it's unusual for applications to start with a low number of
executors, especially if the user knows it will be executing things right away.
So if I start it with 32 executors, your code will right away try to make it
64. Andrew's approach would try to make it 33, then 35, then...

But I agree that it might be a good idea to make the auto-scaling backend an
interface, so that we can easily play with different approaches. That shouldn't
be hard at all.

bq. The main point being, It does all these without making any changes in
TaskSchedulerImpl/TaskSetManager

Theoretically, I agree that's a good thing. I haven't gone through the code in
detail, though, to know whether all the information Andrew is using from the
scheduler is available from SparkListener events. If you can derive that info,
great, I think it would be worth it to make the auto-scale code decoupled from
the scheduler. If not, then we either have the choice of hooking the
auto-scaling backend into the scheduler (like Andrew's change) or exposing more
info in the events - which may or may not be a good thing, depending on what
that info is.

Anyway, as I've said, both approaches are not irreconcilably different -
they're actually more similar than not.

Provide elastic scaling within a Spark application
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-15 Thread Praveen Seluka (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14172387#comment-14172387
 ] 

Praveen Seluka commented on SPARK-3174:
---

[~andrewor] - Can you also comment on the API exposed to add/delete executors 
from SparkContext ? I believe it will be, sc.addExecutors(count : Int)
sc.deleteExecutors(List[String])

[~sandyr] [~tgraves] [~andrewor] [~vanzin] - Can you please take a look at the 
design doc I have proposed. I am sure there are some pros in doing it this way 
- Have indicated them in detail in the doc. Since, it does not change Spark 
Core itself, you could easily replace with another pluggable algorithm for 
dynamic scaling. I know that Anrdrew already have a PR based on his design doc, 
but would surely love to get some feedback.

 Provide elastic scaling within a Spark application
 --

 Key: SPARK-3174
 URL: https://issues.apache.org/jira/browse/SPARK-3174
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Affects Versions: 1.0.2
Reporter: Sandy Ryza
Assignee: Andrew Or
 Attachments: SPARK-3174design.pdf, SparkElasticScalingDesignB.pdf, 
 dynamic-scaling-executors-10-6-14.pdf


 A common complaint with Spark in a multi-tenant environment is that 
 applications have a fixed allocation that doesn't grow and shrink with their 
 resource needs.  We're blocked on YARN-1197 for dynamically changing the 
 resources within executors, but we can still allocate and discard whole 
 executors.
 It would be useful to have some heuristics that
 * Request more executors when many pending tasks are building up
 * Discard executors when they are idle
 See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-15 Thread Marcelo Vanzin (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14172861#comment-14172861
]

Marcelo Vanzin commented on SPARK-3174:
---

[~praveenseluka] I took a quick look at your design. I have a few coments:

* It ignores the shuffle, which is coverend in Andrew's design. You can't scale
down without figuring out what to do with the shuffle data of completed states,
since it may be reused. It's not just about cached blocks.

* Overall, I like the idea of having pluggable scaling algorithms, but I
think it's too early for a public add/remove executor API. For example,
standalone mode doesn't support adding or removing executors, AFAIK.

* The task runtime-based heuristic looks a bit weird. I think it needs to be
more dynamic - e.g., actually consider the backlog of tasks vs. number of tasks
that can be run concurrently on the current executors. Also, doubling number
of executors seems a bit like a sledgehammer - an exponential approach like
Andrew has proposed might be able to reach similar results with lower overall
system load.

I haven't looked at Andrew's PR yet (it's on my todo list), but I don't see a
lot that is that much different in the two proposals. I agree that if it's
possible it would be nice to keep the autoscaling code / interfaces as separate
as possible from the current core, if only to make them easier to replace /
refactor / disable, but then again, I haven't looked at Andrew's code yet.

Provide elastic scaling within a Spark application
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-14 Thread Praveen Seluka (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170998#comment-14170998
]

Praveen Seluka commented on SPARK-3174:
---

Posted a detailed autoscaling design doc =
https://issues.apache.org/jira/secure/attachment/12674773/SparkElasticScalingDesignB.pdf
- Created PR for https://issues.apache.org/jira/browse/SPARK-3822 (hooks to
add/delete executors from SparkContext)
- Posted the detailed design on Autoscaling criteria's and also have a patch
ready for the same.

Just to give some context, I have been looking at elastic autoscaling for quite
sometime. Mailed sparkusers mailing list few weeks back on the idea of having
hooks for adding and deleting executors and also ready to submit a patch. Last
week, I saw the initial details design doc posted in SPARK3174 JIRA here.
After looking briefly at the design proposed, few things were evident for me.
- The design largely overlaps with what I have built so far. Also, I have the
patch in working state for this now.
- Sharing this design doc which highlights the idea in slightly more detail.
It will be great if we could collaborate on this as I have the basic pieces
implemented already.
- Contrasting some implementation level details when compared to the one
proposed already (hence, calling this design B) and we could take the best
course of action.

Provide elastic scaling within a Spark application
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-14 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171024#comment-14171024
 ] 

Sandy Ryza commented on SPARK-3174:
---

bq. If I understand correctly, your concern with requesting executors in rounds 
is that we will end up making many requests if we have many executors, when we 
could instead just batch them?
My original claim was that a policy that could be aware of the number of tasks 
needed by a stage would be necessary to get enough executors quickly when going 
from idle to running a job.  You claim was that the exponential policy can be 
tuned to give similar enough behavior to the type of policy I'm describing.  My 
concern was that, even if that is true, tuning the exponential policy in this 
way would be detrimental at other points of the application lifecycle: e.g., 
starting the next stage within a job could lead to over-allocation.


 Provide elastic scaling within a Spark application
 --

 Key: SPARK-3174
 URL: https://issues.apache.org/jira/browse/SPARK-3174
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Affects Versions: 1.0.2
Reporter: Sandy Ryza
Assignee: Andrew Or
 Attachments: SPARK-3174design.pdf, SparkElasticScalingDesignB.pdf, 
 dynamic-scaling-executors-10-6-14.pdf


 A common complaint with Spark in a multi-tenant environment is that 
 applications have a fixed allocation that doesn't grow and shrink with their 
 resource needs.  We're blocked on YARN-1197 for dynamically changing the 
 resources within executors, but we can still allocate and discard whole 
 executors.
 It would be useful to have some heuristics that
 * Request more executors when many pending tasks are building up
 * Discard executors when they are idle
 See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-14 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171580#comment-14171580
 ] 

Marcelo Vanzin commented on SPARK-3174:
---

[~andrewor14] still regarding the shuffle service, your plan is not to use the 
existing ShuffleService in Hadoop, but still deploy spark's shuffle service as 
a node manager aux service (in the case of Yarn), correct? Or is that API not 
generic enough for Spark's needs?

 Provide elastic scaling within a Spark application
 --

 Key: SPARK-3174
 URL: https://issues.apache.org/jira/browse/SPARK-3174
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Affects Versions: 1.0.2
Reporter: Sandy Ryza
Assignee: Andrew Or
 Attachments: SPARK-3174design.pdf, SparkElasticScalingDesignB.pdf, 
 dynamic-scaling-executors-10-6-14.pdf


 A common complaint with Spark in a multi-tenant environment is that 
 applications have a fixed allocation that doesn't grow and shrink with their 
 resource needs.  We're blocked on YARN-1197 for dynamically changing the 
 resources within executors, but we can still allocate and discard whole 
 executors.
 It would be useful to have some heuristics that
 * Request more executors when many pending tasks are building up
 * Discard executors when they are idle
 See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-13 Thread Andrew Or (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169830#comment-14169830
]

Andrew Or commented on SPARK-3174:
--

[~vanzin]

bq. My first question I think is similar to Tom's. It was not clear to me how
the app will behave when it starts up. I'd expect the first job to be the one
that has to process the largest amount of data, so it would benefit from having
as many executors as possible available as quickly as possible - something that
seems to conflict with the idea of a slow start.
Are you proposing a change to the current semantics, where Yarn will request
--num-executors up front? If you keep that, I think that would cover my above
concerns. But switching to a slow start with no option to pre-allocate a
certain numbers seems like it might harm certain jobs.

I'm actually not proposing to change the application start-up behavior. Spark
will continue to request however many number of executors it will today
upfront. The slow-start comes in when you want to add executors after removing
them. Also, you can control how often you want to add executors with a config,
so if the application wants the behavior where it requests all executors at
once, it can still do that.

bq. My second is about the shuffle service you're proposing. Have you
investigated whether it would be possible to make Hadoop's shuffle service more
generic, so that Spark can benefit from it? It does mean that this feature
might be constrained to certain versions of Hadoop, but maybe that's not
necessarily a bad thing if it means more infrastructure is shared.

I have indeed. The main difficulty to integrating Spark and Yarn cleanly there
stems from the hard-coded shuffle file paths and index shuffle file format.
Currently, both are highly specific to MR, and although we can work around them
by adapting Spark's shuffle behavior to MR's (non-trivial but certainly
possible), we'll only be able to use the feature on Yarn. If we decide to
extend this feature to standalone or mesos mode, we'll have to do what we're
doing right now anyway since we can't rely on the Yarn ShuffleHandler there.

Provide elastic scaling within a Spark application
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-13 Thread Andrew Or (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169856#comment-14169856
 ] 

Andrew Or commented on SPARK-3174:
--

[~sandyr]
bq. Consider the (common) case of a user keeping a Hive session open and 
setting a low number of minimum executors in order to not sit on cluster 
resources when idle. Goal number 1 should be making queries return as fast as 
possible. A policy that, upon receiving a job, simply requested executors with 
enough slots to handle all the tasks required by the first stage would be a 
vast latency and user experience improvement over the exponential increase 
policy. Given that resource managers like YARN will mediate fairness between 
users and that Spark will be able to give executors back, there's not much 
advantage to being conservative or ramping up slowly in this case. Accurately 
anticipating resource needs is difficult, but not necessary.

Yes, in this case we may want to get back many executors quickly, but this can 
be achieved even in the exponential increase model because we expose a config 
that regulates how often executors should be added. Slow-start is actually not 
slow at all if we lower the interval between which we add executors. I just 
think that this model gives the application more control and flexibility than 
one where you get either all executors or very few of them.

 Provide elastic scaling within a Spark application
 --

 Key: SPARK-3174
 URL: https://issues.apache.org/jira/browse/SPARK-3174
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Affects Versions: 1.0.2
Reporter: Sandy Ryza
Assignee: Andrew Or
 Attachments: SPARK-3174design.pdf, 
 dynamic-scaling-executors-10-6-14.pdf


 A common complaint with Spark in a multi-tenant environment is that 
 applications have a fixed allocation that doesn't grow and shrink with their 
 resource needs.  We're blocked on YARN-1197 for dynamically changing the 
 resources within executors, but we can still allocate and discard whole 
 executors.
 It would be useful to have some heuristics that
 * Request more executors when many pending tasks are building up
 * Discard executors when they are idle
 See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-13 Thread Timothy St. Clair (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170033#comment-14170033
 ] 

Timothy St. Clair commented on SPARK-3174:
--

[~pwendell] are you talking about resizing? 

[~nnielsen] [~tnachen] ^ FYI. 

 Provide elastic scaling within a Spark application
 --

 Key: SPARK-3174
 URL: https://issues.apache.org/jira/browse/SPARK-3174
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Affects Versions: 1.0.2
Reporter: Sandy Ryza
Assignee: Andrew Or
 Attachments: SPARK-3174design.pdf, 
 dynamic-scaling-executors-10-6-14.pdf


 A common complaint with Spark in a multi-tenant environment is that 
 applications have a fixed allocation that doesn't grow and shrink with their 
 resource needs.  We're blocked on YARN-1197 for dynamically changing the 
 resources within executors, but we can still allocate and discard whole 
 executors.
 It would be useful to have some heuristics that
 * Request more executors when many pending tasks are building up
 * Discard executors when they are idle
 See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-13 Thread Andrew Or (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170244#comment-14170244
 ] 

Andrew Or commented on SPARK-3174:
--

@[~sandyr] by quick-start do you mean adding all executors all at once? If I 
understand correctly, your concern with requesting executors in rounds is that 
we will end up making many requests if we have many executors, when we could 
instead just batch them?

 Provide elastic scaling within a Spark application
 --

 Key: SPARK-3174
 URL: https://issues.apache.org/jira/browse/SPARK-3174
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Affects Versions: 1.0.2
Reporter: Sandy Ryza
Assignee: Andrew Or
 Attachments: SPARK-3174design.pdf, 
 dynamic-scaling-executors-10-6-14.pdf


 A common complaint with Spark in a multi-tenant environment is that 
 applications have a fixed allocation that doesn't grow and shrink with their 
 resource needs.  We're blocked on YARN-1197 for dynamically changing the 
 resources within executors, but we can still allocate and discard whole 
 executors.
 It would be useful to have some heuristics that
 * Request more executors when many pending tasks are building up
 * Discard executors when they are idle
 See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-07 Thread Marcelo Vanzin (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162514#comment-14162514
]

Marcelo Vanzin commented on SPARK-3174:
---

Hi Andrew, thanks for writing this up.

My first question I think is similar to Tom's. It was not clear to me how the
app will behave when it starts up. I'd expect the first job to be the one that
has to process the largest amount of data, so it would benefit from having as
many executors as possible available as quickly as possible - something that
seems to conflict with the idea of a slow start.

Are you proposing a change to the current semantics, where Yarn will request
--num-executors up front? If you keep that, I think that would cover my above
concerns. But switching to a slow start with no option to pre-allocate a
certain numbers seems like it might harm certain jobs.

My second is about the shuffle service you're proposing. Have you investigated
whether it would be possible to make Hadoop's shuffle service more generic, so
that Spark can benefit from it? It does mean that this feature might be
constrained to certain versions of Hadoop, but maybe that's not necessarily a
bad thing if it means more infrastructure is shared.

Provide elastic scaling within a Spark application
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-07 Thread Sandy Ryza (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162685#comment-14162685
]

Sandy Ryza commented on SPARK-3174:
---

bq. Maybe it makes sense to just call it `spark.dynamicAllocation.*`
That sounds good to me.

bq. I think in general we should limit the number of things that will affect
adding/removing executors. Otherwise an application might get/lose many
executors all of a sudden without a good understanding of why. Also
anticipating what's needed in a future stage is usually fairly difficult,
because you don't know a priori how long each stage is running. I don't see a
good metric to decide how far in the future to anticipate for.

Consider the (common) case of a user keeping a Hive session open and setting a
low number of minimum executors in order to not sit on cluster resources when
idle. Goal number 1 should be making queries return as fast as possible. A
policy that, upon receiving a job, simply requested executors with enough slots
to handle all the tasks required by the first stage would be a vast latency and
user experience improvement over the exponential increase policy. Given that
resource managers like YARN will mediate fairness between users and that Spark
will be able to give executors back, there's not much advantage to being
conservative or ramping up slowly in this case. Accurately anticipating
resource needs is difficult, but not necessary.

Provide elastic scaling within a Spark application
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-06 Thread Andrew Or (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160914#comment-14160914
 ] 

Andrew Or commented on SPARK-3174:
--

Hi all, I have attached an updated design doc that details the various 
components of the proposed solution. It summarizes the considerations in each 
component and explains why we have chosen the approaches outlined in the doc. 
The re-organization of this issue corresponds to the main components of the 
design as follows:

(1) Heuristics for scaling executors (SPARK-3795)
(2) Mechanism for scaling executors (SPARK-3822)
(3) Graceful decommission of executors (SPARK-3796 / SPARK-3797)

Let me know if you have any questions or feedback.

 Provide elastic scaling within a Spark application
 --

 Key: SPARK-3174
 URL: https://issues.apache.org/jira/browse/SPARK-3174
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Affects Versions: 1.0.2
Reporter: Sandy Ryza
Assignee: Andrew Or
 Attachments: SPARK-3174design.pdf, 
 dynamic-scaling-executors-10-6-14.pdf


 A common complaint with Spark in a multi-tenant environment is that 
 applications have a fixed allocation that doesn't grow and shrink with their 
 resource needs.  We're blocked on YARN-1197 for dynamically changing the 
 resources within executors, but we can still allocate and discard whole 
 executors.
 It would be useful to have some heuristics that
 * Request more executors when many pending tasks are building up
 * Discard executors when they are idle
 See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-06 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160937#comment-14160937
 ] 

Sandy Ryza commented on SPARK-3174:
---

Thanks for posting the detailed design, Andrew.  A few comments.

I would expect properties underneath spark.executor.* to pertain to what goes 
on inside of executors.  This is really more of a driver/scheduler feature, so 
a different prefix might make more sense.

Because it's a large task and there's still significant value without it, I 
assume we'll hold off on implementing the graceful decommission until we're 
done with the first parts?

This is probably out of scope for the first cut, but in the future it might be 
useful to include addition/removal policies that use what Spark knows about 
upcoming stages to anticipate the number of executors needed.  Can we structure 
the config property names in a way that will make sense if we choose to add 
more advanced functionality like this?

When cluster resources are constrained, we may find ourselves in situations 
where YARN is unable to allocate the additional resources we requested before 
the next time interval.  I haven't thought about it extremely deeply, but it 
seems like there may be some pathological situations in which we request an 
enormous number of additional executors while waiting.  It might make sense to 
do something like avoid increasing the number requested until we've actually 
received some?

Last, any thoughts on what reasonable intervals would be?  For the add 
interval, I imagine that we want it to be at least the amount of time required 
between invoking requestExecutors and being able to schedule tasks on the 
executors requested.

 Provide elastic scaling within a Spark application
 --

 Key: SPARK-3174
 URL: https://issues.apache.org/jira/browse/SPARK-3174
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Affects Versions: 1.0.2
Reporter: Sandy Ryza
Assignee: Andrew Or
 Attachments: SPARK-3174design.pdf, 
 dynamic-scaling-executors-10-6-14.pdf


 A common complaint with Spark in a multi-tenant environment is that 
 applications have a fixed allocation that doesn't grow and shrink with their 
 resource needs.  We're blocked on YARN-1197 for dynamically changing the 
 resources within executors, but we can still allocate and discard whole 
 executors.
 It would be useful to have some heuristics that
 * Request more executors when many pending tasks are building up
 * Discard executors when they are idle
 See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-06 Thread Thomas Graves (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160939#comment-14160939
]

Thomas Graves commented on SPARK-3174:
--

Good write up Andrew.

Just to make sure, the things in blue with the star in the doc are the
approaches you are proposing?

So how well is the spark scheduler going to handle the addition of/waiting for
executors? is SPARK-3795 going to address making that better? Meaning if you
start a job with a small number of executors it will be scheduled non-optimally
and in many cases will cause failures. Hence why we added the configs to wait
for executors before starting.

Will the config(s) be allowed to be changed part of the way through an
application? for instance, lets say I do some ETL stuff where I want it to do
dynamic, but then I need to run an ML algorithm or do some heavy caching where
I want to shut it off.

Is it also safe to assume this doesn't handle changing the resource
requirements of the containers? Ie the executors it starts and stops will also
be the same size.

Provide elastic scaling within a Spark application
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-06 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160960#comment-14160960
 ] 

Sandy Ryza commented on SPARK-3174:
---

bq. for instance, lets say I do some ETL stuff where I want it to do dynamic, 
but then I need to run an ML algorithm or do some heavy caching where I want to 
shut it off.

IIUC, the proposal only gets rid of executors once they are idle / empty of 
cached data.  If that's the case, do you still see issues with leaving dynamic 
allocation on during the ML algorithm phase?

 Provide elastic scaling within a Spark application
 --

 Key: SPARK-3174
 URL: https://issues.apache.org/jira/browse/SPARK-3174
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Affects Versions: 1.0.2
Reporter: Sandy Ryza
Assignee: Andrew Or
 Attachments: SPARK-3174design.pdf, 
 dynamic-scaling-executors-10-6-14.pdf


 A common complaint with Spark in a multi-tenant environment is that 
 applications have a fixed allocation that doesn't grow and shrink with their 
 resource needs.  We're blocked on YARN-1197 for dynamically changing the 
 resources within executors, but we can still allocate and discard whole 
 executors.
 It would be useful to have some heuristics that
 * Request more executors when many pending tasks are building up
 * Discard executors when they are idle
 See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-06 Thread Thomas Graves (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160988#comment-14160988
]

Thomas Graves commented on SPARK-3174:
--

Perhaps I misread it then because the proposal that is starred in Graceful
Decommission. I didn't see anything else mentioned in the removing but perhaps
I missed it? If the graceful decommission is going to be handled later thats
fine but perhaps we clarify in the doc.

==
Do nothing, and recompute these blocks later if necessary

This is the simplest solution. Note that this may not be optimal if the
application relies heavily on caching, however. A relevant question then is
what to do if the user attempts to cache blocks. Spark should at least log a
warning so the performance does not degrade silently. It may also be
appropriate to throw an exception when caching is detected. If we do throw
an exception, we could add an extra Spark property that only disables this
exception if the user knows what s/he is doing.

Provide elastic scaling within a Spark application
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-06 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161048#comment-14161048
 ] 

Sandy Ryza commented on SPARK-3174:
---

Ah, misread.

My opinion is that, for a first cut without the graceful decommission, we 
should go with option 2 and only remove executors that have no cached blocks.

 Provide elastic scaling within a Spark application
 --

 Key: SPARK-3174
 URL: https://issues.apache.org/jira/browse/SPARK-3174
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Affects Versions: 1.0.2
Reporter: Sandy Ryza
Assignee: Andrew Or
 Attachments: SPARK-3174design.pdf, 
 dynamic-scaling-executors-10-6-14.pdf


 A common complaint with Spark in a multi-tenant environment is that 
 applications have a fixed allocation that doesn't grow and shrink with their 
 resource needs.  We're blocked on YARN-1197 for dynamically changing the 
 resources within executors, but we can still allocate and discard whole 
 executors.
 It would be useful to have some heuristics that
 * Request more executors when many pending tasks are building up
 * Discard executors when they are idle
 See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-06 Thread Andrew Or (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161378#comment-14161378
]

Andrew Or commented on SPARK-3174:
--

@[~tgraves] Replying inline:

bq. Just to make sure, the things in blue with the star in the doc are the
approaches you are proposing?

Yes.

bq. So how well is the spark scheduler going to handle the addition of/waiting
for executors? is SPARK-3795 going to address making that better? Meaning if
you start a job with a small number of executors it will be scheduled
non-optimally and in many cases will cause failures. Hence why we added the
configs to wait for executors before starting.

Not sure if I understand what you mean by this. I would assume that the Spark
application will start with a reasonable number of executors. This scaling
feature is mainly concerned with scaling down your executors from the ones you
started with, but because we may need them later we also need a way and a
heuristic to add them back. I did not intend for this feature to be, for
instance, used for bootstrapping from one executor in the beginning.

bq. Will the config(s) be allowed to be changed part of the way through an
application? for instance, lets say I do some ETL stuff where I want it to do
dynamic, but then I need to run an ML algorithm or do some heavy caching where
I want to shut it off.

Yeah, especially for a REPL it would be good to change this across jobs. This
proposal is a first-cut design and does not incorporate that. Though I think
this is a more general issue; Spark configurations are intended to be set for
the entire duration of the application, but many are somewhat specific to each
job within that application. I think it is of interest to eventually be able to
configure this dynamically, but I don't have a great idea of how to expose that
at the moment.

bq. Is it also safe to assume this doesn't handle changing the resource
requirements of the containers? Ie the executors it starts and stops will also
be the same size.

Yes, we do not resize the executor's JVM or container (I don't think Yarn
supports this yet). Our unit of execution here is on the granularity of
fixed-size executors.

Provide elastic scaling within a Spark application
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

23 matches

Site Navigation

Mail list logo

Footer information