Are you using Yarn to run spark jobs only ?. Are you configuring spark
properties in spark-submit parameters? . If so
did you try with --no - of - executors x*53 (where x is no of nodes )
--spark executor-memory 1g --spark-driver-memory 1g.
You might see yarn allocating
I mean Jonathan
On Tue, Feb 9, 2016 at 10:41 AM, Alexander Pivovarov
wrote:
> I decided to do YARN over-commit and add 896
> to yarn.nodemanager.resource.memory-mb
> it was 54,272
> now I set it to 54,272+896 = 55,168
>
> Kelly, can I ask you couple questions
> 1. it is
I decided to do YARN over-commit and add 896
to yarn.nodemanager.resource.memory-mb
it was 54,272
now I set it to 54,272+896 = 55,168
Kelly, can I ask you couple questions
1. it is possible to add yarn label to particular instance group boxes on
EMR?
2. in addition to maximizeResourceAllocation
Thanks Jonathan
Actually I'd like to use maximizeResourceAllocation.
Ideally for me would be to add new instance group having single small box
labelled as AM
I'm not sure "aws emr create-cluster" supports setting custom LABELS , the
only settings awailable are:
Interesting, I was not aware of spark.yarn.am.nodeLabelExpression.
We do use YARN labels on EMR; each node is automatically labeled with its
type (MASTER, CORE, or TASK). And we do
set yarn.app.mapreduce.am.labels=CORE in yarn-site.xml, but we do not set
spark.yarn.am.nodeLabelExpression.
Does
If it's too small to run an executor, I'd think it would be chosen for
the AM as the only way to satisfy the request.
On Tue, Feb 9, 2016 at 8:35 AM, Alexander Pivovarov
wrote:
> If I add additional small box to the cluster can I configure yarn to select
> small box to run
If I add additional small box to the cluster can I configure yarn to select
small box to run am container?
On Mon, Feb 8, 2016 at 10:53 PM, Sean Owen wrote:
> Typically YARN is there because you're mediating resource requests
> from things besides Spark, so yeah using every
How about running in client mode, so that the client from which it is run
becomes the driver.
Regards,
Praveen
On 9 Feb 2016 16:59, "Steve Loughran" wrote:
>
> > On 9 Feb 2016, at 06:53, Sean Owen wrote:
> >
> >
> > I think you can let YARN
> On 9 Feb 2016, at 06:53, Sean Owen wrote:
>
>
> I think you can let YARN over-commit RAM though, and allocate more
> memory than it actually has. It may be beneficial to let them all
> think they have an extra GB, and let one node running the AM
> technically be
Sean, I'm not sure if that's actually the case, since the AM would be
allocated before the executors are even requested (by the driver through
the AM), right? This must at least be the case with dynamicAllocation
enabled, but I would expect that it's true regardless.
However, Alex, yes, this
Praveen,
You mean cluster mode, right? That would still in a sense cause one box to
be "wasted", but at least it would be used a bit more to its full
potential, especially if you set spark.driver.memory to higher than its 1g
default. Also, cluster mode is not an option for some applications, such
On Tue, Feb 9, 2016 at 12:16 PM, Jonathan Kelly wrote:
> And we do set yarn.app.mapreduce.am.labels=CORE
That sounds very mapreduce-specific, so I doubt Spark (or anything
non-MR) would honor it.
--
Marcelo
You can set custom per-instance-group configurations (e.g.,
["classification":"yarn-site",properties:{"yarn.nodemanager.labels":"SPARKAM"}])
using the Configurations parameter of
http://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_InstanceGroupConfig.html.
Unfortunately, it's not currently
Oh, sheesh, how silly of me. I copied and pasted that setting name without
even noticing the "mapreduce" in it. Yes, I guess that would mean that
Spark AMs are probably running even on TASK instances currently, which is
OK but not consistent with what we do for MapReduce. I'll make sure we
set
Can you add an ability to set custom yarn labels instead/in addition to?
On Feb 9, 2016 3:28 PM, "Jonathan Kelly" wrote:
> Oh, sheesh, how silly of me. I copied and pasted that setting name without
> even noticing the "mapreduce" in it. Yes, I guess that would mean that
>
Great! Thank you!
On Tue, Feb 9, 2016 at 4:02 PM, Jonathan Kelly
wrote:
> You can set custom per-instance-group configurations (e.g.,
> ["classification":"yarn-site",properties:{"yarn.nodemanager.labels":"SPARKAM"}])
> using the Configurations parameter of
>
Am container starts first and yarn selects random computer to run it.
Is it possible to configure yarn so that it selects small computer for am
container.
On Feb 9, 2016 12:40 AM, "Sean Owen" wrote:
> If it's too small to run an executor, I'd think it would be chosen for
>
You should be able to use spark.yarn.am.nodeLabelExpression if your
version of YARN supports node labels (and you've added a label to the
node where you want the AM to run).
On Tue, Feb 9, 2016 at 9:51 AM, Alexander Pivovarov
wrote:
> Am container starts first and yarn
Alex,
That's a very good question that I've been trying to answer myself recently
too. Since you've mentioned before that you're using EMR, I assume you're
asking this because you've noticed this behavior on emr-4.3.0.
In this release, we made some changes to the maximizeResourceAllocation
Lets say that yarn has 53GB memory available on each slave
spark.am container needs 896MB. (512 + 384)
I see two options to configure spark:
1. configure spark executors to use 52GB and leave 1 GB on each box. So,
some box will also run am container. So, 1GB memory will not be used on all
Typically YARN is there because you're mediating resource requests
from things besides Spark, so yeah using every bit of the cluster is a
little bit of a corner case. There's not a good answer if all your
nodes are the same size.
I think you can let YARN over-commit RAM though, and allocate more
21 matches
Mail list logo