Are you using Yarn to run spark jobs only ?. Are you configuring spark
properties in spark-submit parameters? . If so
did you try with --no - of - executors x*53 (where x is no of nodes )
--spark executor-memory 1g --spark-driver-memory 1g.
You might see yarn allocating
I mean Jonathan
On Tue, Feb 9, 2016 at 10:41 AM, Alexander Pivovarov
wrote:
> I decided to do YARN over-commit and add 896
> to yarn.nodemanager.resource.memory-mb
> it was 54,272
> now I set it to 54,272+896 = 55,168
>
> Kelly, can I ask you couple questions
> 1. it is
I decided to do YARN over-commit and add 896
to yarn.nodemanager.resource.memory-mb
it was 54,272
now I set it to 54,272+896 = 55,168
Kelly, can I ask you couple questions
1. it is possible to add yarn label to particular instance group boxes on
EMR?
2. in addition to maximizeResourceAllocation
RDD level partitioning information is not used to decide when to shuffle
for queries planned using Catalyst (since we have better information about
distribution from the query plan itself). Instead you should be looking at
the logic in EnsureRequirements
Thanks Jonathan
Actually I'd like to use maximizeResourceAllocation.
Ideally for me would be to add new instance group having single small box
labelled as AM
I'm not sure "aws emr create-cluster" supports setting custom LABELS , the
only settings awailable are:
Interesting, I was not aware of spark.yarn.am.nodeLabelExpression.
We do use YARN labels on EMR; each node is automatically labeled with its
type (MASTER, CORE, or TASK). And we do
set yarn.app.mapreduce.am.labels=CORE in yarn-site.xml, but we do not set
spark.yarn.am.nodeLabelExpression.
Does
If it's too small to run an executor, I'd think it would be chosen for
the AM as the only way to satisfy the request.
On Tue, Feb 9, 2016 at 8:35 AM, Alexander Pivovarov
wrote:
> If I add additional small box to the cluster can I configure yarn to select
> small box to run
On 9 Feb 2016, at 05:55, Prabhu Joseph
> wrote:
+ Spark-Dev
On Tue, Feb 9, 2016 at 10:04 AM, Prabhu Joseph
> wrote:
Hi All,
A long running Spark job on YARN throws
If I add additional small box to the cluster can I configure yarn to select
small box to run am container?
On Mon, Feb 8, 2016 at 10:53 PM, Sean Owen wrote:
> Typically YARN is there because you're mediating resource requests
> from things besides Spark, so yeah using every
How about running in client mode, so that the client from which it is run
becomes the driver.
Regards,
Praveen
On 9 Feb 2016 16:59, "Steve Loughran" wrote:
>
> > On 9 Feb 2016, at 06:53, Sean Owen wrote:
> >
> >
> > I think you can let YARN
> On 9 Feb 2016, at 06:53, Sean Owen wrote:
>
>
> I think you can let YARN over-commit RAM though, and allocate more
> memory than it actually has. It may be beneficial to let them all
> think they have an extra GB, and let one node running the AM
> technically be
Sean, I'm not sure if that's actually the case, since the AM would be
allocated before the executors are even requested (by the driver through
the AM), right? This must at least be the case with dynamicAllocation
enabled, but I would expect that it's true regardless.
However, Alex, yes, this
On 9 Feb 2016, at 11:26, Steve Loughran
> wrote:
On 9 Feb 2016, at 05:55, Prabhu Joseph
> wrote:
+ Spark-Dev
On Tue, Feb 9, 2016 at 10:04 AM, Prabhu Joseph
Praveen,
You mean cluster mode, right? That would still in a sense cause one box to
be "wasted", but at least it would be used a bit more to its full
potential, especially if you set spark.driver.memory to higher than its 1g
default. Also, cluster mode is not an option for some applications, such
Do you mind pastebin'ning code snippet and exception one more time - I
couldn't see them in your original email.
Which Spark release are you using ?
On Tue, Feb 9, 2016 at 11:55 AM, rakeshchalasani
wrote:
> Hi All:
>
> I am getting an "UnsupportedOperationException" when
On Tue, Feb 9, 2016 at 12:16 PM, Jonathan Kelly wrote:
> And we do set yarn.app.mapreduce.am.labels=CORE
That sounds very mapreduce-specific, so I doubt Spark (or anything
non-MR) would honor it.
--
Marcelo
How about changing the last line to:
scala> val df2 = df.select(functions.array(df("a"),
df("b")).alias("arrayCol"))
df2: org.apache.spark.sql.DataFrame = [arrayCol: array]
scala> df2.show()
++
|arrayCol|
++
| [0, 1]|
| [1, 2]|
| [2, 3]|
| [3, 4]|
| [4, 5]|
| [5, 6]|
| [6,
Sorry, didn't realize the mail didn't show the code. Using Spark release
1.6.0
Below is an example to reproduce it.
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sparkContext)
import sqlContext.implicits._
import org.apache.spark.sql.functions
case class Test(a:Int,
Do you mean using "alias" instead of "as"? Unfortunately, that didn't help
> val arrayCol = functions.array(df("a"), df("b")).alias("arrayCol")
still throws the error.
Surprisingly, doing the same thing inside a select works,
> df.select(functions.array(df("a"), df("b")).as("arrayCol")).show()
What's your plan of using the arrayCol ?
It would be part of some query, right ?
On Tue, Feb 9, 2016 at 2:27 PM, Rakesh Chalasani
wrote:
> Do you mean using "alias" instead of "as"? Unfortunately, that didn't help
>
> > val arrayCol = functions.array(df("a"),
That looks like a bug in toString for columns. Can you open a JIRA?
On Tue, Feb 9, 2016 at 1:38 PM, Rakesh Chalasani
wrote:
> Sorry, didn't realize the mail didn't show the code. Using Spark release
> 1.6.0
>
> Below is an example to reproduce it.
>
> import
We are trying to dynamically create the query, with columns coming from
different places. We can over come this with a few more lines of code, but
it would be nice for us pass on the `alias` along (given that we can do so
for all the rest of the frame operations.)
Created JIRA here
The credentials file approach (using keytab for spark apps) will only
update HDFS tokens. YARN's AMRM tokens should be taken care of by YARN
internally.
Steve - correct me if I am wrong here: If the AMRM tokens are disappearing
it might be a YARN bug (does the AMRM token have a 7 day limit as
Hi All:
I am getting an "UnsupportedOperationException" when trying to alias an
array column. The issue seems to be at "CreateArray" expression -> dataType,
which checks for nullability of its children, while aliasing is creating a
PrettyAttribute that does not implement nullability.
Below is
You can set custom per-instance-group configurations (e.g.,
["classification":"yarn-site",properties:{"yarn.nodemanager.labels":"SPARKAM"}])
using the Configurations parameter of
http://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_InstanceGroupConfig.html.
Unfortunately, it's not currently
Oh, sheesh, how silly of me. I copied and pasted that setting name without
even noticing the "mapreduce" in it. Yes, I guess that would mean that
Spark AMs are probably running even on TASK instances currently, which is
OK but not consistent with what we do for MapReduce. I'll make sure we
set
Can you add an ability to set custom yarn labels instead/in addition to?
On Feb 9, 2016 3:28 PM, "Jonathan Kelly" wrote:
> Oh, sheesh, how silly of me. I copied and pasted that setting name without
> even noticing the "mapreduce" in it. Yes, I guess that would mean that
>
Great! Thank you!
On Tue, Feb 9, 2016 at 4:02 PM, Jonathan Kelly
wrote:
> You can set custom per-instance-group configurations (e.g.,
> ["classification":"yarn-site",properties:{"yarn.nodemanager.labels":"SPARKAM"}])
> using the Configurations parameter of
>
Can anybody confirm, whether ANY operator in Spark SQL uses
map-side-combine ? If not, is it safe to assume SortShuffleManager will
always use Serialized sorting in case of queries from Spark SQL ?
Forwarding to the dev list, hoping someone can chime in.
@mengxr?
From: Li Ming Tsai
Sent: Wednesday, February 10, 2016 12:43 PM
To: u...@spark.apache.org
Subject: Re: Slowness in Kmeans calculating fastSquaredDistance
Hi,
It looks
Am container starts first and yarn selects random computer to run it.
Is it possible to configure yarn so that it selects small computer for am
container.
On Feb 9, 2016 12:40 AM, "Sean Owen" wrote:
> If it's too small to run an executor, I'd think it would be chosen for
>
You should be able to use spark.yarn.am.nodeLabelExpression if your
version of YARN supports node labels (and you've added a label to the
node where you want the AM to run).
On Tue, Feb 9, 2016 at 9:51 AM, Alexander Pivovarov
wrote:
> Am container starts first and yarn
32 matches
Mail list logo