Re: Spark-on-Yarn ClassNotFound Exception

2022-12-18 Thread Hariharan
Hi scrypso, Sorry for the late reply. Yes, I did mean spark.driver.extraClassPath. I was able to work around this issue by removing the need for an extra class, but I'll investigate along these lines nonetheless. Thanks again for all your help! On Thu, Dec 15, 2022 at 9:56 PM scrypso wrote: >

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-15 Thread scrypso
Hmm, did you mean spark.*driver*.extraClassPath? That is very odd then - if you check the logs directory for the driver (on the cluster) I think there should be a launch container log, where you can see the exact command used to start the JVM (at the very end), and a line starting "export CLASSPATH

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread Hariharan
Hi scrypso, Thanks for the help so far, and I think you're definitely on to something here. I tried loading the class as you suggested with the code below: try { Thread.currentThread().getContextClassLoader().loadClass(MyS3ClientFactory.class.getCanonicalName()); logger.info("Loaded cust

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread scrypso
I'm on my phone, so can't compare with the Spark source, but that looks to me like it should be well after the ctx loader has been set. You could try printing the classpath of the loader Thread.currentThread().getThreadContextClassLoader(), or try to load your class from that yourself to see if you

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread Hariharan
Thanks for the response, scrypso! I will try adding the extraClassPath option. Meanwhile, please find the full stack trace below (I have masked/removed references to proprietary code) java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class foo.bar.MyS3Client

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread scrypso
Two ideas you could try: You can try spark.driver.extraClassPath as well. Spark loads the user's jar in a child classloader, so Spark/Yarn/Hadoop can only see your classes reflectively. Hadoop's Configuration should use the thread ctx classloader, and Spark should set that to the loader that loads

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread Hariharan
Missed to mention it above, but just to add, the error is coming from the driver. I tried using *--driver-class-path /path/to/my/jar* as well, but no luck. Thanks! On Mon, Dec 12, 2022 at 4:21 PM Hariharan wrote: > Hello folks, > > I have a spark app with a custom implementation of > *fs.s3a.s3

Spark-on-Yarn ClassNotFound Exception

2022-12-12 Thread Hariharan
Hello folks, I have a spark app with a custom implementation of *fs.s3a.s3.client.factory.impl* which is packaged into the same jar. Output of *jar tf* *2620 Mon Dec 12 11:23:00 IST 2022 aws/utils/MyS3ClientFactory.class* However when I run the my spark app with spark-submit in cluster mode, it

Spark on YARN with private Docker repositories/registries

2019-08-16 Thread Tak-Lon (Stephen) Wu
Hi guys, Have anyone been using spark (spark-submit) with yarn mode which pull images from a private Docker repositories/registries ?? how do you pass in the docker config.json which included the auth tokens ? or is there any environment variable can be added in the system environment to make it

Spark on Yarn - Dynamically getting a list of archives from --archives in spark-submit

2019-06-13 Thread Tommy Li
Hi Is there any way to get a list of the archives submitted with a spark job from the spark context? I see that spark context has a `.files()` function which returns the files included with `--files`, but I don't see an equivalent for `--archives`. Thanks, Tommy

Re: [spark on yarn] spark on yarn without DFS

2019-05-23 Thread Achilleus 003
doing jobs (I.e. when the in memory need stop spill over to disk) >>>> >>>> For these operations, Spark does need a distributed file system - You >>>> could use something like EMRFS (which is like a HDFS backed by S3) on >>>> Amazon. >>>> >>

Re: [spark on yarn] spark on yarn without DFS

2019-05-22 Thread Gourav Sengupta
else too - so a stacktrace or error message >>> could help in understanding the problem. >>> >>> >>> >>> On Mon, May 20, 2019, 07:20 Huizhe Wang wrote: >>> >>>> Hi, >>>> >>>> I wanna to use Spark on Yarn without HDFS.I store my resource in AWS >>>> and using s3a to get them. However, when I use stop-dfs.sh stoped Namenode >>>> and DataNode. I got an error when using yarn cluster mode. Could I using >>>> yarn without start DFS, how could I use this mode? >>>> >>>> Yours, >>>> Jane >>>> >>>

Re: [spark on yarn] spark on yarn without DFS

2019-05-21 Thread Huizhe Wang
issue could be something else too - so a stacktrace or error message >> could help in understanding the problem. >> >> >> >> On Mon, May 20, 2019, 07:20 Huizhe Wang wrote: >> >>> Hi, >>> >>> I wanna to use Spark on Yarn without HDFS

Re: [spark on yarn] spark on yarn without DFS

2019-05-20 Thread JB Data31
e something like EMRFS (which is like a HDFS backed by S3) on >> Amazon. >> >> The issue could be something else too - so a stacktrace or error message >> could help in understanding the problem. >> >> >> >> On Mon, May 20, 2019, 07:20 Huizhe Wang wrote

Re: [spark on yarn] spark on yarn without DFS

2019-05-20 Thread Hariharan
system - You > could use something like EMRFS (which is like a HDFS backed by S3) on > Amazon. > > The issue could be something else too - so a stacktrace or error message > could help in understanding the problem. > > > > On Mon, May 20, 2019, 07:20 Huizhe Wang wrote:

Re: [spark on yarn] spark on yarn without DFS

2019-05-19 Thread Abdeali Kothari
something like EMRFS (which is like a HDFS backed by S3) on Amazon. The issue could be something else too - so a stacktrace or error message could help in understanding the problem. On Mon, May 20, 2019, 07:20 Huizhe Wang wrote: > Hi, > > I wanna to use Spark on Yarn without HDFS.I

Re: [spark on yarn] spark on yarn without DFS

2019-05-19 Thread Jeff Zhang
I am afraid not, because yarn needs dfs. Huizhe Wang 于2019年5月20日周一 上午9:50写道: > Hi, > > I wanna to use Spark on Yarn without HDFS.I store my resource in AWS and > using s3a to get them. However, when I use stop-dfs.sh stoped Namenode and > DataNode. I got an error when using ya

[spark on yarn] spark on yarn without DFS

2019-05-19 Thread Huizhe Wang
Hi, I wanna to use Spark on Yarn without HDFS.I store my resource in AWS and using s3a to get them. However, when I use stop-dfs.sh stoped Namenode and DataNode. I got an error when using yarn cluster mode. Could I using yarn without start DFS, how could I use this mode? Yours, Jane

Re: Spark on yarn - application hangs

2019-05-10 Thread Mich Talebzadeh
sure NP. I meant these topics [image: image.png] Have a look at this article of mine https://www.linkedin.com/pulse/real-time-processing-trade-data-kafka-flume-spark-talebzadeh-ph-d-/ under section Understanding the Spark Application Through Visualization See if it helps HTH Dr Mich Taleb

Re: Spark on yarn - application hangs

2019-05-10 Thread Mkal
How can i check what exactly is stagnant? Do you mean on the DAG visualization on Spark UI? Sorry i'm new to spark. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@s

Re: Spark on yarn - application hangs

2019-05-10 Thread Mich Talebzadeh
Hi, Have you checked matrices from Spark UI by any chance? What is stagnant? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebz

Spark on yarn - application hangs

2019-05-10 Thread Mkal
I've built a spark job in which an external program is called through the use of pipe(). Job runs correctly on cluster when the input is a small sample dataset but when the input is a real large dataset it stays on RUNNING state forever. I've tried different ways to tune executor memory, executor

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-12 Thread Vadim Semenov
Yeah, then the easiest would be to fork spark and run using the forked version, and in case of YARN it should be pretty easy to do. git clone https://github.com/apache/spark.git cd spark export MAVEN_OPTS="-Xmx4g -XX:ReservedCodeCacheSize=512m" ./build/mvn -DskipTests clean package ./dev/make-

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-12 Thread Serega Sheypak
I tried a similar approach, it works well for user functions. but I need to crash tasks or executor when spark application runs "repartition". I didn't any away to inject "poison pill" into repartition call :( пн, 11 февр. 2019 г. в 21:19, Vadim Semenov : > something like this > > import org.apac

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Vadim Semenov
something like this import org.apache.spark.TaskContext ds.map(r => { val taskContext = TaskContext.get() if (taskContext.partitionId == 1000) { throw new RuntimeException } r }) On Mon, Feb 11, 2019 at 8:41 AM Serega Sheypak wrote: > > I need to crash task which does repartition. >

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Serega Sheypak
I need to crash task which does repartition. пн, 11 февр. 2019 г. в 10:37, Gabor Somogyi : > What blocks you to put if conditions inside the mentioned map function? > > On Mon, Feb 11, 2019 at 10:31 AM Serega Sheypak > wrote: > >> Yeah, but I don't need to crash entire app, I want to fail severa

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Gabor Somogyi
What blocks you to put if conditions inside the mentioned map function? On Mon, Feb 11, 2019 at 10:31 AM Serega Sheypak wrote: > Yeah, but I don't need to crash entire app, I want to fail several tasks > or executors and then wait for completion. > > вс, 10 февр. 2019 г. в 21:49, Gabor Somogyi :

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Serega Sheypak
Yeah, but I don't need to crash entire app, I want to fail several tasks or executors and then wait for completion. вс, 10 февр. 2019 г. в 21:49, Gabor Somogyi : > Another approach is adding artificial exception into the application's > source code like this: > > val query = input.toDS.map(_ / 0)

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-10 Thread Gabor Somogyi
Another approach is adding artificial exception into the application's source code like this: val query = input.toDS.map(_ / 0).writeStream.format("console").start() G On Sun, Feb 10, 2019 at 9:36 PM Serega Sheypak wrote: > Hi BR, > thanks for your reply. I want to mimic the issue and kill ta

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-10 Thread Serega Sheypak
Hi BR, thanks for your reply. I want to mimic the issue and kill tasks at a certain stage. Killing executor is also an option for me. I'm curious how do core spark contributors test spark fault tolerance? вс, 10 февр. 2019 г. в 16:57, Gabor Somogyi : > Hi Serega, > > If I understand your problem

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-10 Thread Gabor Somogyi
Hi Serega, If I understand your problem correctly you would like to kill one executor only and the rest of the app has to be untouched. If that's true yarn -kill is not what you want because it stops the whole application. I've done similar thing when tested/testing Spark's HA features. - jps -vl

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-10 Thread Jörn Franke
yarn application -kill applicationid ? > Am 10.02.2019 um 13:30 schrieb Serega Sheypak : > > Hi there! > I have weird issue that appears only when tasks fail at specific stage. I > would like to imitate failure on my own. > The plan is to run problematic app and then kill entire executor or som

Spark on YARN, HowTo kill executor or individual task?

2019-02-10 Thread Serega Sheypak
Hi there! I have weird issue that appears only when tasks fail at specific stage. I would like to imitate failure on my own. The plan is to run problematic app and then kill entire executor or some tasks when execution reaches certain stage. Is it do-able?

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-23 Thread Serega Sheypak
Hi Imran, here is my usecase There is 1K nodes cluster and jobs have performance degradation because of a single node. It's rather hard to convince Cluster Ops to decommission node because of "performance degradation". Imagine 10 dev teams chase single ops team for valid reason (node has problems)

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-23 Thread Imran Rashid
Serga, can you explain a bit more why you want this ability? If the node is really bad, wouldn't you want to decomission the NM entirely? If you've got heterogenous resources, than nodelabels seem like they would be more appropriate -- and I don't feel great about adding workarounds for the node-la

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-22 Thread Jörn Franke
You can try with Yarn node labels: https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/NodeLabel.html Then you can whitelist nodes. > Am 19.01.2019 um 00:20 schrieb Serega Sheypak : > > Hi, is there any possibility to tell Scheduler to blacklist specific nodes in > advance?

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-22 Thread Attila Zsolt Piros
The new issue is https://issues.apache.org/jira/browse/SPARK-26688. On Tue, Jan 22, 2019 at 11:30 AM Attila Zsolt Piros wrote: > Hi, > > >> Is it this one: https://github.com/apache/spark/pull/23223 ? > > No. My old development was https://github.com/apache/spark/pull/21068, > which is closed.

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-22 Thread Attila Zsolt Piros
Hi, >> Is it this one: https://github.com/apache/spark/pull/23223 ? No. My old development was https://github.com/apache/spark/pull/21068, which is closed. This would be a new improvement with a new Apache JIRA issue ( https://issues.apache.org) and with a new Github pull request. >> Can I try

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-21 Thread Serega Sheypak
Hi Apiros, thanks for your reply. Is it this one: https://github.com/apache/spark/pull/23223 ? Can I try to reach you through Cloudera Support portal? пн, 21 янв. 2019 г. в 20:06, attilapiros : > Hello, I was working on this area last year (I have developed the > YarnAllocatorBlacklistTracker) a

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-21 Thread attilapiros
Hello, I was working on this area last year (I have developed the YarnAllocatorBlacklistTracker) and if you haven't found any solution for your problem I can introduce a new config which would contain a sequence of always blacklisted nodes. This way blacklisting would improve a bit again :) -- S

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-20 Thread Serega Sheypak
Thanks, so I'll check YARN. Does anyone know if Spark-on-Yarn plans to expose such functionality? сб, 19 янв. 2019 г. в 18:04, Felix Cheung : > To clarify, yarn actually supports excluding node right when requesting > resources. It’s spark that doesn’t provide a way to popu

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-19 Thread Felix Cheung
From: Li Gao Sent: Saturday, January 19, 2019 8:43 AM To: Felix Cheung Cc: Serega Sheypak; user Subject: Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job? on yarn it is impossible afaik. on kubernetes you can use taints to

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-19 Thread Li Gao
2019 3:21 PM > *To:* user > *Subject:* Spark on Yarn, is it possible to manually blacklist nodes > before running spark job? > > Hi, is there any possibility to tell Scheduler to blacklist specific nodes > in advance? >

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-18 Thread Felix Cheung
Not as far as I recall... From: Serega Sheypak Sent: Friday, January 18, 2019 3:21 PM To: user Subject: Spark on Yarn, is it possible to manually blacklist nodes before running spark job? Hi, is there any possibility to tell Scheduler to blacklist specific

Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-18 Thread Serega Sheypak
Hi, is there any possibility to tell Scheduler to blacklist specific nodes in advance?

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-10 Thread Gourav Sengupta
ere > http://spark.apache.org/docs/latest/running-on-yarn.html about running > spark on YARN. Like I said before you can use either the logs from the > application or the Spark UI to understand how many executors are running at > any given time. I don't think I can help much

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Dillon Dukek
There is documentation here http://spark.apache.org/docs/latest/running-on-yarn.html about running spark on YARN. Like I said before you can use either the logs from the application or the Spark UI to understand how many executors are running at any given time. I don't think I can help

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Gourav Sengupta
Hi Dillon, I do think that there is a setting available where in once YARN sets up the containers then you do not deallocate them, I had used it previously in HIVE, and it just saves processing time in terms of allocating containers. That said I am still trying to understand how do we determine on

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Dillon Dukek
I'm still not sure exactly what you are meaning by saying that you have 6 yarn containers. Yarn should just be aware of the total available resources in your cluster and then be able to launch containers based on the executor requirements you set when you submit your job. If you can, I think it wo

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Gourav Sengupta
hi, may be I am not quite clear in my head on this one. But how do we know that 1 yarn container = 1 executor? Regards, Gourav Sengupta On Tue, Oct 9, 2018 at 8:53 PM Dillon Dukek wrote: > Can you send how you are launching your streaming process? Also what > environment is this cluster runnin

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Dillon Dukek
Can you send how you are launching your streaming process? Also what environment is this cluster running in (EMR, GCP, self managed, etc)? On Tue, Oct 9, 2018 at 10:21 AM kant kodali wrote: > Hi All, > > I am using Spark 2.3.1 and using YARN as a cluster manager. > > I currently got > > 1) 6 YAR

Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread kant kodali
Hi All, I am using Spark 2.3.1 and using YARN as a cluster manager. I currently got 1) 6 YARN containers(executors=6) with 4 executor cores for each container. 2) 6 Kafka partitions from one topic. 3) You can assume every other configuration is set to whatever the default values are. Spawned a

Re: Spark on YARN in client-mode: do we need 1 vCore for the AM?

2018-05-24 Thread Jeff Zhang
I don't think it is possible to have less than 1 core for AM, this is due to yarn not spark. The number of AM comparing to the number of executors should be small and acceptable. If you do want to save more resources, I would suggest you to use yarn cluster mode where driver and AM run in the same

Spark on YARN in client-mode: do we need 1 vCore for the AM?

2018-05-18 Thread peay
Hello, I run a Spark cluster on YARN, and we have a bunch of client-mode applications we use for interactive work. Whenever we start one of this, an application master container is started. My understanding is that this is mostly an empty shell, used to request further containers or get status

Hortonworks Spark-Hbase-Connector does not read zookeeper configurations from spark session config ??(Spark on Yarn)

2018-02-22 Thread Dharmin Siddesh J
Hi I am trying to write a spark code that reads data from Hbase and store it in DataFrame. I am able to run it perfectly with hbase-site.xml in $spark-home/conf folder. But I am facing few issues Here. Issue 1: Passing hbase-site.xml location with --file parameter submitted through client mode(It

How to create security filter for Spark UI in Spark on YARN

2018-01-09 Thread Jhon Anderson Cardenas Diaz
*Environment*: AWS EMR, yarn cluster. *Description*: I am trying to use a java filter to protect the access to spark ui, this by using the property spark.ui.filters; the problem is that when spark is running on yarn mode, that property is being allways overriden by hadoop with the filter org.apach

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-04 Thread Marcelo Vanzin
On Wed, Jan 3, 2018 at 8:18 PM, John Zhuge wrote: > Something like: > > Note: When running Spark on YARN, environment variables for the executors > need to be set using the spark.yarn.executorEnv.[EnvironmentVariableName] > property in your conf/spark-defaults.conf file or on t

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread John Zhuge
Sounds good. Should we add another paragraph after this paragraph in configuration.md to explain executor env as well? I will be happy to upload a simple patch. Note: When running Spark on YARN in cluster mode, environment variables > need to be set using the spark.yarn.appMaster

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread Marcelo Vanzin
Because spark-env.sh is something that makes sense only on the gateway machine (where the app is being submitted from). On Wed, Jan 3, 2018 at 6:46 PM, John Zhuge wrote: > Thanks Jacek and Marcelo! > > Any reason it is not sourced? Any security consideration? > > > On Wed, Jan 3, 2018 at 9:59 AM,

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread John Zhuge
Thanks Jacek and Marcelo! Any reason it is not sourced? Any security consideration? On Wed, Jan 3, 2018 at 9:59 AM, Marcelo Vanzin wrote: > On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge wrote: > > I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is > > spark-env.sh sourced

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread Marcelo Vanzin
On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge wrote: > I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is > spark-env.sh sourced when starting the Spark AM container or the executor > container? No, it's not. -- Marcelo --

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread Jacek Laskowski
ced when starting the Spark AM container or the executor > container? > > Saw this paragraph on https://github.com/apache/spark/blob/master/docs/ > configuration.md: > > Note: When running Spark on YARN in cluster mode, environment variables >> need to be set using the s

Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-02 Thread John Zhuge
Hi, I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is spark-env.sh sourced when starting the Spark AM container or the executor container? Saw this paragraph on https://github.com/apache/spark/blob/master/docs/configuration.md: Note: When running Spark on YARN in cluster

[Spark on YARN] Asynchronously launching containers in YARN

2017-10-13 Thread Craig Ingram
I was recently doing some research into Spark on YARN's startup time and observed slow, synchronous allocation of containers/executors. I am testing on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was only allocating about 3 containers per second. Moreover when starting 3

Re: Port to open for submitting Spark on Yarn application

2017-09-03 Thread Satoshi Yamada
Jerry, Thanks for your comment. On Mon, Sep 4, 2017 at 10:43 AM, Saisai Shao wrote: > I think spark.yarn.am.port is not used any more, so you don't need to > consider this. > > If you're running Spark on YARN, I think some YARN RM port to submit > applications sho

Re: Port to open for submitting Spark on Yarn application

2017-09-03 Thread Saisai Shao
I think spark.yarn.am.port is not used any more, so you don't need to consider this. If you're running Spark on YARN, I think some YARN RM port to submit applications should also be reachable via firewall, as well as HDFS port to upload resources. Also in the Spark side, executo

Port to open for submitting Spark on Yarn application

2017-09-03 Thread Satoshi Yamada
Hi, In case we run Spark on Yarn in client mode, we have firewall for Hadoop cluster, and the client node is outside firewall, I think I have to open some ports that Application Master uses. I think the ports is specified by "spark.yarn.am.port" as document says. https://spark.apach

Re: How to configure spark on Yarn cluster

2017-07-28 Thread yohann jardin
;mailto:user@spark.apache.org> Subject: Re: How to configure spark on Yarn cluster Not sure that we are OK on one thing: Yarn limitations are for the sum of all nodes, while you only specify the memory for a single node through Spark. By the way, the memory displayed in the UI is only a part

Re: How to configure spark on Yarn cluster

2017-07-28 Thread jeff saremi
nk you included. Thank you. Yes this is the same problem however it looks like no one has come up with a solution for this problem yet From: yohann jardin Sent: Friday, July 28, 2017 10:47:40 AM To: jeff saremi; user@spark.apache.org Subject: Re: How to configur

Re: How to configure spark on Yarn cluster

2017-07-28 Thread yohann jardin
: Re: How to configure spark on Yarn cluster Check the executor page of the Spark UI, to check if your storage level is limiting. Also, instead of starting with 100 TB of data, sample it, make it work, and grow it little by little until you reached 100 TB. This will validate the workflow and let you

Re: How to configure spark on Yarn cluster

2017-07-28 Thread jeff saremi
From: yohann jardin Sent: Thursday, July 27, 2017 11:15:39 PM To: jeff saremi; user@spark.apache.org Subject: Re: How to configure spark on Yarn cluster Check the executor page of the Spark UI, to check if your storage level is limiting. Also, instead of starting with 100 TB of data

Re: How to configure spark on Yarn cluster

2017-07-27 Thread yohann jardin
Check the executor page of the Spark UI, to check if your storage level is limiting. Also, instead of starting with 100 TB of data, sample it, make it work, and grow it little by little until you reached 100 TB. This will validate the workflow and let you see how much data is shuffled, etc.

How to configure spark on Yarn cluster

2017-07-27 Thread jeff saremi
I have the simplest job which i'm running against 100TB of data. The job keeps failing with ExecutorLostFailure's on containers killed by Yarn for exceeding memory limits I have varied the executor-memory from 32GB to 96GB, the spark.yarn.executor.memoryOverhead from 8192 to 36000 and similar c

Spark on yarn logging

2017-06-29 Thread John Vines
I followed the instructions for configuring a custom logger per https://spark.apache.org/docs/2.0.2/running-on-yarn.html (because we have long running spark jobs, sometimes occasionally get stuck and without a rolling file appender will fill up disk). This seems to work well for us, but it breaks t

spark on yarn cluster model can't use saveAsTable ?

2017-05-15 Thread lk_spark
hi,all: I have a test under spark2.1.0 , which read txt files as DataFrame and save to hive . When I submit the app jar with yarn client model it works well , but If I submit with cluster model , it will not create table and write data , and I didn't find any error log ... can anybody gi

Re: notebook connecting Spark On Yarn

2017-02-15 Thread Jon Gregg
otebooks connecting to spark on yarn. > After starting few jobs my cluster went out of containers. All new notebook > request are in busy state as Jupyter kernel gateway is not getting any > containers for master to be started. > > Some job are not leaving the containers for approx 10

notebook connecting Spark On Yarn

2017-02-15 Thread Sachin Aggarwal
Hi, I am trying to create multiple notebooks connecting to spark on yarn. After starting few jobs my cluster went out of containers. All new notebook request are in busy state as Jupyter kernel gateway is not getting any containers for master to be started. Some job are not leaving the

Re: spark on yarn can't load kafka dependency jar

2016-12-15 Thread Mich Talebzadeh
sparkStreaming_jar4/sparkStreaming.jar > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/spark-on-yarn-can-t-load-kafka- > dependency-jar-tp28216p28220.html > Sent from

Re: spark on yarn can't load kafka dependency jar

2016-12-15 Thread neil90
Don't the jars need to be comma sperated when you pass? i.e. --jars "hdfs://zzz:8020/jars/kafka_2.10-0.8.2.2.jar", /opt/bigdevProject/sparkStreaming_jar4/sparkStreaming.jar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-on-yarn-c

Re: Can i display message on console when use spark on yarn?

2016-10-20 Thread ayan guha
What do you exactly mean by Yarn Console? We use spark-submit and it generates exactly same log as you mentioned on driver console, On Thu, Oct 20, 2016 at 8:21 PM, Jone Zhang wrote: > I submit spark with "spark-submit --master yarn-cluster --deploy-mode > cluster" > How can i display message on

Can i display message on console when use spark on yarn?

2016-10-20 Thread Jone Zhang
I submit spark with "spark-submit --master yarn-cluster --deploy-mode cluster" How can i display message on yarn console. I expect it to be like this: . 16/10/20 17:12:53 main INFO org.apache.spark.deploy.yarn.Client>SPK> Application report for application_1453970859007_481440 (state: RUNNING)

Re: Spark on yarn enviroment var

2016-10-01 Thread Vadim Semenov
The question should be addressed to the oozie community. As far as I remember, a spark action doesn't have support of env variables. On Fri, Sep 30, 2016 at 8:11 PM, Saurabh Malviya (samalviy) < samal...@cisco.com> wrote: > Hi, > > > > I am running spark on yarn using

Spark on yarn enviroment var

2016-09-30 Thread Saurabh Malviya (samalviy)
Hi, I am running spark on yarn using oozie. When submit through command line using spark-submit spark is able to read env variable. But while submit through oozie its not able toget env variable and don't see driver log. Is there any way we specify env variable in oozie spark a

Does Spark on YARN inherit or replace the Hadoop/YARN configs?

2016-08-30 Thread Everett Anderson
Hi, I've had a bit of trouble getting Spark on YARN to work. When executing in this mode and submitting from outside the cluster, one must set HADOOP_CONF_DIR or YARN_CONF_DIR <https://spark.apache.org/docs/latest/running-on-yarn.html>, from which spark-submit can find the params

Re: Spark on yarn, only 1 or 2 vcores getting allocated to the containers getting created.

2016-08-03 Thread Mungeol Heo
Try to turn yarn.scheduler.capacity.resource-calculator on, then check again. On Wed, Aug 3, 2016 at 4:53 PM, Saisai Shao wrote: > Use dominant resource calculator instead of default resource calculator will > get the expected vcores as you wanted. Basically by default yarn does not > honor cpu c

Re: Spark on yarn, only 1 or 2 vcores getting allocated to the containers getting created.

2016-08-03 Thread Mungeol Heo
Try to turn "yarn.scheduler.capacity.resource-calculator" on On Wed, Aug 3, 2016 at 4:53 PM, Saisai Shao wrote: > Use dominant resource calculator instead of default resource calculator will > get the expected vcores as you wanted. Basically by default yarn does not > honor cpu cores as resource,

Re: Spark on yarn, only 1 or 2 vcores getting allocated to the containers getting created.

2016-08-03 Thread Saisai Shao
Use dominant resource calculator instead of default resource calculator will get the expected vcores as you wanted. Basically by default yarn does not honor cpu cores as resource, so you will always see vcore is 1 no matter what number of cores you set in spark. On Wed, Aug 3, 2016 at 12:11 PM, sa

Spark on yarn, only 1 or 2 vcores getting allocated to the containers getting created.

2016-08-02 Thread satyajit vegesna
Hi All, I am trying to run a spark job using yarn, and i specify --executor-cores value as 20. But when i go check the "nodes of the cluster" page in http://hostname:8088/cluster/nodes then i see 4 containers getting created on each of the node in cluster. But can only see 1 vcore getting assigne

Re: Error Invoking Spark on Yarn on using Spark Submit

2016-06-24 Thread Mich Talebzadeh
Hi Punneet, File does not exist: hdfs://localhost:8020/user/opc/.sparkStaging/application_1466711725829_0033/pipeline-lib-0.1.0-SNAPSHOT.jar indicates a YARN issue. It is trying to get that file from HDFS and copy it across to /tmp directory. 1. Check that the class is actually created at co

Re: Error Invoking Spark on Yarn on using Spark Submit

2016-06-24 Thread Jeff Zhang
You might have multiple java servlet jars on your classpath. On Fri, Jun 24, 2016 at 3:31 PM, Mich Talebzadeh wrote: > can you please check the yarn log files to see what they say (both the > nodemamager and resourcemanager) > > HTH > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linked

Re: Error Invoking Spark on Yarn on using Spark Submit

2016-06-24 Thread Mich Talebzadeh
can you please check the yarn log files to see what they say (both the nodemamager and resourcemanager) HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Error Invoking Spark on Yarn on using Spark Submit

2016-06-24 Thread puneet kumar
I am getting below error thrown when I submit Spark Job using Spark Submit on Yarn. Need a quick help on what's going wrong here. 16/06/24 01:09:25 WARN AbstractLifeCycle: FAILED org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter-791eb5d5: java.lang.IllegalStateException: class org.apache.

Re: oozie and spark on yarn

2016-06-08 Thread vaquar khan
Hi Karthi, Hope following information will help you. Doc: https://oozie.apache.org/docs/4.2.0/DG_SparkActionExtension.html Example : https://developer.ibm.com/hadoop/2015/11/05/run-spark-job-yarn-oozie/ Code : http://3097fca9b1ec8942c4305e550ef1b50a.proxysheep.com/apache/oozie/blob/master/clie

Re: oozie and spark on yarn

2016-06-08 Thread karthi keyan
Hi , Make sure you have oozie 4.2.0 and configured with either yarn / mesos mode. Well, you just parse your scala / Jar file in the below syntax, ${jobTracker} ${nameNode} ${master} Wordcount ${Classname} ${nameNode}/WordCo

oozie and spark on yarn

2016-06-08 Thread pseudo oduesp
hi , i want ask if somone used oozie with spark ? if you can give me example: how ? we can configure on yarn thanks

Re: spark on yarn

2016-05-26 Thread Steve Loughran
> On 21 May 2016, at 15:14, Shushant Arora wrote: > > And will it allocate rest executors when other containers get freed which > were occupied by other hadoop jobs/spark applications? > requests will go into the queue(s), they'll stay outstanding until things free up *or more machines join

Re: spark on yarn

2016-05-21 Thread Shushant Arora
3.And is the same behavior applied to streaming application also? On Sat, May 21, 2016 at 7:44 PM, Shushant Arora wrote: > And will it allocate rest executors when other containers get freed which > were occupied by other hadoop jobs/spark applications? > > And is there any minimum (% of executo

Re: spark on yarn

2016-05-21 Thread Shushant Arora
And will it allocate rest executors when other containers get freed which were occupied by other hadoop jobs/spark applications? And is there any minimum (% of executors demanded vs available) executors it wait for to be freed or just start with even 1 . Thanks! On Thu, Apr 21, 2016 at 8:39 PM,

Re: spark on yarn

2016-04-21 Thread Steve Loughran
If there isn't enough space in your cluster for all the executors you asked for to be created, Spark will only get the ones which can be allocated. It will start work without waiting for the others to arrive. Make sure you ask for enough memory: YARN is a lot more unforgiving about memory use t

Long(20+ seconds) startup delay for jobs when running Spark on YARN

2016-04-21 Thread Akmal Abbasov
Hi, I'm running Spark(1.6.1) on YARN(2.5.1), cluster mode. It's taking 20+ seconds for application to move from ACCEPTED to RUNNING state, here's logs 16/04/21 09:06:56 INFO impl.YarnClientImpl: Submitted application application_1461229289298_0001 16/04/21 09:06:57 INFO yarn.Client: Application

  1   2   3   4   5   >