subject:"spark\-submit with Yarn"

Spark Submit through yarn is failing with Default queue.

2020-03-10 Thread SB M

Hi All,

Am trying to submit my application using spark-submit in yarn mode.

But its failing because of unknown queue default, we specified the queue
name in spark-default.conf as spark.yarn.queue SecondaryQueue

its failing for one application, but for another application dont know the
reason.

plesee help me with this.

Regards,
SBM

Re: Help Required - Unable to run spark-submit on YARN client mode

2018-05-08 Thread Deepak Sharma

Can you try increasing the partition for the base RDD/dataframe that you
are working on?


On Tue, May 8, 2018 at 5:05 PM, Debabrata Ghosh 
wrote:

> Hi Everyone,
> I have been trying to run spark-shell in YARN client mode, but am getting
> lot of ClosedChannelException errors, however the program works fine on
> local mode.  I am using spark 2.2.0 build for Hadoop 2.7.3.  If you are
> familiar with this error, please can you help with the possible resolution.
>
> Any help would be greatly appreciated!
>
> Here is the error message:
>
> 18/05/08 00:01:18 ERROR TransportClient: Failed to send RPC
> 7905321254854295784 to /9.30.94.43:60220: java.nio.channels.
> ClosedChannelException
> java.nio.channels.ClosedChannelException
> at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown
> Source)
> 18/05/08 00:01:18 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:
> Sending RequestExecutors(5,0,Map(),Set()) to AM was unsuccessful
> java.io.IOException: Failed to send RPC 7905321254854295784 to /
> 9.30.94.43:60220: java.nio.channels.ClosedChannelException
> at org.apache.spark.network.client.TransportClient.lambda$
> sendRpc$2(TransportClient.java:237)
> at io.netty.util.concurrent.DefaultPromise.notifyListener0(
> DefaultPromise.java:507)
> at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(
> DefaultPromise.java:481)
> at io.netty.util.concurrent.DefaultPromise.access$000(
> DefaultPromise.java:34)
> at io.netty.util.concurrent.DefaultPromise$1.run(
> DefaultPromise.java:431)
> at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(
> SingleThreadEventExecutor.java:399)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446)
> at io.netty.util.concurrent.SingleThreadEventExecutor$2.
> run(SingleThreadEventExecutor.java:131)
> at io.netty.util.concurrent.DefaultThreadFactory$
> DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.channels.ClosedChannelException
>
> Cheers,
>
> Debu
>



-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Help Required - Unable to run spark-submit on YARN client mode

2018-05-08 Thread Debabrata Ghosh

Hi Everyone,
I have been trying to run spark-shell in YARN client mode, but am getting
lot of ClosedChannelException errors, however the program works fine on
local mode.  I am using spark 2.2.0 build for Hadoop 2.7.3.  If you are
familiar with this error, please can you help with the possible resolution.

Any help would be greatly appreciated!

Here is the error message:

18/05/08 00:01:18 ERROR TransportClient: Failed to send RPC
7905321254854295784 to /9.30.94.43:60220:
java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
at
io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
18/05/08 00:01:18 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint: Sending
RequestExecutors(5,0,Map(),Set()) to AM was unsuccessful
java.io.IOException: Failed to send RPC 7905321254854295784 to /
9.30.94.43:60220: java.nio.channels.ClosedChannelException
at
org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
at
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
at
io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
at
io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
at
io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException

Cheers,

Debu

Re: Spark submit on yarn does not return with exit code 1 on exception

2017-02-03 Thread Shashank Mandil

I may have found my problem. We have a scala wrapper on top of spark-submit
to run the shell command through scala.
We were kind of eating the exit code from spark-submit in that wrapper.
When I looked at what the actual exit code was stripping away the wrapper I
got 1.

So I think spark-submit is behaving okay in my case.

Thank you for all the help.

Thanks,
Shashank

On Fri, Feb 3, 2017 at 1:56 PM, Jacek Laskowski  wrote:

> Hi,
>
> ➜  spark git:(master) ✗ ./bin/spark-submit whatever || echo $?
> Error: Cannot load main class from JAR file:/Users/jacek/dev/oss/
> spark/whatever
> Run with --help for usage help or --verbose for debug output
> 1
>
> I see 1 and there are other cases for 1 too.
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Fri, Feb 3, 2017 at 10:46 PM, Ali Gouta  wrote:
> > Hello,
> >
> > +1, i have exactly the same issue. I need the exit code to make a
> decision
> > on oozie executing actions. Spark-submit always returns 0 when catching
> the
> > exception. From spark 1.5 to 1.6.x, i still have the same issue... It
> would
> > be great to fix it or to know if there is some work around about it.
> >
> > Ali Gouta.
> >
> > Le 3 févr. 2017 22:24, "Jacek Laskowski"  a écrit :
> >
> > Hi,
> >
> > An interesting case. You don't use Spark resources whatsoever.
> > Creating a SparkConf does not use YARN...yet. I think any run mode
> > would have the same effect. So, although spark-submit could have
> > returned exit code 1, the use case touches Spark very little.
> >
> > What version is that? Do you see "There is an exception in the script
> > exiting with status 1" printed out to stdout?
> >
> > Pozdrawiam,
> > Jacek Laskowski
> > 
> > https://medium.com/@jaceklaskowski/
> > Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
> > Follow me at https://twitter.com/jaceklaskowski
> >
> >
> > On Fri, Feb 3, 2017 at 8:06 PM, Shashank Mandil
> >  wrote:
> >> Hi All,
> >>
> >> I wrote a test script which always throws an exception as below :
> >>
> >> object Test {
> >>
> >>
> >>   def main(args: Array[String]) {
> >> try {
> >>
> >>   val conf =
> >> new SparkConf()
> >>   .setAppName("Test")
> >>
> >>   throw new RuntimeException("Some Exception")
> >>
> >>   println("all done!")
> >> } catch {
> >>   case e: RuntimeException => {
> >> println("There is an exception in the script exiting with status
> >> 1")
> >> System.exit(1)
> >>   }
> >> }
> >> }
> >>
> >> When I run this code using spark-submit I am expecting to get an exit
> code
> >> of 1,
> >> however I keep getting exit code 0.
> >>
> >> Any ideas how I can force spark-submit to return with code 1 ?
> >>
> >> Thanks,
> >> Shashank
> >>
> >>
> >>
> >
> > -
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
> >
>

Re: Spark submit on yarn does not return with exit code 1 on exception

2017-02-03 Thread Jacek Laskowski

Hi,

➜  spark git:(master) ✗ ./bin/spark-submit whatever || echo $?
Error: Cannot load main class from JAR file:/Users/jacek/dev/oss/spark/whatever
Run with --help for usage help or --verbose for debug output
1

I see 1 and there are other cases for 1 too.

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Feb 3, 2017 at 10:46 PM, Ali Gouta  wrote:
> Hello,
>
> +1, i have exactly the same issue. I need the exit code to make a decision
> on oozie executing actions. Spark-submit always returns 0 when catching the
> exception. From spark 1.5 to 1.6.x, i still have the same issue... It would
> be great to fix it or to know if there is some work around about it.
>
> Ali Gouta.
>
> Le 3 févr. 2017 22:24, "Jacek Laskowski"  a écrit :
>
> Hi,
>
> An interesting case. You don't use Spark resources whatsoever.
> Creating a SparkConf does not use YARN...yet. I think any run mode
> would have the same effect. So, although spark-submit could have
> returned exit code 1, the use case touches Spark very little.
>
> What version is that? Do you see "There is an exception in the script
> exiting with status 1" printed out to stdout?
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Fri, Feb 3, 2017 at 8:06 PM, Shashank Mandil
>  wrote:
>> Hi All,
>>
>> I wrote a test script which always throws an exception as below :
>>
>> object Test {
>>
>>
>>   def main(args: Array[String]) {
>> try {
>>
>>   val conf =
>> new SparkConf()
>>   .setAppName("Test")
>>
>>   throw new RuntimeException("Some Exception")
>>
>>   println("all done!")
>> } catch {
>>   case e: RuntimeException => {
>> println("There is an exception in the script exiting with status
>> 1")
>> System.exit(1)
>>   }
>> }
>> }
>>
>> When I run this code using spark-submit I am expecting to get an exit code
>> of 1,
>> however I keep getting exit code 0.
>>
>> Any ideas how I can force spark-submit to return with code 1 ?
>>
>> Thanks,
>> Shashank
>>
>>
>>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark submit on yarn does not return with exit code 1 on exception

2017-02-03 Thread Ali Gouta

Hello,

+1, i have exactly the same issue. I need the exit code to make a decision
on oozie executing actions. Spark-submit always returns 0 when catching the
exception. From spark 1.5 to 1.6.x, i still have the same issue... It would
be great to fix it or to know if there is some work around about it.

Ali Gouta.

Le 3 févr. 2017 22:24, "Jacek Laskowski"  a écrit :

Hi,

An interesting case. You don't use Spark resources whatsoever.
Creating a SparkConf does not use YARN...yet. I think any run mode
would have the same effect. So, although spark-submit could have
returned exit code 1, the use case touches Spark very little.

What version is that? Do you see "There is an exception in the script
exiting with status 1" printed out to stdout?

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Feb 3, 2017 at 8:06 PM, Shashank Mandil
 wrote:
> Hi All,
>
> I wrote a test script which always throws an exception as below :
>
> object Test {
>
>
>   def main(args: Array[String]) {
> try {
>
>   val conf =
> new SparkConf()
>   .setAppName("Test")
>
>   throw new RuntimeException("Some Exception")
>
>   println("all done!")
> } catch {
>   case e: RuntimeException => {
> println("There is an exception in the script exiting with status
1")
> System.exit(1)
>   }
> }
> }
>
> When I run this code using spark-submit I am expecting to get an exit code
> of 1,
> however I keep getting exit code 0.
>
> Any ideas how I can force spark-submit to return with code 1 ?
>
> Thanks,
> Shashank
>
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark submit on yarn does not return with exit code 1 on exception

2017-02-03 Thread Jacek Laskowski

Hi,

An interesting case. You don't use Spark resources whatsoever.
Creating a SparkConf does not use YARN...yet. I think any run mode
would have the same effect. So, although spark-submit could have
returned exit code 1, the use case touches Spark very little.

What version is that? Do you see "There is an exception in the script
exiting with status 1" printed out to stdout?

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Feb 3, 2017 at 8:06 PM, Shashank Mandil
 wrote:
> Hi All,
>
> I wrote a test script which always throws an exception as below :
>
> object Test {
>
>
>   def main(args: Array[String]) {
> try {
>
>   val conf =
> new SparkConf()
>   .setAppName("Test")
>
>   throw new RuntimeException("Some Exception")
>
>   println("all done!")
> } catch {
>   case e: RuntimeException => {
> println("There is an exception in the script exiting with status 1")
> System.exit(1)
>   }
> }
> }
>
> When I run this code using spark-submit I am expecting to get an exit code
> of 1,
> however I keep getting exit code 0.
>
> Any ideas how I can force spark-submit to return with code 1 ?
>
> Thanks,
> Shashank
>
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Spark submit on yarn does not return with exit code 1 on exception

2017-02-03 Thread Shashank Mandil

Hi All,

I wrote a test script which always throws an exception as below :

object Test {


  def main(args: Array[String]) {
try {

  val conf =
new SparkConf()
  .setAppName("Test")

  throw new RuntimeException("Some Exception")

  println("all done!")
} catch {
  case e: RuntimeException => {
println("There is an exception in the script exiting with status 1")
System.exit(1)
  }
}
}

When I run this code using spark-submit I am expecting to get an exit code
of 1,
however I keep getting exit code 0.

Any ideas how I can force spark-submit to return with code 1 ?

Thanks,
Shashank

need info on Spark submit on yarn-cluster mode

2015-04-08 Thread sachin Singh

Hi ,
I observed that we have installed only one cluster,
and submiting job as yarn-cluster then getting below error, so is this cause
that installation is only one cluster?
Please correct me, if this is not cause then why I am not able to run in
cluster mode,
spark submit command is -
spark-submit --jars some dependent jars... --master yarn --class
com.java.jobs.sparkAggregation mytest-1.0.0.jar 

2015-04-08 19:16:50 INFO  Client - Application report for
application_1427895906171_0087 (state: FAILED)
2015-04-08 19:16:50 DEBUG Client - 
 client token: N/A
 diagnostics: Application application_1427895906171_0087 failed 2 times 
due
to AM Container for appattempt_1427895906171_0087_02 exited with 
exitCode: 15 due to: Exception from container-launch.
Container id: container_1427895906171_0087_02_01
Exit code: 15
Stack trace: ExitCodeException exitCode=15: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:197)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 15
.Failing this attempt.. Failing the application.
 ApplicationMaster host: N/A
 ApplicationMaster RPC port: -1
 queue: root.hdfs
 start time: 1428500770818
 final status: FAILED


Exception in thread main org.apache.spark.SparkException: Application
finished with failed status
at
org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:509)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:35)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:139)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/need-info-on-Spark-submit-on-yarn-cluster-mode-tp22420.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: need info on Spark submit on yarn-cluster mode

2015-04-08 Thread Steve Loughran


This means the spark workers exited with code 15; probably nothing YARN 
related itself (unless there are classpath-related problems). 

Have a look at the logs of the app/container via the resource manager. You can 
also increase the time that logs get kept on the nodes themselves to something 
like 10 minutes or longer

property
  nameyarn.nodemanager.delete.debug-delay-sec/name
  value600/value
/property



 On 8 Apr 2015, at 07:24, sachin Singh sachin.sha...@gmail.com wrote:
 
 Hi ,
 I observed that we have installed only one cluster,
 and submiting job as yarn-cluster then getting below error, so is this cause
 that installation is only one cluster?
 Please correct me, if this is not cause then why I am not able to run in
 cluster mode,
 spark submit command is -
 spark-submit --jars some dependent jars... --master yarn --class
 com.java.jobs.sparkAggregation mytest-1.0.0.jar 
 
 2015-04-08 19:16:50 INFO  Client - Application report for
 application_1427895906171_0087 (state: FAILED)
 2015-04-08 19:16:50 DEBUG Client - 
client token: N/A
diagnostics: Application application_1427895906171_0087 failed 2 times 
 due
 to AM Container for appattempt_1427895906171_0087_02 exited with 
 exitCode: 15 due to: Exception from container-launch.
 Container id: container_1427895906171_0087_02_01
 Exit code: 15
 Stack trace: ExitCodeException exitCode=15: 
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
   at org.apache.hadoop.util.Shell.run(Shell.java:455)
   at
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
   at
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:197)
   at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
   at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 
 
 Container exited with a non-zero exit code 15
 .Failing this attempt.. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.hdfs
start time: 1428500770818
final status: FAILED
 
 
 Exception in thread main org.apache.spark.SparkException: Application
 finished with failed status
at
 org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:509)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:35)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:139)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
 org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/need-info-on-Spark-submit-on-yarn-cluster-mode-tp22420.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: spark-submit on YARN is slow

2014-12-08 Thread Sandy Ryza

Hey Tobias,

Can you try using the YARN Fair Scheduler and set
yarn.scheduler.fair.continuous-scheduling-enabled to true?

-Sandy

On Sun, Dec 7, 2014 at 5:39 PM, Tobias Pfeiffer t...@preferred.jp wrote:

 Hi,

 thanks for your responses!

 On Sat, Dec 6, 2014 at 4:22 AM, Sandy Ryza sandy.r...@cloudera.com
 wrote:

 What version are you using?  In some recent versions, we had a couple of
 large hardcoded sleeps on the Spark side.


 I am using Spark 1.1.1.

 As Andrew mentioned, I guess most of the 10 seconds waiting time probably
 comes from YARN itself. (Other YARN applications also take a while to start
 up.) I'm just really puzzled about what exactly takes so long there... for
 a job that runs an hour or so, that is of course negligible, but I am
 starting up an instance per client to do interactive job processing *for
 this client*, and it feels like yeah, thanks for logging in, now please
 wait a while until you can actually use the program, that's a bit
 suboptimal.

 Tobias

Re: spark-submit on YARN is slow

2014-12-08 Thread Tobias Pfeiffer

Hi,

On Tue, Dec 9, 2014 at 4:39 AM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Can you try using the YARN Fair Scheduler and set
 yarn.scheduler.fair.continuous-scheduling-enabled to true?


I'm using Cloudera 5.2.0 and my configuration says
  yarn.resourcemanager.scheduler.class =
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
yarn.scheduler.fair.continuous-scheduling-enabled = true
by default. Changing to a different Scheduler doesn't really change
anything, from ACCEPTED to RUNNING always takes about 10 seconds.

Thanks
Tobias

Re: spark-submit on YARN is slow

2014-12-07 Thread Tobias Pfeiffer

Hi,

thanks for your responses!

On Sat, Dec 6, 2014 at 4:22 AM, Sandy Ryza sandy.r...@cloudera.com wrote:

 What version are you using?  In some recent versions, we had a couple of
 large hardcoded sleeps on the Spark side.


I am using Spark 1.1.1.

As Andrew mentioned, I guess most of the 10 seconds waiting time probably
comes from YARN itself. (Other YARN applications also take a while to start
up.) I'm just really puzzled about what exactly takes so long there... for
a job that runs an hour or so, that is of course negligible, but I am
starting up an instance per client to do interactive job processing *for
this client*, and it feels like yeah, thanks for logging in, now please
wait a while until you can actually use the program, that's a bit
suboptimal.

Tobias

Re: spark-submit on YARN is slow

2014-12-06 Thread Sandy Ryza

Great to hear!

-Sandy

On Fri, Dec 5, 2014 at 11:17 PM, Denny Lee denny.g@gmail.com wrote:

 Okay, my bad for not testing out the documented arguments - once i use the
 correct ones, the query shrinks completes in ~55s (I can probably make it
 faster).   Thanks for the help, eh?!



 On Fri Dec 05 2014 at 10:34:50 PM Denny Lee denny.g@gmail.com wrote:

 Sorry for the delay in my response - for my spark calls for stand-alone
 and YARN, I am using the --executor-memory and --total-executor-cores for
 the submission.  In standalone, my baseline query completes in ~40s while
 in YARN, it completes in ~1800s.  It does not appear from the RM web UI
 that its asking for more resources than available but by the same token, it
 appears that its only using a small amount of cores and available memory.

 Saying this, let me re-try using the --executor-cores,
 --executor-memory, and --num-executors arguments as suggested (and
 documented) vs. the --total-executor-cores


 On Fri Dec 05 2014 at 1:14:53 PM Andrew Or and...@databricks.com wrote:

 Hey Arun I've seen that behavior before. It happens when the cluster
 doesn't have enough resources to offer and the RM hasn't given us our
 containers yet. Can you check the RM Web UI at port 8088 to see whether
 your application is requesting more resources than the cluster has to offer?

 2014-12-05 12:51 GMT-08:00 Sandy Ryza sandy.r...@cloudera.com:

 Hey Arun,

 The sleeps would only cause maximum like 5 second overhead.  The idea
 was to give executors some time to register.  On more recent versions, they
 were replaced with the spark.scheduler.minRegisteredResourcesRatio and
 spark.scheduler.maxRegisteredResourcesWaitingTime.  As of 1.1, by
 default YARN will wait until either 30 seconds have passed or 80% of the
 requested executors have registered.

 -Sandy

 On Fri, Dec 5, 2014 at 12:46 PM, Ashish Rangole arang...@gmail.com
 wrote:

 Likely this not the case here yet one thing to point out with Yarn
 parameters like --num-executors is that they should be specified *before*
 app jar and app args on spark-submit command line otherwise the app only
 gets the default number of containers which is 2.
 On Dec 5, 2014 12:22 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Denny,

 Those sleeps were only at startup, so if jobs are taking
 significantly longer on YARN, that should be a different problem.  When 
 you
 ran on YARN, did you use the --executor-cores, --executor-memory, and
 --num-executors arguments?  When running against a standalone cluster, by
 default Spark will make use of all the cluster resources, but when 
 running
 against YARN, Spark defaults to a couple tiny executors.

 -Sandy

 On Fri, Dec 5, 2014 at 11:32 AM, Denny Lee denny.g@gmail.com
 wrote:

 My submissions of Spark on YARN (CDH 5.2) resulted in a few thousand
 steps. If I was running this on standalone cluster mode the query 
 finished
 in 55s but on YARN, the query was still running 30min later. Would the 
 hard
 coded sleeps potentially be in play here?
 On Fri, Dec 5, 2014 at 11:23 Sandy Ryza sandy.r...@cloudera.com
 wrote:

 Hi Tobias,

 What version are you using?  In some recent versions, we had a
 couple of large hardcoded sleeps on the Spark side.

 -Sandy

 On Fri, Dec 5, 2014 at 11:15 AM, Andrew Or and...@databricks.com
 wrote:

 Hey Tobias,

 As you suspect, the reason why it's slow is because the resource
 manager in YARN takes a while to grant resources. This is because YARN
 needs to first set up the application master container, and then this 
 AM
 needs to request more containers for Spark executors. I think this 
 accounts
 for most of the overhead. The remaining source probably comes from 
 how our
 own YARN integration code polls application (every second) and cluster
 resource states (every 5 seconds IIRC). I haven't explored in detail
 whether there are optimizations there that can speed this up, but I 
 believe
 most of the overhead comes from YARN itself.

 In other words, no I don't know of any quick fix on your end that
 you can do to speed this up.

 -Andrew


 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp:

 Hi,

 I am using spark-submit to submit my application to YARN in
 yarn-cluster mode. I have both the Spark assembly jar file as well 
 as my
 application jar file put in HDFS and can see from the logging output 
 that
 both files are used from there. However, it still takes about 10 
 seconds
 for my application's yarnAppState to switch from ACCEPTED to RUNNING.

 I am aware that this is probably not a Spark issue, but some YARN
 configuration setting (or YARN-inherent slowness), I was just 
 wondering if
 anyone has an advice for how to speed this up.

 Thanks
 Tobias

Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread LinQili

Hi, all:
According to https://github.com/apache/spark/pull/2732, When a spark job fails 
or exits nonzero in yarn-cluster mode, the spark-submit will get the 
corresponding return code of the spark job. But I tried in spark-1.1.1 yarn 
cluster, spark-submit return zero anyway.
Here is my spark code:
try {  val dropTable = sdrop table $DB.$tableName  
hiveContext.hql(dropTable)  val createTbl =  do some thing...  
hiveContext.hql(createTbl)} catch {  case ex: Exception = {
Util.printLog(ERROR, screate db error.)exit(-1)  }}
Maybe I did something wrong. Is there any hint? Thanks.

RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread LinQili

I tried in spark client mode, spark-submit can get the correct return code from 
spark job. But in yarn-cluster mode, It failed.

From: lin_q...@outlook.com
To: u...@spark.incubator.apache.org
Subject: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in 
yarn-cluster mode
Date: Fri, 5 Dec 2014 16:55:37 +0800

Hi, all:
According to https://github.com/apache/spark/pull/2732, When a spark job fails 
or exits nonzero in yarn-cluster mode, the spark-submit will get the 
corresponding return code of the spark job. But I tried in spark-1.1.1 yarn 
cluster, spark-submit return zero anyway.
Here is my spark code:
try {  val dropTable = sdrop table $DB.$tableName  
hiveContext.hql(dropTable)  val createTbl =  do some thing...  
hiveContext.hql(createTbl)} catch {  case ex: Exception = {
Util.printLog(ERROR, screate db error.)exit(-1)  }}
Maybe I did something wrong. Is there any hint? Thanks.

RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread LinQili

I tried anather test code: def main(args: Array[String]) {if (args.length 
!= 1) {  Util.printLog(ERROR, Args error - arg1: BASE_DIR)  
exit(101) }val currentFile = args(0).toStringval DB = test_spark  
  val tableName = src
val sparkConf = new SparkConf().setAppName(sHiveFromSpark)val sc = 
new SparkContext(sparkConf)val hiveContext = new HiveContext(sc)
// Before exitUtil.printLog(INFO, Exit)exit(100)}
There were two `exit` in this code. If the args was wrong, the spark-submit 
will get the return code 101, but, if the args is correct, spark-submit cannot 
get the second return code 100.  What's the difference between these two 
`exit`? I was so confused.
From: lin_q...@outlook.com
To: u...@spark.incubator.apache.org
Subject: RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in 
yarn-cluster mode
Date: Fri, 5 Dec 2014 17:11:39 +0800




I tried in spark client mode, spark-submit can get the correct return code from 
spark job. But in yarn-cluster mode, It failed.

From: lin_q...@outlook.com
To: u...@spark.incubator.apache.org
Subject: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in 
yarn-cluster mode
Date: Fri, 5 Dec 2014 16:55:37 +0800




Hi, all:
According to https://github.com/apache/spark/pull/2732, When a spark job fails 
or exits nonzero in yarn-cluster mode, the spark-submit will get the 
corresponding return code of the spark job. But I tried in spark-1.1.1 yarn 
cluster, spark-submit return zero anyway.
Here is my spark code:
try {  val dropTable = sdrop table $DB.$tableName  
hiveContext.hql(dropTable)  val createTbl =  do some thing...  
hiveContext.hql(createTbl)} catch {  case ex: Exception = {
Util.printLog(ERROR, screate db error.)exit(-1)  }}
Maybe I did something wrong. Is there any hint? Thanks.

Re: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread Shixiong Zhu

What's the status of this application in the yarn web UI?

Best Regards,
Shixiong Zhu

2014-12-05 17:22 GMT+08:00 LinQili lin_q...@outlook.com:

 I tried anather test code:
  def main(args: Array[String]) {
 if (args.length != 1) {
   Util.printLog(ERROR, Args error - arg1: BASE_DIR)
   exit(101)
 }
 val currentFile = args(0).toString
 val DB = test_spark
 val tableName = src

 val sparkConf = new SparkConf().setAppName(sHiveFromSpark)
 val sc = new SparkContext(sparkConf)
 val hiveContext = new HiveContext(sc)

 // Before exit
 Util.printLog(INFO, Exit)
 exit(100)
 }

 There were two `exit` in this code. If the args was wrong, the
 spark-submit will get the return code 101, but, if the args is correct,
 spark-submit cannot get the second return code 100.  What's the difference
 between these two `exit`? I was so confused.

 --
 From: lin_q...@outlook.com
 To: u...@spark.incubator.apache.org
 Subject: RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit
 in yarn-cluster mode
 Date: Fri, 5 Dec 2014 17:11:39 +0800


 I tried in spark client mode, spark-submit can get the correct return code
 from spark job. But in yarn-cluster mode, It failed.

 --
 From: lin_q...@outlook.com
 To: u...@spark.incubator.apache.org
 Subject: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in
 yarn-cluster mode
 Date: Fri, 5 Dec 2014 16:55:37 +0800

 Hi, all:

 According to https://github.com/apache/spark/pull/2732, When a spark job
 fails or exits nonzero in yarn-cluster mode, the spark-submit will get the
 corresponding return code of the spark job. But I tried in spark-1.1.1 yarn
 cluster, spark-submit return zero anyway.

 Here is my spark code:

 try {
   val dropTable = sdrop table $DB.$tableName
   hiveContext.hql(dropTable)
   val createTbl =  do some thing...
   hiveContext.hql(createTbl)
 } catch {
   case ex: Exception = {
 Util.printLog(ERROR, screate db error.)
 exit(-1)
   }
 }

 Maybe I did something wrong. Is there any hint? Thanks.

Re: spark-submit on YARN is slow

2014-12-05 Thread Andrew Or

Hey Tobias,

As you suspect, the reason why it's slow is because the resource manager in
YARN takes a while to grant resources. This is because YARN needs to first
set up the application master container, and then this AM needs to request
more containers for Spark executors. I think this accounts for most of the
overhead. The remaining source probably comes from how our own YARN
integration code polls application (every second) and cluster resource
states (every 5 seconds IIRC). I haven't explored in detail whether there
are optimizations there that can speed this up, but I believe most of the
overhead comes from YARN itself.

In other words, no I don't know of any quick fix on your end that you can
do to speed this up.

-Andrew


2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp:

 Hi,

 I am using spark-submit to submit my application to YARN in yarn-cluster
 mode. I have both the Spark assembly jar file as well as my application jar
 file put in HDFS and can see from the logging output that both files are
 used from there. However, it still takes about 10 seconds for my
 application's yarnAppState to switch from ACCEPTED to RUNNING.

 I am aware that this is probably not a Spark issue, but some YARN
 configuration setting (or YARN-inherent slowness), I was just wondering if
 anyone has an advice for how to speed this up.

 Thanks
 Tobias

Re: spark-submit on YARN is slow

2014-12-05 Thread Sandy Ryza

Hi Tobias,

What version are you using?  In some recent versions, we had a couple of
large hardcoded sleeps on the Spark side.

-Sandy

On Fri, Dec 5, 2014 at 11:15 AM, Andrew Or and...@databricks.com wrote:

 Hey Tobias,

 As you suspect, the reason why it's slow is because the resource manager
 in YARN takes a while to grant resources. This is because YARN needs to
 first set up the application master container, and then this AM needs to
 request more containers for Spark executors. I think this accounts for most
 of the overhead. The remaining source probably comes from how our own YARN
 integration code polls application (every second) and cluster resource
 states (every 5 seconds IIRC). I haven't explored in detail whether there
 are optimizations there that can speed this up, but I believe most of the
 overhead comes from YARN itself.

 In other words, no I don't know of any quick fix on your end that you can
 do to speed this up.

 -Andrew


 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp:

 Hi,

 I am using spark-submit to submit my application to YARN in
 yarn-cluster mode. I have both the Spark assembly jar file as well as my
 application jar file put in HDFS and can see from the logging output that
 both files are used from there. However, it still takes about 10 seconds
 for my application's yarnAppState to switch from ACCEPTED to RUNNING.

 I am aware that this is probably not a Spark issue, but some YARN
 configuration setting (or YARN-inherent slowness), I was just wondering if
 anyone has an advice for how to speed this up.

 Thanks
 Tobias

Re: spark-submit on YARN is slow

2014-12-05 Thread Denny Lee

My submissions of Spark on YARN (CDH 5.2) resulted in a few thousand steps.
If I was running this on standalone cluster mode the query finished in 55s
but on YARN, the query was still running 30min later. Would the hard coded
sleeps potentially be in play here?
On Fri, Dec 5, 2014 at 11:23 Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Tobias,

 What version are you using?  In some recent versions, we had a couple of
 large hardcoded sleeps on the Spark side.

 -Sandy

 On Fri, Dec 5, 2014 at 11:15 AM, Andrew Or and...@databricks.com wrote:

 Hey Tobias,

 As you suspect, the reason why it's slow is because the resource manager
 in YARN takes a while to grant resources. This is because YARN needs to
 first set up the application master container, and then this AM needs to
 request more containers for Spark executors. I think this accounts for most
 of the overhead. The remaining source probably comes from how our own YARN
 integration code polls application (every second) and cluster resource
 states (every 5 seconds IIRC). I haven't explored in detail whether there
 are optimizations there that can speed this up, but I believe most of the
 overhead comes from YARN itself.

 In other words, no I don't know of any quick fix on your end that you can
 do to speed this up.

 -Andrew


 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp:

 Hi,

 I am using spark-submit to submit my application to YARN in
 yarn-cluster mode. I have both the Spark assembly jar file as well as my
 application jar file put in HDFS and can see from the logging output that
 both files are used from there. However, it still takes about 10 seconds
 for my application's yarnAppState to switch from ACCEPTED to RUNNING.

 I am aware that this is probably not a Spark issue, but some YARN
 configuration setting (or YARN-inherent slowness), I was just wondering if
 anyone has an advice for how to speed this up.

 Thanks
 Tobias

Re: spark-submit on YARN is slow

2014-12-05 Thread Sameer Farooqui

Just an FYI - I can submit the SparkPi app to YARN in cluster mode on a
1-node m3.xlarge EC2 instance instance and the app finishes running
successfully in about 40 seconds. I just figured the 30 - 40 sec run time
was normal b/c of the submitting overhead that Andrew mentioned.

Denny, you can maybe also try to run SparkPi against YARN as a speed check.

spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode
cluster --master yarn
/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/jars/spark-examples-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar
10

On Fri, Dec 5, 2014 at 2:32 PM, Denny Lee denny.g@gmail.com wrote:

My submissions of Spark on YARN (CDH 5.2) resulted in a few thousand
steps. If I was running this on standalone cluster mode the query finished
in 55s but on YARN, the query was still running 30min later. Would the hard
coded sleeps potentially be in play here?
On Fri, Dec 5, 2014 at 11:23 Sandy Ryza sandy.r...@cloudera.com wrote:

Hi Tobias,

What version are you using? In some recent versions, we had a couple of
large hardcoded sleeps on the Spark side.

-Sandy

On Fri, Dec 5, 2014 at 11:15 AM, Andrew Or and...@databricks.com wrote:

Hey Tobias,

As you suspect, the reason why it's slow is because the resource manager
in YARN takes a while to grant resources. This is because YARN needs to
first set up the application master container, and then this AM needs to
request more containers for Spark executors. I think this accounts for most
of the overhead. The remaining source probably comes from how our own YARN
integration code polls application (every second) and cluster resource
states (every 5 seconds IIRC). I haven't explored in detail whether there
are optimizations there that can speed this up, but I believe most of the
overhead comes from YARN itself.

In other words, no I don't know of any quick fix on your end that you
can do to speed this up.

-Andrew

2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp:

Hi,

I am using spark-submit to submit my application to YARN in
yarn-cluster mode. I have both the Spark assembly jar file as well as my
application jar file put in HDFS and can see from the logging output that
both files are used from there. However, it still takes about 10 seconds
for my application's yarnAppState to switch from ACCEPTED to RUNNING.

I am aware that this is probably not a Spark issue, but some YARN
configuration setting (or YARN-inherent slowness), I was just wondering if
anyone has an advice for how to speed this up.

Thanks
Tobias

Re: spark-submit on YARN is slow

2014-12-05 Thread Sandy Ryza

Hi Denny,

Those sleeps were only at startup, so if jobs are taking significantly
longer on YARN, that should be a different problem.  When you ran on YARN,
did you use the --executor-cores, --executor-memory, and --num-executors
arguments?  When running against a standalone cluster, by default Spark
will make use of all the cluster resources, but when running against YARN,
Spark defaults to a couple tiny executors.

-Sandy

On Fri, Dec 5, 2014 at 11:32 AM, Denny Lee denny.g@gmail.com wrote:

 My submissions of Spark on YARN (CDH 5.2) resulted in a few thousand
 steps. If I was running this on standalone cluster mode the query finished
 in 55s but on YARN, the query was still running 30min later. Would the hard
 coded sleeps potentially be in play here?
 On Fri, Dec 5, 2014 at 11:23 Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Tobias,

 What version are you using?  In some recent versions, we had a couple of
 large hardcoded sleeps on the Spark side.

 -Sandy

 On Fri, Dec 5, 2014 at 11:15 AM, Andrew Or and...@databricks.com wrote:

 Hey Tobias,

 As you suspect, the reason why it's slow is because the resource manager
 in YARN takes a while to grant resources. This is because YARN needs to
 first set up the application master container, and then this AM needs to
 request more containers for Spark executors. I think this accounts for most
 of the overhead. The remaining source probably comes from how our own YARN
 integration code polls application (every second) and cluster resource
 states (every 5 seconds IIRC). I haven't explored in detail whether there
 are optimizations there that can speed this up, but I believe most of the
 overhead comes from YARN itself.

 In other words, no I don't know of any quick fix on your end that you
 can do to speed this up.

 -Andrew


 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp:

 Hi,

 I am using spark-submit to submit my application to YARN in
 yarn-cluster mode. I have both the Spark assembly jar file as well as my
 application jar file put in HDFS and can see from the logging output that
 both files are used from there. However, it still takes about 10 seconds
 for my application's yarnAppState to switch from ACCEPTED to RUNNING.

 I am aware that this is probably not a Spark issue, but some YARN
 configuration setting (or YARN-inherent slowness), I was just wondering if
 anyone has an advice for how to speed this up.

 Thanks
 Tobias

Re: spark-submit on YARN is slow

2014-12-05 Thread Arun Ahuja

Hey Sandy,

What are those sleeps for and do they still exist?  We have seen about a
1min to 1:30 executor startup time, which is a large chunk for jobs that
run in ~10min.

Thanks,
Arun

On Fri, Dec 5, 2014 at 3:20 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Denny,

 Those sleeps were only at startup, so if jobs are taking significantly
 longer on YARN, that should be a different problem.  When you ran on YARN,
 did you use the --executor-cores, --executor-memory, and --num-executors
 arguments?  When running against a standalone cluster, by default Spark
 will make use of all the cluster resources, but when running against YARN,
 Spark defaults to a couple tiny executors.

 -Sandy

 On Fri, Dec 5, 2014 at 11:32 AM, Denny Lee denny.g@gmail.com wrote:

 My submissions of Spark on YARN (CDH 5.2) resulted in a few thousand
 steps. If I was running this on standalone cluster mode the query finished
 in 55s but on YARN, the query was still running 30min later. Would the hard
 coded sleeps potentially be in play here?
 On Fri, Dec 5, 2014 at 11:23 Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Tobias,

 What version are you using?  In some recent versions, we had a couple of
 large hardcoded sleeps on the Spark side.

 -Sandy

 On Fri, Dec 5, 2014 at 11:15 AM, Andrew Or and...@databricks.com
 wrote:

 Hey Tobias,

 As you suspect, the reason why it's slow is because the resource
 manager in YARN takes a while to grant resources. This is because YARN
 needs to first set up the application master container, and then this AM
 needs to request more containers for Spark executors. I think this accounts
 for most of the overhead. The remaining source probably comes from how our
 own YARN integration code polls application (every second) and cluster
 resource states (every 5 seconds IIRC). I haven't explored in detail
 whether there are optimizations there that can speed this up, but I believe
 most of the overhead comes from YARN itself.

 In other words, no I don't know of any quick fix on your end that you
 can do to speed this up.

 -Andrew


 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp:

 Hi,

 I am using spark-submit to submit my application to YARN in
 yarn-cluster mode. I have both the Spark assembly jar file as well as my
 application jar file put in HDFS and can see from the logging output that
 both files are used from there. However, it still takes about 10 seconds
 for my application's yarnAppState to switch from ACCEPTED to RUNNING.

 I am aware that this is probably not a Spark issue, but some YARN
 configuration setting (or YARN-inherent slowness), I was just wondering if
 anyone has an advice for how to speed this up.

 Thanks
 Tobias

Re: spark-submit on YARN is slow

2014-12-05 Thread Ashish Rangole

Likely this not the case here yet one thing to point out with Yarn
parameters like --num-executors is that they should be specified *before*
app jar and app args on spark-submit command line otherwise the app only
gets the default number of containers which is 2.
On Dec 5, 2014 12:22 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Denny,

 Those sleeps were only at startup, so if jobs are taking significantly
 longer on YARN, that should be a different problem.  When you ran on YARN,
 did you use the --executor-cores, --executor-memory, and --num-executors
 arguments?  When running against a standalone cluster, by default Spark
 will make use of all the cluster resources, but when running against YARN,
 Spark defaults to a couple tiny executors.

 -Sandy

 On Fri, Dec 5, 2014 at 11:32 AM, Denny Lee denny.g@gmail.com wrote:

 My submissions of Spark on YARN (CDH 5.2) resulted in a few thousand
 steps. If I was running this on standalone cluster mode the query finished
 in 55s but on YARN, the query was still running 30min later. Would the hard
 coded sleeps potentially be in play here?
 On Fri, Dec 5, 2014 at 11:23 Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Tobias,

 What version are you using?  In some recent versions, we had a couple of
 large hardcoded sleeps on the Spark side.

 -Sandy

 On Fri, Dec 5, 2014 at 11:15 AM, Andrew Or and...@databricks.com
 wrote:

 Hey Tobias,

 As you suspect, the reason why it's slow is because the resource
 manager in YARN takes a while to grant resources. This is because YARN
 needs to first set up the application master container, and then this AM
 needs to request more containers for Spark executors. I think this accounts
 for most of the overhead. The remaining source probably comes from how our
 own YARN integration code polls application (every second) and cluster
 resource states (every 5 seconds IIRC). I haven't explored in detail
 whether there are optimizations there that can speed this up, but I believe
 most of the overhead comes from YARN itself.

 In other words, no I don't know of any quick fix on your end that you
 can do to speed this up.

 -Andrew


 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp:

 Hi,

 I am using spark-submit to submit my application to YARN in
 yarn-cluster mode. I have both the Spark assembly jar file as well as my
 application jar file put in HDFS and can see from the logging output that
 both files are used from there. However, it still takes about 10 seconds
 for my application's yarnAppState to switch from ACCEPTED to RUNNING.

 I am aware that this is probably not a Spark issue, but some YARN
 configuration setting (or YARN-inherent slowness), I was just wondering if
 anyone has an advice for how to speed this up.

 Thanks
 Tobias

Re: spark-submit on YARN is slow

2014-12-05 Thread Sandy Ryza

Hey Arun,

The sleeps would only cause maximum like 5 second overhead.  The idea was
to give executors some time to register.  On more recent versions, they
were replaced with the spark.scheduler.minRegisteredResourcesRatio and
spark.scheduler.maxRegisteredResourcesWaitingTime.  As of 1.1, by default
YARN will wait until either 30 seconds have passed or 80% of the requested
executors have registered.

-Sandy

On Fri, Dec 5, 2014 at 12:46 PM, Ashish Rangole arang...@gmail.com wrote:

 Likely this not the case here yet one thing to point out with Yarn
 parameters like --num-executors is that they should be specified *before*
 app jar and app args on spark-submit command line otherwise the app only
 gets the default number of containers which is 2.
 On Dec 5, 2014 12:22 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Denny,

 Those sleeps were only at startup, so if jobs are taking significantly
 longer on YARN, that should be a different problem.  When you ran on YARN,
 did you use the --executor-cores, --executor-memory, and --num-executors
 arguments?  When running against a standalone cluster, by default Spark
 will make use of all the cluster resources, but when running against YARN,
 Spark defaults to a couple tiny executors.

 -Sandy

 On Fri, Dec 5, 2014 at 11:32 AM, Denny Lee denny.g@gmail.com wrote:

 My submissions of Spark on YARN (CDH 5.2) resulted in a few thousand
 steps. If I was running this on standalone cluster mode the query finished
 in 55s but on YARN, the query was still running 30min later. Would the hard
 coded sleeps potentially be in play here?
 On Fri, Dec 5, 2014 at 11:23 Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Tobias,

 What version are you using?  In some recent versions, we had a couple
 of large hardcoded sleeps on the Spark side.

 -Sandy

 On Fri, Dec 5, 2014 at 11:15 AM, Andrew Or and...@databricks.com
 wrote:

 Hey Tobias,

 As you suspect, the reason why it's slow is because the resource
 manager in YARN takes a while to grant resources. This is because YARN
 needs to first set up the application master container, and then this AM
 needs to request more containers for Spark executors. I think this 
 accounts
 for most of the overhead. The remaining source probably comes from how our
 own YARN integration code polls application (every second) and cluster
 resource states (every 5 seconds IIRC). I haven't explored in detail
 whether there are optimizations there that can speed this up, but I 
 believe
 most of the overhead comes from YARN itself.

 In other words, no I don't know of any quick fix on your end that you
 can do to speed this up.

 -Andrew


 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp:

 Hi,

 I am using spark-submit to submit my application to YARN in
 yarn-cluster mode. I have both the Spark assembly jar file as well as 
 my
 application jar file put in HDFS and can see from the logging output that
 both files are used from there. However, it still takes about 10 seconds
 for my application's yarnAppState to switch from ACCEPTED to RUNNING.

 I am aware that this is probably not a Spark issue, but some YARN
 configuration setting (or YARN-inherent slowness), I was just wondering 
 if
 anyone has an advice for how to speed this up.

 Thanks
 Tobias

Re: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread Shixiong Zhu

There were two exit in this code. If the args was wrong, the spark-submit
will get the return code 101, but, if the args is correct, spark-submit
cannot get the second return code 100. What’s the difference between these
two exit? I was so confused.

I’m also confused. When I tried your codes, spark-submit returned 1 for
both two cases. That’s expected. In the yarn-cluster mode, the driver runs
in the ApplicationMaster. The exit code of driver is also the exit code of
ApplicationMaster. However, for now, Spark cannot get the exit code of
ApplicationMaster from Yarn, because Yarn does not send it back to the
client.
spark-submit will return 1 when Yarn reports the ApplicationMaster failed.


Best Regards,
Shixiong Zhu

2014-12-06 1:59 GMT+08:00 LinQili lin_q...@outlook.com:

 You mean the localhost:4040 or the application master web ui?

 Sent from my iPhone

 On Dec 5, 2014, at 17:26, Shixiong Zhu zsxw...@gmail.com wrote:

 What's the status of this application in the yarn web UI?

 Best Regards,
 Shixiong Zhu

 2014-12-05 17:22 GMT+08:00 LinQili lin_q...@outlook.com:

 I tried anather test code:
  def main(args: Array[String]) {
 if (args.length != 1) {
   Util.printLog(ERROR, Args error - arg1: BASE_DIR)
   exit(101)
 }
 val currentFile = args(0).toString
 val DB = test_spark
 val tableName = src

 val sparkConf = new SparkConf().setAppName(sHiveFromSpark)
 val sc = new SparkContext(sparkConf)
 val hiveContext = new HiveContext(sc)

 // Before exit
 Util.printLog(INFO, Exit)
 exit(100)
 }

 There were two `exit` in this code. If the args was wrong, the
 spark-submit will get the return code 101, but, if the args is correct,
 spark-submit cannot get the second return code 100.  What's the difference
 between these two `exit`? I was so confused.

 --
 From: lin_q...@outlook.com
 To: u...@spark.incubator.apache.org
 Subject: RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit
 in yarn-cluster mode
 Date: Fri, 5 Dec 2014 17:11:39 +0800


 I tried in spark client mode, spark-submit can get the correct return
 code from spark job. But in yarn-cluster mode, It failed.

 --
 From: lin_q...@outlook.com
 To: u...@spark.incubator.apache.org
 Subject: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in
 yarn-cluster mode
 Date: Fri, 5 Dec 2014 16:55:37 +0800

 Hi, all:

 According to https://github.com/apache/spark/pull/2732, When a spark job
 fails or exits nonzero in yarn-cluster mode, the spark-submit will get the
 corresponding return code of the spark job. But I tried in spark-1.1.1 yarn
 cluster, spark-submit return zero anyway.

 Here is my spark code:

 try {
   val dropTable = sdrop table $DB.$tableName
   hiveContext.hql(dropTable)
   val createTbl =  do some thing...
   hiveContext.hql(createTbl)
 } catch {
   case ex: Exception = {
 Util.printLog(ERROR, screate db error.)
 exit(-1)
   }
 }

 Maybe I did something wrong. Is there any hint? Thanks.

Re: spark-submit on YARN is slow

2014-12-05 Thread Denny Lee

Sorry for the delay in my response - for my spark calls for stand-alone and
YARN, I am using the --executor-memory and --total-executor-cores for the
submission.  In standalone, my baseline query completes in ~40s while in
YARN, it completes in ~1800s.  It does not appear from the RM web UI that
its asking for more resources than available but by the same token, it
appears that its only using a small amount of cores and available memory.

Saying this, let me re-try using the --executor-cores, --executor-memory,
and --num-executors arguments as suggested (and documented) vs. the
--total-executor-cores


On Fri Dec 05 2014 at 1:14:53 PM Andrew Or and...@databricks.com wrote:

 Hey Arun I've seen that behavior before. It happens when the cluster
 doesn't have enough resources to offer and the RM hasn't given us our
 containers yet. Can you check the RM Web UI at port 8088 to see whether
 your application is requesting more resources than the cluster has to offer?

 2014-12-05 12:51 GMT-08:00 Sandy Ryza sandy.r...@cloudera.com:

 Hey Arun,

 The sleeps would only cause maximum like 5 second overhead.  The idea was
 to give executors some time to register.  On more recent versions, they
 were replaced with the spark.scheduler.minRegisteredResourcesRatio and
 spark.scheduler.maxRegisteredResourcesWaitingTime.  As of 1.1, by default
 YARN will wait until either 30 seconds have passed or 80% of the requested
 executors have registered.

 -Sandy

 On Fri, Dec 5, 2014 at 12:46 PM, Ashish Rangole arang...@gmail.com
 wrote:

 Likely this not the case here yet one thing to point out with Yarn
 parameters like --num-executors is that they should be specified *before*
 app jar and app args on spark-submit command line otherwise the app only
 gets the default number of containers which is 2.
 On Dec 5, 2014 12:22 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Denny,

 Those sleeps were only at startup, so if jobs are taking significantly
 longer on YARN, that should be a different problem.  When you ran on YARN,
 did you use the --executor-cores, --executor-memory, and --num-executors
 arguments?  When running against a standalone cluster, by default Spark
 will make use of all the cluster resources, but when running against YARN,
 Spark defaults to a couple tiny executors.

 -Sandy

 On Fri, Dec 5, 2014 at 11:32 AM, Denny Lee denny.g@gmail.com
 wrote:

 My submissions of Spark on YARN (CDH 5.2) resulted in a few thousand
 steps. If I was running this on standalone cluster mode the query finished
 in 55s but on YARN, the query was still running 30min later. Would the 
 hard
 coded sleeps potentially be in play here?
 On Fri, Dec 5, 2014 at 11:23 Sandy Ryza sandy.r...@cloudera.com
 wrote:

 Hi Tobias,

 What version are you using?  In some recent versions, we had a couple
 of large hardcoded sleeps on the Spark side.

 -Sandy

 On Fri, Dec 5, 2014 at 11:15 AM, Andrew Or and...@databricks.com
 wrote:

 Hey Tobias,

 As you suspect, the reason why it's slow is because the resource
 manager in YARN takes a while to grant resources. This is because YARN
 needs to first set up the application master container, and then this AM
 needs to request more containers for Spark executors. I think this 
 accounts
 for most of the overhead. The remaining source probably comes from how 
 our
 own YARN integration code polls application (every second) and cluster
 resource states (every 5 seconds IIRC). I haven't explored in detail
 whether there are optimizations there that can speed this up, but I 
 believe
 most of the overhead comes from YARN itself.

 In other words, no I don't know of any quick fix on your end that
 you can do to speed this up.

 -Andrew


 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp:

 Hi,

 I am using spark-submit to submit my application to YARN in
 yarn-cluster mode. I have both the Spark assembly jar file as well 
 as my
 application jar file put in HDFS and can see from the logging output 
 that
 both files are used from there. However, it still takes about 10 
 seconds
 for my application's yarnAppState to switch from ACCEPTED to RUNNING.

 I am aware that this is probably not a Spark issue, but some YARN
 configuration setting (or YARN-inherent slowness), I was just 
 wondering if
 anyone has an advice for how to speed this up.

 Thanks
 Tobias

Re: spark-submit on YARN is slow

2014-12-05 Thread Denny Lee

Okay, my bad for not testing out the documented arguments - once i use the
correct ones, the query shrinks completes in ~55s (I can probably make it
faster).   Thanks for the help, eh?!


On Fri Dec 05 2014 at 10:34:50 PM Denny Lee denny.g@gmail.com wrote:

 Sorry for the delay in my response - for my spark calls for stand-alone
 and YARN, I am using the --executor-memory and --total-executor-cores for
 the submission.  In standalone, my baseline query completes in ~40s while
 in YARN, it completes in ~1800s.  It does not appear from the RM web UI
 that its asking for more resources than available but by the same token, it
 appears that its only using a small amount of cores and available memory.

 Saying this, let me re-try using the --executor-cores, --executor-memory,
 and --num-executors arguments as suggested (and documented) vs. the
 --total-executor-cores


 On Fri Dec 05 2014 at 1:14:53 PM Andrew Or and...@databricks.com wrote:

 Hey Arun I've seen that behavior before. It happens when the cluster
 doesn't have enough resources to offer and the RM hasn't given us our
 containers yet. Can you check the RM Web UI at port 8088 to see whether
 your application is requesting more resources than the cluster has to offer?

 2014-12-05 12:51 GMT-08:00 Sandy Ryza sandy.r...@cloudera.com:

 Hey Arun,

 The sleeps would only cause maximum like 5 second overhead.  The idea
 was to give executors some time to register.  On more recent versions, they
 were replaced with the spark.scheduler.minRegisteredResourcesRatio and
 spark.scheduler.maxRegisteredResourcesWaitingTime.  As of 1.1, by
 default YARN will wait until either 30 seconds have passed or 80% of the
 requested executors have registered.

 -Sandy

 On Fri, Dec 5, 2014 at 12:46 PM, Ashish Rangole arang...@gmail.com
 wrote:

 Likely this not the case here yet one thing to point out with Yarn
 parameters like --num-executors is that they should be specified *before*
 app jar and app args on spark-submit command line otherwise the app only
 gets the default number of containers which is 2.
 On Dec 5, 2014 12:22 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Denny,

 Those sleeps were only at startup, so if jobs are taking significantly
 longer on YARN, that should be a different problem.  When you ran on YARN,
 did you use the --executor-cores, --executor-memory, and --num-executors
 arguments?  When running against a standalone cluster, by default Spark
 will make use of all the cluster resources, but when running against YARN,
 Spark defaults to a couple tiny executors.

 -Sandy

 On Fri, Dec 5, 2014 at 11:32 AM, Denny Lee denny.g@gmail.com
 wrote:

 My submissions of Spark on YARN (CDH 5.2) resulted in a few thousand
 steps. If I was running this on standalone cluster mode the query 
 finished
 in 55s but on YARN, the query was still running 30min later. Would the 
 hard
 coded sleeps potentially be in play here?
 On Fri, Dec 5, 2014 at 11:23 Sandy Ryza sandy.r...@cloudera.com
 wrote:

 Hi Tobias,

 What version are you using?  In some recent versions, we had a
 couple of large hardcoded sleeps on the Spark side.

 -Sandy

 On Fri, Dec 5, 2014 at 11:15 AM, Andrew Or and...@databricks.com
 wrote:

 Hey Tobias,

 As you suspect, the reason why it's slow is because the resource
 manager in YARN takes a while to grant resources. This is because YARN
 needs to first set up the application master container, and then this 
 AM
 needs to request more containers for Spark executors. I think this 
 accounts
 for most of the overhead. The remaining source probably comes from how 
 our
 own YARN integration code polls application (every second) and cluster
 resource states (every 5 seconds IIRC). I haven't explored in detail
 whether there are optimizations there that can speed this up, but I 
 believe
 most of the overhead comes from YARN itself.

 In other words, no I don't know of any quick fix on your end that
 you can do to speed this up.

 -Andrew


 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp:

 Hi,

 I am using spark-submit to submit my application to YARN in
 yarn-cluster mode. I have both the Spark assembly jar file as well 
 as my
 application jar file put in HDFS and can see from the logging output 
 that
 both files are used from there. However, it still takes about 10 
 seconds
 for my application's yarnAppState to switch from ACCEPTED to RUNNING.

 I am aware that this is probably not a Spark issue, but some YARN
 configuration setting (or YARN-inherent slowness), I was just 
 wondering if
 anyone has an advice for how to speed this up.

 Thanks
 Tobias

spark-submit on YARN is slow

2014-12-03 Thread Tobias Pfeiffer

Hi,

I am using spark-submit to submit my application to YARN in yarn-cluster
mode. I have both the Spark assembly jar file as well as my application jar
file put in HDFS and can see from the logging output that both files are
used from there. However, it still takes about 10 seconds for my
application's yarnAppState to switch from ACCEPTED to RUNNING.

I am aware that this is probably not a Spark issue, but some YARN
configuration setting (or YARN-inherent slowness), I was just wondering if
anyone has an advice for how to speed this up.

Thanks
Tobias

Re: prepending jars to the driver class path for spark-submit on YARN

2014-09-09 Thread Penny Espinoza

:1350)
   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
   at 
org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:147)
   at 
java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
   at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
   at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
   at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:165)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)

If I do not mark spark-core and spark-streaming as provided, and also omit the 
exclusions, I get the same error I get when they are marked as provided and 
excluded.






From: Xiangrui Meng men...@gmail.commailto:men...@gmail.com
Sent: Sunday, September 07, 2014 11:40 PM
To: Victor Tso-Guillen
Cc: Penny Espinoza; Spark
Subject: Re: prepending jars to the driver class path for spark-submit on YARN

There is an undocumented configuration to put users jars in front of
spark jar. But I'm not very certain that it works as expected (and
this is why it is undocumented). Please try turning on
spark.yarn.user.classpath.first . -Xiangrui

On Sat, Sep 6, 2014 at 5:13 PM, Victor Tso-Guillen 
v...@paxata.commailto:v...@paxata.com wrote:
I ran into the same issue. What I did was use maven shade plugin to shade my
version of httpcomponents libraries into another package.


On Fri, Sep 5, 2014 at 4:33 PM, Penny Espinoza
pesp...@societyconsulting.commailto:pesp...@societyconsulting.com wrote:

Hey - I’m struggling with some dependency issues with
org.apache.httpcomponents httpcore and httpclient when using spark-submit
with YARN running Spark 1.0.2 on a Hadoop 2.2 cluster.  I’ve seen several
posts about this issue, but no resolution.

The error message is this:


Caused by: java.lang.NoSuchMethodError:
org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
   at
org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
   at
org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
   at
org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
   at
org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:85)
   at
org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:93)
   at
com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26)
   at
com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96)
   at
com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:155)
   at
com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:118)
   at
com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:102)
   at
com.amazonaws.services.s3.AmazonS3Client.init(AmazonS3Client.java:332)
   at
com.oncue.rna.realtime.streaming.config.package$.transferManager(package.scala:76)
   at
com.oncue.rna.realtime.streaming.models.S3SchemaRegistry.init(SchemaRegistry.scala:27)
   at
com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry$lzycompute(SchemaRegistry.scala:46)
   at
com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry(SchemaRegistry.scala:44)
   at
com.oncue.rna.realtime.streaming.coders.KafkaAvroDecoder.init(KafkaAvroDecoder.scala:20)
   ... 17 more

The apache httpcomponents libraries include the method above as of version
4.2.  The Spark 1.0.2 binaries seem to include version 4.1.

I can get this to work in my driver program by adding exclusions to force
use of 4.1, but then I get the error in tasks even when using the —jars
option of the spark-submit command.  How can I get both the driver program
and the individual tasks in my spark-streaming job to use the same version
of this library so my job will run all the way through?

thanks
p


-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org

Re: prepending jars to the driver class path for spark-submit on YARN

2014-09-08 Thread Xiangrui Meng

There is an undocumented configuration to put users jars in front of
spark jar. But I'm not very certain that it works as expected (and
this is why it is undocumented). Please try turning on
spark.yarn.user.classpath.first . -Xiangrui

On Sat, Sep 6, 2014 at 5:13 PM, Victor Tso-Guillen v...@paxata.com wrote:
 I ran into the same issue. What I did was use maven shade plugin to shade my
 version of httpcomponents libraries into another package.


 On Fri, Sep 5, 2014 at 4:33 PM, Penny Espinoza
 pesp...@societyconsulting.com wrote:

 Hey - I’m struggling with some dependency issues with
 org.apache.httpcomponents httpcore and httpclient when using spark-submit
 with YARN running Spark 1.0.2 on a Hadoop 2.2 cluster.  I’ve seen several
 posts about this issue, but no resolution.

 The error message is this:


 Caused by: java.lang.NoSuchMethodError:
 org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:85)
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:93)
 at
 com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26)
 at
 com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96)
 at
 com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:155)
 at
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:118)
 at
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:102)
 at
 com.amazonaws.services.s3.AmazonS3Client.init(AmazonS3Client.java:332)
 at
 com.oncue.rna.realtime.streaming.config.package$.transferManager(package.scala:76)
 at
 com.oncue.rna.realtime.streaming.models.S3SchemaRegistry.init(SchemaRegistry.scala:27)
 at
 com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry$lzycompute(SchemaRegistry.scala:46)
 at
 com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry(SchemaRegistry.scala:44)
 at
 com.oncue.rna.realtime.streaming.coders.KafkaAvroDecoder.init(KafkaAvroDecoder.scala:20)
 ... 17 more

 The apache httpcomponents libraries include the method above as of version
 4.2.  The Spark 1.0.2 binaries seem to include version 4.1.

 I can get this to work in my driver program by adding exclusions to force
 use of 4.1, but then I get the error in tasks even when using the —jars
 option of the spark-submit command.  How can I get both the driver program
 and the individual tasks in my spark-streaming job to use the same version
 of this library so my job will run all the way through?

 thanks
 p



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: prepending jars to the driver class path for spark-submit on YARN

2014-09-08 Thread Penny Espinoza

I don't understand what you mean.  Can you be more specific?

From: Victor Tso-Guillen v...@paxata.com
Sent: Saturday, September 06, 2014 5:13 PM
To: Penny Espinoza
Cc: Spark
Subject: Re: prepending jars to the driver class path for spark-submit on YARN

I ran into the same issue. What I did was use maven shade plugin to shade my 
version of httpcomponents libraries into another package.

On Fri, Sep 5, 2014 at 4:33 PM, Penny Espinoza 
pesp...@societyconsulting.commailto:pesp...@societyconsulting.com wrote:
Hey - I'm struggling with some dependency issues with org.apache.httpcomponents 
httpcore and httpclient when using spark-submit with YARN running Spark 1.0.2 
on a Hadoop 2.2 cluster.  I've seen several posts about this issue, but no 
resolution.

The error message is this:

Caused by: java.lang.NoSuchMethodError: 
org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:85)
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:93)
at 
com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26)
at 
com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96)
at com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:155)
at 
com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:118)
at 
com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:102)
at 
com.amazonaws.services.s3.AmazonS3Client.init(AmazonS3Client.java:332)
at 
com.oncue.rna.realtime.streaming.config.package$.transferManager(package.scala:76)
at 
com.oncue.rna.realtime.streaming.models.S3SchemaRegistry.init(SchemaRegistry.scala:27)
at 
com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry$lzycompute(SchemaRegistry.scala:46)
at 
com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry(SchemaRegistry.scala:44)
at 
com.oncue.rna.realtime.streaming.coders.KafkaAvroDecoder.init(KafkaAvroDecoder.scala:20)
... 17 more

The apache httpcomponents libraries include the method above as of version 4.2. 
 The Spark 1.0.2 binaries seem to include version 4.1.

I can get this to work in my driver program by adding exclusions to force use 
of 4.1, but then I get the error in tasks even when using the -jars option of 
the spark-submit command.  How can I get both the driver program and the 
individual tasks in my spark-streaming job to use the same version of this 
library so my job will run all the way through?

thanks
p

Re: prepending jars to the driver class path for spark-submit on YARN

2014-09-08 Thread Xiangrui Meng

When you submit the job to yarn with spark-submit, set --conf
spark.yarn.user.classpath.first=true .

On Mon, Sep 8, 2014 at 10:46 AM, Penny Espinoza
pesp...@societyconsulting.com wrote:
 I don't understand what you mean.  Can you be more specific?


 
 From: Victor Tso-Guillen v...@paxata.com
 Sent: Saturday, September 06, 2014 5:13 PM
 To: Penny Espinoza
 Cc: Spark
 Subject: Re: prepending jars to the driver class path for spark-submit on
 YARN

 I ran into the same issue. What I did was use maven shade plugin to shade my
 version of httpcomponents libraries into another package.


 On Fri, Sep 5, 2014 at 4:33 PM, Penny Espinoza
 pesp...@societyconsulting.com wrote:

 Hey - I’m struggling with some dependency issues with
 org.apache.httpcomponents httpcore and httpclient when using spark-submit
 with YARN running Spark 1.0.2 on a Hadoop 2.2 cluster.  I’ve seen several
 posts about this issue, but no resolution.

 The error message is this:


 Caused by: java.lang.NoSuchMethodError:
 org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:85)
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:93)
 at
 com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26)
 at
 com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96)
 at
 com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:155)
 at
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:118)
 at
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:102)
 at
 com.amazonaws.services.s3.AmazonS3Client.init(AmazonS3Client.java:332)
 at
 com.oncue.rna.realtime.streaming.config.package$.transferManager(package.scala:76)
 at
 com.oncue.rna.realtime.streaming.models.S3SchemaRegistry.init(SchemaRegistry.scala:27)
 at
 com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry$lzycompute(SchemaRegistry.scala:46)
 at
 com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry(SchemaRegistry.scala:44)
 at
 com.oncue.rna.realtime.streaming.coders.KafkaAvroDecoder.init(KafkaAvroDecoder.scala:20)
 ... 17 more

 The apache httpcomponents libraries include the method above as of version
 4.2.  The Spark 1.0.2 binaries seem to include version 4.1.

 I can get this to work in my driver program by adding exclusions to force
 use of 4.1, but then I get the error in tasks even when using the —jars
 option of the spark-submit command.  How can I get both the driver program
 and the individual tasks in my spark-streaming job to use the same version
 of this library so my job will run all the way through?

 thanks
 p



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: prepending jars to the driver class path for spark-submit on YARN

2014-09-08 Thread Penny Espinoza

)
at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:165)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

If I do not mark spark-core and spark-streaming as provided, and also omit the 
exclusions, I get the same error I get when they are marked as provided and 
excluded.






From: Xiangrui Meng men...@gmail.com
Sent: Sunday, September 07, 2014 11:40 PM
To: Victor Tso-Guillen
Cc: Penny Espinoza; Spark
Subject: Re: prepending jars to the driver class path for spark-submit on YARN

There is an undocumented configuration to put users jars in front of
spark jar. But I'm not very certain that it works as expected (and
this is why it is undocumented). Please try turning on
spark.yarn.user.classpath.first . -Xiangrui

On Sat, Sep 6, 2014 at 5:13 PM, Victor Tso-Guillen v...@paxata.com wrote:
 I ran into the same issue. What I did was use maven shade plugin to shade my
 version of httpcomponents libraries into another package.


 On Fri, Sep 5, 2014 at 4:33 PM, Penny Espinoza
 pesp...@societyconsulting.com wrote:

 Hey - I’m struggling with some dependency issues with
 org.apache.httpcomponents httpcore and httpclient when using spark-submit
 with YARN running Spark 1.0.2 on a Hadoop 2.2 cluster.  I’ve seen several
 posts about this issue, but no resolution.

 The error message is this:


 Caused by: java.lang.NoSuchMethodError:
 org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:85)
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:93)
 at
 com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26)
 at
 com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96)
 at
 com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:155)
 at
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:118)
 at
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:102)
 at
 com.amazonaws.services.s3.AmazonS3Client.init(AmazonS3Client.java:332)
 at
 com.oncue.rna.realtime.streaming.config.package$.transferManager(package.scala:76)
 at
 com.oncue.rna.realtime.streaming.models.S3SchemaRegistry.init(SchemaRegistry.scala:27)
 at
 com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry$lzycompute(SchemaRegistry.scala:46)
 at
 com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry(SchemaRegistry.scala:44)
 at
 com.oncue.rna.realtime.streaming.coders.KafkaAvroDecoder.init(KafkaAvroDecoder.scala:20)
 ... 17 more

 The apache httpcomponents libraries include the method above as of version
 4.2.  The Spark 1.0.2 binaries seem to include version 4.1.

 I can get this to work in my driver program by adding exclusions to force
 use of 4.1, but then I get the error in tasks even when using the —jars
 option of the spark-submit command.  How can I get both the driver program
 and the individual tasks in my spark-streaming job to use the same version
 of this library so my job will run all the way through?

 thanks
 p


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: prepending jars to the driver class path for spark-submit on YARN

2014-09-08 Thread Penny Espinoza

?VIctor - Not sure what you mean.  Can you provide more detail about what you 
did?

From: Victor Tso-Guillen v...@paxata.com
Sent: Saturday, September 06, 2014 5:13 PM
To: Penny Espinoza
Cc: Spark
Subject: Re: prepending jars to the driver class path for spark-submit on YARN

I ran into the same issue. What I did was use maven shade plugin to shade my 
version of httpcomponents libraries into another package.

On Fri, Sep 5, 2014 at 4:33 PM, Penny Espinoza 
pesp...@societyconsulting.commailto:pesp...@societyconsulting.com wrote:
Hey - I'm struggling with some dependency issues with org.apache.httpcomponents 
httpcore and httpclient when using spark-submit with YARN running Spark 1.0.2 
on a Hadoop 2.2 cluster.  I've seen several posts about this issue, but no 
resolution.

The error message is this:

Caused by: java.lang.NoSuchMethodError: 
org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:85)
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:93)
at 
com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26)
at 
com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96)
at com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:155)
at 
com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:118)
at 
com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:102)
at 
com.amazonaws.services.s3.AmazonS3Client.init(AmazonS3Client.java:332)
at 
com.oncue.rna.realtime.streaming.config.package$.transferManager(package.scala:76)
at 
com.oncue.rna.realtime.streaming.models.S3SchemaRegistry.init(SchemaRegistry.scala:27)
at 
com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry$lzycompute(SchemaRegistry.scala:46)
at 
com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry(SchemaRegistry.scala:44)
at 
com.oncue.rna.realtime.streaming.coders.KafkaAvroDecoder.init(KafkaAvroDecoder.scala:20)
... 17 more

The apache httpcomponents libraries include the method above as of version 4.2. 
 The Spark 1.0.2 binaries seem to include version 4.1.

I can get this to work in my driver program by adding exclusions to force use 
of 4.1, but then I get the error in tasks even when using the -jars option of 
the spark-submit command.  How can I get both the driver program and the 
individual tasks in my spark-streaming job to use the same version of this 
library so my job will run all the way through?

thanks
p

Re: prepending jars to the driver class path for spark-submit on YARN

2014-09-06 Thread Victor Tso-Guillen

I ran into the same issue. What I did was use maven shade plugin to shade
my version of httpcomponents libraries into another package.


On Fri, Sep 5, 2014 at 4:33 PM, Penny Espinoza 
pesp...@societyconsulting.com wrote:

  Hey - I’m struggling with some dependency issues with
 org.apache.httpcomponents httpcore and httpclient when using spark-submit
 with YARN running Spark 1.0.2 on a Hadoop 2.2 cluster.  I’ve seen several
 posts about this issue, but no resolution.

  The error message is this:


  Caused by: java.lang.NoSuchMethodError:
 org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:85)
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:93)
 at
 com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26)
 at
 com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96)
 at
 com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:155)
 at
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:118)
 at
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:102)
 at
 com.amazonaws.services.s3.AmazonS3Client.init(AmazonS3Client.java:332)
 at
 com.oncue.rna.realtime.streaming.config.package$.transferManager(package.scala:76)
 at
 com.oncue.rna.realtime.streaming.models.S3SchemaRegistry.init(SchemaRegistry.scala:27)
 at
 com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry$lzycompute(SchemaRegistry.scala:46)
 at
 com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry(SchemaRegistry.scala:44)
 at
 com.oncue.rna.realtime.streaming.coders.KafkaAvroDecoder.init(KafkaAvroDecoder.scala:20)
 ... 17 more

   The apache httpcomponents libraries include the method above as of
 version 4.2.  The Spark 1.0.2 binaries seem to include version 4.1.

  I can get this to work in my driver program by adding exclusions to
 force use of 4.1, but then I get the error in tasks even when using the
 —jars option of the spark-submit command.  How can I get both the driver
 program and the individual tasks in my spark-streaming job to use the same
 version of this library so my job will run all the way through?

  thanks
 p

prepending jars to the driver class path for spark-submit on YARN

2014-09-05 Thread Penny Espinoza

Hey - I’m struggling with some dependency issues with org.apache.httpcomponents 
httpcore and httpclient when using spark-submit with YARN running Spark 1.0.2 
on a Hadoop 2.2 cluster.  I’ve seen several posts about this issue, but no 
resolution.

The error message is this:


Caused by: java.lang.NoSuchMethodError: 
org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:85)
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:93)
at 
com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26)
at 
com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96)
at com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:155)
at 
com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:118)
at 
com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:102)
at 
com.amazonaws.services.s3.AmazonS3Client.init(AmazonS3Client.java:332)
at 
com.oncue.rna.realtime.streaming.config.package$.transferManager(package.scala:76)
at 
com.oncue.rna.realtime.streaming.models.S3SchemaRegistry.init(SchemaRegistry.scala:27)
at 
com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry$lzycompute(SchemaRegistry.scala:46)
at 
com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry(SchemaRegistry.scala:44)
at 
com.oncue.rna.realtime.streaming.coders.KafkaAvroDecoder.init(KafkaAvroDecoder.scala:20)
... 17 more

The apache httpcomponents libraries include the method above as of version 4.2. 
 The Spark 1.0.2 binaries seem to include version 4.1.

I can get this to work in my driver program by adding exclusions to force use 
of 4.1, but then I get the error in tasks even when using the —jars option of 
the spark-submit command.  How can I get both the driver program and the 
individual tasks in my spark-streaming job to use the same version of this 
library so my job will run all the way through?

thanks
p

spark-submit with Yarn

2014-08-19 Thread Arun Ahuja

Is there more documentation on using spark-submit with Yarn?  Trying to
launch a simple job does not seem to work.

My run command is as follows:

/opt/cloudera/parcels/CDH/bin/spark-submit \
--master yarn \
--deploy-mode client \
--executor-memory 10g \
--driver-memory 10g \
--num-executors 50 \
--class $MAIN_CLASS \
--verbose \
$JAR \
$@

The verbose logging correctly parses the arguments:

System properties:
spark.executor.memory - 10g
spark.executor.instances - 50
SPARK_SUBMIT - true
spark.master - yarn-client


But when I view the job 4040 page, SparkUI, there is a single executor
(just the driver node) and I see the following in enviroment

spark.master - local[24]

Also, when I run with yarn-cluster, how can I access the SparkUI page?

Thanks,
Arun

Re: spark-submit with Yarn

2014-08-19 Thread Marcelo Vanzin

On Tue, Aug 19, 2014 at 2:34 PM, Arun Ahuja aahuj...@gmail.com wrote:
 /opt/cloudera/parcels/CDH/bin/spark-submit \
 --master yarn \
 --deploy-mode client \

This should be enough.

 But when I view the job 4040 page, SparkUI, there is a single executor (just
 the driver node) and I see the following in enviroment

 spark.master - local[24]

Hmmm. Are you sure the app itself is not overwriting spark.master
before creating the SparkContext? That's the only explanation I can
think of.

 Also, when I run with yarn-cluster, how can I access the SparkUI page?

You can click on the link in the RM application list. The address is
also printed to the AM logs, which are also available through the RM
web ui. Finally, the link is printed to the output of the launcher
process (look for appTrackingUrl).


-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: spark-submit with Yarn

2014-08-19 Thread Arun Ahuja

Yes, the application is overwriting it - I need to pass it as argument to
the application otherwise it will be set as local.

Thanks for the quick reply!  Also, yes now the appTrackingUrl is set
properly as well, before it just said unassigned.

Thanks!
Arun


On Tue, Aug 19, 2014 at 5:47 PM, Marcelo Vanzin van...@cloudera.com wrote:

 On Tue, Aug 19, 2014 at 2:34 PM, Arun Ahuja aahuj...@gmail.com wrote:
  /opt/cloudera/parcels/CDH/bin/spark-submit \
  --master yarn \
  --deploy-mode client \

 This should be enough.

  But when I view the job 4040 page, SparkUI, there is a single executor
 (just
  the driver node) and I see the following in enviroment
 
  spark.master - local[24]

 Hmmm. Are you sure the app itself is not overwriting spark.master
 before creating the SparkContext? That's the only explanation I can
 think of.

  Also, when I run with yarn-cluster, how can I access the SparkUI page?

 You can click on the link in the RM application list. The address is
 also printed to the AM logs, which are also available through the RM
 web ui. Finally, the link is printed to the output of the launcher
 process (look for appTrackingUrl).


 --
 Marcelo

Re: spark-submit with Yarn

2014-08-19 Thread Andrew Or

 The --master should override any other ways of setting the Spark
master.

Ah yes, actually you can set spark.master directly in your application
through SparkConf. Thanks Marcelo.


2014-08-19 14:47 GMT-07:00 Marcelo Vanzin van...@cloudera.com:

 On Tue, Aug 19, 2014 at 2:34 PM, Arun Ahuja aahuj...@gmail.com wrote:
  /opt/cloudera/parcels/CDH/bin/spark-submit \
  --master yarn \
  --deploy-mode client \

 This should be enough.

  But when I view the job 4040 page, SparkUI, there is a single executor
 (just
  the driver node) and I see the following in enviroment
 
  spark.master - local[24]

 Hmmm. Are you sure the app itself is not overwriting spark.master
 before creating the SparkContext? That's the only explanation I can
 think of.

  Also, when I run with yarn-cluster, how can I access the SparkUI page?

 You can click on the link in the RM application list. The address is
 also printed to the AM logs, which are also available through the RM
 web ui. Finally, the link is printed to the output of the launcher
 process (look for appTrackingUrl).


 --
 Marcelo

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

42 matches

Mail list logo