I’ve gotten a little further along. It now submits the job via Yarn, but now the jobs exit immediately with the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Logging at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:646) at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) I’ve checked and the class does live in the spark assembly. Any thoughts as what might be wrong? Best Regards, David R Robison Senior Systems Engineer [cid:image004.png@01D19182.F24CA3E0] From: David Robison [mailto:david.robi...@psgglobal.net] Sent: Wednesday, November 16, 2016 9:04 AM To: Rohit Verma <rohit.ve...@rokittech.com> Cc: user@spark.apache.org Subject: RE: Problem submitting a spark job using yarn-client as master This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing<http://aka.ms/LearnAboutSpoofing> Feedback<http://aka.ms/SafetyTipsFeedback> Unfortunately, it doesn’t get that far in my code where I have a SparkContext from which to set the Hadoop config parameters. Here is my Java code: SparkConf sparkConf = new SparkConf() .setJars(new String[] { "file:///opt/wildfly/mapreduce/mysparkjob-5.0.0.jar", }) .setSparkHome("/usr/hdp/" + getHdpVersion() + "/spark") .set("fs.defaultFS", config.get("fs.defaultFS")) ; sparkContext = new JavaSparkContext("yarn-client", "SumFramesPerTimeUnit", sparkConf); The job dies in the constructor of the JavaSparkContext. I have a logging call right after creating the SparkContext and it is never executied. Any idea what I’m doing wrong? David Best Regards, David R Robison Senior Systems Engineer [cid:image004.png@01D19182.F24CA3E0] From: Rohit Verma [mailto:rohit.ve...@rokittech.com] Sent: Tuesday, November 15, 2016 9:27 PM To: David Robison <david.robi...@psgglobal.net<mailto:david.robi...@psgglobal.net>> Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Problem submitting a spark job using yarn-client as master you can set hdfs as defaults, sparksession.sparkContext().hadoopConfiguration().set("fs.defaultFS", “hdfs://master_node:8020”); Regards Rohit On Nov 16, 2016, at 3:15 AM, David Robison <david.robi...@psgglobal.net<mailto:david.robi...@psgglobal.net>> wrote: I am trying to submit a spark job through the yarn-client master setting. The job gets created and submitted to the clients but immediately errors out. Here is the relevant portion of the log: 15:39:37,385 INFO [org.apache.spark.deploy.yarn.Client] (default task-1) Requesting a new application from cluster with 1 NodeManagers 15:39:37,397 INFO [org.apache.spark.deploy.yarn.Client] (default task-1) Verifying our application has not requested more than the maximum memory capability of the cluster (4608 MB per container) 15:39:37,398 INFO [org.apache.spark.deploy.yarn.Client] (default task-1) Will allocate AM container, with 896 MB memory including 384 MB overhead 15:39:37,399 INFO [org.apache.spark.deploy.yarn.Client] (default task-1) Setting up container launch context for our AM 15:39:37,403 INFO [org.apache.spark.deploy.yarn.Client] (default task-1) Setting up the launch environment for our AM container 15:39:37,427 INFO [org.apache.spark.deploy.yarn.Client] (default task-1) Preparing resources for our AM container 15:39:37,845 INFO [org.apache.spark.deploy.yarn.Client] (default task-1) Source and destination file systems are the same. Not copying file:/opt/wildfly/modules/org/apache/hadoop/client/main/spark-yarn_2.10-1.6.2.jar 15:39:38,050 INFO [org.apache.spark.deploy.yarn.Client] (default task-1) Source and destination file systems are the same. Not copying file:/tmp/spark-fa954c4a-a6cd-4675-8610-67ce858b4842/__spark_conf__1435451360463636119.zip 15:39:38,102 INFO [org.apache.spark.SecurityManager] (default task-1) Changing view acls to: wildfly,hdfs 15:39:38,105 INFO [org.apache.spark.SecurityManager] (default task-1) Changing modify acls to: wildfly,hdfs 15:39:38,105 INFO [org.apache.spark.SecurityManager] (default task-1) SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(wildfly, hdfs); users with modify permissions: Set(wildfly, hdfs) 15:39:38,138 INFO [org.apache.spark.deploy.yarn.Client] (default task-1) Submitting application 5 to ResourceManager 15:39:38,256 INFO [org.apache.hadoop.yarn.client.api.impl.YarnClientImpl] (default task-1) Submitted application application_1479240217825_0005 15:39:39,269 INFO [org.apache.spark.deploy.yarn.Client] (default task-1) Application report for application_1479240217825_0005 (state: ACCEPTED) 15:39:39,279 INFO [org.apache.spark.deploy.yarn.Client] (default task-1) client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1479242378159 final status: UNDEFINED tracking URL: http://vb1.localdomain:8088/proxy/application_1479240217825_0005/ user: hdfs 15:39:40,285 INFO [org.apache.spark.deploy.yarn.Client] (default task-1) Application report for application_1479240217825_0005 (state: ACCEPTED) 15:39:41,290 INFO [org.apache.spark.deploy.yarn.Client] (default task-1) Application report for application_1479240217825_0005 (state: ACCEPTED) 15:39:42,295 INFO [org.apache.spark.deploy.yarn.Client] (default task-1) Application report for application_1479240217825_0005 (state: FAILED) 15:39:42,295 INFO [org.apache.spark.deploy.yarn.Client] (default task-1) client token: N/A diagnostics: Application application_1479240217825_0005 failed 2 times due to AM Container for appattempt_1479240217825_0005_000002 exited with exitCode: -1000 For more detailed output, check application tracking page:http://vb1.localdomain:8088/cluster/app/application_1479240217825_0005Then, click on links to logs of each attempt. Diagnostics: File file:/tmp/spark-fa954c4a-a6cd-4675-8610-67ce858b4842/__spark_conf__1435451360463636119.zip does not exist java.io.FileNotFoundException: File file:/tmp/spark-fa954c4a-a6cd-4675-8610-67ce858b4842/__spark_conf__1435451360463636119.zip does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421) Notice that the file __spark_conf__1435451360463636119.zip is not copied because it exists, I believe on the hdfs. However when the client goes to fetch it, it is reporting that it does not exist, probably because it is trying to get it from “file:/tmp” not the hdfs. Any idea how I can get this to work? Thanks, David David R Robison Senior Systems Engineer O. +1 512 247 3700 M. +1 757 286 0022 david.robi...@psgglobal.net<mailto:david.robi...@psgglobal.net> www.psgglobal.net<http://www.psgglobal.net/> <image001.png> Prometheus Security Group Global, Inc. 3019 Alvin Devane Boulevard Building 4, Suite 450 Austin, TX 78741 <image001.png>