Sorry, here is the node-manager log. application_1463692924309_0002 is my test. Hope this will help. http://pastebin.com/0BPEcgcW
On 5/20/16, 2:09 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote: >Hi Weifeng, > >That's the Spark event log, not the YARN application log. You get the >latter using the "yarn logs" command. > >On Fri, May 20, 2016 at 1:14 PM, Cui, Weifeng <weife...@a9.com> wrote: >> Here is the application log for this spark job. >> >> http://pastebin.com/2UJS9L4e >> >> >> >> Thanks, >> Weifeng >> >> >> >> >> >> From: "Aulakh, Sahib" <aula...@a9.com> >> Date: Friday, May 20, 2016 at 12:43 PM >> To: Ted Yu <yuzhih...@gmail.com> >> Cc: Rodrick Brown <rodr...@orchardplatform.com>, Cui Weifeng >> <weife...@a9.com>, user <user@spark.apache.org>, "Zhao, Jun" >> <junz...@a9.com> >> Subject: Re: Can not set spark dynamic resource allocation >> >> >> >> Yes it is yarn. We have configured spark shuffle service w yarn node manager >> but something must be off. >> >> >> >> We will send u app log on paste bin. >> >> Sent from my iPhone >> >> >> On May 20, 2016, at 12:35 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >> Since yarn-site.xml was cited, I assume the cluster runs YARN. >> >> >> >> On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown >> <rodr...@orchardplatform.com> wrote: >> >> Is this Yarn or Mesos? For the later you need to start an external shuffle >> service. >> >> Get Outlook for iOS >> >> >> >> >> >> On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" <weife...@a9.com> >> wrote: >> >> Hi guys, >> >> >> >> Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set dynamic >> resource allocation for spark and we followed the following link. After the >> changes, all spark jobs failed. >> >> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation >> >> This test was on a test cluster which has 1 master machine (running >> namenode, resourcemanager and hive server), 1 worker machine (running >> datanode and nodemanager) and 1 machine as client( running spark shell). >> >> >> >> What I updated in config : >> >> >> >> 1. Update in spark-defaults.conf >> >> spark.dynamicAllocation.enabled true >> >> spark.shuffle.service.enabled true >> >> >> >> 2. Update yarn-site.xml >> >> <property> >> >> <name>yarn.nodemanager.aux-services</name> >> <value>mapreduce_shuffle,spark_shuffle</value> >> </property> >> >> <property> >> <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> >> <value>org.apache.spark.network.yarn.YarnShuffleService</value> >> </property> >> >> <property> >> <name>spark.shuffle.service.enabled</name> >> <value>true</value> >> </property> >> >> 3. Copy spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath >> ($HADOOP_HOME/share/hadoop/yarn/*) in python code >> >> 4. Restart namenode, datanode, resourcemanager, nodemanger... retart >> everything >> >> 5. The config will update in all machines, resourcemanager and nodemanager. >> We update the config in one place and copy to all machines. >> >> >> >> What I tested: >> >> >> >> 1. I started a scala spark shell and check its environment variables, >> spark.dynamicAllocation.enabled is true. >> >> 2. I used the following code: >> >> scala > val line = >> sc.textFile("/spark-events/application_1463681113470_0006") >> >> line: org.apache.spark.rdd.RDD[String] = >> /spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at textFile >> at <console>:27 >> >> scala > line.count # This command just stuck here >> >> >> >> 3. In the beginning, there is only 1 executor(this is for driver) and after >> line.count, I could see 3 executors, then dropped to 1. >> >> 4. Several jobs were launched and all of them failed. Tasks (for all >> stages): Succeeded/Total : 0/2 (4 failed) >> >> >> >> Error messages: >> >> >> >> I found the following messages in spark web UI. I found this in spark.log on >> nodemanager machine as well. >> >> >> >> ExecutorLostFailure (executor 1 exited caused by one of the running tasks) >> Reason: Container marked as failed: container_1463692924309_0002_01_000002 >> on host: xxxxxxxxxxxxxxx.com. Exit status: 1. Diagnostics: Exception from >> container-launch. >> Container id: container_1463692924309_0002_01_000002 >> Exit code: 1 >> Stack trace: ExitCodeException exitCode=1: >> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) >> at org.apache.hadoop.util.Shell.run(Shell.java:455) >> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) >> at >> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) >> at >> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) >> at >> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> Container exited with a non-zero exit code 1 >> >> >> >> Thanks a lot for help. We can provide more information if needed. >> >> >> >> Thanks, >> Weifeng >> >> >> >> >> >> >> >> >> >> >> >> >> >> NOTICE TO RECIPIENTS: This communication is confidential and intended for >> the use of the addressee only. If you are not an intended recipient of this >> communication, please delete it immediately and notify the sender by return >> email. Unauthorized reading, dissemination, distribution or copying of this >> communication is prohibited. This communication does not constitute an offer >> to sell or a solicitation of an indication of interest to purchase any loan, >> security or any other financial product or instrument, nor is it an offer to >> sell or a solicitation of an indication of interest to purchase any products >> or services to any persons who are prohibited from receiving such >> information under applicable law. The contents of this communication may not >> be accurate or complete and are subject to change without notice. As such, >> Orchard App, Inc. (including its subsidiaries and affiliates, "Orchard") >> makes no representation regarding the accuracy or completeness of the >> information contained herein. The intended recipient is advised to consult >> its own professional advisors, including those specializing in legal, tax >> and accounting matters. Orchard does not provide legal, tax or accounting >> advice. >> >> > > > >-- >Marcelo --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org