Re: Tez lauche container error when use UseG1GC

Hitesh Shah Fri, 29 May 2015 08:32:21 -0700

To clarify, given that the error is showing up with 
container_1432885077153_0004_01_000005, that means that the AM launched 
properly.


Use “bin/yarn logs -applicationId application_1432885077153_0004" to get the 
logs. See if there are any errors for the logs for 
container_1432885077153_0004_01_000005. If there are none, you will need to 
search for "Assigning container to task” for the above container in the AM’s 
logs. Using this log line, you will see what host the container belongs to and 
you should then look at the NodeManager logs and search for the container id.

The above would be a lot simpler if you have the UI setup to work against 0.5.3 
but may still require you to dig through the NodeManager logs. 

thanks
— Hitesh 

On May 29, 2015, at 3:48 AM, Jianfeng (Jeff) Zhang <jzh...@hortonworks.com> 
wrote:

> 
> Could you check the yarn app logs to see what the error is ?  If there’s 
> still no useful info, you may refer the yarn RM/NN logs
> 
> 
> 
> 
> Best Regard,
> Jeff Zhang
> 
> 
> From: "r7raul1...@163.com" <r7raul1...@163.com>
> Reply-To: user <user@tez.apache.org>
> Date: Friday, May 29, 2015 at 4:16 PM
> To: user <user@tez.apache.org>
> Subject: Re: Tez lauche container error when use UseG1GC
> 
> BTW my tez_site.xml content is:
> <configuration> 
> <property> 
> <name>tez.lib.uris</name> 
> <value>hdfs:///apps/tez-0.5.3/tez-0.5.3.tar.gz</value> 
> </property> 
> <property> 
> <name>tez.task.generate.counters.per.io</name> 
> <value>true</value> 
> </property> 
> <property> 
> <description>Log history using the Timeline Server</description> 
> <name>tez.history.logging.service.class</name> 
> <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
>  
> </property> 
> <property> 
> <description>Publish configuration information to Timeline server 
> </description> 
> <name>tez.runtime.convert.user-payload.to.history-text</name> 
> <value>true</value> 
> </property> 
> <property> 
> <name>tez.am.launch.cmd-opts</name> 
> <value>-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA 
> -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/</value> 
> </property> 
> 
> </configuration>
> 
> r7raul1...@163.com
>  
> From: r7raul1...@163.com
> Date: 2015-05-29 16:15
> To: user
> Subject: Tez lauche container error when use UseG1GC
>  I change my mapreduce.map.java.opts  's  value from 
> -Djava.net.preferIPv4Stack=true  -Xmx825955249  to  
> -Djava.net.preferIPv4Stack=true -XX:+UseG1GC  -Xmx825955249
> 
> When I run query by hive 1.1.0+tez0.53 in hadoop 2.5.0.
> 
> set mapreduce.framework.name=yarn-tez; 
> set hive.execution.engine=tez; 
> select userid,count(*) from u_data group by userid order by userid;
> The  query return error.
> I found error :
> 2015-05-29 16:02:39,064 WARN [AsyncDispatcher event handler] 
> container.AMContainerImpl: Container container_1432885077153_0004_01_000005 
> finished with diagnostics set to [Container failed. Exception from 
> container-launch. 
> Container id: container_1432885077153_0004_01_000005 
> Exit code: 1 
> Stack trace: ExitCodeException exitCode=1: 
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) 
> at org.apache.hadoop.util.Shell.run(Shell.java:455) 
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) 
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
>  
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  
> at java.lang.Thread.run(Thread.java:745) 
> 
> But I try
> hive> set hive.execution.engine=mr; 
> hive> set mapreduce.framework.name=yarn; 
> hive> select userid,count(*) from u_data group by userid order by userid 
> limit 1; 
> Query ID = hdfs_20150529160606_d550bca4-0341-4eb0-aace-a9018bfbb7a9 
> Total jobs = 2 
> Launching Job 1 out of 2 
> Number of reduce tasks not specified. Estimated from input data size: 1 
> In order to change the average load for a reducer (in bytes): 
> set hive.exec.reducers.bytes.per.reducer=<number> 
> In order to limit the maximum number of reducers: 
> set hive.exec.reducers.max=<number> 
> In order to set a constant number of reducers: 
> set mapreduce.job.reduces=<number> 
> Starting Job = job_1432885077153_0005, Tracking URL = 
> http://localhost:8088/proxy/application_1432885077153_0005/ 
> Kill Command = 
> /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop job 
> -kill job_1432885077153_0005 
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
> 1 
> 2015-05-29 16:06:34,863 Stage-1 map = 0%, reduce = 0% 
> 2015-05-29 16:06:40,066 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.72 
> sec 
> 2015-05-29 16:06:48,366 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 
> 2.96 sec 
> MapReduce Total cumulative CPU time: 2 seconds 960 msec 
> Ended Job = job_1432885077153_0005 
> Launching Job 2 out of 2 
> Number of reduce tasks determined at compile time: 1 
> In order to change the average load for a reducer (in bytes): 
> set hive.exec.reducers.bytes.per.reducer=<number> 
> In order to limit the maximum number of reducers: 
> set hive.exec.reducers.max=<number> 
> In order to set a constant number of reducers: 
> set mapreduce.job.reduces=<number> 
> Starting Job = job_1432885077153_0006, Tracking URL = 
> http://localhost:8088/proxy/application_1432885077153_0006/ 
> Kill Command = 
> /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop job 
> -kill job_1432885077153_0006 
> Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 
> 1 
> 2015-05-29 16:07:03,333 Stage-2 map = 0%, reduce = 0% 
> 2015-05-29 16:07:07,485 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.2 
> sec 
> 2015-05-29 16:07:15,739 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 
> 2.35 sec 
> MapReduce Total cumulative CPU time: 2 seconds 350 msec 
> Ended Job = job_1432885077153_0006 
> MapReduce Jobs Launched: 
> Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.96 sec HDFS Read: 1985399 
> HDFS Write: 20068 SUCCESS 
> Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 2.35 sec HDFS Read: 24481 
> HDFS Write: 6 SUCCESS 
> Total MapReduce CPU Time Spent: 5 seconds 310 msec 
> 
> That's ok.
> 
> 
> r7raul1...@163.com

Re: Tez lauche container error when use UseG1GC

Reply via email to