Re: Tez lauche container error when use UseG1GC

2015-05-31 Thread Jianfeng (Jeff) Zhang
Could you also attach the resource manager log ?


Best Regard,
Jeff Zhang


From: r7raul1...@163.commailto:r7raul1...@163.com 
r7raul1...@163.commailto:r7raul1...@163.com
Reply-To: user user@tez.apache.orgmailto:user@tez.apache.org
Date: Monday, June 1, 2015 at 9:39 AM
To: user user@tez.apache.orgmailto:user@tez.apache.org
Subject: Re: Re: Tez lauche container error when use UseG1GC

node manager log


r7raul1...@163.commailto:r7raul1...@163.com

From: r7raul1...@163.commailto:r7raul1...@163.com
Date: 2015-06-01 09:23
To: usermailto:user@tez.apache.org
Subject: Re: Re: Tez lauche container error when use UseG1GC
Attach is log file.


r7raul1...@163.commailto:r7raul1...@163.com

From: Jianfeng (Jeff) Zhangmailto:jzh...@hortonworks.com
Date: 2015-06-01 08:50
To: usermailto:user@tez.apache.org
Subject: Re: Tez lauche container error when use UseG1GC

From the logs, it is due to container fail to launch. I guess it is due to 
some yarn configuration issue. You’d better to check the node manager logs.
And it looks like you haven’t enable the log aggregation, so you can’t get the 
node manage logs by command “yarn logs”.
You need to check each node manager machine, by default the logs are located in 
$HADOOP_HOME/logs. In the node manager logs, you should be able to see why the 
container fail to launch.


BTW, I would suggest you to enable the log aggregation. You can check this for 
details  
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.4/bk_yarn_resource_mgt/content/ref-375ff479-e530-46d8-9f96-8b52dadb5183.1.html







Best Regard,
Jeff Zhang


From: r7raul1...@163.commailto:r7raul1...@163.com 
r7raul1...@163.commailto:r7raul1...@163.com
Reply-To: user user@tez.apache.orgmailto:user@tez.apache.org
Date: Monday, June 1, 2015 at 8:07 AM
To: user user@tez.apache.orgmailto:user@tez.apache.org
Subject: Re: Re: Tez lauche container error when use UseG1GC


Log is:
Status: Running (Executing on YARN cluster with App id 
application_1432885077153_0011)


VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

Map 1 FAILED 1 0 0 1 4 0
Reducer 2 KILLED 1 0 0 1 0 1
Reducer 3 KILLED 1 0 0 1 0 1

VERTICES: 00/03 [--] 0% ELAPSED TIME: 16.13 s

Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1432885077153_0011_1_00, 
diagnostics=[Task failed, taskId=task_1432885077153_0011_1_00_00, 
diagnostics=[TaskAttempt 0 failed, info=[Container 
container_1432885077153_0011_01_02 finished with diagnostics set to 
[Container failed. Exception from container-launch.
Container id: container_1432885077153_0011_01_02
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
]], TaskAttempt 1 failed, info=[Container 
container_1432885077153_0011_01_03 finished with diagnostics set to 
[Container failed. Exception from container-launch.
Container id: container_1432885077153_0011_01_03
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745

Re: Re: Tez lauche container error when use UseG1GC

2015-05-31 Thread r7raul1...@163.com

mapreduce.map.java.opts = -Djava.net.preferIPv4Stack=true -XX:+PrintGCDetails 
-verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC

Map Task Maximum Heap Size 
mapreduce.map.java.opts.max.heap
825955249 B
default value


r7raul1...@163.com
 
From: r7raul1...@163.com
Date: 2015-06-01 10:27
To: user
Subject: Re: Re: Tez lauche container error when use UseG1GC
Fair Scheduler Preemption 
yarn.scheduler.fair.preemptionFalse 




r7raul1...@163.com
 
From: Jianfeng (Jeff) Zhang
Date: 2015-06-01 10:10
To: user
Subject: Re: Tez lauche container error when use UseG1GC

Could you also attach the resource manager log ? And do you enable the 
preemption ?

Best Regard,
Jeff Zhang


From: r7raul1...@163.com r7raul1...@163.com
Reply-To: user user@tez.apache.org
Date: Monday, June 1, 2015 at 9:39 AM
To: user user@tez.apache.org
Subject: Re: Re: Tez lauche container error when use UseG1GC

node manager log



r7raul1...@163.com
 
From: r7raul1...@163.com
Date: 2015-06-01 09:23
To: user
Subject: Re: Re: Tez lauche container error when use UseG1GC
Attach is log file.



r7raul1...@163.com
 
From: Jianfeng (Jeff) Zhang
Date: 2015-06-01 08:50
To: user
Subject: Re: Tez lauche container error when use UseG1GC

From the logs, it is due to container fail to launch. I guess it is due to some 
yarn configuration issue. You’d better to check the node manager logs.
And it looks like you haven’t enable the log aggregation, so you can’t get the 
node manage logs by command “yarn logs”.
You need to check each node manager machine, by default the logs are located in 
$HADOOP_HOME/logs. In the node manager logs, you should be able to see why the 
container fail to launch.


BTW, I would suggest you to enable the log aggregation. You can check this for 
details  
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.4/bk_yarn_resource_mgt/content/ref-375ff479-e530-46d8-9f96-8b52dadb5183.1.html







Best Regard,
Jeff Zhang


From: r7raul1...@163.com r7raul1...@163.com
Reply-To: user user@tez.apache.org
Date: Monday, June 1, 2015 at 8:07 AM
To: user user@tez.apache.org
Subject: Re: Re: Tez lauche container error when use UseG1GC


Log is:
Status: Running (Executing on YARN cluster with App id 
application_1432885077153_0011) 


 
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED 

 
Map 1 FAILED 1 0 0 1 4 0 
Reducer 2 KILLED 1 0 0 1 0 1 
Reducer 3 KILLED 1 0 0 1 0 1 

 
VERTICES: 00/03 [--] 0% ELAPSED TIME: 16.13 s 

 
Status: Failed 
Vertex failed, vertexName=Map 1, vertexId=vertex_1432885077153_0011_1_00, 
diagnostics=[Task failed, taskId=task_1432885077153_0011_1_00_00, 
diagnostics=[TaskAttempt 0 failed, info=[Container 
container_1432885077153_0011_01_02 finished with diagnostics set to 
[Container failed. Exception from container-launch. 
Container id: container_1432885077153_0011_01_02 
Exit code: 1 
Stack trace: ExitCodeException exitCode=1: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) 
at org.apache.hadoop.util.Shell.run(Shell.java:455) 
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) 
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
 
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
 
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
 
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745) 


Container exited with a non-zero exit code 1 
]], TaskAttempt 1 failed, info=[Container 
container_1432885077153_0011_01_03 finished with diagnostics set to 
[Container failed. Exception from container-launch. 
Container id: container_1432885077153_0011_01_03 
Exit code: 1 
Stack trace: ExitCodeException exitCode=1: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) 
at org.apache.hadoop.util.Shell.run(Shell.java:455) 
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) 
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
 
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
 
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81

Re: Tez lauche container error when use UseG1GC

2015-05-31 Thread Jianfeng (Jeff) Zhang

Could you also attach the resource manager log ? And do you enable the 
preemption ?

Best Regard,
Jeff Zhang


From: r7raul1...@163.commailto:r7raul1...@163.com 
r7raul1...@163.commailto:r7raul1...@163.com
Reply-To: user user@tez.apache.orgmailto:user@tez.apache.org
Date: Monday, June 1, 2015 at 9:39 AM
To: user user@tez.apache.orgmailto:user@tez.apache.org
Subject: Re: Re: Tez lauche container error when use UseG1GC

node manager log


r7raul1...@163.commailto:r7raul1...@163.com

From: r7raul1...@163.commailto:r7raul1...@163.com
Date: 2015-06-01 09:23
To: usermailto:user@tez.apache.org
Subject: Re: Re: Tez lauche container error when use UseG1GC
Attach is log file.


r7raul1...@163.commailto:r7raul1...@163.com

From: Jianfeng (Jeff) Zhangmailto:jzh...@hortonworks.com
Date: 2015-06-01 08:50
To: usermailto:user@tez.apache.org
Subject: Re: Tez lauche container error when use UseG1GC

From the logs, it is due to container fail to launch. I guess it is due to 
some yarn configuration issue. You’d better to check the node manager logs.
And it looks like you haven’t enable the log aggregation, so you can’t get the 
node manage logs by command “yarn logs”.
You need to check each node manager machine, by default the logs are located in 
$HADOOP_HOME/logs. In the node manager logs, you should be able to see why the 
container fail to launch.


BTW, I would suggest you to enable the log aggregation. You can check this for 
details  
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.4/bk_yarn_resource_mgt/content/ref-375ff479-e530-46d8-9f96-8b52dadb5183.1.html







Best Regard,
Jeff Zhang


From: r7raul1...@163.commailto:r7raul1...@163.com 
r7raul1...@163.commailto:r7raul1...@163.com
Reply-To: user user@tez.apache.orgmailto:user@tez.apache.org
Date: Monday, June 1, 2015 at 8:07 AM
To: user user@tez.apache.orgmailto:user@tez.apache.org
Subject: Re: Re: Tez lauche container error when use UseG1GC


Log is:
Status: Running (Executing on YARN cluster with App id 
application_1432885077153_0011)


VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

Map 1 FAILED 1 0 0 1 4 0
Reducer 2 KILLED 1 0 0 1 0 1
Reducer 3 KILLED 1 0 0 1 0 1

VERTICES: 00/03 [--] 0% ELAPSED TIME: 16.13 s

Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1432885077153_0011_1_00, 
diagnostics=[Task failed, taskId=task_1432885077153_0011_1_00_00, 
diagnostics=[TaskAttempt 0 failed, info=[Container 
container_1432885077153_0011_01_02 finished with diagnostics set to 
[Container failed. Exception from container-launch.
Container id: container_1432885077153_0011_01_02
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
]], TaskAttempt 1 failed, info=[Container 
container_1432885077153_0011_01_03 finished with diagnostics set to 
[Container failed. Exception from container-launch.
Container id: container_1432885077153_0011_01_03
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615

Re: Re: Tez lauche container error when use UseG1GC

2015-05-31 Thread r7raul1...@163.com
(ContainerLaunch.java:299)
 
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
 
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745) 


Container exited with a non-zero exit code 1 
]]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
vertex_1432885077153_0011_1_00 [Map 1] killed/failed due to:null] 
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1432885077153_0011_1_01, 
diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as 
other vertex failed. failedTasks:0, Vertex vertex_1432885077153_0011_1_01 
[Reducer 2] killed/failed due to:null] 
Vertex killed, vertexName=Reducer 3, vertexId=vertex_1432885077153_0011_1_02, 
diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as 
other vertex failed. failedTasks:0, Vertex vertex_1432885077153_0011_1_02 
[Reducer 3] killed/failed due to:null] 
DAG failed due to vertex failure. failedVertices:1 killedVertices:2 
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask 

yarn logs -applicationId application_1432885077153_0011 
15/06/01 08:07:01 INFO client.RMProxy: Connecting to ResourceManager at 
localhost/127.0.0.1:8032 
Logs not available at /tmp/logs/root/logs/application_1432885077153_0011 
Log aggregation has not completed or is not enabled. 




r7raul1...@163.com
 
From: Hitesh Shah
Date: 2015-05-29 23:31
To: user
Subject: Re: Tez lauche container error when use UseG1GC
To clarify, given that the error is showing up with 
container_1432885077153_0004_01_05, that means that the AM launched 
properly. 
 
Use “bin/yarn logs -applicationId application_1432885077153_0004 to get the 
logs. See if there are any errors for the logs for 
container_1432885077153_0004_01_05. If there are none, you will need to 
search for Assigning container to task” for the above container in the AM’s 
logs. Using this log line, you will see what host the container belongs to and 
you should then look at the NodeManager logs and search for the container id.
 
The above would be a lot simpler if you have the UI setup to work against 0.5.3 
but may still require you to dig through the NodeManager logs. 
 
thanks
— Hitesh 
 
On May 29, 2015, at 3:48 AM, Jianfeng (Jeff) Zhang jzh...@hortonworks.com 
wrote:
 
 
 Could you check the yarn app logs to see what the error is ?  If there’s 
 still no useful info, you may refer the yarn RM/NN logs
 
 
 
 
 Best Regard,
 Jeff Zhang
 
 
 From: r7raul1...@163.com r7raul1...@163.com
 Reply-To: user user@tez.apache.org
 Date: Friday, May 29, 2015 at 4:16 PM
 To: user user@tez.apache.org
 Subject: Re: Tez lauche container error when use UseG1GC
 
 BTW my tez_site.xml content is:
 configuration 
 property 
 nametez.lib.uris/name 
 valuehdfs:///apps/tez-0.5.3/tez-0.5.3.tar.gz/value 
 /property 
 property 
 nametez.task.generate.counters.per.io/name 
 valuetrue/value 
 /property 
 property 
 descriptionLog history using the Timeline Server/description 
 nametez.history.logging.service.class/name 
 valueorg.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService/value
  
 /property 
 property 
 descriptionPublish configuration information to Timeline server 
 /description 
 nametez.runtime.convert.user-payload.to.history-text/name 
 valuetrue/value 
 /property 
 property 
 nametez.am.launch.cmd-opts/name 
 value-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA 
 -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp//value 
 /property 
 
 /configuration
 
 r7raul1...@163.com
  
 From: r7raul1...@163.com
 Date: 2015-05-29 16:15
 To: user
 Subject: Tez lauche container error when use UseG1GC
  I change my mapreduce.map.java.opts  's  value from 
 -Djava.net.preferIPv4Stack=true  -Xmx825955249  to  
 -Djava.net.preferIPv4Stack=true -XX:+UseG1GC  -Xmx825955249
 
 When I run query by hive 1.1.0+tez0.53 in hadoop 2.5.0.
 
 set mapreduce.framework.name=yarn-tez; 
 set hive.execution.engine=tez; 
 select userid,count(*) from u_data group by userid order by userid;
 The  query return error.
 I found error :
 2015-05-29 16:02:39,064 WARN [AsyncDispatcher event handler] 
 container.AMContainerImpl: Container container_1432885077153_0004_01_05 
 finished with diagnostics set to [Container failed. Exception from 
 container-launch. 
 Container id: container_1432885077153_0004_01_05 
 Exit code: 1 
 Stack trace: ExitCodeException exitCode=1: 
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) 
 at org.apache.hadoop.util.Shell.run(Shell.java:455) 
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) 
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196

Re: Tez lauche container error when use UseG1GC

2015-05-31 Thread Jianfeng (Jeff) Zhang
The task container fails to launch.  Have you specify some jvm related property 
in tez-site.xml ? Like TEZ_TASK_LAUNCH_CMD_OPTS

Conflicting collector combinations in option list; please refer to the release 
notes for the combinations allowed
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.


Best Regard,
Jeff Zhang


From: Jianfeng Zhang jzh...@hortonworks.commailto:jzh...@hortonworks.com
Reply-To: user user@tez.apache.orgmailto:user@tez.apache.org
Date: Monday, June 1, 2015 at 10:10 AM
To: user user@tez.apache.orgmailto:user@tez.apache.org
Subject: Re: Tez lauche container error when use UseG1GC


Could you also attach the resource manager log ? And do you enable the 
preemption ?

Best Regard,
Jeff Zhang


From: r7raul1...@163.commailto:r7raul1...@163.com 
r7raul1...@163.commailto:r7raul1...@163.com
Reply-To: user user@tez.apache.orgmailto:user@tez.apache.org
Date: Monday, June 1, 2015 at 9:39 AM
To: user user@tez.apache.orgmailto:user@tez.apache.org
Subject: Re: Re: Tez lauche container error when use UseG1GC

node manager log


r7raul1...@163.commailto:r7raul1...@163.com

From: r7raul1...@163.commailto:r7raul1...@163.com
Date: 2015-06-01 09:23
To: usermailto:user@tez.apache.org
Subject: Re: Re: Tez lauche container error when use UseG1GC
Attach is log file.


r7raul1...@163.commailto:r7raul1...@163.com

From: Jianfeng (Jeff) Zhangmailto:jzh...@hortonworks.com
Date: 2015-06-01 08:50
To: usermailto:user@tez.apache.org
Subject: Re: Tez lauche container error when use UseG1GC

From the logs, it is due to container fail to launch. I guess it is due to 
some yarn configuration issue. You’d better to check the node manager logs.
And it looks like you haven’t enable the log aggregation, so you can’t get the 
node manage logs by command “yarn logs”.
You need to check each node manager machine, by default the logs are located in 
$HADOOP_HOME/logs. In the node manager logs, you should be able to see why the 
container fail to launch.


BTW, I would suggest you to enable the log aggregation. You can check this for 
details  
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.4/bk_yarn_resource_mgt/content/ref-375ff479-e530-46d8-9f96-8b52dadb5183.1.html







Best Regard,
Jeff Zhang


From: r7raul1...@163.commailto:r7raul1...@163.com 
r7raul1...@163.commailto:r7raul1...@163.com
Reply-To: user user@tez.apache.orgmailto:user@tez.apache.org
Date: Monday, June 1, 2015 at 8:07 AM
To: user user@tez.apache.orgmailto:user@tez.apache.org
Subject: Re: Re: Tez lauche container error when use UseG1GC


Log is:
Status: Running (Executing on YARN cluster with App id 
application_1432885077153_0011)


VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

Map 1 FAILED 1 0 0 1 4 0
Reducer 2 KILLED 1 0 0 1 0 1
Reducer 3 KILLED 1 0 0 1 0 1

VERTICES: 00/03 [--] 0% ELAPSED TIME: 16.13 s

Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1432885077153_0011_1_00, 
diagnostics=[Task failed, taskId=task_1432885077153_0011_1_00_00, 
diagnostics=[TaskAttempt 0 failed, info=[Container 
container_1432885077153_0011_01_02 finished with diagnostics set to 
[Container failed. Exception from container-launch.
Container id: container_1432885077153_0011_01_02
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
]], TaskAttempt 1 failed, info=[Container 
container_1432885077153_0011_01_03 finished with diagnostics set to 
[Container failed. Exception from container-launch.
Container id: container_1432885077153_0011_01_03
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455

Tez lauche container error when use UseG1GC

2015-05-29 Thread r7raul1...@163.com
 I change my mapreduce.map.java.opts  's  value from 
-Djava.net.preferIPv4Stack=true  -Xmx825955249  to  
-Djava.net.preferIPv4Stack=true -XX:+UseG1GC  -Xmx825955249

When I run query by hive 1.1.0+tez0.53 in hadoop 2.5.0.

set mapreduce.framework.name=yarn-tez; 
set hive.execution.engine=tez; 
select userid,count(*) from u_data group by userid order by userid;
The  query return error.
I found error :
2015-05-29 16:02:39,064 WARN [AsyncDispatcher event handler] 
container.AMContainerImpl: Container container_1432885077153_0004_01_05 
finished with diagnostics set to [Container failed. Exception from 
container-launch. 
Container id: container_1432885077153_0004_01_05 
Exit code: 1 
Stack trace: ExitCodeException exitCode=1: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) 
at org.apache.hadoop.util.Shell.run(Shell.java:455) 
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) 
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
 
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
 
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
 
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745) 

But I try
hive set hive.execution.engine=mr; 
hive set mapreduce.framework.name=yarn; 
hive select userid,count(*) from u_data group by userid order by userid limit 
1; 
Query ID = hdfs_20150529160606_d550bca4-0341-4eb0-aace-a9018bfbb7a9 
Total jobs = 2 
Launching Job 1 out of 2 
Number of reduce tasks not specified. Estimated from input data size: 1 
In order to change the average load for a reducer (in bytes): 
set hive.exec.reducers.bytes.per.reducer=number 
In order to limit the maximum number of reducers: 
set hive.exec.reducers.max=number 
In order to set a constant number of reducers: 
set mapreduce.job.reduces=number 
Starting Job = job_1432885077153_0005, Tracking URL = 
http://localhost:8088/proxy/application_1432885077153_0005/ 
Kill Command = 
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop job 
-kill job_1432885077153_0005 
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 
2015-05-29 16:06:34,863 Stage-1 map = 0%, reduce = 0% 
2015-05-29 16:06:40,066 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.72 
sec 
2015-05-29 16:06:48,366 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.96 
sec 
MapReduce Total cumulative CPU time: 2 seconds 960 msec 
Ended Job = job_1432885077153_0005 
Launching Job 2 out of 2 
Number of reduce tasks determined at compile time: 1 
In order to change the average load for a reducer (in bytes): 
set hive.exec.reducers.bytes.per.reducer=number 
In order to limit the maximum number of reducers: 
set hive.exec.reducers.max=number 
In order to set a constant number of reducers: 
set mapreduce.job.reduces=number 
Starting Job = job_1432885077153_0006, Tracking URL = 
http://localhost:8088/proxy/application_1432885077153_0006/ 
Kill Command = 
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop job 
-kill job_1432885077153_0006 
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 
2015-05-29 16:07:03,333 Stage-2 map = 0%, reduce = 0% 
2015-05-29 16:07:07,485 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec 
2015-05-29 16:07:15,739 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 2.35 
sec 
MapReduce Total cumulative CPU time: 2 seconds 350 msec 
Ended Job = job_1432885077153_0006 
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.96 sec HDFS Read: 1985399 
HDFS Write: 20068 SUCCESS 
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 2.35 sec HDFS Read: 24481 HDFS 
Write: 6 SUCCESS 
Total MapReduce CPU Time Spent: 5 seconds 310 msec 

That's ok.




r7raul1...@163.com


Re: Tez lauche container error when use UseG1GC

2015-05-29 Thread Hitesh Shah
To clarify, given that the error is showing up with 
container_1432885077153_0004_01_05, that means that the AM launched 
properly. 

Use “bin/yarn logs -applicationId application_1432885077153_0004 to get the 
logs. See if there are any errors for the logs for 
container_1432885077153_0004_01_05. If there are none, you will need to 
search for Assigning container to task” for the above container in the AM’s 
logs. Using this log line, you will see what host the container belongs to and 
you should then look at the NodeManager logs and search for the container id.

The above would be a lot simpler if you have the UI setup to work against 0.5.3 
but may still require you to dig through the NodeManager logs. 

thanks
— Hitesh 

On May 29, 2015, at 3:48 AM, Jianfeng (Jeff) Zhang jzh...@hortonworks.com 
wrote:

 
 Could you check the yarn app logs to see what the error is ?  If there’s 
 still no useful info, you may refer the yarn RM/NN logs
 
 
 
 
 Best Regard,
 Jeff Zhang
 
 
 From: r7raul1...@163.com r7raul1...@163.com
 Reply-To: user user@tez.apache.org
 Date: Friday, May 29, 2015 at 4:16 PM
 To: user user@tez.apache.org
 Subject: Re: Tez lauche container error when use UseG1GC
 
 BTW my tez_site.xml content is:
 configuration 
 property 
 nametez.lib.uris/name 
 valuehdfs:///apps/tez-0.5.3/tez-0.5.3.tar.gz/value 
 /property 
 property 
 nametez.task.generate.counters.per.io/name 
 valuetrue/value 
 /property 
 property 
 descriptionLog history using the Timeline Server/description 
 nametez.history.logging.service.class/name 
 valueorg.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService/value
  
 /property 
 property 
 descriptionPublish configuration information to Timeline server 
 /description 
 nametez.runtime.convert.user-payload.to.history-text/name 
 valuetrue/value 
 /property 
 property 
 nametez.am.launch.cmd-opts/name 
 value-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA 
 -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp//value 
 /property 
 
 /configuration
 
 r7raul1...@163.com
  
 From: r7raul1...@163.com
 Date: 2015-05-29 16:15
 To: user
 Subject: Tez lauche container error when use UseG1GC
  I change my mapreduce.map.java.opts  's  value from 
 -Djava.net.preferIPv4Stack=true  -Xmx825955249  to  
 -Djava.net.preferIPv4Stack=true -XX:+UseG1GC  -Xmx825955249
 
 When I run query by hive 1.1.0+tez0.53 in hadoop 2.5.0.
 
 set mapreduce.framework.name=yarn-tez; 
 set hive.execution.engine=tez; 
 select userid,count(*) from u_data group by userid order by userid;
 The  query return error.
 I found error :
 2015-05-29 16:02:39,064 WARN [AsyncDispatcher event handler] 
 container.AMContainerImpl: Container container_1432885077153_0004_01_05 
 finished with diagnostics set to [Container failed. Exception from 
 container-launch. 
 Container id: container_1432885077153_0004_01_05 
 Exit code: 1 
 Stack trace: ExitCodeException exitCode=1: 
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) 
 at org.apache.hadoop.util.Shell.run(Shell.java:455) 
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) 
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
  
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
  
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
  
 at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  
 at java.lang.Thread.run(Thread.java:745) 
 
 But I try
 hive set hive.execution.engine=mr; 
 hive set mapreduce.framework.name=yarn; 
 hive select userid,count(*) from u_data group by userid order by userid 
 limit 1; 
 Query ID = hdfs_20150529160606_d550bca4-0341-4eb0-aace-a9018bfbb7a9 
 Total jobs = 2 
 Launching Job 1 out of 2 
 Number of reduce tasks not specified. Estimated from input data size: 1 
 In order to change the average load for a reducer (in bytes): 
 set hive.exec.reducers.bytes.per.reducer=number 
 In order to limit the maximum number of reducers: 
 set hive.exec.reducers.max=number 
 In order to set a constant number of reducers: 
 set mapreduce.job.reduces=number 
 Starting Job = job_1432885077153_0005, Tracking URL = 
 http://localhost:8088/proxy/application_1432885077153_0005/ 
 Kill Command = 
 /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop job 
 -kill job_1432885077153_0005 
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
 1 
 2015-05-29 16:06:34,863 Stage-1 map = 0%, reduce = 0% 
 2015-05-29 16:06:40,066 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.72 
 sec 
 2015-05-29 16:06:48,366 Stage-1 map = 100%, reduce = 100