Re: Word Count examples run failed with Tez 0.8.4

2016-08-04 Thread Hitesh Shah
Hello 

I am assuming that this is the same issue as the one reported in TEZ-3396?

Based on the logs in the jira: 

2016-08-03 10:55:33,856 [INFO] [Thread-2] |app.DAGAppMaster|: 
DAGAppMasterShutdownHook invoked
2016-08-03 10:55:33,856 [INFO] [Thread-2] |app.DAGAppMaster|: DAGAppMaster 
received a signal. Signaling TaskScheduler

It seems like the AM is getting killed. 

Can you provide the configs being used for:
  - tez.am.resource.memory.mb
  - tez.am.launch.cmd-opts

You should also check the NodeManager logs for 
container_1470148111230_0011_01_01. It might shed light on whether the NM 
killed the AM for exceeding memory limits. 

thanks
— Hitesh 

> On Aug 3, 2016, at 8:50 PM, HuXi  wrote:
> 
> Default configuration was used with yarn.resourcemanager.hostname  set to 
> 0.0.0.0 and yarn.resourcemanager.address set to 0.0.0.0:8032.
> 
> If what you mentioned is really the reason, please tell me what I should do 
> to fix it? 
> 
> 
> > Date: Wed, 3 Aug 2016 20:41:31 -0700
> > Subject: Re: Word Count examples run failed with Tez 0.8.4
> > From: gop...@apache.org
> > To: user@tez.apache.org
> > 
> > 
> > > 16/08/04 09:36:00 INFO client.TezClient: The url to track the Tez AM:
> > >http://iZ25f2qedc7Z:8088/proxy/application_1470148111230_0014/
> > > 16/08/04 09:36:05 INFO client.RMProxy: Connecting to ResourceManager at
> > >/0.0.0.0:8032
> > 
> > That sounds very strange - is the resource manager really running on
> > localhost, but that resolves back to that strange hostname?
> > 
> > Cheers,
> > Gopal
> > 
> > 
> > 
> > 
> > 
> > 



RE: Word Count examples run failed with Tez 0.8.4

2016-08-03 Thread HuXi
Default configuration was used with yarn.resourcemanager.hostname  set to 
0.0.0.0 and yarn.resourcemanager.address set to 0.0.0.0:8032.
If what you mentioned is really the reason, please tell me what I should do to 
fix it? 

> Date: Wed, 3 Aug 2016 20:41:31 -0700
> Subject: Re: Word Count examples run failed with Tez 0.8.4
> From: gop...@apache.org
> To: user@tez.apache.org
> 
> 
> > 16/08/04 09:36:00 INFO client.TezClient: The url to track the Tez AM:
> >http://iZ25f2qedc7Z:8088/proxy/application_1470148111230_0014/
> > 16/08/04 09:36:05 INFO client.RMProxy: Connecting to ResourceManager at
> >/0.0.0.0:8032
> 
> That sounds very strange - is the resource manager really running on
> localhost, but that resolves back to that strange hostname?
> 
> Cheers,
> Gopal
> 
> 
> 
> 
> 
> 
  

Re: Word Count examples run failed with Tez 0.8.4

2016-08-03 Thread Gopal Vijayaraghavan

> 16/08/04 09:36:00 INFO client.TezClient: The url to track the Tez AM:
>http://iZ25f2qedc7Z:8088/proxy/application_1470148111230_0014/
> 16/08/04 09:36:05 INFO client.RMProxy: Connecting to ResourceManager at
>/0.0.0.0:8032

That sounds very strange - is the resource manager really running on
localhost, but that resolves back to that strange hostname?

Cheers,
Gopal








Word Count examples run failed with Tez 0.8.4

2016-08-03 Thread HuXi
Hi,


I am using Tez 0.8.4 integrated into my apache hadoop 2.7.2 cluster. When 
running the word count example via issuing "hadoop jar tez-examples-0.8.4.jar 
orderedwordcount /apps/tez/NOTICE.txt /out”, the log says it was failed as 
below:


SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/mnt/disk/huxi/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/mnt/disk/huxi/tez/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/04 09:35:58 INFO shim.HadoopShimsLoader: Trying to locate 
HadoopShimProvider for hadoopVersion=2.7.2, majorVersion=2, minorVersion=7
16/08/04 09:35:58 INFO shim.HadoopShimsLoader: Picked HadoopShim 
org.apache.tez.hadoop.shim.HadoopShim26, 
providerName=org.apache.tez.hadoop.shim.HadoopShim25_26_27Provider, 
overrideProviderViaConfig=null, hadoopVersion=2.7.2, majorVersion=2, 
minorVersion=7
16/08/04 09:35:58 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
16/08/04 09:35:58 INFO client.TezClient: Tez Client Version: [ 
component=tez-api, version=0.8.4, 
revision=ef70407682918c022dffea86d6fa0571ccebcd8b, 
SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, 
buildTime=20160705-1449 ]
16/08/04 09:35:59 INFO client.RMProxy: Connecting to ResourceManager at 
/0.0.0.0:8032
16/08/04 09:36:00 INFO examples.OrderedWordCount: Running OrderedWordCount
16/08/04 09:36:00 INFO client.TezClient: Submitting DAG application with id: 
application_1470148111230_0014
16/08/04 09:36:00 INFO client.TezClientUtils: Using tez.lib.uris value from 
configuration: hdfs://localhost:8500/apps/tez/tez.tar.gz
16/08/04 09:36:00 INFO client.TezClientUtils: Using tez.lib.uris.classpath 
value from configuration: null
16/08/04 09:36:00 INFO client.TezClient: Tez system stage directory 
hdfs://localhost:8500/tmp/work/tez/staging/.tez/application_1470148111230_0014 
doesn't exist and is created
16/08/04 09:36:00 INFO client.TezClient: Submitting DAG to YARN, 
applicationId=application_1470148111230_0014, dagName=OrderedWordCount, 
callerContext={ context=TezExamples, callerType=null, callerId=null }
16/08/04 09:36:00 INFO impl.YarnClientImpl: Submitted application 
application_1470148111230_0014
16/08/04 09:36:00 INFO client.TezClient: The url to track the Tez AM: 
http://iZ25f2qedc7Z:8088/proxy/application_1470148111230_0014/
16/08/04 09:36:05 INFO client.RMProxy: Connecting to ResourceManager at 
/0.0.0.0:8032
16/08/04 09:36:06 INFO client.DAGClientImpl: DAG initialized: 
CurrentState=Running
16/08/04 09:36:06 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
16/08/04 09:36:06 INFO client.DAGClientImpl:VertexStatus: VertexName: 
Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
16/08/04 09:36:06 INFO client.DAGClientImpl:VertexStatus: VertexName: 
Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
16/08/04 09:36:06 INFO client.DAGClientImpl:VertexStatus: VertexName: 
Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
16/08/04 09:36:08 INFO ipc.Client: Retrying connect to server: 
iZ25f2qedc7Z/10.171.49.61:45880. Already tried 0 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/08/04 09:36:09 INFO ipc.Client: Retrying connect to server: 
iZ25f2qedc7Z/10.171.49.61:45880. Already tried 1 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/08/04 09:36:10 INFO ipc.Client: Retrying connect to server: 
iZ25f2qedc7Z/10.171.49.61:45880. Already tried 2 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/08/04 09:36:11 INFO ipc.Client: Retrying connect to server: 
iZ25f2qedc7Z/10.171.49.61:45880. Already tried 3 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/08/04 09:36:12 INFO ipc.Client: Retrying connect to server: 
iZ25f2qedc7Z/10.171.49.61:45880. Already tried 4 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/08/04 09:36:13 INFO ipc.Client: Retrying connect to server: 
iZ25f2qedc7Z/10.171.49.61:45880. Already tried 5 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/08/04 09:36:14 INFO ipc.Client: Retrying connect to server: 
iZ25f2qedc7Z/10.171.49.61:45880. Already tried 6 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/08/04 09:36:15 INFO ipc.Client: Retrying conn