RE: Tez log location?

2015-05-21 Thread Bikas Saha
Perhaps we should consider creating a TEE in the AM and always do 
SimpleHistoryLogging?

From: Jianfeng (Jeff) Zhang [mailto:jzh...@hortonworks.com]
Sent: Thursday, May 21, 2015 5:33 PM
To: user@tez.apache.org
Subject: Re: Tez log location?

In that case you are using SimpleHistoryLoggingService rather than 
ATSHistoryLoggingService.

SimpleHistoryLoggingService will log all the messages to the container logs. So 
if you will find something like the following which is the data same as in 
/ws/v1/timeline/TEZ_DAG_ID.  But these data are only for diagnosis, if you are 
trying to consume these data for offline analysis, I would encourage you to use 
data from ATS


2015-05-21 18:52:06,245 INFO [Dispatcher thread: Central] 
history.HistoryEventHandler: 
[HISTORY][DAG:dag_1432205503669_0001_1][Event:DAG_FINISHED]: 
dagId=dag_1432205503669_0001_1, startTime=1432205516089, 
finishTime=1432205526204, timeTaken=10115, status=SUCCEEDED, diagnostics=, 
counters=Counters: 14, org.apache.tez.common.counters.DAGCounter, 
NUM_KILLED_TASKS=1, NUM_SUCCEEDED_TASKS=2, TOTAL_LAUNCHED_TASKS=3, 
AM_CPU_MILLISECONDS=0, AM_GC_TIME_MILLIS=0, File System Counters, 
HDFS_BYTES_READ=0, HDFS_BYTES_WRITTEN=24, HDFS_READ_OPS=6, 
HDFS_LARGE_READ_OPS=0, HDFS_WRITE_OPS=4, 
org.apache.tez.common.counters.TaskCounter, NUM_SPECULATIONS=1, 
GC_TIME_MILLIS=21, COMMITTED_HEAP_BYTES=514850816, OUTPUT_RECORDS=2


Best Regard,
Jeff Zhang


From: Xiaoyong Zhu mailto:xiaoy...@microsoft.com>>
Reply-To: "user@tez.apache.org" 
mailto:user@tez.apache.org>>
Date: Friday, May 22, 2015 at 7:14 AM
To: "user@tez.apache.org" 
mailto:user@tez.apache.org>>
Subject: Tez log location?

Hi, I am wondering if I didn't configure YARN ATS integration, where would be 
the tez log (I mean the data available in /ws/v1/timeline/TEZ_DAG_ID if there 
is an integration happening) go to HDFS? Is there a configuration for that?

Thanks!

Xiaoyong



Re: Tez log location?

2015-05-21 Thread Jianfeng (Jeff) Zhang
In that case you are using SimpleHistoryLoggingService rather than 
ATSHistoryLoggingService.

SimpleHistoryLoggingService will log all the messages to the container logs. So 
if you will find something like the following which is the data same as in 
/ws/v1/timeline/TEZ_DAG_ID.  But these data are only for diagnosis, if you are 
trying to consume these data for offline analysis, I would encourage you to use 
data from ATS


2015-05-21 18:52:06,245 INFO [Dispatcher thread: Central] 
history.HistoryEventHandler: 
[HISTORY][DAG:dag_1432205503669_0001_1][Event:DAG_FINISHED]: 
dagId=dag_1432205503669_0001_1, startTime=1432205516089, 
finishTime=1432205526204, timeTaken=10115, status=SUCCEEDED, diagnostics=, 
counters=Counters: 14, org.apache.tez.common.counters.DAGCounter, 
NUM_KILLED_TASKS=1, NUM_SUCCEEDED_TASKS=2, TOTAL_LAUNCHED_TASKS=3, 
AM_CPU_MILLISECONDS=0, AM_GC_TIME_MILLIS=0, File System Counters, 
HDFS_BYTES_READ=0, HDFS_BYTES_WRITTEN=24, HDFS_READ_OPS=6, 
HDFS_LARGE_READ_OPS=0, HDFS_WRITE_OPS=4, 
org.apache.tez.common.counters.TaskCounter, NUM_SPECULATIONS=1, 
GC_TIME_MILLIS=21, COMMITTED_HEAP_BYTES=514850816, OUTPUT_RECORDS=2


Best Regard,
Jeff Zhang


From: Xiaoyong Zhu mailto:xiaoy...@microsoft.com>>
Reply-To: "user@tez.apache.org" 
mailto:user@tez.apache.org>>
Date: Friday, May 22, 2015 at 7:14 AM
To: "user@tez.apache.org" 
mailto:user@tez.apache.org>>
Subject: Tez log location?

Hi, I am wondering if I didn't configure YARN ATS integration, where would be 
the tez log (I mean the data available in /ws/v1/timeline/TEZ_DAG_ID if there 
is an integration happening) go to HDFS? Is there a configuration for that?

Thanks!

Xiaoyong



Re: Tez log location?

2015-05-21 Thread Hitesh Shah
There is some history logging done that can be enabled via the 
SimpleHistoryLogger. This activates by default if ATS logger is not enabled. 
This is not fully compatible with the ATS data and also as it is mostly 
experimental, it may not have all the data. To use it, you can configure the 
“tez.history.logging.service.class” to “” or 
“org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService”.  The 
config property “tez.simple.history.logging.dir” controls the path on HDFS 
where the history is written to. If the dir path is not configured, it writes 
the logs as part of the Application Master container logs which can then be 
pulled via “bin/yarn logs -application” 

Using the HDFS logger does imply that the UI will no longer be functional. 

thanks
— Hitesh





On May 21, 2015, at 4:14 PM, Xiaoyong Zhu  wrote:

> Hi, I am wondering if I didn’t configure YARN ATS integration, where would be 
> the tez log (I mean the data available in /ws/v1/timeline/TEZ_DAG_ID if there 
> is an integration happening) go to HDFS? Is there a configuration for that?
>  
> Thanks!
>  
> Xiaoyong



Tez log location?

2015-05-21 Thread Xiaoyong Zhu
Hi, I am wondering if I didn't configure YARN ATS integration, where would be 
the tez log (I mean the data available in /ws/v1/timeline/TEZ_DAG_ID if there 
is an integration happening) go to HDFS? Is there a configuration for that?

Thanks!

Xiaoyong



Re: Tez local mode hanging in big testsuite

2015-05-21 Thread Andre Kelpe
There you go: https://issues.apache.org/jira/browse/TEZ-2475

- André

On Thu, May 21, 2015 at 5:44 PM, Hitesh Shah  wrote:

> Hello Andre,
>
> Could you file a JIRA for this and upload the logs around the point where
> it hangs?
>
> thanks
> — Hitesh
>
> On May 21, 2015, at 7:55 AM, Andre Kelpe  wrote:
>
> > Hi,
> >
> > we have a big test suite for lingual, our SQL layer for cascading. We
> are trying very hard to make it work correctly on Tez, but I am stuck:
> >
> > The setup is a huge suite of SQL based tests (6000+), which are being
> executed in order in local mode. At certain moments the whole process just
> stops. Nothing gets executed any longer. This is not all the time, but
> quite often. Note that it is not happening at the same line of code, more
> at random, which makes it quite complex to debug.
> >
> > What I am seeing, is these kind of stacktraces in the middle of the run:
> >
> > 2015-05-21 16:07:42,413 ERROR [TaskHeartbeatThread] task.TezTaskRunner
> (TezTaskRunner.java:reportError(333)) - TaskReporter reported error
> > java.lang.InterruptedException
> > at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
> > at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2188)
> > at
> org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.call(TaskReporter.java:187)
> > at
> org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.call(TaskReporter.java:118)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> >
> > This looks like it could be related to the hang, but the hang is not
> happening immediately afterwards, but some time later.
> >
> > I have gone through quite a few JIRAs and saw that there were problems
> with locks and hanging threads before, which should be fixed, but it still
> happens.
> >
> > I have tried 0.6.1 and 0.7.0. Both show the same behaviour.
> >
> > This gist contains a thread dump of a hanging build:
> https://gist.github.com/fs111/1ee44469bf5cc31e5a52
> >
> > Does anyone have an idea, what could be wrong?
> >
> > - André
> >
> >
> > --
> > André Kelpe
> > an...@concurrentinc.com
> > http://concurrentinc.com
>
>


-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Tez local mode hanging in big testsuite

2015-05-21 Thread Hitesh Shah
Hello Andre, 

Could you file a JIRA for this and upload the logs around the point where it 
hangs?
 
thanks
— Hitesh

On May 21, 2015, at 7:55 AM, Andre Kelpe  wrote:

> Hi,
> 
> we have a big test suite for lingual, our SQL layer for cascading. We are 
> trying very hard to make it work correctly on Tez, but I am stuck:
> 
> The setup is a huge suite of SQL based tests (6000+), which are being 
> executed in order in local mode. At certain moments the whole process just 
> stops. Nothing gets executed any longer. This is not all the time, but quite 
> often. Note that it is not happening at the same line of code, more at 
> random, which makes it quite complex to debug.
> 
> What I am seeing, is these kind of stacktraces in the middle of the run:
> 
> 2015-05-21 16:07:42,413 ERROR [TaskHeartbeatThread] task.TezTaskRunner 
> (TezTaskRunner.java:reportError(333)) - TaskReporter reported error
> java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2188)
> at 
> org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.call(TaskReporter.java:187)
> at 
> org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.call(TaskReporter.java:118)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 
> This looks like it could be related to the hang, but the hang is not 
> happening immediately afterwards, but some time later.
> 
> I have gone through quite a few JIRAs and saw that there were problems with 
> locks and hanging threads before, which should be fixed, but it still happens.
> 
> I have tried 0.6.1 and 0.7.0. Both show the same behaviour.
> 
> This gist contains a thread dump of a hanging build: 
> https://gist.github.com/fs111/1ee44469bf5cc31e5a52
> 
> Does anyone have an idea, what could be wrong?
> 
> - André
> 
> 
> -- 
> André Kelpe
> an...@concurrentinc.com
> http://concurrentinc.com



Tez local mode hanging in big testsuite

2015-05-21 Thread Andre Kelpe
Hi,

we have a big test suite for lingual, our SQL layer for cascading. We are
trying very hard to make it work correctly on Tez, but I am stuck:

The setup is a huge suite of SQL based tests (6000+), which are being
executed in order in local mode. At certain moments the whole process just
stops. Nothing gets executed any longer. This is not all the time, but
quite often. Note that it is not happening at the same line of code, more
at random, which makes it quite complex to debug.

What I am seeing, is these kind of stacktraces in the middle of the run:

2015-05-21 16:07:42,413 ERROR [TaskHeartbeatThread] task.TezTaskRunner
(TezTaskRunner.java:reportError(333)) - TaskReporter reported error
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2188)
at
org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.call(TaskReporter.java:187)
at
org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.call(TaskReporter.java:118)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

This looks like it could be related to the hang, but the hang is not
happening immediately afterwards, but some time later.

I have gone through quite a few JIRAs and saw that there were problems with
locks and hanging threads before, which should be fixed, but it still
happens.

I have tried 0.6.1 and 0.7.0. Both show the same behaviour.

This gist contains a thread dump of a hanging build:
https://gist.github.com/fs111/1ee44469bf5cc31e5a52

Does anyone have an idea, what could be wrong?

- André


-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com