To pinpoint the issue, one approach would be to change the history logger to 
SimpleHistoryLogger . i.e comment out the property for 
tez.history.logging.service.class in the configs so that it falls back to the 
default value. This should generate a history log file as part of the 
application logs which should help us understand whether tez itself is not 
generating the data or YARN timeline is somehow losing it. Any exceptions in 
the DAGAppMaster log and/or the yarn timeline logs when this job runs? 

— HItesh  



> On Sep 28, 2016, at 1:30 PM, Madhusudan Ramanna <m.rama...@ymail.com> wrote:
> 
> Hitesh,
> 
> Some information like appId is getting through to timeline server, but not 
> all. See attached.
> 
> Here is the output of 
> 
> http://timelinehost:port/ws/v1/timeline/TEZ_DAG_ID/
> {"entities":[{"events":[{"timestamp":1475094093409,"eventtype":"DAG_FINISHED","eventinfo":{}},{"timestamp":1475094062692,"eventtype":"DAG_STARTED","eventinfo":{}},{"timestamp":1475094062688,"eventtype":"DAG_INITIALIZED","eventinfo":{}},{"timestamp":1475094062055,"eventtype":"DAG_SUBMITTED","eventinfo":{}}],"entitytype":"TEZ_DAG_ID","entity":"dag_1475091857089_0007_1","starttime":1475094062055,"domain":"DEFAULT","relatedentities":{},"primaryfilters":{},"otherinfo":{}}]}
> 
> http://host:8188/ws/v1/timeline/TEZ_DAG_ID/dag_1475091857089_0007_1
> 
> {"events":[{"timestamp":1475094093409,"eventtype":"DAG_FINISHED","eventinfo":{}},{"timestamp":1475094062692,"eventtype":"DAG_STARTED","eventinfo":{}},{"timestamp":1475094062688,"eventtype":"DAG_INITIALIZED","eventinfo":{}},{"timestamp":1475094062055,"eventtype":"DAG_SUBMITTED","eventinfo":{}}],"entitytype":"TEZ_DAG_ID","entity":"dag_1475091857089_0007_1","starttime":1475094062055,"domain":"DEFAULT","relatedentities":{},"primaryfilters":{},"otherinfo":{}}
> 
> 
> 
> On Wednesday, September 28, 2016 8:44 AM, Hitesh Shah <hit...@apache.org> 
> wrote:
> 
> 
> Hello Madhusudan, 
> 
> Thanks for the patience. Let us take this to a jira where once you attach 
> more logs, we can root cause the issue.
> 
> A few things to attach to the jira:
>   - yarn-site.xml
>   - tez-site.xml
>   - hadoop version
>   - timeline server log for the time period in question
>   - application logs for any tez app which fails to display
>   - output of http://timelinehost:port/ws/v1/timeline/TEZ_DAG_ID/<dag_id>/ ( 
> e.g. dag_1475014682883_0027_1 )
> 
> thanks
> — Hitesh
> 
> > On Sep 27, 2016, at 10:42 PM, Madhusudan Ramanna <m.rama...@ymail.com> 
> > wrote:
> > 
> > So I downloaded Tez commit 91a397b0ba and built the dist package.  We're 
> > not seeing the zip exception anymore.
> > 
> > However, now Tez UI is completely broken. Not at all sure what is happening 
> > here. Please see attached screenshots.
> > 
> > 
> > 2016-09-28 05:11:40,903 [INFO] [main] |web.WebUIService|: Tez UI History 
> > URL: http://dev-cv2.aws:8080/tez-ui/#/tez-app/application_1475014682883_0027
> > 2016-09-28 05:11:40,908 [INFO] [main] |history.HistoryEventHandler|: 
> > Initializing HistoryEventHandler withrecoveryEnabled=true, 
> > historyServiceClassName=org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService
> > 2016-09-28 05:11:41,474 [INFO] [main] |impl.TimelineClientImpl|: Timeline 
> > service address: http://ts-ip.aws:8188/ws/v1/timeline/
> > 2016-09-28 05:11:41,474 [INFO] [main] |ats.ATSHistoryLoggingService|: 
> > Initializing ATSHistoryLoggingService with maxEventsPerBatch=5, 
> > maxPollingTime(ms)=10, waitTimeForShutdown(ms)=-1, 
> > TimelineACLManagerClass=org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager
> > 2016-09-28 05:11:41,644 [INFO] [main] |impl.TimelineClientImpl|: Timeline 
> > service address: http://ts-ip.aws:8188/ws/v1/timeline/
> > 
> > 
> > >>> DAG Execution
> > 
> > 2016-09-28 05:11:52,779 [INFO] [IPC Server handler 0 on 44039] 
> > |history.HistoryEventHandler|: 
> > [HISTORY][DAG:dag_1475014682883_0027_1][Event:DAG_SUBMITTED]: 
> > dagID=dag_1475014682883_0027_1, submitTime=1475039511185
> > 
> > 
> > Timeline server is up and running. Tez UI is however not able to display 
> > DAG and other details 
> > 
> > thanks,
> > Madhu
> > 
> > 
> > 
> > On Saturday, September 24, 2016 12:01 PM, Hitesh Shah <hit...@apache.org> 
> > wrote:
> > 
> > 
> > tez-dist tar balls are not published to maven today - only the module 
> > specific jars are. But yes, you could just try a local build to see if you 
> > can reproduce the issue with the commit in question. 
> > 
> > — Hitesh
> > 
> > 
> > > On Sep 23, 2016, at 6:23 PM, Madhusudan Ramanna <m.rama...@ymail.com> 
> > > wrote:
> > > 
> > > Hitesh and Zhiyuan,
> > > 
> > > Apache snapshots doesn't seem to have tez-dist 
> > > 
> > > http://repository.apache.org/content/groups/snapshots/org/apache/tez/tez-dist/
> > > 
> > > The last one seems to be 0.2.0-SNAPSHOT
> > > 
> > > Should I just download based on the commit and recompile ? 
> > > 
> > > thanks,
> > > Madhu
> > > 
> > > 
> > > On Friday, September 23, 2016 5:19 PM, Hitesh Shah <hit...@apache.org> 
> > > wrote:
> > > 
> > > 
> > > Hello Madhusudan,
> > > 
> > > If you look at the MANIFEST.MF inside any of the tez jars, it will 
> > > provide the commit hash via the SCM-Revision field.
> > > 
> > > The tez client and the DAGAppMaster also log this info at runtime.
> > > 
> > > — Hitesh 
> > > 
> > > > On Sep 23, 2016, at 4:08 PM, Madhusudan Ramanna <m.rama...@ymail.com> 
> > > > wrote:
> > > > 
> > > > Zhiyuan,
> > > > 
> > > > We just pulled down the latest snapshot from Apache repository.  
> > > > Question, is how can I figure out branch and commit information from 
> > > > the snapshot artifact ?
> > > > 
> > > > thanks,
> > > > Madhu
> > > > 
> > > > 
> > > > On Friday, September 23, 2016 10:38 AM, zhiyuan yang 
> > > > <sjtu....@gmail.com> wrote:
> > > > 
> > > > 
> > > > Hi Madhu,
> > > > 
> > > > It looks like a Inflater-Deflater mismatch to me. From stack traces I 
> > > > see you cherry-picked this patch instead of using master branch.
> > > > Would you mind double check whether the patch is correctly 
> > > > cherry-picked?
> > > > 
> > > > Thanks!
> > > > Zhiyuan
> > > > 
> > > >> On Sep 23, 2016, at 10:21 AM, Madhusudan Ramanna <m.rama...@ymail.com> 
> > > >> wrote:
> > > >> 
> > > >> Hello,
> > > >> 
> > > >> We're using the Apache snapshot repository to pull latest tez 
> > > >> snapshots. 
> > > >> 
> > > >> We've started seeing this exception:
> > > >> 
> > > >> org.apache.tez.dag.api.TezUncheckedException: 
> > > >> java.util.zip.ZipException: incorrect header check
> > > >> at 
> > > >> org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.handleVertexManagerEvent(ShuffleVertexManager.java:622)
> > > >> at 
> > > >> org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.onVertexManagerEventReceived(ShuffleVertexManager.java:579)
> > > >> at 
> > > >> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventReceived.invoke(VertexManager.java:606)
> > > >> at 
> > > >> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:647)
> > > >> at 
> > > >> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:642)
> > > >> at java.security.AccessController.doPrivileged(Native Method)
> > > >> at javax.security.auth.Subject.doAs(Subject.java:422)
> > > >> at 
> > > >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> > > >> at 
> > > >> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:642)
> > > >> at 
> > > >> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:631)
> > > >> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > > >> at 
> > > >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > > >> at 
> > > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > > >> at java.lang.Thread.run(Thread.java:745)
> > > >> Caused by: java.util.zip.ZipException: incorrect header check
> > > >> at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164)
> > > >> at java.io.FilterInputStream.read(FilterInputStream.java:107)
> > > >> at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1792)
> > > >> at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769)
> > > >> at org.apache.commons.io.IOUtils.copy(IOUtils.java:1744)
> > > >> at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:462)
> > > >> 
> > > >> 
> > > >> since this commit
> > > >> 
> > > >> https://github.com/apache/tez/commit/da4098b9d6f72e6d4aacc1623622a0875408d2ba
> > > >> 
> > > >> 
> > > >> Wanted to bring this to your attention. For now we've locked the 
> > > >> snapshot version down.
> > > >> 
> > > >> thanks,
> > > >> Madhu
> 
> > > > 
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> > <Screen Shot 2016-09-27 at 10.27.02 PM.png><Screen Shot 2016-09-27 at 
> > 10.27.13 PM.png><Screen Shot 2016-09-27 at 10.39.20 PM.png>
> 
> 
> 
> <Screen Shot 2016-09-28 at 1.26.35 PM.png>

Reply via email to