Hi Douglas, For the timeline service, please also set “tez.yarn.ats.enabled” to false in tez-site.xml if the timeline service is not running. Would you mind filing a jira for the errors that you saw when it was enabled.
As for the hung job ( assuming you have already killed it ), can you provide the application logs obtained via "bin/yarn logs -applicationId” and also the hive query explain plan. This should help us diagnose the potential problems. Apache mailing lists do not support attachments so feel free to file a jira and attach the logs there. thanks — Hitesh On Thu, May 29, 2014 at 8:50 AM, Douglas Moore < [email protected]> wrote: > I'm on HDP 2.1 build running a Hive job that has created 3 stages. > The first stage has 1045 maps, the second has 2 reducers the 3rd has 1 > reducer. > The job churns through the first stage and never starts the second. > > I can see from the log file syslog_dag_.... that the job releases the > containers and gets down to heldContainers=3 (which makes sense to me). > > How can I diagnose this further? > How can I run the job in a safer mode, low gear, something to get through > this stall? > > I should note, that I turned off the timeline service because of numerous > errors by modifying > > yarn-site.xml: via Ambari > > <name>yarn.timeline-service.enabled</name> > > <value>false</value> > > > Thanks in advance, > > Douglas > >
