[jira] [Created] (TEZ-3368) NPE in DelayedContainerManager
Jason Lowe created TEZ-3368: --- Summary: NPE in DelayedContainerManager Key: TEZ-3368 URL: https://issues.apache.org/jira/browse/TEZ-3368 Project: Apache Tez Issue Type: Bug Affects Versions: 0.7.1 Reporter: Jason Lowe Saw a Tez AM hang due to an NPE in the DelayedContainerManager: {noformat} 2016-07-17 01:53:23,157 [ERROR] [DelayedContainerManager] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[DelayedContainerManager,5,main] threw an Exception. java.lang.NullPointerException at org.apache.tez.dag.app.rm.TezAMRMClientAsync.getMatchingRequestsForTopPriority(TezAMRMClientAsync.java:142) at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getMatchingRequestWithoutPriority(YarnTaskSchedulerService.java:1474) at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.access$500(YarnTaskSchedulerService.java:84) at org.apache.tez.dag.app.rm.YarnTaskSchedulerService$NodeLocalContainerAssigner.assignReUsedContainer(YarnTaskSchedulerService.java:1869) at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.assignReUsedContainerWithLocation(YarnTaskSchedulerService.java:1753) at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.assignDelayedContainer(YarnTaskSchedulerService.java:733) at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.access$600(YarnTaskSchedulerService.java:84) at org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.run(YarnTaskSchedulerService.java:2030) {noformat} After the DelayedContainerManager thread exited the AM proceeded to receive requested containers that would go unused until the container allocations expired. Then they would be re-requested, and the cycle repeated indefinitely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3367) Add support for Multiple Files Fetch from the Shuffle Handler
Kuhu Shukla created TEZ-3367: Summary: Add support for Multiple Files Fetch from the Shuffle Handler Key: TEZ-3367 URL: https://issues.apache.org/jira/browse/TEZ-3367 Project: Apache Tez Issue Type: Sub-task Reporter: Kuhu Shukla Assignee: Kuhu Shukla Equip the Custom Shuffle Handler to read multiple file.out(s) at once. One of the possible ways is to fetch all files from a given directory. The design may need to address the possible scenario of too many files exhausting the Inodes on a given node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TEZ-3366) Tez timeline client reporting different domains for same entity
[ https://issues.apache.org/jira/browse/TEZ-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah resolved TEZ-3366. -- Resolution: Duplicate > Tez timeline client reporting different domains for same entity > --- > > Key: TEZ-3366 > URL: https://issues.apache.org/jira/browse/TEZ-3366 > Project: Apache Tez > Issue Type: Bug > Environment: centos 6.6 > apache hadoop 2.6.4 > tez 0.6.2 >Reporter: Nikhil Mulley > > Hi, > Timeline server service logs on 2.6.4 cluster (no security, no acls) show > often these error and then an exception follows when tez job runs. Closely > inspecting the code shows there is a possibility of tez itself reporting > different domain for the same entity (one that is already also in the > timeline store) and then getting skipped to handle the event and store the > event timeline information. > > ERROR org.apache.hadoop.yarn.server.timeline.TimelineDataManager: Skip the > timeline entity: { id: tez_container_1468970783049_0021_01_02, type: > TEZ_CONTAINER_ID } > >>> > >>> > org.apache.hadoop.yarn.exceptions.YarnException: The domain of the timeline > entity { id: tez_container_1468970783049_0021_01_02, type: > TEZ_CONTAINER_ID } is not allowed to be changed. > >>> -- This message was sent by Atlassian JIRA (v6.3.4#6332)