[jira] [Created] (TEZ-3368) NPE in DelayedContainerManager

2016-07-20 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-3368:
---

 Summary: NPE in DelayedContainerManager
 Key: TEZ-3368
 URL: https://issues.apache.org/jira/browse/TEZ-3368
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Jason Lowe


Saw a Tez AM hang due to an NPE in the DelayedContainerManager:
{noformat}
2016-07-17 01:53:23,157 [ERROR] [DelayedContainerManager] 
|yarn.YarnUncaughtExceptionHandler|: Thread 
Thread[DelayedContainerManager,5,main] threw an Exception.
java.lang.NullPointerException
at 
org.apache.tez.dag.app.rm.TezAMRMClientAsync.getMatchingRequestsForTopPriority(TezAMRMClientAsync.java:142)
at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getMatchingRequestWithoutPriority(YarnTaskSchedulerService.java:1474)
at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.access$500(YarnTaskSchedulerService.java:84)
at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService$NodeLocalContainerAssigner.assignReUsedContainer(YarnTaskSchedulerService.java:1869)
at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.assignReUsedContainerWithLocation(YarnTaskSchedulerService.java:1753)
at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.assignDelayedContainer(YarnTaskSchedulerService.java:733)
at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.access$600(YarnTaskSchedulerService.java:84)
at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.run(YarnTaskSchedulerService.java:2030)
{noformat}

After the DelayedContainerManager thread exited the AM proceeded to receive 
requested containers that would go unused until the container allocations 
expired.  Then they would be re-requested, and the cycle repeated indefinitely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3367) Add support for Multiple Files Fetch from the Shuffle Handler

2016-07-20 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created TEZ-3367:


 Summary: Add support for Multiple Files Fetch from the Shuffle 
Handler
 Key: TEZ-3367
 URL: https://issues.apache.org/jira/browse/TEZ-3367
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


Equip the Custom Shuffle Handler to read multiple file.out(s) at once. One of 
the possible ways is to fetch all files from a given directory. The design may 
need to address the possible scenario of too many files exhausting the Inodes 
on a given node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-3366) Tez timeline client reporting different domains for same entity

2016-07-20 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah resolved TEZ-3366.
--
Resolution: Duplicate

> Tez timeline client reporting different domains for same entity
> ---
>
> Key: TEZ-3366
> URL: https://issues.apache.org/jira/browse/TEZ-3366
> Project: Apache Tez
>  Issue Type: Bug
> Environment: centos 6.6
> apache hadoop 2.6.4
> tez 0.6.2
>Reporter: Nikhil Mulley
>
> Hi,
> Timeline server service logs on 2.6.4 cluster (no security, no acls) show 
> often these error and then an exception follows when tez job runs. Closely 
> inspecting the code shows there is a possibility of tez itself reporting 
> different domain for the same entity (one that is already also in the 
> timeline store) and then getting skipped to handle the event and store the 
> event timeline information. 
> 
> ERROR org.apache.hadoop.yarn.server.timeline.TimelineDataManager: Skip the 
> timeline entity: { id: tez_container_1468970783049_0021_01_02, type: 
> TEZ_CONTAINER_ID }
> >>>
> >>>
> org.apache.hadoop.yarn.exceptions.YarnException: The domain of the timeline 
> entity { id: tez_container_1468970783049_0021_01_02, type: 
> TEZ_CONTAINER_ID } is not allowed to be changed.
> >>>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)