[jira] [Created] (TEZ-3297) Deadlock scenario in AM during ShuffleVertexManager auto reduce

2016-06-09 Thread Zhiyuan Yang (JIRA)
Zhiyuan Yang created TEZ-3297:
-

 Summary: Deadlock scenario in AM during ShuffleVertexManager auto 
reduce
 Key: TEZ-3297
 URL: https://issues.apache.org/jira/browse/TEZ-3297
 Project: Apache Tez
  Issue Type: Bug
Reporter: Zhiyuan Yang
Priority: Critical


Here is what's happening in the attached thread dump.

App Pool thread #9 does the auto reduce on V2 and initializes the new edge 
manager, it holds the V2 write lock and wants read lock of source vertex V1. 

At the same time, another App Pool thread #2 schedules a task of V1 and gets 
the output spec, so it holds the V1 read lock and wants V2 read lock. 

Also, dispatcher thread wants the V1 write lock to begin the state machine 
transition. Since dispatcher thread is at the head of V1 ReadWriteLock queue, 
thread #9 cannot get V1 read lock even thread #2 is holding V1 read lock. 

This is a circular lock scenario. #2 blocks dispatcher, dispatcher blocks #9, 
and #9 blocks #2.

There is no problem with ReadWriteLock behavior in this case. Please see this 
java bug report, http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6816565.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements

2016-06-09 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-3296:
---

 Summary: Tez job can hang if two vertices at the same root 
distance have different task requirements
 Key: TEZ-3296
 URL: https://issues.apache.org/jira/browse/TEZ-3296
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Jason Lowe
Priority: Critical


When two vertices have the same distance from the root Tez will schedule 
containers with the same priority.  However those vertices could have different 
task requirements and therefore different capabilities.  As documented in 
YARN-314, YARN currently doesn't support requests for multiple sizes at the 
same priority.  In practice this leads to one vertex allocation requests 
clobbering the other, and that can result in a situation where the Tez AM is 
waiting on containers it will never receive from the RM.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)