[jira] [Commented] (TEZ-3967) DAGImpl: dag lock is unfair and can starve the writers

2020-05-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102353#comment-17102353
 ] 

László Bodor commented on TEZ-3967:
---

[~jeagles], [~gopalv], [~jeagles]: could you please look back to this patch 
once TEZ-4171 is resolved?
I can see the [^TEZ-3967.03.patch] basically solves a lock contention problem 
by generating dag status periodically and returning a cached value, however, 
regarding TEZ-4171 I realized again that the dag state enum is a "privileged" 
property, which should contain the most up-to-date value which is possible...
currently, in [^TEZ-3967.03.patch], i simply return the cached value like:
{code}
public DAGStatusBuilder getStatus() {
      countGetDagStatus += 1;
      return cachedDagStatusBuilder;
}
{code}

but I would change it in a way that force the returned state to be the latest 
(no matter of the additional read-lock compared to the original optimization), 
otherwise, this patch could ruin the optimization of e.g. TEZ-4171 I think
{code}
public DAGStatusBuilder getStatus() {
      countGetDagStatus += 1;
      cachedDagStatusBuilder.setState(getState());
      return cachedDagStatusBuilder;
}
{code}

I don't think I can optimize this further at the moment, only if we implement 
an endpoint which returns only the state, I guess most of the time while 
optimizing this part (e.g. TEZ-4171), client is only interested in the state, 
not the counters (querying it in every ~100ms)

 

> DAGImpl: dag lock is unfair and can starve the writers
> --
>
> Key: TEZ-3967
> URL: https://issues.apache.org/jira/browse/TEZ-3967
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal Vijayaraghavan
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-3967.01.patch, TEZ-3967.02.patch, TEZ-3967.03.patch
>
>
> Found when debugging HIVE-20103, that a reader arriving when another reader 
> is active can postpone a writer from obtaining a write-lock.
> This is fundamentally bad for the DAGImpl as useful progress can only happen 
> when the writeLock is held.
> {code}
>   public void handle(DAGEvent event) {
> ...
> try {
>   writeLock.lock();
> {code}
> {code}
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7efb02246f40> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1162)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:149)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2251)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2242)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> while read-lock is passed around between 
> {code}
>at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:901)
> at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:940)
> at 
> org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:73)
> {code}
> calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-3967) DAGImpl: dag lock is unfair and can starve the writers

2020-04-29 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095928#comment-17095928
 ] 

Jonathan Turner Eagles commented on TEZ-3967:
-

Going to prioritize this review for tomorrow. Thanks for your patience 

> DAGImpl: dag lock is unfair and can starve the writers
> --
>
> Key: TEZ-3967
> URL: https://issues.apache.org/jira/browse/TEZ-3967
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal Vijayaraghavan
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-3967.01.patch, TEZ-3967.02.patch, TEZ-3967.03.patch
>
>
> Found when debugging HIVE-20103, that a reader arriving when another reader 
> is active can postpone a writer from obtaining a write-lock.
> This is fundamentally bad for the DAGImpl as useful progress can only happen 
> when the writeLock is held.
> {code}
>   public void handle(DAGEvent event) {
> ...
> try {
>   writeLock.lock();
> {code}
> {code}
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7efb02246f40> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1162)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:149)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2251)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2242)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> while read-lock is passed around between 
> {code}
>at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:901)
> at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:940)
> at 
> org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:73)
> {code}
> calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-3967) DAGImpl: dag lock is unfair and can starve the writers

2020-04-21 Thread TezQA (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088683#comment-17088683
 ] 

TezQA commented on TEZ-3967:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  9m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m  
9s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
36s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m  9s{color} | {color:orange} tez-common: The patch generated 1 new + 12 
unchanged - 0 fixed = 13 total (was 12) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 19s{color} | {color:orange} tez-dag: The patch generated 2 new + 129 
unchanged - 0 fixed = 131 total (was 129) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
34s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
42s{color} | {color:green} tez-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
24s{color} | {color:green} tez-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  4m 38s{color} 
| {color:red} tez-dag in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m 19s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | tez.dag.app.TestSpeculation |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 base: 
https://builds.apache.org/job/PreCommit-TEZ-Build/362/artifact/out/Dockerfile |
| JIRA Issue | TEZ-3967 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13000702/TEZ-3967.03.patch |
| Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs 
checkstyle compile |
| uname | Linux a85563ff7ae8 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/tez.sh |
| git revision | master / e7c24f0 |
| Default Java | 1.8.0_242 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-TEZ-Build/362/artifact/out/di

[jira] [Commented] (TEZ-3967) DAGImpl: dag lock is unfair and can starve the writers

2020-04-21 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088651#comment-17088651
 ] 

László Bodor commented on TEZ-3967:
---

[~jeagles]: in  [^TEZ-3967.03.patch], there is a pluggable "instrumented" 
read-write lock, what I've used in the unit tests, for instance:
{code}
mvn test -Dtest=TestDAGImpl#testGroupDAGCompletionWithCommitSuccess -pl tez-dag
{code}

I can see that the numbers were better in case of the new approach, let me 
share some numbers
I believe the most important is "AVG_LOCK_BLOCKED" kind of numbers, which shows 
the average time spent before and after calling .lock() (I guess this is the 
best way to check for how much time a thread is blocked blocked because of 
locks)

with an average 1ms improvement per lock, it can add up into a major 
improvement for a bigger/longer dag (considering that 
testGroupDAGCompletionWithCommitSuccess is a quite short-living unit test)

WITHOUT THE PATCH:
average blocked time for write lock: 2.8, 3.42, 2.82
{code}
READ_AVG_LOCK_BLOCKED_TIMES_MS: 2.1760985915492954, 
READ_COUNT_LOCK_ACQUIRED_BY_LOCK_METHOD: 71, 
READ_MAX_LOCK_BLOCKED_TIME_MS: 15.031, 
READ_SUM_LOCK_BLOCKED_TIMES_MS: 154.503, 
WRITE_AVG_LOCK_BLOCKED_TIMES_MS: 2.809090909090909, 
WRITE_COUNT_LOCK_ACQUIRED_BY_LOCK_METHOD: 11, 
WRITE_MAX_LOCK_BLOCKED_TIME_MS: 16.09, 
WRITE_SUM_LOCK_BLOCKED_TIME_MS: 30.9]

READ_AVG_LOCK_BLOCKED_TIMES_MS: 2.330492957746479, 
READ_COUNT_LOCK_ACQUIRED_BY_LOCK_METHOD: 71, 
READ_MAX_LOCK_BLOCKED_TIME_MS: 18.983, 
READ_SUM_LOCK_BLOCKED_TIMES_MS: 165.465, 
WRITE_AVG_LOCK_BLOCKED_TIMES_MS: 3.4274545454545455, 
WRITE_COUNT_LOCK_ACQUIRED_BY_LOCK_METHOD: 11, 
WRITE_MAX_LOCK_BLOCKED_TIME_MS: 17.092, 
WRITE_SUM_LOCK_BLOCKED_TIME_MS: 37.702]

READ_AVG_LOCK_BLOCKED_TIMES_MS: 2.427774647887324, 
READ_COUNT_LOCK_ACQUIRED_BY_LOCK_METHOD: 71, 
READ_MAX_LOCK_BLOCKED_TIME_MS: 15.377, 
READ_SUM_LOCK_BLOCKED_TIMES_MS: 172.372, 
WRITE_AVG_LOCK_BLOCKED_TIMES_MS: 2.8256363636363635, 
WRITE_COUNT_LOCK_ACQUIRED_BY_LOCK_METHOD: 11, 
WRITE_MAX_LOCK_BLOCKED_TIME_MS: 16.377, 
WRITE_SUM_LOCK_BLOCKED_TIME_MS: 31.082]
{code}

WITH THE PATCH:
average blocked time for write lock: 2.01, 1.51, 1.65
{code}
READ_AVG_LOCK_BLOCKED_TIMES_MS: 1.8155, 
READ_COUNT_LOCK_ACQUIRED_BY_LOCK_METHOD: 60, 
READ_MAX_LOCK_BLOCKED_TIME_MS: 6.974, 
READ_SUM_LOCK_BLOCKED_TIMES_MS: 108.93, 
WRITE_AVG_LOCK_BLOCKED_TIMES_MS: 2.0169, 
WRITE_COUNT_LOCK_ACQUIRED_BY_LOCK_METHOD: 10, 
WRITE_MAX_LOCK_BLOCKED_TIME_MS: 4.047, 
WRITE_SUM_LOCK_BLOCKED_TIME_MS: 20.169]

READ_AVG_LOCK_BLOCKED_TIMES_MS: 1.46175, 
READ_COUNT_LOCK_ACQUIRED_BY_LOCK_METHOD: 60, 
READ_MAX_LOCK_BLOCKED_TIME_MS: 5.533, 
READ_SUM_LOCK_BLOCKED_TIMES_MS: 87.705, 
WRITE_AVG_LOCK_BLOCKED_TIMES_MS: 1.5105, 
WRITE_COUNT_LOCK_ACQUIRED_BY_LOCK_METHOD: 10, 
WRITE_MAX_LOCK_BLOCKED_TIME_MS: 3.642, 
WRITE_SUM_LOCK_BLOCKED_TIME_MS: 15.105]

READ_AVG_LOCK_BLOCKED_TIMES_MS: 1.50095, 
READ_COUNT_LOCK_ACQUIRED_BY_LOCK_METHOD: 60, 
READ_MAX_LOCK_BLOCKED_TIME_MS: 4.009, 
READ_SUM_LOCK_BLOCKED_TIMES_MS: 90.057, 
WRITE_AVG_LOCK_BLOCKED_TIMES_MS: 1.6573, 
WRITE_COUNT_LOCK_ACQUIRED_BY_LOCK_METHOD: 10, 
WRITE_MAX_LOCK_BLOCKED_TIME_MS: 5.042, 
WRITE_SUM_LOCK_BLOCKED_TIME_MS: 16.573]
{code}


> DAGImpl: dag lock is unfair and can starve the writers
> --
>
> Key: TEZ-3967
> URL: https://issues.apache.org/jira/browse/TEZ-3967
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal Vijayaraghavan
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-3967.01.patch, TEZ-3967.02.patch, TEZ-3967.03.patch
>
>
> Found when debugging HIVE-20103, that a reader arriving when another reader 
> is active can postpone a writer from obtaining a write-lock.
> This is fundamentally bad for the DAGImpl as useful progress can only happen 
> when the writeLock is held.
> {code}
>   public void handle(DAGEvent event) {
> ...
> try {
>   writeLock.lock();
> {code}
> {code}
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7efb02246f40> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1162)
> at org

[jira] [Commented] (TEZ-3967) DAGImpl: dag lock is unfair and can starve the writers

2020-04-20 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17087862#comment-17087862
 ] 

László Bodor commented on TEZ-3967:
---

[~jeagles]: to be honest, I haven't tested this on cluster, but if patch looks 
good, I'll check it, let me find a way to measure lock contention somehow

> DAGImpl: dag lock is unfair and can starve the writers
> --
>
> Key: TEZ-3967
> URL: https://issues.apache.org/jira/browse/TEZ-3967
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal Vijayaraghavan
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-3967.01.patch, TEZ-3967.02.patch
>
>
> Found when debugging HIVE-20103, that a reader arriving when another reader 
> is active can postpone a writer from obtaining a write-lock.
> This is fundamentally bad for the DAGImpl as useful progress can only happen 
> when the writeLock is held.
> {code}
>   public void handle(DAGEvent event) {
> ...
> try {
>   writeLock.lock();
> {code}
> {code}
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7efb02246f40> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1162)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:149)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2251)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2242)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> while read-lock is passed around between 
> {code}
>at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:901)
> at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:940)
> at 
> org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:73)
> {code}
> calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-3967) DAGImpl: dag lock is unfair and can starve the writers

2020-04-20 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17087806#comment-17087806
 ] 

Jonathan Turner Eagles commented on TEZ-3967:
-

[~abstractdog], what did you find as far as lock contention goes with the patch 
vs without. I seems like this kind of approach will be better as reads should 
not need to build the cache with the readlock anymore, but numbers will prove 
this better.

> DAGImpl: dag lock is unfair and can starve the writers
> --
>
> Key: TEZ-3967
> URL: https://issues.apache.org/jira/browse/TEZ-3967
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal Vijayaraghavan
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-3967.01.patch, TEZ-3967.02.patch
>
>
> Found when debugging HIVE-20103, that a reader arriving when another reader 
> is active can postpone a writer from obtaining a write-lock.
> This is fundamentally bad for the DAGImpl as useful progress can only happen 
> when the writeLock is held.
> {code}
>   public void handle(DAGEvent event) {
> ...
> try {
>   writeLock.lock();
> {code}
> {code}
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7efb02246f40> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1162)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:149)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2251)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2242)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> while read-lock is passed around between 
> {code}
>at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:901)
> at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:940)
> at 
> org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:73)
> {code}
> calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-3967) DAGImpl: dag lock is unfair and can starve the writers

2020-04-20 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17087761#comment-17087761
 ] 

László Bodor commented on TEZ-3967:
---

[~gopalv], [~jeagles]: could you please take a look?

> DAGImpl: dag lock is unfair and can starve the writers
> --
>
> Key: TEZ-3967
> URL: https://issues.apache.org/jira/browse/TEZ-3967
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal Vijayaraghavan
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-3967.01.patch, TEZ-3967.02.patch
>
>
> Found when debugging HIVE-20103, that a reader arriving when another reader 
> is active can postpone a writer from obtaining a write-lock.
> This is fundamentally bad for the DAGImpl as useful progress can only happen 
> when the writeLock is held.
> {code}
>   public void handle(DAGEvent event) {
> ...
> try {
>   writeLock.lock();
> {code}
> {code}
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7efb02246f40> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1162)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:149)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2251)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2242)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> while read-lock is passed around between 
> {code}
>at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:901)
> at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:940)
> at 
> org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:73)
> {code}
> calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-3967) DAGImpl: dag lock is unfair and can starve the writers

2020-04-16 Thread TezQA (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085027#comment-17085027
 ] 

TezQA commented on TEZ-3967:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m  
8s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
41s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m  9s{color} | {color:orange} tez-common: The patch generated 1 new + 12 
unchanged - 0 fixed = 13 total (was 12) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 20s{color} | {color:orange} tez-dag: The patch generated 4 new + 129 
unchanged - 0 fixed = 133 total (was 129) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
2s{color} | {color:red} tez-dag generated 1 new + 0 unchanged - 0 fixed = 1 
total (was 0) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
25s{color} | {color:green} tez-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
27s{color} | {color:green} tez-dag in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 19m 39s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:tez-dag |
|  |  Unread field:field be static?  At DAGImpl.java:[line 2477] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 base: 
https://builds.apache.org/job/PreCommit-TEZ-Build/355/artifact/out/Dockerfile |
| JIRA Issue | TEZ-3967 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13000167/TEZ-3967.02.patch |
| Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs 
checkstyle compile |
| uname | Linux 209ac2d76222 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/tez.sh |
| git revision | master / 214c544 |
| Default Java | 1.8.0_242 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-TEZ-Build/355/artifact/out/diff-checkstyle-tez-common.txt
 |
| checkstyle | 

[jira] [Commented] (TEZ-3967) DAGImpl: dag lock is unfair and can starve the writers

2020-04-16 Thread TezQA (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085020#comment-17085020
 ] 

TezQA commented on TEZ-3967:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
44s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m  
3s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
48s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m  
8s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
11s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m  9s{color} | {color:orange} tez-common: The patch generated 1 new + 12 
unchanged - 0 fixed = 13 total (was 12) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 19s{color} | {color:orange} tez-dag: The patch generated 4 new + 129 
unchanged - 0 fixed = 133 total (was 129) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
4s{color} | {color:red} tez-dag generated 1 new + 0 unchanged - 0 fixed = 1 
total (was 0) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
31s{color} | {color:green} tez-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
19s{color} | {color:green} tez-dag in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 37m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:tez-dag |
|  |  Unread field:field be static?  At DAGImpl.java:[line 2477] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 base: 
https://builds.apache.org/job/PreCommit-TEZ-Build/354/artifact/out/Dockerfile |
| JIRA Issue | TEZ-3967 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13000167/TEZ-3967.02.patch |
| Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs 
checkstyle compile |
| uname | Linux 0fb113e146dd 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/tez.sh |
| git revision | master / 214c544 |
| Default Java | 1.8.0_242 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-TEZ-Build/354/artifact/out/diff-checkstyle-tez-common.txt
 |
| checkstyle | 

[jira] [Commented] (TEZ-3967) DAGImpl: dag lock is unfair and can starve the writers

2020-04-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084977#comment-17084977
 ] 

László Bodor commented on TEZ-3967:
---

[~jeagles]: attached  [^TEZ-3967.02.patch] , which contains the following 
basically:
1. DagStatusHandler: which encapsulates the dag status refresh logic
2. I noticed that getCachedCounters were invalid, [it didn't refresh 
cachedCounters at 
all|https://github.com/apache/tez/blob/214c544472f80997755b62c382f016fbe8373177/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java]...but
 with this patch I started to use cached counters

the basic idea is to refresh different fields with a different frequency 
(please let me know which should be configurable in your opinion)
2a) dag status refresh: 500ms - DAG_STATUS_BUILD_INTERVAL_MS
2b) counters refresh: >1000ms - COUNTER_CACHE_INTERVAL_THRESHOLD_FAST_MS

3. DrainDispatcher minor fix (I've found sometimes that it was hanging in the 
while loop, even when queue was empty), this class is used only in tests

4. TestDAGImpl: added some dag.entityUpdateTracker.stop(), becuase we manually 
set entityUpdateTracker into DAGImpl, but there is another one, which already 
started its thread (I haven't indentified error because of this, so just for 
clarity's sake)

> DAGImpl: dag lock is unfair and can starve the writers
> --
>
> Key: TEZ-3967
> URL: https://issues.apache.org/jira/browse/TEZ-3967
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal Vijayaraghavan
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-3967.01.patch, TEZ-3967.02.patch
>
>
> Found when debugging HIVE-20103, that a reader arriving when another reader 
> is active can postpone a writer from obtaining a write-lock.
> This is fundamentally bad for the DAGImpl as useful progress can only happen 
> when the writeLock is held.
> {code}
>   public void handle(DAGEvent event) {
> ...
> try {
>   writeLock.lock();
> {code}
> {code}
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7efb02246f40> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1162)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:149)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2251)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2242)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> while read-lock is passed around between 
> {code}
>at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:901)
> at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:940)
> at 
> org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:73)
> {code}
> calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-3967) DAGImpl: dag lock is unfair and can starve the writers

2020-04-06 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076646#comment-17076646
 ] 

László Bodor commented on TEZ-3967:
---

thanks for the quick response [~jeagles]! 

"Alternatively, a new approach could be made where the calculations are done on 
a periodic basis and then the readcalls would return only cached values."
I was thinking about the same, I think it's not much more effort than hunting 
for a better locking mechanism, I'll try that

> DAGImpl: dag lock is unfair and can starve the writers
> --
>
> Key: TEZ-3967
> URL: https://issues.apache.org/jira/browse/TEZ-3967
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal Vijayaraghavan
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-3967.01.patch
>
>
> Found when debugging HIVE-20103, that a reader arriving when another reader 
> is active can postpone a writer from obtaining a write-lock.
> This is fundamentally bad for the DAGImpl as useful progress can only happen 
> when the writeLock is held.
> {code}
>   public void handle(DAGEvent event) {
> ...
> try {
>   writeLock.lock();
> {code}
> {code}
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7efb02246f40> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1162)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:149)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2251)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2242)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> while read-lock is passed around between 
> {code}
>at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:901)
> at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:940)
> at 
> org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:73)
> {code}
> calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-3967) DAGImpl: dag lock is unfair and can starve the writers

2020-04-06 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076543#comment-17076543
 ] 

Jonathan Turner Eagles commented on TEZ-3967:
-

Perhaps, this fails to address the issue at hand. For the getAllCounters case, 
by moving the getInternalState outside of the readlock, the cost of the call is 
actually more. getInternalState has to lock and unlock the readlock. While 
splitting the lock is more costly, there is opportunity for another call to 
acquire the read lock, perhaps reducing the average latency time of acquiring 
the lock for other calls. On the other hand, you all the possibility of getting 
inconsistent results as a write could have taken place in between the two 
readlocks.

Similar for the getDAGStatus calls that were changed as well. The difference 
there is that some of the changes inside of the readlock could actually be 
moved outside of the readlock without risk, though not sure how much benefit it 
will be.

{code:title=Safe to access outside of the readlock}
  ProgressBuilder dagProgress = new ProgressBuilder();
  dagProgress.setTotalTaskCount(totalTaskCount);
  dagProgress.setSucceededTaskCount(totalSucceededTaskCount);
  dagProgress.setRunningTaskCount(totalRunningTaskCount);
  dagProgress.setFailedTaskCount(totalFailedTaskCount);
  dagProgress.setKilledTaskCount(totalKilledTaskCount);
  dagProgress.setFailedTaskAttemptCount(totalFailedTaskAttemptCount);
  dagProgress.setKilledTaskAttemptCount(totalKilledTaskAttemptCount);
  dagProgress.setRejectedTaskAttemptCount(totalRejectedTaskAttemptCount);
{code}

Alternatively, a new approach could be made where the calculations are done on 
a periodic basis and then the readcalls would return only cached values.

> DAGImpl: dag lock is unfair and can starve the writers
> --
>
> Key: TEZ-3967
> URL: https://issues.apache.org/jira/browse/TEZ-3967
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal Vijayaraghavan
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-3967.01.patch
>
>
> Found when debugging HIVE-20103, that a reader arriving when another reader 
> is active can postpone a writer from obtaining a write-lock.
> This is fundamentally bad for the DAGImpl as useful progress can only happen 
> when the writeLock is held.
> {code}
>   public void handle(DAGEvent event) {
> ...
> try {
>   writeLock.lock();
> {code}
> {code}
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7efb02246f40> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1162)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:149)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2251)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2242)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> while read-lock is passed around between 
> {code}
>at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:901)
> at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:940)
> at 
> org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:73)
> {code}
> calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-3967) DAGImpl: dag lock is unfair and can starve the writers

2020-04-06 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076468#comment-17076468
 ] 

László Bodor commented on TEZ-3967:
---

[~gopalv], [~jeagles]: could you please take a look at  [^TEZ-3967.01.patch] 
it's basically about narrowing the scope of the read locking

1. removed read lock from getDAGStatus, as it doesn't contain sensitive 
operation I think (except counters, but it's handled in called getAllCounters 
method)
-> first I thought that iterating vertexMap's entryset could be dangerous 
without lock, but I don't expect ConcurrentModificationException as vertexMap 
is initialized initializeDAG and then the number of entries are not supposed to 
change

2. getAllCounters
removed readlock from the beginning of the method, as 
mayBeConstructFinalFullCounters is protected with a fullCountersLock itself + 
getInternalState method also has its own locking

do you think this patch can help getDAGStatus lock footprint? if so, let me 
know if you want me to do any performance/sanity check

> DAGImpl: dag lock is unfair and can starve the writers
> --
>
> Key: TEZ-3967
> URL: https://issues.apache.org/jira/browse/TEZ-3967
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal Vijayaraghavan
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-3967.01.patch
>
>
> Found when debugging HIVE-20103, that a reader arriving when another reader 
> is active can postpone a writer from obtaining a write-lock.
> This is fundamentally bad for the DAGImpl as useful progress can only happen 
> when the writeLock is held.
> {code}
>   public void handle(DAGEvent event) {
> ...
> try {
>   writeLock.lock();
> {code}
> {code}
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7efb02246f40> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1162)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:149)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2251)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2242)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> while read-lock is passed around between 
> {code}
>at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:901)
> at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:940)
> at 
> org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:73)
> {code}
> calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-3967) DAGImpl: dag lock is unfair and can starve the writers

2020-04-04 Thread TezQA (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17075164#comment-17075164
 ] 

TezQA commented on TEZ-3967:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m  
9s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
8s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  4m 57s{color} 
| {color:red} tez-dag in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 9s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 34s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | tez.dag.app.TestSpeculation |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 base: 
https://builds.apache.org/job/PreCommit-TEZ-Build/335/artifact/out/Dockerfile |
| JIRA Issue | TEZ-3967 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12998833/TEZ-3967.01.patch |
| Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs 
checkstyle compile |
| uname | Linux 0f7807f2bd32 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/tez.sh |
| git revision | master / b5b432b |
| Default Java | 1.8.0_242 |
| unit | 
https://builds.apache.org/job/PreCommit-TEZ-Build/335/artifact/out/patch-unit-tez-dag.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-TEZ-Build/335/testReport/ |
| Max. process+thread count | 228 (vs. ulimit of 5500) |
| modules | C: tez-dag U: tez-dag |
| Console output | 
https://builds.apache.org/job/PreCommit-TEZ-Build/335/console |
| versions | git=2.7.4 maven=3.3.9 findbugs=3.0.1 |
| Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |


This message was automatically generated.



> DAGImpl: dag lock is unfair and can starve the writers
> --
>
> Key: TEZ-3967
> URL: https://issues.apache.org/jira/browse/TEZ-3967
> Project: Apache Tez
>  Issue Type: Bug
>  

[jira] [Commented] (TEZ-3967) DAGImpl: dag lock is unfair and can starve the writers

2018-07-12 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542205#comment-16542205
 ] 

Gopal V commented on TEZ-3967:
--

{code}
TEZ_DAG_STATUS_CHECK_INTERVAL("hive.tez.dag.status.check.interval", "500ms",
  new TimeValidator(TimeUnit.MILLISECONDS), "Interval between subsequent 
DAG status invocation."),
{code}

bq. I can see unfair locking being better for through put for the exact exact 
same reason you are suggesting

The trouble is that overlapping read-locks are starving out the write-lock 
(because the read-lock can lock over the other read-lock) right now.

Yes, fairness is not free - if you have a different write-preferred locking for 
this, I think that's what we are really looking for.


> DAGImpl: dag lock is unfair and can starve the writers
> --
>
> Key: TEZ-3967
> URL: https://issues.apache.org/jira/browse/TEZ-3967
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Priority: Major
>
> Found when debugging HIVE-20103, that a reader arriving when another reader 
> is active can postpone a writer from obtaining a write-lock.
> This is fundamentally bad for the DAGImpl as useful progress can only happen 
> when the writeLock is held.
> {code}
>   public void handle(DAGEvent event) {
> ...
> try {
>   writeLock.lock();
> {code}
> {code}
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7efb02246f40> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1162)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:149)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2251)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2242)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> while read-lock is passed around between 
> {code}
>at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:901)
> at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:940)
> at 
> org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:73)
> {code}
> calls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3967) DAGImpl: dag lock is unfair and can starve the writers

2018-07-12 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542031#comment-16542031
 ] 

Jonathan Eagles commented on TEZ-3967:
--

[~gopalv], can you indicate what the dag status poll interval is and also the 
number of vertices and task counts. Depending on the configuration and scale, I 
can see unfair locking being better for through put for the exact exact same 
reason you are suggesting and current locking method can be better for getting 
"real work" done.

> DAGImpl: dag lock is unfair and can starve the writers
> --
>
> Key: TEZ-3967
> URL: https://issues.apache.org/jira/browse/TEZ-3967
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Priority: Major
>
> Found when debugging HIVE-20103, that a reader arriving when another reader 
> is active can postpone a writer from obtaining a write-lock.
> This is fundamentally bad for the DAGImpl as useful progress can only happen 
> when the writeLock is held.
> {code}
>   public void handle(DAGEvent event) {
> ...
> try {
>   writeLock.lock();
> {code}
> {code}
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7efb02246f40> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1162)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:149)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2251)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2242)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> while read-lock is passed around between 
> {code}
>at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:901)
> at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:940)
> at 
> org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:73)
> {code}
> calls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3967) DAGImpl: dag lock is unfair and can starve the writers

2018-07-12 Thread Jaume M (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541957#comment-16541957
 ] 

Jaume M commented on TEZ-3967:
--

[~gopalv] the description of the problem makes sense to me but I'm having 
problems experimentally proving that it actually leads to better results.
* I'm starting a tez job with many vertices and failures (I believe the 
vertices ending and failing will require access to the writelock)
* Also at the same time I'm polling with one thread the status of the DAG. I'd 
like to do it with more but the {{DAGClientImpl}} is not thread safe.
* I'm measuring the average time waiting for the writelock when the polling 
thread waits {0, 10, 100, 1000} milliseconds after each poll but I'm not seeing 
any significant results

Is there a scenario in which many threads could poll the state?

> DAGImpl: dag lock is unfair and can starve the writers
> --
>
> Key: TEZ-3967
> URL: https://issues.apache.org/jira/browse/TEZ-3967
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Priority: Major
>
> Found when debugging HIVE-20103, that a reader arriving when another reader 
> is active can postpone a writer from obtaining a write-lock.
> This is fundamentally bad for the DAGImpl as useful progress can only happen 
> when the writeLock is held.
> {code}
>   public void handle(DAGEvent event) {
> ...
> try {
>   writeLock.lock();
> {code}
> {code}
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7efb02246f40> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1162)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:149)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2251)
> at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2242)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> while read-lock is passed around between 
> {code}
>at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:901)
> at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:940)
> at 
> org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:73)
> {code}
> calls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)