[jira] [Comment Edited] (TEZ-2329) UI Query on final dag status performance improvement

2015-04-16 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497985#comment-14497985
 ] 

Jonathan Eagles edited comment on TEZ-2329 at 4/16/15 12:30 PM:


[~Sreenath], [~pramachandran], can you take a look?


was (Author: jeagles):
[~Sreenath], can you take a look?

 UI Query on final dag status performance improvement
 

 Key: TEZ-2329
 URL: https://issues.apache.org/jira/browse/TEZ-2329
 Project: Apache Tez
  Issue Type: Bug
  Components: UI
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: TEZ-2329.1.patch


 Final dag status is a primary filter for the TEZ_DAG_ID entity. However, 
 intermediate dag status is not.
 By conditionally selecting between primaryFilter and secondaryFilter for 
 status, we can dramatically speed up the FAILED, ERROR, KILLED dag status 
 queries that are a common debugging operation for users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2329) UI Query on final dag status performance improvement

2015-04-16 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created TEZ-2329:


 Summary: UI Query on final dag status performance improvement
 Key: TEZ-2329
 URL: https://issues.apache.org/jira/browse/TEZ-2329
 Project: Apache Tez
  Issue Type: Bug
  Components: UI
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles


Final dag status is a primary filter for the TEZ_DAG_ID entity. However, 
intermediate dag status is not.

By conditionally selecting between primaryFilter and secondaryFilter for 
status, we can dramatically speed up the FAILED, ERROR, KILLED dag status 
queries that are a common debugging operation for users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2329) UI Query on final dag status performance improvement

2015-04-16 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498029#comment-14498029
 ] 

Prakash Ramachandran commented on TEZ-2329:
---

+1 LGTM.

 UI Query on final dag status performance improvement
 

 Key: TEZ-2329
 URL: https://issues.apache.org/jira/browse/TEZ-2329
 Project: Apache Tez
  Issue Type: Bug
  Components: UI
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: TEZ-2329.1.patch


 Final dag status is a primary filter for the TEZ_DAG_ID entity. However, 
 intermediate dag status is not.
 By conditionally selecting between primaryFilter and secondaryFilter for 
 status, we can dramatically speed up the FAILED, ERROR, KILLED dag status 
 queries that are a common debugging operation for users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2329) UI Query on final dag status performance improvement

2015-04-16 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-2329:
-
Attachment: TEZ-2329.1.patch

[~Sreenath], can you take a look?

 UI Query on final dag status performance improvement
 

 Key: TEZ-2329
 URL: https://issues.apache.org/jira/browse/TEZ-2329
 Project: Apache Tez
  Issue Type: Bug
  Components: UI
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: TEZ-2329.1.patch


 Final dag status is a primary filter for the TEZ_DAG_ID entity. However, 
 intermediate dag status is not.
 By conditionally selecting between primaryFilter and secondaryFilter for 
 status, we can dramatically speed up the FAILED, ERROR, KILLED dag status 
 queries that are a common debugging operation for users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-2329) UI Query on final dag status performance improvement

2015-04-16 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles resolved TEZ-2329.
--
Resolution: Fixed

 UI Query on final dag status performance improvement
 

 Key: TEZ-2329
 URL: https://issues.apache.org/jira/browse/TEZ-2329
 Project: Apache Tez
  Issue Type: Bug
  Components: UI
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: TEZ-2329.1.patch


 Final dag status is a primary filter for the TEZ_DAG_ID entity. However, 
 intermediate dag status is not.
 By conditionally selecting between primaryFilter and secondaryFilter for 
 status, we can dramatically speed up the FAILED, ERROR, KILLED dag status 
 queries that are a common debugging operation for users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2329) UI Query on final dag status performance improvement

2015-04-16 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498063#comment-14498063
 ] 

Jonathan Eagles commented on TEZ-2329:
--

Thanks, [~pramachandran]. Committed to master and branch-0.6

 UI Query on final dag status performance improvement
 

 Key: TEZ-2329
 URL: https://issues.apache.org/jira/browse/TEZ-2329
 Project: Apache Tez
  Issue Type: Bug
  Components: UI
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: TEZ-2329.1.patch


 Final dag status is a primary filter for the TEZ_DAG_ID entity. However, 
 intermediate dag status is not.
 By conditionally selecting between primaryFilter and secondaryFilter for 
 status, we can dramatically speed up the FAILED, ERROR, KILLED dag status 
 queries that are a common debugging operation for users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-986) Make conf set on DAG and vertex available in tez UI

2015-04-16 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-986:

Summary: Make conf set on DAG and vertex available in tez UI  (was: Make 
conf set on DAG and vertex available in jobhistory)

 Make conf set on DAG and vertex available in tez UI
 ---

 Key: TEZ-986
 URL: https://issues.apache.org/jira/browse/TEZ-986
 Project: Apache Tez
  Issue Type: Sub-task
  Components: UI
Reporter: Rohini Palaniswamy
Priority: Blocker

 Would like to have the conf set on DAG and Vertex
   1) viewable in Tez UI after the job completes. This is very essential for 
 debugging jobs.
   2) We have processes, that parse jobconf.xml from job history (hdfs) and 
 load them into hive tables for analysis. Would like to have Tez also make all 
 the configuration (byte array) available in job history so that we can 
 similarly parse them. 1) mandates that you store it in hdfs. 2) is just to 
 say make the format stored as a contract others can rely on for parsing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-1969 PreCommit Build #473

2015-04-16 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-1969
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/473/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2770 lines...]



{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12725864/TEZ-1969.3.patch
  against master revision bfb34af.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/473//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/473//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
aa10720f77d8923179e1ae0f66932fd481d58bb1 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #472
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2623423 bytes
Compression is 4.8%
Took 4.4 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2317) Successful task attempts getting killed

2015-04-16 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498134#comment-14498134
 ] 

Jonathan Eagles commented on TEZ-2317:
--

[~hitesh], this might be a good candidate for 0.6.1. Patch is simple enough and 
there is a big benefit for complex jobs.

 Successful task attempts getting killed
 ---

 Key: TEZ-2317
 URL: https://issues.apache.org/jira/browse/TEZ-2317
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy
Assignee: Bikas Saha
 Fix For: 0.7.0

 Attachments: AM-taskkill.log, TEZ-2317.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2317) Successful task attempts getting killed

2015-04-16 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498104#comment-14498104
 ] 

Rohini Palaniswamy commented on TEZ-2317:
-

+1. Don't see killed tasks with this patch anymore.

 Successful task attempts getting killed
 ---

 Key: TEZ-2317
 URL: https://issues.apache.org/jira/browse/TEZ-2317
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy
Assignee: Bikas Saha
 Fix For: 0.7.0

 Attachments: AM-taskkill.log, TEZ-2317.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1969) Stop the DAGAppMaster when a local mode client is stopped

2015-04-16 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498110#comment-14498110
 ] 

TezQA commented on TEZ-1969:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12725864/TEZ-1969.3.patch
  against master revision bfb34af.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/473//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/473//console

This message is automatically generated.

 Stop the DAGAppMaster when a local mode client is stopped
 -

 Key: TEZ-1969
 URL: https://issues.apache.org/jira/browse/TEZ-1969
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Prakash Ramachandran
 Attachments: TEZ-1969.1.patch, TEZ-1969.2.patch, TEZ-1969.3.patch


 https://issues.apache.org/jira/browse/TEZ-1661?focusedCommentId=14275366page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14275366
 Running multiple local clients in a single JVM will leak DAGAppMaster and 
 related threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2317) Successful task attempts getting killed

2015-04-16 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498187#comment-14498187
 ] 

Hitesh Shah commented on TEZ-2317:
--

[~bikassaha] does this impact 0.5 too?

 Successful task attempts getting killed
 ---

 Key: TEZ-2317
 URL: https://issues.apache.org/jira/browse/TEZ-2317
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy
Assignee: Bikas Saha
 Attachments: AM-taskkill.log, TEZ-2317.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2300) TezClient.stop() takes a lot of time or does not work sometimes

2015-04-16 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498230#comment-14498230
 ] 

Rohini Palaniswamy commented on TEZ-2300:
-

There are couple of issues with the behavior after talking to [~jlowe] and 
comparing what is done in MR
- Kill is put in the event queue and is processed like any other event. 
When there are millions of event in the queue it takes a long time to get to 
that and I see the AM even scheduling new tasks. MR also does it this way. 
Problem is with too many events and TEZ-776 should reduce that. But still with 
large jobs there are going to be many events in the queue.
   - TezClient.stop() returns immediately after the kill. It should not and it 
should poll and wait on the client side. MR does that.
   - If the DAG is not killed and session not shutdown even after a certain 
timeout, yarn kill should be called. MR does that.

This is an important issue as people might kill a script and think the 
application is killed and proceed with running a new one which could cause lot 
of issues while the old one is still running.  So the kill needs to be 
synchronous and reliable.

 TezClient.stop() takes a lot of time or does not work sometimes
 ---

 Key: TEZ-2300
 URL: https://issues.apache.org/jira/browse/TEZ-2300
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy
 Attachments: syslog_dag_1428329756093_325099_1_post 


   Noticed this with a couple of pig scripts which were not behaving well (AM 
 close to OOM, etc) and even with some that were running fine. Pig calls 
 Tezclient.stop() in shutdown hook. Ctrl+C to the pig script either exits 
 immediately or is hung. In both cases it either takes a long time for the 
 yarn application to go to KILLED state. Many times I just end up calling yarn 
 application -kill separately after waiting for 5 mins or more for it to get 
 killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2317) Successful task attempts getting killed

2015-04-16 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2317:
-
Fix Version/s: (was: 0.7.0)

 Successful task attempts getting killed
 ---

 Key: TEZ-2317
 URL: https://issues.apache.org/jira/browse/TEZ-2317
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy
Assignee: Bikas Saha
 Attachments: AM-taskkill.log, TEZ-2317.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2314) Tez task attempt failures due to bad event serialization

2015-04-16 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498083#comment-14498083
 ] 

Rohini Palaniswamy commented on TEZ-2314:
-

[~bikassaha],
   I don't see this issue with tez 0.6 for the same script even for multiple 
runs. Should be something introduced in master. 

 Tez task attempt failures due to bad event serialization
 

 Key: TEZ-2314
 URL: https://issues.apache.org/jira/browse/TEZ-2314
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy

 {code}
 2015-04-13 19:21:48,516 WARN [Socket Reader #3 for port 53530] ipc.Server: 
 Unable to read call parameters for client 10.216.13.112on connection protocol 
 org.apache.tez.common.TezTaskUmbilicalProtocol for rpcKind RPC_WRITABLE
 java.lang.ArrayIndexOutOfBoundsException: 1935896432
 at 
 org.apache.tez.runtime.api.impl.EventMetaData.readFields(EventMetaData.java:120)
 at 
 org.apache.tez.runtime.api.impl.TezEvent.readFields(TezEvent.java:271)
 at 
 org.apache.tez.runtime.api.impl.TezHeartbeatRequest.readFields(TezHeartbeatRequest.java:110)
 at 
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
 at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:160)
 at 
 org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1884)
 at 
 org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1816)
 at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1574)
 at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:806)
 at 
 org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:673)
 at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:644)
 {code}
 cc/ [~hitesh] and [~bikassaha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2317 PreCommit Build #474

2015-04-16 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2317
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/474/

###
## LAST 60 LINES OF THE CONSOLE 
###
Started by remote host 127.0.0.1
Building remotely on H8 (Mapreduce Falcon Hadoop Pig Zookeeper Tez Hdfs) in 
workspace /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build
  git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
  git config remote.origin.url https://git-wip-us.apache.org/repos/asf/tez.git 
  # timeout=10
Cleaning workspace
  git rev-parse --verify HEAD # timeout=10
Resetting working tree
  git reset --hard # timeout=10
  git clean -fdx # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/tez.git
  git --version # timeout=10
  git fetch --tags --progress https://git-wip-us.apache.org/repos/asf/tez.git 
  +refs/heads/*:refs/remotes/origin/*
  git rev-parse refs/remotes/origin/master^{commit} # timeout=10
  git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10
Checking out Revision e1968681cee821103e0105e4948c4fc6dc949776 
(refs/remotes/origin/master)
  git config core.sparsecheckout # timeout=10
  git checkout -f e1968681cee821103e0105e4948c4fc6dc949776
  git rev-list bfb34afba0edfb254b05037b3b2ab37e3d3e44cf # timeout=10
No emails were triggered.
[PreCommit-TEZ-Build] $ /bin/bash /tmp/hudson6745202919307307605.sh
Running in Jenkins mode


==
==
Testing patch for TEZ-2317.
==
==


HEAD is now at e196868 TEZ-2317. Event processing backlog can result in task 
failures for short tasks (bikas)
Previous HEAD position was e196868... TEZ-2317. Event processing backlog can 
result in task failures for short tasks (bikas)
Switched to branch 'master'
Your branch is behind 'origin/master' by 5 commits, and can be fast-forwarded.
  (use git pull to update your local branch)
First, rewinding head to replay your work on top of it...
Fast-forwarded master to e1968681cee821103e0105e4948c4fc6dc949776.
TEZ-2317 is not Patch Available.  Exiting.


==
==
Finished build.
==
==


Archiving artifacts
ERROR: No artifacts found that match the file pattern patchprocess/*.*. 
Configuration error?
ERROR: ?patchprocess/*.*? doesn?t match anything, but ?*.*? does. Perhaps 
that?s what you mean?
Build step 'Archive the artifacts' changed build result to FAILURE
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (TEZ-1969) Stop the DAGAppMaster when a local mode client is stopped

2015-04-16 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498766#comment-14498766
 ] 

Hitesh Shah commented on TEZ-1969:
--

Might be relevant to FLINK-1892 \cc [~ktzoumas]

 Stop the DAGAppMaster when a local mode client is stopped
 -

 Key: TEZ-1969
 URL: https://issues.apache.org/jira/browse/TEZ-1969
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Prakash Ramachandran
 Attachments: TEZ-1969.1.patch, TEZ-1969.2.patch, TEZ-1969.3.patch


 https://issues.apache.org/jira/browse/TEZ-1661?focusedCommentId=14275366page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14275366
 Running multiple local clients in a single JVM will leak DAGAppMaster and 
 related threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2314) Tez task attempt failures due to bad event serialization

2015-04-16 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2314:
-
Target Version/s: 0.7.0

 Tez task attempt failures due to bad event serialization
 

 Key: TEZ-2314
 URL: https://issues.apache.org/jira/browse/TEZ-2314
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Rohini Palaniswamy
 Attachments: TEZ-2314.log.patch


 {code}
 2015-04-13 19:21:48,516 WARN [Socket Reader #3 for port 53530] ipc.Server: 
 Unable to read call parameters for client 10.216.13.112on connection protocol 
 org.apache.tez.common.TezTaskUmbilicalProtocol for rpcKind RPC_WRITABLE
 java.lang.ArrayIndexOutOfBoundsException: 1935896432
 at 
 org.apache.tez.runtime.api.impl.EventMetaData.readFields(EventMetaData.java:120)
 at 
 org.apache.tez.runtime.api.impl.TezEvent.readFields(TezEvent.java:271)
 at 
 org.apache.tez.runtime.api.impl.TezHeartbeatRequest.readFields(TezHeartbeatRequest.java:110)
 at 
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
 at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:160)
 at 
 org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1884)
 at 
 org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1816)
 at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1574)
 at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:806)
 at 
 org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:673)
 at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:644)
 {code}
 cc/ [~hitesh] and [~bikassaha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2314) Tez task attempt failures due to bad event serialization

2015-04-16 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2314:
-
Fix Version/s: (was: 0.7.0)

 Tez task attempt failures due to bad event serialization
 

 Key: TEZ-2314
 URL: https://issues.apache.org/jira/browse/TEZ-2314
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Rohini Palaniswamy
 Attachments: TEZ-2314.log.patch


 {code}
 2015-04-13 19:21:48,516 WARN [Socket Reader #3 for port 53530] ipc.Server: 
 Unable to read call parameters for client 10.216.13.112on connection protocol 
 org.apache.tez.common.TezTaskUmbilicalProtocol for rpcKind RPC_WRITABLE
 java.lang.ArrayIndexOutOfBoundsException: 1935896432
 at 
 org.apache.tez.runtime.api.impl.EventMetaData.readFields(EventMetaData.java:120)
 at 
 org.apache.tez.runtime.api.impl.TezEvent.readFields(TezEvent.java:271)
 at 
 org.apache.tez.runtime.api.impl.TezHeartbeatRequest.readFields(TezHeartbeatRequest.java:110)
 at 
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
 at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:160)
 at 
 org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1884)
 at 
 org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1816)
 at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1574)
 at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:806)
 at 
 org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:673)
 at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:644)
 {code}
 cc/ [~hitesh] and [~bikassaha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1969) Stop the DAGAppMaster when a local mode client is stopped

2015-04-16 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498713#comment-14498713
 ] 

Siddharth Seth commented on TEZ-1969:
-

Thanks for the clarification.
+1 Looks good.

 Stop the DAGAppMaster when a local mode client is stopped
 -

 Key: TEZ-1969
 URL: https://issues.apache.org/jira/browse/TEZ-1969
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Prakash Ramachandran
 Attachments: TEZ-1969.1.patch, TEZ-1969.2.patch, TEZ-1969.3.patch


 https://issues.apache.org/jira/browse/TEZ-1661?focusedCommentId=14275366page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14275366
 Running multiple local clients in a single JVM will leak DAGAppMaster and 
 related threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2282) Delimit reused yarn container logs (stderr, stdout, syslog) with task attempt start/stop events

2015-04-16 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated TEZ-2282:
---
Attachment: TEZ-2282.3.master.patch

Attached patches:
TEZ-2282.3.patch - for branch-0.6
TEZ-2282.3.master.patch - for master

 Delimit reused yarn container logs (stderr, stdout, syslog) with task attempt 
 start/stop events
 ---

 Key: TEZ-2282
 URL: https://issues.apache.org/jira/browse/TEZ-2282
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: TEZ-2282.1.patch, TEZ-2282.2.patch, 
 TEZ-2282.3.master.patch, TEZ-2282.3.patch, TEZ-2282.master.1.patch


 This could help with debugging in some cases where logging is task specific. 
 For example GC log is going to stdout, it will be nice to see task attempt 
 start/stop times



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2282) Delimit reused yarn container logs (stderr, stdout, syslog) with task attempt start/stop events

2015-04-16 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated TEZ-2282:
---
Attachment: TEZ-2282.3.patch

 Delimit reused yarn container logs (stderr, stdout, syslog) with task attempt 
 start/stop events
 ---

 Key: TEZ-2282
 URL: https://issues.apache.org/jira/browse/TEZ-2282
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: TEZ-2282.1.patch, TEZ-2282.2.patch, TEZ-2282.3.patch, 
 TEZ-2282.master.1.patch


 This could help with debugging in some cases where logging is task specific. 
 For example GC log is going to stdout, it will be nice to see task attempt 
 start/stop times



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2282) Delimit reused yarn container logs (stderr, stdout, syslog) with task attempt start/stop events

2015-04-16 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498761#comment-14498761
 ] 

TezQA commented on TEZ-2282:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  
http://issues.apache.org/jira/secure/attachment/12725970/TEZ-2282.3.master.patch
  against master revision e196868.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/475//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/475//console

This message is automatically generated.

 Delimit reused yarn container logs (stderr, stdout, syslog) with task attempt 
 start/stop events
 ---

 Key: TEZ-2282
 URL: https://issues.apache.org/jira/browse/TEZ-2282
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: TEZ-2282.1.patch, TEZ-2282.2.patch, 
 TEZ-2282.3.master.patch, TEZ-2282.3.patch, TEZ-2282.master.1.patch


 This could help with debugging in some cases where logging is task specific. 
 For example GC log is going to stdout, it will be nice to see task attempt 
 start/stop times



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2282 PreCommit Build #475

2015-04-16 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2282
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/475/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2769 lines...]



{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  
http://issues.apache.org/jira/secure/attachment/12725970/TEZ-2282.3.master.patch
  against master revision e196868.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/475//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/475//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
6aa8655ddaee8fd371e3f7e14bb2f1db1a5c4324 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #472
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2649800 bytes
Compression is 4.7%
Took 1 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2282) Delimit reused yarn container logs (stderr, stdout, syslog) with task attempt start/stop events

2015-04-16 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498707#comment-14498707
 ] 

Hitesh Shah commented on TEZ-2282:
--

Mostly looks good. I will defer the final review to [~jeagles] as he requested 
this change. 

 Delimit reused yarn container logs (stderr, stdout, syslog) with task attempt 
 start/stop events
 ---

 Key: TEZ-2282
 URL: https://issues.apache.org/jira/browse/TEZ-2282
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: TEZ-2282.1.patch, TEZ-2282.2.patch, 
 TEZ-2282.3.master.patch, TEZ-2282.3.patch, TEZ-2282.master.1.patch


 This could help with debugging in some cases where logging is task specific. 
 For example GC log is going to stdout, it will be nice to see task attempt 
 start/stop times



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2282) Delimit reused yarn container logs (stderr, stdout, syslog) with task attempt start/stop events

2015-04-16 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498671#comment-14498671
 ] 

Mit Desai commented on TEZ-2282:


[~hitesh], [~jeagles], [~knoguchi]. This is how the log files look like after 
the patch.

{noformat}
Log Type: stderr

Log Upload Time: 16-Apr-2015 20:17:19

Log Length: 376

2015-04-16 20:17:07 Starting to run new task attempt: 
attempt_1429195759237_0018_1_01_00_0
2015-04-16 20:17:08 Completed running task attempt: 
attempt_1429195759237_0018_1_01_00_0
2015-04-16 20:17:08 Starting to run new task attempt: 
attempt_1429195759237_0018_1_02_00_0
2015-04-16 20:17:08 Completed running task attempt: 
attempt_1429195759237_0018_1_02_00_0


Log Type: stdout

Log Upload Time: 16-Apr-2015 20:17:19

Log Length: 1860

0.202: [GC [PSYoungGen: 5440K-893K(6336K)] 5440K-1517K(64640K), 0.0046680 
secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
0.353: [GC [PSYoungGen: 6333K-893K(11776K)] 6957K-2293K(70080K), 0.0049120 
secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
0.517: [GC [PSYoungGen: 11773K-885K(11776K)] 13173K-3554K(70080K), 0.0040680 
secs] [Times: user=0.01 sys=0.01, real=0.01 secs] 
0.690: [GC [PSYoungGen: 11765K-885K(22656K)] 14434K-4622K(80960K), 0.0034990 
secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
1.144: [GC [PSYoungGen: 22645K-885K(22656K)] 26382K-6884K(80960K), 0.0054460 
secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
1.669: [GC [PSYoungGen: 22645K-3056K(45632K)] 28644K-9986K(103936K), 
0.0093110 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
2015-04-16 20:17:07 Starting to run new task attempt: 
attempt_1429195759237_0018_1_01_00_0
2.227: [GC [PSYoungGen: 45616K-4017K(46592K)] 52546K-11854K(104896K), 
0.0231380 secs] [Times: user=0.05 sys=0.00, real=0.03 secs] 
2015-04-16 20:17:08 Completed running task attempt: 
attempt_1429195759237_0018_1_01_00_0
2015-04-16 20:17:08 Starting to run new task attempt: 
attempt_1429195759237_0018_1_02_00_0
2015-04-16 20:17:08 Completed running task attempt: 
attempt_1429195759237_0018_1_02_00_0
Heap
 PSYoungGen  total 46592K, used 46577K [0xed28, 0xf2f8, 0xf444)
  eden space 42560K, 100% used [0xed28,0xefc1,0xefc1)
  from space 4032K, 99% used [0xefc1,0xefffc768,0xf000)
  to   space 5056K, 0% used [0xf2a9,0xf2a9,0xf2f8)
 ParOldGen   total 125952K, used 75420K [0xb444, 0xbbf4, 0xed28)
  object space 125952K, 59% used [0xb444,0xb8de73f0,0xbbf4)
 PSPermGen   total 16384K, used 12647K [0xb044, 0xb144, 0xb444)
  object space 16384K, 77% used [0xb044,0xb1099f90,0xb144)
{noformat}

{noformat}
 Log Type: dag_1429195759237_0018_1.dot

Log Upload Time: 16-Apr-2015 20:17:19

Log Length: 1154

digraph MRRSleepJob {
graph [ label=MRRSleepJob, fontsize=24, fontname=Helvetica];
node [fontsize=12, fontname=Helvetica];
edge [fontsize=9, fontcolor=blue, fontname=Arial];
MRRSleepJob.reduce [ label = reduce[ReduceProcessor] ];
MRRSleepJob.reduce - MRRSleepJob.reduce_MROutput [ label = Output 
[outputClass=MROutputLegacy,\n initializer=MROutputCommitter] ];
MRRSleepJob.ireduce1 [ label = ireduce1[ReduceProcessor] ];
MRRSleepJob.ireduce1 - MRRSleepJob.reduce [ label = 
[input=OrderedPartitionedKVOutput,\n output=OrderedGroupedInputLegacy,\n 
dataMovement=SCATTER_GATHER,\n schedulingType=SEQUENTIAL] ];
MRRSleepJob.reduce_MROutput [ label = reduce[MROutput], shape = box ];
MRRSleepJob.map_MRInput [ label = map[MRInput], shape = box ];
MRRSleepJob.map_MRInput - MRRSleepJob.map [ label = Input 
[inputClass=MRInputLegacy,\n initializer=MRInputSplitDistributor] ];
MRRSleepJob.map [ label = map[MapProcessor] ];
MRRSleepJob.map - MRRSleepJob.ireduce1 [ label = 
[input=OrderedPartitionedKVOutput,\n output=OrderedGroupedInputLegacy,\n 
dataMovement=SCATTER_GATHER,\n schedulingType=SEQUENTIAL] ];
}


Log Type: stderr

Log Upload Time: 16-Apr-2015 20:17:19

Log Length: 118

2015-04-16 20:16:54 Running Dag: dag_1429195759237_0018_1
2015-04-16 20:17:08 Completed Dag: dag_1429195759237_0018_1


Log Type: stdout

Log Upload Time: 16-Apr-2015 20:17:19

Log Length: 1757

0.395: [GC [PSYoungGen: 16448K-2680K(19136K)] 16448K-3177K(62848K), 0.0062140 
secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
0.599: [GC [PSYoungGen: 19128K-2679K(35584K)] 19625K-3268K(79296K), 0.0072310 
secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
1.247: [GC [PSYoungGen: 35575K-2683K(35584K)] 36164K-5911K(79296K), 0.0100360 
secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 
1.636: [GC [PSYoungGen: 35579K-2675K(68480K)] 38807K-7810K(112192K), 
0.0093970 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 
2.169: [GC [PSYoungGen: 68467K-2685K(68480K)] 73602K-12997K(112192K), 
0.0152030 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 
2.845: [GC [PSYoungGen: 68477K-7379K(138688K)] 78789K-17695K(182400K), 
0.0137060 secs] [Times: user=0.02 sys=0.00, 

[jira] [Commented] (TEZ-2333) enable local fetch optimization by default.

2015-04-16 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499196#comment-14499196
 ] 

TezQA commented on TEZ-2333:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12726050/TEZ-2333.1.patch
  against master revision 3e6fc35.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestSecureShuffle

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/478//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/478//console

This message is automatically generated.

 enable local fetch optimization by default.
 ---

 Key: TEZ-2333
 URL: https://issues.apache.org/jira/browse/TEZ-2333
 Project: Apache Tez
  Issue Type: Task
Reporter: Prakash Ramachandran
Assignee: Prakash Ramachandran
 Attachments: TEZ-2333.1.patch


 enable TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2331) Container Stop Info Always Missing When Container Reuse Enabled

2015-04-16 Thread Chang Li (JIRA)
Chang Li created TEZ-2331:
-

 Summary: Container Stop Info Always Missing When Container Reuse 
Enabled
 Key: TEZ-2331
 URL: https://issues.apache.org/jira/browse/TEZ-2331
 Project: Apache Tez
  Issue Type: Bug
Reporter: Chang Li


Inside otherinfo the container's exit status and end time is always missing 
when container reuse is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2310) AM Deadlock in VertexImpl

2015-04-16 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498996#comment-14498996
 ] 

TezQA commented on TEZ-2310:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12726012/TEZ-2310.2.patch
  against master revision e196868.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.dag.app.dag.impl.TestDAGImpl

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/477//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/477//console

This message is automatically generated.

 AM Deadlock in VertexImpl
 -

 Key: TEZ-2310
 URL: https://issues.apache.org/jira/browse/TEZ-2310
 Project: Apache Tez
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Bikas Saha
 Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch, TEZ-2310.2.patch


 See the following deadlock in testing:
 Thread#1:
 {code}
 Daemon Thread [App Shared Pool - #3] (Suspended)  
   owns: VertexManager$VertexManagerPluginContextImpl  (id=327)
   owns: ShuffleVertexManager  (id=328)
   owns: VertexManager  (id=329)   
   waiting for: VertexManager$VertexManagerPluginContextImpl  (id=326) 
   
 VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate)
  line: 344
   
 StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) 
 line: 138  
   
 StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer,
  VertexStateUpdate) line: 122
   StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) 
 line: 116   
   StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 
 106  
   VertexImpl.maybeSendConfiguredEvent() line: 3385
   VertexImpl.doneReconfiguringVertex() line: 1634 
   VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() 
 line: 339
   ShuffleVertexManager.schedulePendingTasks(int) line: 561
   ShuffleVertexManager.schedulePendingTasks() line: 620   
   ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 
 731   
   ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744  
   VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527  
   VertexManager$VertexManagerEvent$1.run() line: 612  
   VertexManager$VertexManagerEvent$1.run() line: 607  
   AccessController.doPrivileged(PrivilegedExceptionActionT, 
 AccessControlContext) line: not available [native method]   
   Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415   
   UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548  
   
 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
  line: 607  
   
 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
  line: 596  
   ListenableFutureTaskV(FutureTaskV).run() line: 262  
   ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145  
   ThreadPoolExecutor$Worker.run() line: 615   
   Thread.run() line: 745  
 {code}
 Thread #2
 {code}
 Daemon Thread [App Shared Pool - #2] (Suspended)  
   owns: VertexManager$VertexManagerPluginContextImpl  (id=326)
   owns: PigGraceShuffleVertexManager  (id=344)
   owns: VertexManager  (id=345)   
   Unsafe.park(boolean, long) line: not available [native method]  
   LockSupport.park(Object) line: 186  
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt()
  line: 834
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int)
  line: 964   
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int)
  line: 1282
   ReentrantReadWriteLock$ReadLock.lock() line: 731
   VertexImpl.getTotalTasks() line: 952
   VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) 
 

[jira] [Commented] (TEZ-2310) AM Deadlock in VertexImpl

2015-04-16 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498980#comment-14498980
 ] 

Hitesh Shah commented on TEZ-2310:
--

+1. Please open a jira for failing the dag instead of triggering the internal 
error for the handler exception scenario.

 AM Deadlock in VertexImpl
 -

 Key: TEZ-2310
 URL: https://issues.apache.org/jira/browse/TEZ-2310
 Project: Apache Tez
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Bikas Saha
 Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch, TEZ-2310.2.patch


 See the following deadlock in testing:
 Thread#1:
 {code}
 Daemon Thread [App Shared Pool - #3] (Suspended)  
   owns: VertexManager$VertexManagerPluginContextImpl  (id=327)
   owns: ShuffleVertexManager  (id=328)
   owns: VertexManager  (id=329)   
   waiting for: VertexManager$VertexManagerPluginContextImpl  (id=326) 
   
 VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate)
  line: 344
   
 StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) 
 line: 138  
   
 StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer,
  VertexStateUpdate) line: 122
   StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) 
 line: 116   
   StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 
 106  
   VertexImpl.maybeSendConfiguredEvent() line: 3385
   VertexImpl.doneReconfiguringVertex() line: 1634 
   VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() 
 line: 339
   ShuffleVertexManager.schedulePendingTasks(int) line: 561
   ShuffleVertexManager.schedulePendingTasks() line: 620   
   ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 
 731   
   ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744  
   VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527  
   VertexManager$VertexManagerEvent$1.run() line: 612  
   VertexManager$VertexManagerEvent$1.run() line: 607  
   AccessController.doPrivileged(PrivilegedExceptionActionT, 
 AccessControlContext) line: not available [native method]   
   Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415   
   UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548  
   
 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
  line: 607  
   
 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
  line: 596  
   ListenableFutureTaskV(FutureTaskV).run() line: 262  
   ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145  
   ThreadPoolExecutor$Worker.run() line: 615   
   Thread.run() line: 745  
 {code}
 Thread #2
 {code}
 Daemon Thread [App Shared Pool - #2] (Suspended)  
   owns: VertexManager$VertexManagerPluginContextImpl  (id=326)
   owns: PigGraceShuffleVertexManager  (id=344)
   owns: VertexManager  (id=345)   
   Unsafe.park(boolean, long) line: not available [native method]  
   LockSupport.park(Object) line: 186  
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt()
  line: 834
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int)
  line: 964   
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int)
  line: 1282
   ReentrantReadWriteLock$ReadLock.lock() line: 731
   VertexImpl.getTotalTasks() line: 952
   VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) 
 line: 162
   
 PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() 
 line: 435
   
 PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(MapString,ListInteger)
  line: 353 
   VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541  
   VertexManager$VertexManagerEvent$1.run() line: 612  
   VertexManager$VertexManagerEvent$1.run() line: 607  
   AccessController.doPrivileged(PrivilegedExceptionActionT, 
 AccessControlContext) line: not available [native method]   
   Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415   
   UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548  
   
 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call()
  line: 607  
   
 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call()
  line: 596  
   ListenableFutureTaskV(FutureTaskV).run() line: 262  
   

[jira] [Updated] (TEZ-2310) AM Deadlock in VertexImpl

2015-04-16 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2310:
-
Fix Version/s: (was: 0.7.0)

 AM Deadlock in VertexImpl
 -

 Key: TEZ-2310
 URL: https://issues.apache.org/jira/browse/TEZ-2310
 Project: Apache Tez
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Bikas Saha
 Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch


 See the following deadlock in testing:
 Thread#1:
 {code}
 Daemon Thread [App Shared Pool - #3] (Suspended)  
   owns: VertexManager$VertexManagerPluginContextImpl  (id=327)
   owns: ShuffleVertexManager  (id=328)
   owns: VertexManager  (id=329)   
   waiting for: VertexManager$VertexManagerPluginContextImpl  (id=326) 
   
 VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate)
  line: 344
   
 StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) 
 line: 138  
   
 StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer,
  VertexStateUpdate) line: 122
   StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) 
 line: 116   
   StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 
 106  
   VertexImpl.maybeSendConfiguredEvent() line: 3385
   VertexImpl.doneReconfiguringVertex() line: 1634 
   VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() 
 line: 339
   ShuffleVertexManager.schedulePendingTasks(int) line: 561
   ShuffleVertexManager.schedulePendingTasks() line: 620   
   ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 
 731   
   ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744  
   VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527  
   VertexManager$VertexManagerEvent$1.run() line: 612  
   VertexManager$VertexManagerEvent$1.run() line: 607  
   AccessController.doPrivileged(PrivilegedExceptionActionT, 
 AccessControlContext) line: not available [native method]   
   Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415   
   UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548  
   
 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
  line: 607  
   
 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
  line: 596  
   ListenableFutureTaskV(FutureTaskV).run() line: 262  
   ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145  
   ThreadPoolExecutor$Worker.run() line: 615   
   Thread.run() line: 745  
 {code}
 Thread #2
 {code}
 Daemon Thread [App Shared Pool - #2] (Suspended)  
   owns: VertexManager$VertexManagerPluginContextImpl  (id=326)
   owns: PigGraceShuffleVertexManager  (id=344)
   owns: VertexManager  (id=345)   
   Unsafe.park(boolean, long) line: not available [native method]  
   LockSupport.park(Object) line: 186  
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt()
  line: 834
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int)
  line: 964   
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int)
  line: 1282
   ReentrantReadWriteLock$ReadLock.lock() line: 731
   VertexImpl.getTotalTasks() line: 952
   VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) 
 line: 162
   
 PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() 
 line: 435
   
 PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(MapString,ListInteger)
  line: 353 
   VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541  
   VertexManager$VertexManagerEvent$1.run() line: 612  
   VertexManager$VertexManagerEvent$1.run() line: 607  
   AccessController.doPrivileged(PrivilegedExceptionActionT, 
 AccessControlContext) line: not available [native method]   
   Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415   
   UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548  
   
 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call()
  line: 607  
   
 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call()
  line: 596  
   ListenableFutureTaskV(FutureTaskV).run() line: 262  
   ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145  
   ThreadPoolExecutor$Worker.run() line: 615   
   Thread.run() line: 745  
 {code}
 What happens is 

[jira] [Commented] (TEZ-2310) AM Deadlock in VertexImpl

2015-04-16 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498917#comment-14498917
 ] 

Hitesh Shah commented on TEZ-2310:
--

Comments: 

{code}
  } catch (Throwable t) {
108 LOG.error(Error in state update notification for  + 
event, t);
109 return;
110   }
{code}
   - catch exception instead of throwable
   - is the state change notification is going to user code, this should be 
caught, handled as needed and the thread shoudl remain alive to process other 
notifications. What is the behavior for handling exceptions thrown from user 
code at this point? Also, how should errors thrown by framework code be 
handled? 

Why is the exception in enqueueNotification() ignored?

s/static final Logger LOG/private .../

 
   

 AM Deadlock in VertexImpl
 -

 Key: TEZ-2310
 URL: https://issues.apache.org/jira/browse/TEZ-2310
 Project: Apache Tez
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Bikas Saha
 Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch


 See the following deadlock in testing:
 Thread#1:
 {code}
 Daemon Thread [App Shared Pool - #3] (Suspended)  
   owns: VertexManager$VertexManagerPluginContextImpl  (id=327)
   owns: ShuffleVertexManager  (id=328)
   owns: VertexManager  (id=329)   
   waiting for: VertexManager$VertexManagerPluginContextImpl  (id=326) 
   
 VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate)
  line: 344
   
 StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) 
 line: 138  
   
 StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer,
  VertexStateUpdate) line: 122
   StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) 
 line: 116   
   StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 
 106  
   VertexImpl.maybeSendConfiguredEvent() line: 3385
   VertexImpl.doneReconfiguringVertex() line: 1634 
   VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() 
 line: 339
   ShuffleVertexManager.schedulePendingTasks(int) line: 561
   ShuffleVertexManager.schedulePendingTasks() line: 620   
   ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 
 731   
   ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744  
   VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527  
   VertexManager$VertexManagerEvent$1.run() line: 612  
   VertexManager$VertexManagerEvent$1.run() line: 607  
   AccessController.doPrivileged(PrivilegedExceptionActionT, 
 AccessControlContext) line: not available [native method]   
   Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415   
   UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548  
   
 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
  line: 607  
   
 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
  line: 596  
   ListenableFutureTaskV(FutureTaskV).run() line: 262  
   ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145  
   ThreadPoolExecutor$Worker.run() line: 615   
   Thread.run() line: 745  
 {code}
 Thread #2
 {code}
 Daemon Thread [App Shared Pool - #2] (Suspended)  
   owns: VertexManager$VertexManagerPluginContextImpl  (id=326)
   owns: PigGraceShuffleVertexManager  (id=344)
   owns: VertexManager  (id=345)   
   Unsafe.park(boolean, long) line: not available [native method]  
   LockSupport.park(Object) line: 186  
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt()
  line: 834
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int)
  line: 964   
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int)
  line: 1282
   ReentrantReadWriteLock$ReadLock.lock() line: 731
   VertexImpl.getTotalTasks() line: 952
   VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) 
 line: 162
   
 PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() 
 line: 435
   
 PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(MapString,ListInteger)
  line: 353 
   VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541  
   VertexManager$VertexManagerEvent$1.run() line: 612  
   VertexManager$VertexManagerEvent$1.run() line: 607  
   AccessController.doPrivileged(PrivilegedExceptionActionT, 
 

Failed: TEZ-2330 PreCommit Build #476

2015-04-16 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2330
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/476/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2531 lines...]



{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12726006/TEZ-2330.1.patch
  against master revision e196868.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 177 javac 
compiler warnings (more than the master's current 176 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestTezJobs

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/476//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/476//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/476//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
c751099bd88ec394aef11ba54bd96ceab1ef8ee9 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #472
Archived 45 artifacts
Archive block size is 32768
Received 18 blocks and 2153638 bytes
Compression is 21.5%
Took 0.52 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
1 tests failed.
REGRESSION:  org.apache.tez.test.TestTezJobs.testSortMergeJoinExample

Error Message:
test timed out after 6 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 6 milliseconds
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.ipc.Client.call(Client.java:1454)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy91.getDAGStatus(Unknown Source)
at 
org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.getDAGStatusViaAM(DAGClientRPCImpl.java:175)
at 
org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.getDAGStatus(DAGClientRPCImpl.java:94)
at 
org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatusViaAM(DAGClientImpl.java:346)
at 
org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatusInternal(DAGClientImpl.java:213)
at 
org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatus(DAGClientImpl.java:200)
at 
org.apache.tez.dag.api.client.DAGClientImpl._waitForCompletionWithStatusUpdates(DAGClientImpl.java:484)
at 
org.apache.tez.dag.api.client.DAGClientImpl.waitForCompletionWithStatusUpdates(DAGClientImpl.java:324)
at 
org.apache.tez.examples.TezExampleBase.runDag(TezExampleBase.java:134)
at 
org.apache.tez.examples.SortMergeJoinExample.runJob(SortMergeJoinExample.java:120)
at 
org.apache.tez.examples.TezExampleBase._execute(TezExampleBase.java:179)
at org.apache.tez.examples.TezExampleBase.run(TezExampleBase.java:82)
at 
org.apache.tez.test.TestTezJobs.testSortMergeJoinExample(TestTezJobs.java:295)




[jira] [Issue Comment Deleted] (TEZ-2310) AM Deadlock in VertexImpl

2015-04-16 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2310:
-
Comment: was deleted

(was: bq.Because we are not using a bounded queue and will never block on the 
put method. But the based API has an exception that must be caught for 
compilation.

Any reason why we cannot not catch the exception and let the calling code 
handle it?

 )

 AM Deadlock in VertexImpl
 -

 Key: TEZ-2310
 URL: https://issues.apache.org/jira/browse/TEZ-2310
 Project: Apache Tez
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Bikas Saha
 Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch, TEZ-2310.2.patch


 See the following deadlock in testing:
 Thread#1:
 {code}
 Daemon Thread [App Shared Pool - #3] (Suspended)  
   owns: VertexManager$VertexManagerPluginContextImpl  (id=327)
   owns: ShuffleVertexManager  (id=328)
   owns: VertexManager  (id=329)   
   waiting for: VertexManager$VertexManagerPluginContextImpl  (id=326) 
   
 VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate)
  line: 344
   
 StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) 
 line: 138  
   
 StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer,
  VertexStateUpdate) line: 122
   StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) 
 line: 116   
   StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 
 106  
   VertexImpl.maybeSendConfiguredEvent() line: 3385
   VertexImpl.doneReconfiguringVertex() line: 1634 
   VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() 
 line: 339
   ShuffleVertexManager.schedulePendingTasks(int) line: 561
   ShuffleVertexManager.schedulePendingTasks() line: 620   
   ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 
 731   
   ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744  
   VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527  
   VertexManager$VertexManagerEvent$1.run() line: 612  
   VertexManager$VertexManagerEvent$1.run() line: 607  
   AccessController.doPrivileged(PrivilegedExceptionActionT, 
 AccessControlContext) line: not available [native method]   
   Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415   
   UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548  
   
 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
  line: 607  
   
 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
  line: 596  
   ListenableFutureTaskV(FutureTaskV).run() line: 262  
   ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145  
   ThreadPoolExecutor$Worker.run() line: 615   
   Thread.run() line: 745  
 {code}
 Thread #2
 {code}
 Daemon Thread [App Shared Pool - #2] (Suspended)  
   owns: VertexManager$VertexManagerPluginContextImpl  (id=326)
   owns: PigGraceShuffleVertexManager  (id=344)
   owns: VertexManager  (id=345)   
   Unsafe.park(boolean, long) line: not available [native method]  
   LockSupport.park(Object) line: 186  
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt()
  line: 834
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int)
  line: 964   
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int)
  line: 1282
   ReentrantReadWriteLock$ReadLock.lock() line: 731
   VertexImpl.getTotalTasks() line: 952
   VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) 
 line: 162
   
 PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() 
 line: 435
   
 PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(MapString,ListInteger)
  line: 353 
   VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541  
   VertexManager$VertexManagerEvent$1.run() line: 612  
   VertexManager$VertexManagerEvent$1.run() line: 607  
   AccessController.doPrivileged(PrivilegedExceptionActionT, 
 AccessControlContext) line: not available [native method]   
   Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415   
   UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548  
   
 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call()
  line: 607  
   
 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call()
  line: 

[jira] [Commented] (TEZ-2310) AM Deadlock in VertexImpl

2015-04-16 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498955#comment-14498955
 ] 

Bikas Saha commented on TEZ-2310:
-

bq. is the state change notification is going to user code, this should be ca
there is no error handling in the state change code right now. but for now, I 
can send an internal error to the DAG. We should follow up to change it to user 
code exception where we know it is coming from user code.

bq. Why is the exception in enqueueNotification() ignored?
Because we are not using a bounded queue and will never block on the put 
method. But the based API has an exception that must be caught for compilation.

 AM Deadlock in VertexImpl
 -

 Key: TEZ-2310
 URL: https://issues.apache.org/jira/browse/TEZ-2310
 Project: Apache Tez
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Bikas Saha
 Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch


 See the following deadlock in testing:
 Thread#1:
 {code}
 Daemon Thread [App Shared Pool - #3] (Suspended)  
   owns: VertexManager$VertexManagerPluginContextImpl  (id=327)
   owns: ShuffleVertexManager  (id=328)
   owns: VertexManager  (id=329)   
   waiting for: VertexManager$VertexManagerPluginContextImpl  (id=326) 
   
 VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate)
  line: 344
   
 StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) 
 line: 138  
   
 StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer,
  VertexStateUpdate) line: 122
   StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) 
 line: 116   
   StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 
 106  
   VertexImpl.maybeSendConfiguredEvent() line: 3385
   VertexImpl.doneReconfiguringVertex() line: 1634 
   VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() 
 line: 339
   ShuffleVertexManager.schedulePendingTasks(int) line: 561
   ShuffleVertexManager.schedulePendingTasks() line: 620   
   ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 
 731   
   ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744  
   VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527  
   VertexManager$VertexManagerEvent$1.run() line: 612  
   VertexManager$VertexManagerEvent$1.run() line: 607  
   AccessController.doPrivileged(PrivilegedExceptionActionT, 
 AccessControlContext) line: not available [native method]   
   Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415   
   UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548  
   
 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
  line: 607  
   
 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
  line: 596  
   ListenableFutureTaskV(FutureTaskV).run() line: 262  
   ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145  
   ThreadPoolExecutor$Worker.run() line: 615   
   Thread.run() line: 745  
 {code}
 Thread #2
 {code}
 Daemon Thread [App Shared Pool - #2] (Suspended)  
   owns: VertexManager$VertexManagerPluginContextImpl  (id=326)
   owns: PigGraceShuffleVertexManager  (id=344)
   owns: VertexManager  (id=345)   
   Unsafe.park(boolean, long) line: not available [native method]  
   LockSupport.park(Object) line: 186  
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt()
  line: 834
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int)
  line: 964   
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int)
  line: 1282
   ReentrantReadWriteLock$ReadLock.lock() line: 731
   VertexImpl.getTotalTasks() line: 952
   VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) 
 line: 162
   
 PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() 
 line: 435
   
 PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(MapString,ListInteger)
  line: 353 
   VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541  
   VertexManager$VertexManagerEvent$1.run() line: 612  
   VertexManager$VertexManagerEvent$1.run() line: 607  
   AccessController.doPrivileged(PrivilegedExceptionActionT, 
 AccessControlContext) line: not available [native method]   
   Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415   
   

[jira] [Comment Edited] (TEZ-2310) AM Deadlock in VertexImpl

2015-04-16 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498980#comment-14498980
 ] 

Hitesh Shah edited comment on TEZ-2310 at 4/16/15 11:49 PM:


+1. Please open a jira for failing the dag instead of triggering the internal 
error for the handler exception scenario. (internal error will cause the AM to 
shutdown)


was (Author: hitesh):
+1. Please open a jira for failing the dag instead of triggering the internal 
error for the handler exception scenario.

 AM Deadlock in VertexImpl
 -

 Key: TEZ-2310
 URL: https://issues.apache.org/jira/browse/TEZ-2310
 Project: Apache Tez
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Bikas Saha
 Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch, TEZ-2310.2.patch


 See the following deadlock in testing:
 Thread#1:
 {code}
 Daemon Thread [App Shared Pool - #3] (Suspended)  
   owns: VertexManager$VertexManagerPluginContextImpl  (id=327)
   owns: ShuffleVertexManager  (id=328)
   owns: VertexManager  (id=329)   
   waiting for: VertexManager$VertexManagerPluginContextImpl  (id=326) 
   
 VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate)
  line: 344
   
 StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) 
 line: 138  
   
 StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer,
  VertexStateUpdate) line: 122
   StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) 
 line: 116   
   StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 
 106  
   VertexImpl.maybeSendConfiguredEvent() line: 3385
   VertexImpl.doneReconfiguringVertex() line: 1634 
   VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() 
 line: 339
   ShuffleVertexManager.schedulePendingTasks(int) line: 561
   ShuffleVertexManager.schedulePendingTasks() line: 620   
   ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 
 731   
   ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744  
   VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527  
   VertexManager$VertexManagerEvent$1.run() line: 612  
   VertexManager$VertexManagerEvent$1.run() line: 607  
   AccessController.doPrivileged(PrivilegedExceptionActionT, 
 AccessControlContext) line: not available [native method]   
   Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415   
   UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548  
   
 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
  line: 607  
   
 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
  line: 596  
   ListenableFutureTaskV(FutureTaskV).run() line: 262  
   ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145  
   ThreadPoolExecutor$Worker.run() line: 615   
   Thread.run() line: 745  
 {code}
 Thread #2
 {code}
 Daemon Thread [App Shared Pool - #2] (Suspended)  
   owns: VertexManager$VertexManagerPluginContextImpl  (id=326)
   owns: PigGraceShuffleVertexManager  (id=344)
   owns: VertexManager  (id=345)   
   Unsafe.park(boolean, long) line: not available [native method]  
   LockSupport.park(Object) line: 186  
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt()
  line: 834
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int)
  line: 964   
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int)
  line: 1282
   ReentrantReadWriteLock$ReadLock.lock() line: 731
   VertexImpl.getTotalTasks() line: 952
   VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) 
 line: 162
   
 PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() 
 line: 435
   
 PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(MapString,ListInteger)
  line: 353 
   VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541  
   VertexManager$VertexManagerEvent$1.run() line: 612  
   VertexManager$VertexManagerEvent$1.run() line: 607  
   AccessController.doPrivileged(PrivilegedExceptionActionT, 
 AccessControlContext) line: not available [native method]   
   Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415   
   UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548  
   
 

[jira] [Updated] (TEZ-2330) Create reconfigureVertex() API for input based initialization

2015-04-16 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2330:

Attachment: TEZ-2330.1.patch

 Create reconfigureVertex() API for input based initialization 
 --

 Key: TEZ-2330
 URL: https://issues.apache.org/jira/browse/TEZ-2330
 Project: Apache Tez
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-2330.1.patch


 TEZ-2233 added a reconfigureVertex() to enable a cleaner API to change 
 parallelism of a vertex. Adding a variant to do the same for input 
 initialization based parallelism change would allow us to deprecate the older 
 overloaded setParallelism() API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2331) Container Stop Info Always Missing When Container Reuse Enabled

2015-04-16 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498920#comment-14498920
 ] 

Chang Li commented on TEZ-2331:
---

Have done some investigation, found out that the container is never released 
and stay in idle even when all tasks finishes running, that container's exit 
status and end time info will only be added in if container is released and 
C_STOP_REQUEST occur

 Container Stop Info Always Missing When Container Reuse Enabled
 ---

 Key: TEZ-2331
 URL: https://issues.apache.org/jira/browse/TEZ-2331
 Project: Apache Tez
  Issue Type: Bug
Reporter: Chang Li

 Inside otherinfo the container's exit status and end time is always missing 
 when container reuse is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2330) Create reconfigureVertex() API for input based initialization

2015-04-16 Thread Bikas Saha (JIRA)
Bikas Saha created TEZ-2330:
---

 Summary: Create reconfigureVertex() API for input based 
initialization 
 Key: TEZ-2330
 URL: https://issues.apache.org/jira/browse/TEZ-2330
 Project: Apache Tez
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha


TEZ-2233 added a reconfigureVertex() to enable a cleaner API to change 
parallelism of a vertex. Adding a variant to do the same for input 
initialization based parallelism change would allow us to deprecate the older 
overloaded setParallelism() API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2333) enable local fetch optimization by default.

2015-04-16 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-2333:
--
Attachment: TEZ-2333.1.patch

 enable local fetch optimization by default.
 ---

 Key: TEZ-2333
 URL: https://issues.apache.org/jira/browse/TEZ-2333
 Project: Apache Tez
  Issue Type: Task
Reporter: Prakash Ramachandran
Assignee: Prakash Ramachandran
 Attachments: TEZ-2333.1.patch


 enable TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2332) StateChangeNotifier should send out user code exception instead of internal error

2015-04-16 Thread Bikas Saha (JIRA)
Bikas Saha created TEZ-2332:
---

 Summary: StateChangeNotifier should send out user code exception 
instead of internal error
 Key: TEZ-2332
 URL: https://issues.apache.org/jira/browse/TEZ-2332
 Project: Apache Tez
  Issue Type: Task
Reporter: Bikas Saha


https://issues.apache.org/jira/browse/TEZ-2310?focusedCommentId=14498955page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14498955



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2310) AM Deadlock in VertexImpl

2015-04-16 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2310:

Attachment: TEZ-2310.2.patch

Patch addresses review comments. Please take a look.

 AM Deadlock in VertexImpl
 -

 Key: TEZ-2310
 URL: https://issues.apache.org/jira/browse/TEZ-2310
 Project: Apache Tez
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Bikas Saha
 Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch, TEZ-2310.2.patch


 See the following deadlock in testing:
 Thread#1:
 {code}
 Daemon Thread [App Shared Pool - #3] (Suspended)  
   owns: VertexManager$VertexManagerPluginContextImpl  (id=327)
   owns: ShuffleVertexManager  (id=328)
   owns: VertexManager  (id=329)   
   waiting for: VertexManager$VertexManagerPluginContextImpl  (id=326) 
   
 VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate)
  line: 344
   
 StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) 
 line: 138  
   
 StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer,
  VertexStateUpdate) line: 122
   StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) 
 line: 116   
   StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 
 106  
   VertexImpl.maybeSendConfiguredEvent() line: 3385
   VertexImpl.doneReconfiguringVertex() line: 1634 
   VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() 
 line: 339
   ShuffleVertexManager.schedulePendingTasks(int) line: 561
   ShuffleVertexManager.schedulePendingTasks() line: 620   
   ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 
 731   
   ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744  
   VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527  
   VertexManager$VertexManagerEvent$1.run() line: 612  
   VertexManager$VertexManagerEvent$1.run() line: 607  
   AccessController.doPrivileged(PrivilegedExceptionActionT, 
 AccessControlContext) line: not available [native method]   
   Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415   
   UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548  
   
 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
  line: 607  
   
 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
  line: 596  
   ListenableFutureTaskV(FutureTaskV).run() line: 262  
   ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145  
   ThreadPoolExecutor$Worker.run() line: 615   
   Thread.run() line: 745  
 {code}
 Thread #2
 {code}
 Daemon Thread [App Shared Pool - #2] (Suspended)  
   owns: VertexManager$VertexManagerPluginContextImpl  (id=326)
   owns: PigGraceShuffleVertexManager  (id=344)
   owns: VertexManager  (id=345)   
   Unsafe.park(boolean, long) line: not available [native method]  
   LockSupport.park(Object) line: 186  
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt()
  line: 834
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int)
  line: 964   
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int)
  line: 1282
   ReentrantReadWriteLock$ReadLock.lock() line: 731
   VertexImpl.getTotalTasks() line: 952
   VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) 
 line: 162
   
 PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() 
 line: 435
   
 PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(MapString,ListInteger)
  line: 353 
   VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541  
   VertexManager$VertexManagerEvent$1.run() line: 612  
   VertexManager$VertexManagerEvent$1.run() line: 607  
   AccessController.doPrivileged(PrivilegedExceptionActionT, 
 AccessControlContext) line: not available [native method]   
   Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415   
   UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548  
   
 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call()
  line: 607  
   
 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call()
  line: 596  
   ListenableFutureTaskV(FutureTaskV).run() line: 262  
   ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145  
   ThreadPoolExecutor$Worker.run() line: 

[jira] [Commented] (TEZ-1897) Allow higher concurrency in AsyncDispatcher

2015-04-16 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498974#comment-14498974
 ] 

Hitesh Shah commented on TEZ-1897:
--

This might need a mini benchmark run to verify the benefits of this change when 
used and also to verify correctness. 

 Allow higher concurrency in AsyncDispatcher
 ---

 Key: TEZ-1897
 URL: https://issues.apache.org/jira/browse/TEZ-1897
 Project: Apache Tez
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch


 Currently, it processes events on a single thread. For events that can be 
 executed in parallel, e.g. vertex manager events, allowing higher concurrency 
 may be beneficial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2310) AM Deadlock in VertexImpl

2015-04-16 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498985#comment-14498985
 ] 

Bikas Saha commented on TEZ-2310:
-

TEZ-2332 created

 AM Deadlock in VertexImpl
 -

 Key: TEZ-2310
 URL: https://issues.apache.org/jira/browse/TEZ-2310
 Project: Apache Tez
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Bikas Saha
 Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch, TEZ-2310.2.patch


 See the following deadlock in testing:
 Thread#1:
 {code}
 Daemon Thread [App Shared Pool - #3] (Suspended)  
   owns: VertexManager$VertexManagerPluginContextImpl  (id=327)
   owns: ShuffleVertexManager  (id=328)
   owns: VertexManager  (id=329)   
   waiting for: VertexManager$VertexManagerPluginContextImpl  (id=326) 
   
 VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate)
  line: 344
   
 StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) 
 line: 138  
   
 StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer,
  VertexStateUpdate) line: 122
   StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) 
 line: 116   
   StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 
 106  
   VertexImpl.maybeSendConfiguredEvent() line: 3385
   VertexImpl.doneReconfiguringVertex() line: 1634 
   VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() 
 line: 339
   ShuffleVertexManager.schedulePendingTasks(int) line: 561
   ShuffleVertexManager.schedulePendingTasks() line: 620   
   ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 
 731   
   ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744  
   VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527  
   VertexManager$VertexManagerEvent$1.run() line: 612  
   VertexManager$VertexManagerEvent$1.run() line: 607  
   AccessController.doPrivileged(PrivilegedExceptionActionT, 
 AccessControlContext) line: not available [native method]   
   Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415   
   UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548  
   
 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
  line: 607  
   
 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
  line: 596  
   ListenableFutureTaskV(FutureTaskV).run() line: 262  
   ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145  
   ThreadPoolExecutor$Worker.run() line: 615   
   Thread.run() line: 745  
 {code}
 Thread #2
 {code}
 Daemon Thread [App Shared Pool - #2] (Suspended)  
   owns: VertexManager$VertexManagerPluginContextImpl  (id=326)
   owns: PigGraceShuffleVertexManager  (id=344)
   owns: VertexManager  (id=345)   
   Unsafe.park(boolean, long) line: not available [native method]  
   LockSupport.park(Object) line: 186  
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt()
  line: 834
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int)
  line: 964   
   
 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int)
  line: 1282
   ReentrantReadWriteLock$ReadLock.lock() line: 731
   VertexImpl.getTotalTasks() line: 952
   VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) 
 line: 162
   
 PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() 
 line: 435
   
 PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(MapString,ListInteger)
  line: 353 
   VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541  
   VertexManager$VertexManagerEvent$1.run() line: 612  
   VertexManager$VertexManagerEvent$1.run() line: 607  
   AccessController.doPrivileged(PrivilegedExceptionActionT, 
 AccessControlContext) line: not available [native method]   
   Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415   
   UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548  
   
 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call()
  line: 607  
   
 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call()
  line: 596  
   ListenableFutureTaskV(FutureTaskV).run() line: 262  
   ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145  
   ThreadPoolExecutor$Worker.run() line: 615   
   

Failed: TEZ-2310 PreCommit Build #477

2015-04-16 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2310
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/477/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2378 lines...]



{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12726012/TEZ-2310.2.patch
  against master revision e196868.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.dag.app.dag.impl.TestDAGImpl

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/477//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/477//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
24a3232a13a5636bc9d621d9e8353e55753ddd40 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #472
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2599144 bytes
Compression is 4.8%
Took 1.6 sec
[description-setter] Could not determine description.
Recording test results
Publish JUnit test result report is waiting for a checkpoint on 
PreCommit-TEZ-Build #476
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
2 tests failed.
REGRESSION:  
org.apache.tez.dag.app.dag.impl.TestDAGImpl.testEdgeManager_RouteInputErrorEventToSource

Error Message:
null

Stack Trace:
java.lang.NullPointerException: null
at 
org.apache.tez.dag.app.dag.impl.TestDAGImpl.testEdgeManager_RouteInputErrorEventToSource(TestDAGImpl.java:1098)


REGRESSION:  
org.apache.tez.dag.app.dag.impl.TestDAGImpl.testEdgeManager_GetNumDestinationTaskPhysicalInputs

Error Message:
null

Stack Trace:
java.lang.NullPointerException: null
at 
org.apache.tez.dag.app.dag.impl.TestDAGImpl.testEdgeManager_GetNumDestinationTaskPhysicalInputs(TestDAGImpl.java:965)




[jira] [Commented] (TEZ-2317) Successful task attempts getting killed

2015-04-16 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498464#comment-14498464
 ] 

Bikas Saha commented on TEZ-2317:
-

Yes. I will pull this all the way to 0.5 Thanks.

 Successful task attempts getting killed
 ---

 Key: TEZ-2317
 URL: https://issues.apache.org/jira/browse/TEZ-2317
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy
Assignee: Bikas Saha
 Attachments: AM-taskkill.log, TEZ-2317.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2317) Successful task attempts getting killed

2015-04-16 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2317:

Attachment: TEZ-2317.2.patch

 Successful task attempts getting killed
 ---

 Key: TEZ-2317
 URL: https://issues.apache.org/jira/browse/TEZ-2317
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy
Assignee: Bikas Saha
 Attachments: AM-taskkill.log, TEZ-2317.1.patch, TEZ-2317.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2317) Successful task attempts getting killed

2015-04-16 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498475#comment-14498475
 ] 

Bikas Saha commented on TEZ-2317:
-

Added commit patch that had a minor update to status update event serde code. 
The code is actually dead because the value is always non null but putting the 
code in for correctness.

 Successful task attempts getting killed
 ---

 Key: TEZ-2317
 URL: https://issues.apache.org/jira/browse/TEZ-2317
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy
Assignee: Bikas Saha
 Attachments: AM-taskkill.log, TEZ-2317.1.patch, TEZ-2317.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2314) Tez task attempt failures due to bad event serialization

2015-04-16 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2314:
-
Attachment: TEZ-2314.log.patch

[~rohini] Mind trying to reproduce the problem with the log patch. This will 
help drill into which event is causing a problem. Feel free to add a try/catch 
around the whole deserialization block too if that helps. 

If there is a simple pig script we can use to reproduce this locally, that 
would help too. 

 Tez task attempt failures due to bad event serialization
 

 Key: TEZ-2314
 URL: https://issues.apache.org/jira/browse/TEZ-2314
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy
 Attachments: TEZ-2314.log.patch


 {code}
 2015-04-13 19:21:48,516 WARN [Socket Reader #3 for port 53530] ipc.Server: 
 Unable to read call parameters for client 10.216.13.112on connection protocol 
 org.apache.tez.common.TezTaskUmbilicalProtocol for rpcKind RPC_WRITABLE
 java.lang.ArrayIndexOutOfBoundsException: 1935896432
 at 
 org.apache.tez.runtime.api.impl.EventMetaData.readFields(EventMetaData.java:120)
 at 
 org.apache.tez.runtime.api.impl.TezEvent.readFields(TezEvent.java:271)
 at 
 org.apache.tez.runtime.api.impl.TezHeartbeatRequest.readFields(TezHeartbeatRequest.java:110)
 at 
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
 at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:160)
 at 
 org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1884)
 at 
 org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1816)
 at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1574)
 at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:806)
 at 
 org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:673)
 at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:644)
 {code}
 cc/ [~hitesh] and [~bikassaha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2317) Successful task attempts getting killed

2015-04-16 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498481#comment-14498481
 ] 

Rohini Palaniswamy commented on TEZ-2317:
-

+1

 Successful task attempts getting killed
 ---

 Key: TEZ-2317
 URL: https://issues.apache.org/jira/browse/TEZ-2317
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy
Assignee: Bikas Saha
 Attachments: AM-taskkill.log, TEZ-2317.1.patch, TEZ-2317.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2317) Event processing backlog can result task failures for short tasks

2015-04-16 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2317:

Summary: Event processing backlog can result task failures for short tasks  
(was: Successful task attempts getting killed)

 Event processing backlog can result task failures for short tasks
 -

 Key: TEZ-2317
 URL: https://issues.apache.org/jira/browse/TEZ-2317
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy
Assignee: Bikas Saha
 Attachments: AM-taskkill.log, TEZ-2317.1.patch, TEZ-2317.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2317) Event processing backlog can result in task failures for short tasks

2015-04-16 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2317:

Summary: Event processing backlog can result in task failures for short 
tasks  (was: Event processing backlog can result task failures for short tasks)

 Event processing backlog can result in task failures for short tasks
 

 Key: TEZ-2317
 URL: https://issues.apache.org/jira/browse/TEZ-2317
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy
Assignee: Bikas Saha
 Attachments: AM-taskkill.log, TEZ-2317.1.patch, TEZ-2317.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2314) Tez task attempt failures due to bad event serialization

2015-04-16 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated TEZ-2314:

Affects Version/s: 0.7.0
Fix Version/s: 0.7.0

bq. If there is a simple pig script we can use to reproduce this locally, that 
would help too.
   I don't have any. I noticed it in two of the large pig scripts that I ran. I 
will debug it with log statements and update.

 Tez task attempt failures due to bad event serialization
 

 Key: TEZ-2314
 URL: https://issues.apache.org/jira/browse/TEZ-2314
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Rohini Palaniswamy
 Fix For: 0.7.0

 Attachments: TEZ-2314.log.patch


 {code}
 2015-04-13 19:21:48,516 WARN [Socket Reader #3 for port 53530] ipc.Server: 
 Unable to read call parameters for client 10.216.13.112on connection protocol 
 org.apache.tez.common.TezTaskUmbilicalProtocol for rpcKind RPC_WRITABLE
 java.lang.ArrayIndexOutOfBoundsException: 1935896432
 at 
 org.apache.tez.runtime.api.impl.EventMetaData.readFields(EventMetaData.java:120)
 at 
 org.apache.tez.runtime.api.impl.TezEvent.readFields(TezEvent.java:271)
 at 
 org.apache.tez.runtime.api.impl.TezHeartbeatRequest.readFields(TezHeartbeatRequest.java:110)
 at 
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
 at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:160)
 at 
 org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1884)
 at 
 org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1816)
 at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1574)
 at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:806)
 at 
 org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:673)
 at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:644)
 {code}
 cc/ [~hitesh] and [~bikassaha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)