[jira] [Comment Edited] (TEZ-2329) UI Query on final dag status performance improvement
[ https://issues.apache.org/jira/browse/TEZ-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497985#comment-14497985 ] Jonathan Eagles edited comment on TEZ-2329 at 4/16/15 12:30 PM: [~Sreenath], [~pramachandran], can you take a look? was (Author: jeagles): [~Sreenath], can you take a look? UI Query on final dag status performance improvement Key: TEZ-2329 URL: https://issues.apache.org/jira/browse/TEZ-2329 Project: Apache Tez Issue Type: Bug Components: UI Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: TEZ-2329.1.patch Final dag status is a primary filter for the TEZ_DAG_ID entity. However, intermediate dag status is not. By conditionally selecting between primaryFilter and secondaryFilter for status, we can dramatically speed up the FAILED, ERROR, KILLED dag status queries that are a common debugging operation for users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2329) UI Query on final dag status performance improvement
Jonathan Eagles created TEZ-2329: Summary: UI Query on final dag status performance improvement Key: TEZ-2329 URL: https://issues.apache.org/jira/browse/TEZ-2329 Project: Apache Tez Issue Type: Bug Components: UI Reporter: Jonathan Eagles Assignee: Jonathan Eagles Final dag status is a primary filter for the TEZ_DAG_ID entity. However, intermediate dag status is not. By conditionally selecting between primaryFilter and secondaryFilter for status, we can dramatically speed up the FAILED, ERROR, KILLED dag status queries that are a common debugging operation for users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2329) UI Query on final dag status performance improvement
[ https://issues.apache.org/jira/browse/TEZ-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498029#comment-14498029 ] Prakash Ramachandran commented on TEZ-2329: --- +1 LGTM. UI Query on final dag status performance improvement Key: TEZ-2329 URL: https://issues.apache.org/jira/browse/TEZ-2329 Project: Apache Tez Issue Type: Bug Components: UI Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: TEZ-2329.1.patch Final dag status is a primary filter for the TEZ_DAG_ID entity. However, intermediate dag status is not. By conditionally selecting between primaryFilter and secondaryFilter for status, we can dramatically speed up the FAILED, ERROR, KILLED dag status queries that are a common debugging operation for users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2329) UI Query on final dag status performance improvement
[ https://issues.apache.org/jira/browse/TEZ-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated TEZ-2329: - Attachment: TEZ-2329.1.patch [~Sreenath], can you take a look? UI Query on final dag status performance improvement Key: TEZ-2329 URL: https://issues.apache.org/jira/browse/TEZ-2329 Project: Apache Tez Issue Type: Bug Components: UI Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: TEZ-2329.1.patch Final dag status is a primary filter for the TEZ_DAG_ID entity. However, intermediate dag status is not. By conditionally selecting between primaryFilter and secondaryFilter for status, we can dramatically speed up the FAILED, ERROR, KILLED dag status queries that are a common debugging operation for users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TEZ-2329) UI Query on final dag status performance improvement
[ https://issues.apache.org/jira/browse/TEZ-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles resolved TEZ-2329. -- Resolution: Fixed UI Query on final dag status performance improvement Key: TEZ-2329 URL: https://issues.apache.org/jira/browse/TEZ-2329 Project: Apache Tez Issue Type: Bug Components: UI Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: TEZ-2329.1.patch Final dag status is a primary filter for the TEZ_DAG_ID entity. However, intermediate dag status is not. By conditionally selecting between primaryFilter and secondaryFilter for status, we can dramatically speed up the FAILED, ERROR, KILLED dag status queries that are a common debugging operation for users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2329) UI Query on final dag status performance improvement
[ https://issues.apache.org/jira/browse/TEZ-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498063#comment-14498063 ] Jonathan Eagles commented on TEZ-2329: -- Thanks, [~pramachandran]. Committed to master and branch-0.6 UI Query on final dag status performance improvement Key: TEZ-2329 URL: https://issues.apache.org/jira/browse/TEZ-2329 Project: Apache Tez Issue Type: Bug Components: UI Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: TEZ-2329.1.patch Final dag status is a primary filter for the TEZ_DAG_ID entity. However, intermediate dag status is not. By conditionally selecting between primaryFilter and secondaryFilter for status, we can dramatically speed up the FAILED, ERROR, KILLED dag status queries that are a common debugging operation for users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-986) Make conf set on DAG and vertex available in tez UI
[ https://issues.apache.org/jira/browse/TEZ-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-986: Summary: Make conf set on DAG and vertex available in tez UI (was: Make conf set on DAG and vertex available in jobhistory) Make conf set on DAG and vertex available in tez UI --- Key: TEZ-986 URL: https://issues.apache.org/jira/browse/TEZ-986 Project: Apache Tez Issue Type: Sub-task Components: UI Reporter: Rohini Palaniswamy Priority: Blocker Would like to have the conf set on DAG and Vertex 1) viewable in Tez UI after the job completes. This is very essential for debugging jobs. 2) We have processes, that parse jobconf.xml from job history (hdfs) and load them into hive tables for analysis. Would like to have Tez also make all the configuration (byte array) available in job history so that we can similarly parse them. 1) mandates that you store it in hdfs. 2) is just to say make the format stored as a contract others can rely on for parsing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-1969 PreCommit Build #473
Jira: https://issues.apache.org/jira/browse/TEZ-1969 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/473/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2770 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12725864/TEZ-1969.3.patch against master revision bfb34af. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/473//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/473//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. aa10720f77d8923179e1ae0f66932fd481d58bb1 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #472 Archived 44 artifacts Archive block size is 32768 Received 4 blocks and 2623423 bytes Compression is 4.8% Took 4.4 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-2317) Successful task attempts getting killed
[ https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498134#comment-14498134 ] Jonathan Eagles commented on TEZ-2317: -- [~hitesh], this might be a good candidate for 0.6.1. Patch is simple enough and there is a big benefit for complex jobs. Successful task attempts getting killed --- Key: TEZ-2317 URL: https://issues.apache.org/jira/browse/TEZ-2317 Project: Apache Tez Issue Type: Bug Reporter: Rohini Palaniswamy Assignee: Bikas Saha Fix For: 0.7.0 Attachments: AM-taskkill.log, TEZ-2317.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2317) Successful task attempts getting killed
[ https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498104#comment-14498104 ] Rohini Palaniswamy commented on TEZ-2317: - +1. Don't see killed tasks with this patch anymore. Successful task attempts getting killed --- Key: TEZ-2317 URL: https://issues.apache.org/jira/browse/TEZ-2317 Project: Apache Tez Issue Type: Bug Reporter: Rohini Palaniswamy Assignee: Bikas Saha Fix For: 0.7.0 Attachments: AM-taskkill.log, TEZ-2317.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1969) Stop the DAGAppMaster when a local mode client is stopped
[ https://issues.apache.org/jira/browse/TEZ-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498110#comment-14498110 ] TezQA commented on TEZ-1969: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12725864/TEZ-1969.3.patch against master revision bfb34af. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/473//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/473//console This message is automatically generated. Stop the DAGAppMaster when a local mode client is stopped - Key: TEZ-1969 URL: https://issues.apache.org/jira/browse/TEZ-1969 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Prakash Ramachandran Attachments: TEZ-1969.1.patch, TEZ-1969.2.patch, TEZ-1969.3.patch https://issues.apache.org/jira/browse/TEZ-1661?focusedCommentId=14275366page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14275366 Running multiple local clients in a single JVM will leak DAGAppMaster and related threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2317) Successful task attempts getting killed
[ https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498187#comment-14498187 ] Hitesh Shah commented on TEZ-2317: -- [~bikassaha] does this impact 0.5 too? Successful task attempts getting killed --- Key: TEZ-2317 URL: https://issues.apache.org/jira/browse/TEZ-2317 Project: Apache Tez Issue Type: Bug Reporter: Rohini Palaniswamy Assignee: Bikas Saha Attachments: AM-taskkill.log, TEZ-2317.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2300) TezClient.stop() takes a lot of time or does not work sometimes
[ https://issues.apache.org/jira/browse/TEZ-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498230#comment-14498230 ] Rohini Palaniswamy commented on TEZ-2300: - There are couple of issues with the behavior after talking to [~jlowe] and comparing what is done in MR - Kill is put in the event queue and is processed like any other event. When there are millions of event in the queue it takes a long time to get to that and I see the AM even scheduling new tasks. MR also does it this way. Problem is with too many events and TEZ-776 should reduce that. But still with large jobs there are going to be many events in the queue. - TezClient.stop() returns immediately after the kill. It should not and it should poll and wait on the client side. MR does that. - If the DAG is not killed and session not shutdown even after a certain timeout, yarn kill should be called. MR does that. This is an important issue as people might kill a script and think the application is killed and proceed with running a new one which could cause lot of issues while the old one is still running. So the kill needs to be synchronous and reliable. TezClient.stop() takes a lot of time or does not work sometimes --- Key: TEZ-2300 URL: https://issues.apache.org/jira/browse/TEZ-2300 Project: Apache Tez Issue Type: Bug Reporter: Rohini Palaniswamy Attachments: syslog_dag_1428329756093_325099_1_post Noticed this with a couple of pig scripts which were not behaving well (AM close to OOM, etc) and even with some that were running fine. Pig calls Tezclient.stop() in shutdown hook. Ctrl+C to the pig script either exits immediately or is hung. In both cases it either takes a long time for the yarn application to go to KILLED state. Many times I just end up calling yarn application -kill separately after waiting for 5 mins or more for it to get killed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2317) Successful task attempts getting killed
[ https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2317: - Fix Version/s: (was: 0.7.0) Successful task attempts getting killed --- Key: TEZ-2317 URL: https://issues.apache.org/jira/browse/TEZ-2317 Project: Apache Tez Issue Type: Bug Reporter: Rohini Palaniswamy Assignee: Bikas Saha Attachments: AM-taskkill.log, TEZ-2317.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2314) Tez task attempt failures due to bad event serialization
[ https://issues.apache.org/jira/browse/TEZ-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498083#comment-14498083 ] Rohini Palaniswamy commented on TEZ-2314: - [~bikassaha], I don't see this issue with tez 0.6 for the same script even for multiple runs. Should be something introduced in master. Tez task attempt failures due to bad event serialization Key: TEZ-2314 URL: https://issues.apache.org/jira/browse/TEZ-2314 Project: Apache Tez Issue Type: Bug Reporter: Rohini Palaniswamy {code} 2015-04-13 19:21:48,516 WARN [Socket Reader #3 for port 53530] ipc.Server: Unable to read call parameters for client 10.216.13.112on connection protocol org.apache.tez.common.TezTaskUmbilicalProtocol for rpcKind RPC_WRITABLE java.lang.ArrayIndexOutOfBoundsException: 1935896432 at org.apache.tez.runtime.api.impl.EventMetaData.readFields(EventMetaData.java:120) at org.apache.tez.runtime.api.impl.TezEvent.readFields(TezEvent.java:271) at org.apache.tez.runtime.api.impl.TezHeartbeatRequest.readFields(TezHeartbeatRequest.java:110) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285) at org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:160) at org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1884) at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1816) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1574) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:806) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:673) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:644) {code} cc/ [~hitesh] and [~bikassaha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2317 PreCommit Build #474
Jira: https://issues.apache.org/jira/browse/TEZ-2317 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/474/ ### ## LAST 60 LINES OF THE CONSOLE ### Started by remote host 127.0.0.1 Building remotely on H8 (Mapreduce Falcon Hadoop Pig Zookeeper Tez Hdfs) in workspace /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository git config remote.origin.url https://git-wip-us.apache.org/repos/asf/tez.git # timeout=10 Cleaning workspace git rev-parse --verify HEAD # timeout=10 Resetting working tree git reset --hard # timeout=10 git clean -fdx # timeout=10 Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/tez.git git --version # timeout=10 git fetch --tags --progress https://git-wip-us.apache.org/repos/asf/tez.git +refs/heads/*:refs/remotes/origin/* git rev-parse refs/remotes/origin/master^{commit} # timeout=10 git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10 Checking out Revision e1968681cee821103e0105e4948c4fc6dc949776 (refs/remotes/origin/master) git config core.sparsecheckout # timeout=10 git checkout -f e1968681cee821103e0105e4948c4fc6dc949776 git rev-list bfb34afba0edfb254b05037b3b2ab37e3d3e44cf # timeout=10 No emails were triggered. [PreCommit-TEZ-Build] $ /bin/bash /tmp/hudson6745202919307307605.sh Running in Jenkins mode == == Testing patch for TEZ-2317. == == HEAD is now at e196868 TEZ-2317. Event processing backlog can result in task failures for short tasks (bikas) Previous HEAD position was e196868... TEZ-2317. Event processing backlog can result in task failures for short tasks (bikas) Switched to branch 'master' Your branch is behind 'origin/master' by 5 commits, and can be fast-forwarded. (use git pull to update your local branch) First, rewinding head to replay your work on top of it... Fast-forwarded master to e1968681cee821103e0105e4948c4fc6dc949776. TEZ-2317 is not Patch Available. Exiting. == == Finished build. == == Archiving artifacts ERROR: No artifacts found that match the file pattern patchprocess/*.*. Configuration error? ERROR: ?patchprocess/*.*? doesn?t match anything, but ?*.*? does. Perhaps that?s what you mean? Build step 'Archive the artifacts' changed build result to FAILURE [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (TEZ-1969) Stop the DAGAppMaster when a local mode client is stopped
[ https://issues.apache.org/jira/browse/TEZ-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498766#comment-14498766 ] Hitesh Shah commented on TEZ-1969: -- Might be relevant to FLINK-1892 \cc [~ktzoumas] Stop the DAGAppMaster when a local mode client is stopped - Key: TEZ-1969 URL: https://issues.apache.org/jira/browse/TEZ-1969 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Prakash Ramachandran Attachments: TEZ-1969.1.patch, TEZ-1969.2.patch, TEZ-1969.3.patch https://issues.apache.org/jira/browse/TEZ-1661?focusedCommentId=14275366page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14275366 Running multiple local clients in a single JVM will leak DAGAppMaster and related threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2314) Tez task attempt failures due to bad event serialization
[ https://issues.apache.org/jira/browse/TEZ-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2314: - Target Version/s: 0.7.0 Tez task attempt failures due to bad event serialization Key: TEZ-2314 URL: https://issues.apache.org/jira/browse/TEZ-2314 Project: Apache Tez Issue Type: Bug Affects Versions: 0.7.0 Reporter: Rohini Palaniswamy Attachments: TEZ-2314.log.patch {code} 2015-04-13 19:21:48,516 WARN [Socket Reader #3 for port 53530] ipc.Server: Unable to read call parameters for client 10.216.13.112on connection protocol org.apache.tez.common.TezTaskUmbilicalProtocol for rpcKind RPC_WRITABLE java.lang.ArrayIndexOutOfBoundsException: 1935896432 at org.apache.tez.runtime.api.impl.EventMetaData.readFields(EventMetaData.java:120) at org.apache.tez.runtime.api.impl.TezEvent.readFields(TezEvent.java:271) at org.apache.tez.runtime.api.impl.TezHeartbeatRequest.readFields(TezHeartbeatRequest.java:110) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285) at org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:160) at org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1884) at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1816) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1574) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:806) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:673) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:644) {code} cc/ [~hitesh] and [~bikassaha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2314) Tez task attempt failures due to bad event serialization
[ https://issues.apache.org/jira/browse/TEZ-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2314: - Fix Version/s: (was: 0.7.0) Tez task attempt failures due to bad event serialization Key: TEZ-2314 URL: https://issues.apache.org/jira/browse/TEZ-2314 Project: Apache Tez Issue Type: Bug Affects Versions: 0.7.0 Reporter: Rohini Palaniswamy Attachments: TEZ-2314.log.patch {code} 2015-04-13 19:21:48,516 WARN [Socket Reader #3 for port 53530] ipc.Server: Unable to read call parameters for client 10.216.13.112on connection protocol org.apache.tez.common.TezTaskUmbilicalProtocol for rpcKind RPC_WRITABLE java.lang.ArrayIndexOutOfBoundsException: 1935896432 at org.apache.tez.runtime.api.impl.EventMetaData.readFields(EventMetaData.java:120) at org.apache.tez.runtime.api.impl.TezEvent.readFields(TezEvent.java:271) at org.apache.tez.runtime.api.impl.TezHeartbeatRequest.readFields(TezHeartbeatRequest.java:110) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285) at org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:160) at org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1884) at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1816) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1574) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:806) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:673) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:644) {code} cc/ [~hitesh] and [~bikassaha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1969) Stop the DAGAppMaster when a local mode client is stopped
[ https://issues.apache.org/jira/browse/TEZ-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498713#comment-14498713 ] Siddharth Seth commented on TEZ-1969: - Thanks for the clarification. +1 Looks good. Stop the DAGAppMaster when a local mode client is stopped - Key: TEZ-1969 URL: https://issues.apache.org/jira/browse/TEZ-1969 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Prakash Ramachandran Attachments: TEZ-1969.1.patch, TEZ-1969.2.patch, TEZ-1969.3.patch https://issues.apache.org/jira/browse/TEZ-1661?focusedCommentId=14275366page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14275366 Running multiple local clients in a single JVM will leak DAGAppMaster and related threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2282) Delimit reused yarn container logs (stderr, stdout, syslog) with task attempt start/stop events
[ https://issues.apache.org/jira/browse/TEZ-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated TEZ-2282: --- Attachment: TEZ-2282.3.master.patch Attached patches: TEZ-2282.3.patch - for branch-0.6 TEZ-2282.3.master.patch - for master Delimit reused yarn container logs (stderr, stdout, syslog) with task attempt start/stop events --- Key: TEZ-2282 URL: https://issues.apache.org/jira/browse/TEZ-2282 Project: Apache Tez Issue Type: Bug Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: TEZ-2282.1.patch, TEZ-2282.2.patch, TEZ-2282.3.master.patch, TEZ-2282.3.patch, TEZ-2282.master.1.patch This could help with debugging in some cases where logging is task specific. For example GC log is going to stdout, it will be nice to see task attempt start/stop times -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2282) Delimit reused yarn container logs (stderr, stdout, syslog) with task attempt start/stop events
[ https://issues.apache.org/jira/browse/TEZ-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated TEZ-2282: --- Attachment: TEZ-2282.3.patch Delimit reused yarn container logs (stderr, stdout, syslog) with task attempt start/stop events --- Key: TEZ-2282 URL: https://issues.apache.org/jira/browse/TEZ-2282 Project: Apache Tez Issue Type: Bug Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: TEZ-2282.1.patch, TEZ-2282.2.patch, TEZ-2282.3.patch, TEZ-2282.master.1.patch This could help with debugging in some cases where logging is task specific. For example GC log is going to stdout, it will be nice to see task attempt start/stop times -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2282) Delimit reused yarn container logs (stderr, stdout, syslog) with task attempt start/stop events
[ https://issues.apache.org/jira/browse/TEZ-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498761#comment-14498761 ] TezQA commented on TEZ-2282: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12725970/TEZ-2282.3.master.patch against master revision e196868. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/475//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/475//console This message is automatically generated. Delimit reused yarn container logs (stderr, stdout, syslog) with task attempt start/stop events --- Key: TEZ-2282 URL: https://issues.apache.org/jira/browse/TEZ-2282 Project: Apache Tez Issue Type: Bug Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: TEZ-2282.1.patch, TEZ-2282.2.patch, TEZ-2282.3.master.patch, TEZ-2282.3.patch, TEZ-2282.master.1.patch This could help with debugging in some cases where logging is task specific. For example GC log is going to stdout, it will be nice to see task attempt start/stop times -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2282 PreCommit Build #475
Jira: https://issues.apache.org/jira/browse/TEZ-2282 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/475/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2769 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12725970/TEZ-2282.3.master.patch against master revision e196868. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/475//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/475//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 6aa8655ddaee8fd371e3f7e14bb2f1db1a5c4324 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #472 Archived 44 artifacts Archive block size is 32768 Received 4 blocks and 2649800 bytes Compression is 4.7% Took 1 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-2282) Delimit reused yarn container logs (stderr, stdout, syslog) with task attempt start/stop events
[ https://issues.apache.org/jira/browse/TEZ-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498707#comment-14498707 ] Hitesh Shah commented on TEZ-2282: -- Mostly looks good. I will defer the final review to [~jeagles] as he requested this change. Delimit reused yarn container logs (stderr, stdout, syslog) with task attempt start/stop events --- Key: TEZ-2282 URL: https://issues.apache.org/jira/browse/TEZ-2282 Project: Apache Tez Issue Type: Bug Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: TEZ-2282.1.patch, TEZ-2282.2.patch, TEZ-2282.3.master.patch, TEZ-2282.3.patch, TEZ-2282.master.1.patch This could help with debugging in some cases where logging is task specific. For example GC log is going to stdout, it will be nice to see task attempt start/stop times -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2282) Delimit reused yarn container logs (stderr, stdout, syslog) with task attempt start/stop events
[ https://issues.apache.org/jira/browse/TEZ-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498671#comment-14498671 ] Mit Desai commented on TEZ-2282: [~hitesh], [~jeagles], [~knoguchi]. This is how the log files look like after the patch. {noformat} Log Type: stderr Log Upload Time: 16-Apr-2015 20:17:19 Log Length: 376 2015-04-16 20:17:07 Starting to run new task attempt: attempt_1429195759237_0018_1_01_00_0 2015-04-16 20:17:08 Completed running task attempt: attempt_1429195759237_0018_1_01_00_0 2015-04-16 20:17:08 Starting to run new task attempt: attempt_1429195759237_0018_1_02_00_0 2015-04-16 20:17:08 Completed running task attempt: attempt_1429195759237_0018_1_02_00_0 Log Type: stdout Log Upload Time: 16-Apr-2015 20:17:19 Log Length: 1860 0.202: [GC [PSYoungGen: 5440K-893K(6336K)] 5440K-1517K(64640K), 0.0046680 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 0.353: [GC [PSYoungGen: 6333K-893K(11776K)] 6957K-2293K(70080K), 0.0049120 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 0.517: [GC [PSYoungGen: 11773K-885K(11776K)] 13173K-3554K(70080K), 0.0040680 secs] [Times: user=0.01 sys=0.01, real=0.01 secs] 0.690: [GC [PSYoungGen: 11765K-885K(22656K)] 14434K-4622K(80960K), 0.0034990 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 1.144: [GC [PSYoungGen: 22645K-885K(22656K)] 26382K-6884K(80960K), 0.0054460 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 1.669: [GC [PSYoungGen: 22645K-3056K(45632K)] 28644K-9986K(103936K), 0.0093110 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 2015-04-16 20:17:07 Starting to run new task attempt: attempt_1429195759237_0018_1_01_00_0 2.227: [GC [PSYoungGen: 45616K-4017K(46592K)] 52546K-11854K(104896K), 0.0231380 secs] [Times: user=0.05 sys=0.00, real=0.03 secs] 2015-04-16 20:17:08 Completed running task attempt: attempt_1429195759237_0018_1_01_00_0 2015-04-16 20:17:08 Starting to run new task attempt: attempt_1429195759237_0018_1_02_00_0 2015-04-16 20:17:08 Completed running task attempt: attempt_1429195759237_0018_1_02_00_0 Heap PSYoungGen total 46592K, used 46577K [0xed28, 0xf2f8, 0xf444) eden space 42560K, 100% used [0xed28,0xefc1,0xefc1) from space 4032K, 99% used [0xefc1,0xefffc768,0xf000) to space 5056K, 0% used [0xf2a9,0xf2a9,0xf2f8) ParOldGen total 125952K, used 75420K [0xb444, 0xbbf4, 0xed28) object space 125952K, 59% used [0xb444,0xb8de73f0,0xbbf4) PSPermGen total 16384K, used 12647K [0xb044, 0xb144, 0xb444) object space 16384K, 77% used [0xb044,0xb1099f90,0xb144) {noformat} {noformat} Log Type: dag_1429195759237_0018_1.dot Log Upload Time: 16-Apr-2015 20:17:19 Log Length: 1154 digraph MRRSleepJob { graph [ label=MRRSleepJob, fontsize=24, fontname=Helvetica]; node [fontsize=12, fontname=Helvetica]; edge [fontsize=9, fontcolor=blue, fontname=Arial]; MRRSleepJob.reduce [ label = reduce[ReduceProcessor] ]; MRRSleepJob.reduce - MRRSleepJob.reduce_MROutput [ label = Output [outputClass=MROutputLegacy,\n initializer=MROutputCommitter] ]; MRRSleepJob.ireduce1 [ label = ireduce1[ReduceProcessor] ]; MRRSleepJob.ireduce1 - MRRSleepJob.reduce [ label = [input=OrderedPartitionedKVOutput,\n output=OrderedGroupedInputLegacy,\n dataMovement=SCATTER_GATHER,\n schedulingType=SEQUENTIAL] ]; MRRSleepJob.reduce_MROutput [ label = reduce[MROutput], shape = box ]; MRRSleepJob.map_MRInput [ label = map[MRInput], shape = box ]; MRRSleepJob.map_MRInput - MRRSleepJob.map [ label = Input [inputClass=MRInputLegacy,\n initializer=MRInputSplitDistributor] ]; MRRSleepJob.map [ label = map[MapProcessor] ]; MRRSleepJob.map - MRRSleepJob.ireduce1 [ label = [input=OrderedPartitionedKVOutput,\n output=OrderedGroupedInputLegacy,\n dataMovement=SCATTER_GATHER,\n schedulingType=SEQUENTIAL] ]; } Log Type: stderr Log Upload Time: 16-Apr-2015 20:17:19 Log Length: 118 2015-04-16 20:16:54 Running Dag: dag_1429195759237_0018_1 2015-04-16 20:17:08 Completed Dag: dag_1429195759237_0018_1 Log Type: stdout Log Upload Time: 16-Apr-2015 20:17:19 Log Length: 1757 0.395: [GC [PSYoungGen: 16448K-2680K(19136K)] 16448K-3177K(62848K), 0.0062140 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 0.599: [GC [PSYoungGen: 19128K-2679K(35584K)] 19625K-3268K(79296K), 0.0072310 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 1.247: [GC [PSYoungGen: 35575K-2683K(35584K)] 36164K-5911K(79296K), 0.0100360 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 1.636: [GC [PSYoungGen: 35579K-2675K(68480K)] 38807K-7810K(112192K), 0.0093970 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 2.169: [GC [PSYoungGen: 68467K-2685K(68480K)] 73602K-12997K(112192K), 0.0152030 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 2.845: [GC [PSYoungGen: 68477K-7379K(138688K)] 78789K-17695K(182400K), 0.0137060 secs] [Times: user=0.02 sys=0.00,
[jira] [Commented] (TEZ-2333) enable local fetch optimization by default.
[ https://issues.apache.org/jira/browse/TEZ-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499196#comment-14499196 ] TezQA commented on TEZ-2333: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726050/TEZ-2333.1.patch against master revision 3e6fc35. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestSecureShuffle Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/478//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/478//console This message is automatically generated. enable local fetch optimization by default. --- Key: TEZ-2333 URL: https://issues.apache.org/jira/browse/TEZ-2333 Project: Apache Tez Issue Type: Task Reporter: Prakash Ramachandran Assignee: Prakash Ramachandran Attachments: TEZ-2333.1.patch enable TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2331) Container Stop Info Always Missing When Container Reuse Enabled
Chang Li created TEZ-2331: - Summary: Container Stop Info Always Missing When Container Reuse Enabled Key: TEZ-2331 URL: https://issues.apache.org/jira/browse/TEZ-2331 Project: Apache Tez Issue Type: Bug Reporter: Chang Li Inside otherinfo the container's exit status and end time is always missing when container reuse is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2310) AM Deadlock in VertexImpl
[ https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498996#comment-14498996 ] TezQA commented on TEZ-2310: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726012/TEZ-2310.2.patch against master revision e196868. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.dag.app.dag.impl.TestDAGImpl Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/477//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/477//console This message is automatically generated. AM Deadlock in VertexImpl - Key: TEZ-2310 URL: https://issues.apache.org/jira/browse/TEZ-2310 Project: Apache Tez Issue Type: Bug Reporter: Daniel Dai Assignee: Bikas Saha Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch, TEZ-2310.2.patch See the following deadlock in testing: Thread#1: {code} Daemon Thread [App Shared Pool - #3] (Suspended) owns: VertexManager$VertexManagerPluginContextImpl (id=327) owns: ShuffleVertexManager (id=328) owns: VertexManager (id=329) waiting for: VertexManager$VertexManagerPluginContextImpl (id=326) VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate) line: 344 StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) line: 138 StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer, VertexStateUpdate) line: 122 StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) line: 116 StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 106 VertexImpl.maybeSendConfiguredEvent() line: 3385 VertexImpl.doneReconfiguringVertex() line: 1634 VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() line: 339 ShuffleVertexManager.schedulePendingTasks(int) line: 561 ShuffleVertexManager.schedulePendingTasks() line: 620 ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 731 ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744 VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527 VertexManager$VertexManagerEvent$1.run() line: 612 VertexManager$VertexManagerEvent$1.run() line: 607 AccessController.doPrivileged(PrivilegedExceptionActionT, AccessControlContext) line: not available [native method] Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415 UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 607 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 596 ListenableFutureTaskV(FutureTaskV).run() line: 262 ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145 ThreadPoolExecutor$Worker.run() line: 615 Thread.run() line: 745 {code} Thread #2 {code} Daemon Thread [App Shared Pool - #2] (Suspended) owns: VertexManager$VertexManagerPluginContextImpl (id=326) owns: PigGraceShuffleVertexManager (id=344) owns: VertexManager (id=345) Unsafe.park(boolean, long) line: not available [native method] LockSupport.park(Object) line: 186 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt() line: 834 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int) line: 964 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int) line: 1282 ReentrantReadWriteLock$ReadLock.lock() line: 731 VertexImpl.getTotalTasks() line: 952 VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String)
[jira] [Commented] (TEZ-2310) AM Deadlock in VertexImpl
[ https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498980#comment-14498980 ] Hitesh Shah commented on TEZ-2310: -- +1. Please open a jira for failing the dag instead of triggering the internal error for the handler exception scenario. AM Deadlock in VertexImpl - Key: TEZ-2310 URL: https://issues.apache.org/jira/browse/TEZ-2310 Project: Apache Tez Issue Type: Bug Reporter: Daniel Dai Assignee: Bikas Saha Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch, TEZ-2310.2.patch See the following deadlock in testing: Thread#1: {code} Daemon Thread [App Shared Pool - #3] (Suspended) owns: VertexManager$VertexManagerPluginContextImpl (id=327) owns: ShuffleVertexManager (id=328) owns: VertexManager (id=329) waiting for: VertexManager$VertexManagerPluginContextImpl (id=326) VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate) line: 344 StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) line: 138 StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer, VertexStateUpdate) line: 122 StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) line: 116 StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 106 VertexImpl.maybeSendConfiguredEvent() line: 3385 VertexImpl.doneReconfiguringVertex() line: 1634 VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() line: 339 ShuffleVertexManager.schedulePendingTasks(int) line: 561 ShuffleVertexManager.schedulePendingTasks() line: 620 ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 731 ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744 VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527 VertexManager$VertexManagerEvent$1.run() line: 612 VertexManager$VertexManagerEvent$1.run() line: 607 AccessController.doPrivileged(PrivilegedExceptionActionT, AccessControlContext) line: not available [native method] Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415 UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 607 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 596 ListenableFutureTaskV(FutureTaskV).run() line: 262 ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145 ThreadPoolExecutor$Worker.run() line: 615 Thread.run() line: 745 {code} Thread #2 {code} Daemon Thread [App Shared Pool - #2] (Suspended) owns: VertexManager$VertexManagerPluginContextImpl (id=326) owns: PigGraceShuffleVertexManager (id=344) owns: VertexManager (id=345) Unsafe.park(boolean, long) line: not available [native method] LockSupport.park(Object) line: 186 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt() line: 834 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int) line: 964 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int) line: 1282 ReentrantReadWriteLock$ReadLock.lock() line: 731 VertexImpl.getTotalTasks() line: 952 VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) line: 162 PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() line: 435 PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(MapString,ListInteger) line: 353 VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541 VertexManager$VertexManagerEvent$1.run() line: 612 VertexManager$VertexManagerEvent$1.run() line: 607 AccessController.doPrivileged(PrivilegedExceptionActionT, AccessControlContext) line: not available [native method] Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415 UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call() line: 607 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call() line: 596 ListenableFutureTaskV(FutureTaskV).run() line: 262
[jira] [Updated] (TEZ-2310) AM Deadlock in VertexImpl
[ https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2310: - Fix Version/s: (was: 0.7.0) AM Deadlock in VertexImpl - Key: TEZ-2310 URL: https://issues.apache.org/jira/browse/TEZ-2310 Project: Apache Tez Issue Type: Bug Reporter: Daniel Dai Assignee: Bikas Saha Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch See the following deadlock in testing: Thread#1: {code} Daemon Thread [App Shared Pool - #3] (Suspended) owns: VertexManager$VertexManagerPluginContextImpl (id=327) owns: ShuffleVertexManager (id=328) owns: VertexManager (id=329) waiting for: VertexManager$VertexManagerPluginContextImpl (id=326) VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate) line: 344 StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) line: 138 StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer, VertexStateUpdate) line: 122 StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) line: 116 StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 106 VertexImpl.maybeSendConfiguredEvent() line: 3385 VertexImpl.doneReconfiguringVertex() line: 1634 VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() line: 339 ShuffleVertexManager.schedulePendingTasks(int) line: 561 ShuffleVertexManager.schedulePendingTasks() line: 620 ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 731 ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744 VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527 VertexManager$VertexManagerEvent$1.run() line: 612 VertexManager$VertexManagerEvent$1.run() line: 607 AccessController.doPrivileged(PrivilegedExceptionActionT, AccessControlContext) line: not available [native method] Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415 UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 607 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 596 ListenableFutureTaskV(FutureTaskV).run() line: 262 ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145 ThreadPoolExecutor$Worker.run() line: 615 Thread.run() line: 745 {code} Thread #2 {code} Daemon Thread [App Shared Pool - #2] (Suspended) owns: VertexManager$VertexManagerPluginContextImpl (id=326) owns: PigGraceShuffleVertexManager (id=344) owns: VertexManager (id=345) Unsafe.park(boolean, long) line: not available [native method] LockSupport.park(Object) line: 186 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt() line: 834 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int) line: 964 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int) line: 1282 ReentrantReadWriteLock$ReadLock.lock() line: 731 VertexImpl.getTotalTasks() line: 952 VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) line: 162 PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() line: 435 PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(MapString,ListInteger) line: 353 VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541 VertexManager$VertexManagerEvent$1.run() line: 612 VertexManager$VertexManagerEvent$1.run() line: 607 AccessController.doPrivileged(PrivilegedExceptionActionT, AccessControlContext) line: not available [native method] Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415 UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call() line: 607 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call() line: 596 ListenableFutureTaskV(FutureTaskV).run() line: 262 ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145 ThreadPoolExecutor$Worker.run() line: 615 Thread.run() line: 745 {code} What happens is
[jira] [Commented] (TEZ-2310) AM Deadlock in VertexImpl
[ https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498917#comment-14498917 ] Hitesh Shah commented on TEZ-2310: -- Comments: {code} } catch (Throwable t) { 108 LOG.error(Error in state update notification for + event, t); 109 return; 110 } {code} - catch exception instead of throwable - is the state change notification is going to user code, this should be caught, handled as needed and the thread shoudl remain alive to process other notifications. What is the behavior for handling exceptions thrown from user code at this point? Also, how should errors thrown by framework code be handled? Why is the exception in enqueueNotification() ignored? s/static final Logger LOG/private .../ AM Deadlock in VertexImpl - Key: TEZ-2310 URL: https://issues.apache.org/jira/browse/TEZ-2310 Project: Apache Tez Issue Type: Bug Reporter: Daniel Dai Assignee: Bikas Saha Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch See the following deadlock in testing: Thread#1: {code} Daemon Thread [App Shared Pool - #3] (Suspended) owns: VertexManager$VertexManagerPluginContextImpl (id=327) owns: ShuffleVertexManager (id=328) owns: VertexManager (id=329) waiting for: VertexManager$VertexManagerPluginContextImpl (id=326) VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate) line: 344 StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) line: 138 StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer, VertexStateUpdate) line: 122 StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) line: 116 StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 106 VertexImpl.maybeSendConfiguredEvent() line: 3385 VertexImpl.doneReconfiguringVertex() line: 1634 VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() line: 339 ShuffleVertexManager.schedulePendingTasks(int) line: 561 ShuffleVertexManager.schedulePendingTasks() line: 620 ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 731 ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744 VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527 VertexManager$VertexManagerEvent$1.run() line: 612 VertexManager$VertexManagerEvent$1.run() line: 607 AccessController.doPrivileged(PrivilegedExceptionActionT, AccessControlContext) line: not available [native method] Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415 UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 607 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 596 ListenableFutureTaskV(FutureTaskV).run() line: 262 ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145 ThreadPoolExecutor$Worker.run() line: 615 Thread.run() line: 745 {code} Thread #2 {code} Daemon Thread [App Shared Pool - #2] (Suspended) owns: VertexManager$VertexManagerPluginContextImpl (id=326) owns: PigGraceShuffleVertexManager (id=344) owns: VertexManager (id=345) Unsafe.park(boolean, long) line: not available [native method] LockSupport.park(Object) line: 186 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt() line: 834 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int) line: 964 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int) line: 1282 ReentrantReadWriteLock$ReadLock.lock() line: 731 VertexImpl.getTotalTasks() line: 952 VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) line: 162 PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() line: 435 PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(MapString,ListInteger) line: 353 VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541 VertexManager$VertexManagerEvent$1.run() line: 612 VertexManager$VertexManagerEvent$1.run() line: 607 AccessController.doPrivileged(PrivilegedExceptionActionT,
Failed: TEZ-2330 PreCommit Build #476
Jira: https://issues.apache.org/jira/browse/TEZ-2330 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/476/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2531 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726006/TEZ-2330.1.patch against master revision e196868. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 177 javac compiler warnings (more than the master's current 176 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestTezJobs Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/476//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/476//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/476//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. c751099bd88ec394aef11ba54bd96ceab1ef8ee9 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #472 Archived 45 artifacts Archive block size is 32768 Received 18 blocks and 2153638 bytes Compression is 21.5% Took 0.52 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## 1 tests failed. REGRESSION: org.apache.tez.test.TestTezJobs.testSortMergeJoinExample Error Message: test timed out after 6 milliseconds Stack Trace: java.lang.Exception: test timed out after 6 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.hadoop.ipc.Client.call(Client.java:1454) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy91.getDAGStatus(Unknown Source) at org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.getDAGStatusViaAM(DAGClientRPCImpl.java:175) at org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.getDAGStatus(DAGClientRPCImpl.java:94) at org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatusViaAM(DAGClientImpl.java:346) at org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatusInternal(DAGClientImpl.java:213) at org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatus(DAGClientImpl.java:200) at org.apache.tez.dag.api.client.DAGClientImpl._waitForCompletionWithStatusUpdates(DAGClientImpl.java:484) at org.apache.tez.dag.api.client.DAGClientImpl.waitForCompletionWithStatusUpdates(DAGClientImpl.java:324) at org.apache.tez.examples.TezExampleBase.runDag(TezExampleBase.java:134) at org.apache.tez.examples.SortMergeJoinExample.runJob(SortMergeJoinExample.java:120) at org.apache.tez.examples.TezExampleBase._execute(TezExampleBase.java:179) at org.apache.tez.examples.TezExampleBase.run(TezExampleBase.java:82) at org.apache.tez.test.TestTezJobs.testSortMergeJoinExample(TestTezJobs.java:295)
[jira] [Issue Comment Deleted] (TEZ-2310) AM Deadlock in VertexImpl
[ https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2310: - Comment: was deleted (was: bq.Because we are not using a bounded queue and will never block on the put method. But the based API has an exception that must be caught for compilation. Any reason why we cannot not catch the exception and let the calling code handle it? ) AM Deadlock in VertexImpl - Key: TEZ-2310 URL: https://issues.apache.org/jira/browse/TEZ-2310 Project: Apache Tez Issue Type: Bug Reporter: Daniel Dai Assignee: Bikas Saha Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch, TEZ-2310.2.patch See the following deadlock in testing: Thread#1: {code} Daemon Thread [App Shared Pool - #3] (Suspended) owns: VertexManager$VertexManagerPluginContextImpl (id=327) owns: ShuffleVertexManager (id=328) owns: VertexManager (id=329) waiting for: VertexManager$VertexManagerPluginContextImpl (id=326) VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate) line: 344 StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) line: 138 StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer, VertexStateUpdate) line: 122 StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) line: 116 StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 106 VertexImpl.maybeSendConfiguredEvent() line: 3385 VertexImpl.doneReconfiguringVertex() line: 1634 VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() line: 339 ShuffleVertexManager.schedulePendingTasks(int) line: 561 ShuffleVertexManager.schedulePendingTasks() line: 620 ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 731 ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744 VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527 VertexManager$VertexManagerEvent$1.run() line: 612 VertexManager$VertexManagerEvent$1.run() line: 607 AccessController.doPrivileged(PrivilegedExceptionActionT, AccessControlContext) line: not available [native method] Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415 UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 607 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 596 ListenableFutureTaskV(FutureTaskV).run() line: 262 ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145 ThreadPoolExecutor$Worker.run() line: 615 Thread.run() line: 745 {code} Thread #2 {code} Daemon Thread [App Shared Pool - #2] (Suspended) owns: VertexManager$VertexManagerPluginContextImpl (id=326) owns: PigGraceShuffleVertexManager (id=344) owns: VertexManager (id=345) Unsafe.park(boolean, long) line: not available [native method] LockSupport.park(Object) line: 186 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt() line: 834 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int) line: 964 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int) line: 1282 ReentrantReadWriteLock$ReadLock.lock() line: 731 VertexImpl.getTotalTasks() line: 952 VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) line: 162 PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() line: 435 PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(MapString,ListInteger) line: 353 VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541 VertexManager$VertexManagerEvent$1.run() line: 612 VertexManager$VertexManagerEvent$1.run() line: 607 AccessController.doPrivileged(PrivilegedExceptionActionT, AccessControlContext) line: not available [native method] Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415 UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call() line: 607 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call() line:
[jira] [Commented] (TEZ-2310) AM Deadlock in VertexImpl
[ https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498955#comment-14498955 ] Bikas Saha commented on TEZ-2310: - bq. is the state change notification is going to user code, this should be ca there is no error handling in the state change code right now. but for now, I can send an internal error to the DAG. We should follow up to change it to user code exception where we know it is coming from user code. bq. Why is the exception in enqueueNotification() ignored? Because we are not using a bounded queue and will never block on the put method. But the based API has an exception that must be caught for compilation. AM Deadlock in VertexImpl - Key: TEZ-2310 URL: https://issues.apache.org/jira/browse/TEZ-2310 Project: Apache Tez Issue Type: Bug Reporter: Daniel Dai Assignee: Bikas Saha Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch See the following deadlock in testing: Thread#1: {code} Daemon Thread [App Shared Pool - #3] (Suspended) owns: VertexManager$VertexManagerPluginContextImpl (id=327) owns: ShuffleVertexManager (id=328) owns: VertexManager (id=329) waiting for: VertexManager$VertexManagerPluginContextImpl (id=326) VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate) line: 344 StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) line: 138 StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer, VertexStateUpdate) line: 122 StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) line: 116 StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 106 VertexImpl.maybeSendConfiguredEvent() line: 3385 VertexImpl.doneReconfiguringVertex() line: 1634 VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() line: 339 ShuffleVertexManager.schedulePendingTasks(int) line: 561 ShuffleVertexManager.schedulePendingTasks() line: 620 ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 731 ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744 VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527 VertexManager$VertexManagerEvent$1.run() line: 612 VertexManager$VertexManagerEvent$1.run() line: 607 AccessController.doPrivileged(PrivilegedExceptionActionT, AccessControlContext) line: not available [native method] Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415 UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 607 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 596 ListenableFutureTaskV(FutureTaskV).run() line: 262 ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145 ThreadPoolExecutor$Worker.run() line: 615 Thread.run() line: 745 {code} Thread #2 {code} Daemon Thread [App Shared Pool - #2] (Suspended) owns: VertexManager$VertexManagerPluginContextImpl (id=326) owns: PigGraceShuffleVertexManager (id=344) owns: VertexManager (id=345) Unsafe.park(boolean, long) line: not available [native method] LockSupport.park(Object) line: 186 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt() line: 834 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int) line: 964 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int) line: 1282 ReentrantReadWriteLock$ReadLock.lock() line: 731 VertexImpl.getTotalTasks() line: 952 VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) line: 162 PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() line: 435 PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(MapString,ListInteger) line: 353 VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541 VertexManager$VertexManagerEvent$1.run() line: 612 VertexManager$VertexManagerEvent$1.run() line: 607 AccessController.doPrivileged(PrivilegedExceptionActionT, AccessControlContext) line: not available [native method] Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415
[jira] [Comment Edited] (TEZ-2310) AM Deadlock in VertexImpl
[ https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498980#comment-14498980 ] Hitesh Shah edited comment on TEZ-2310 at 4/16/15 11:49 PM: +1. Please open a jira for failing the dag instead of triggering the internal error for the handler exception scenario. (internal error will cause the AM to shutdown) was (Author: hitesh): +1. Please open a jira for failing the dag instead of triggering the internal error for the handler exception scenario. AM Deadlock in VertexImpl - Key: TEZ-2310 URL: https://issues.apache.org/jira/browse/TEZ-2310 Project: Apache Tez Issue Type: Bug Reporter: Daniel Dai Assignee: Bikas Saha Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch, TEZ-2310.2.patch See the following deadlock in testing: Thread#1: {code} Daemon Thread [App Shared Pool - #3] (Suspended) owns: VertexManager$VertexManagerPluginContextImpl (id=327) owns: ShuffleVertexManager (id=328) owns: VertexManager (id=329) waiting for: VertexManager$VertexManagerPluginContextImpl (id=326) VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate) line: 344 StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) line: 138 StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer, VertexStateUpdate) line: 122 StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) line: 116 StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 106 VertexImpl.maybeSendConfiguredEvent() line: 3385 VertexImpl.doneReconfiguringVertex() line: 1634 VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() line: 339 ShuffleVertexManager.schedulePendingTasks(int) line: 561 ShuffleVertexManager.schedulePendingTasks() line: 620 ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 731 ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744 VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527 VertexManager$VertexManagerEvent$1.run() line: 612 VertexManager$VertexManagerEvent$1.run() line: 607 AccessController.doPrivileged(PrivilegedExceptionActionT, AccessControlContext) line: not available [native method] Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415 UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 607 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 596 ListenableFutureTaskV(FutureTaskV).run() line: 262 ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145 ThreadPoolExecutor$Worker.run() line: 615 Thread.run() line: 745 {code} Thread #2 {code} Daemon Thread [App Shared Pool - #2] (Suspended) owns: VertexManager$VertexManagerPluginContextImpl (id=326) owns: PigGraceShuffleVertexManager (id=344) owns: VertexManager (id=345) Unsafe.park(boolean, long) line: not available [native method] LockSupport.park(Object) line: 186 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt() line: 834 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int) line: 964 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int) line: 1282 ReentrantReadWriteLock$ReadLock.lock() line: 731 VertexImpl.getTotalTasks() line: 952 VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) line: 162 PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() line: 435 PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(MapString,ListInteger) line: 353 VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541 VertexManager$VertexManagerEvent$1.run() line: 612 VertexManager$VertexManagerEvent$1.run() line: 607 AccessController.doPrivileged(PrivilegedExceptionActionT, AccessControlContext) line: not available [native method] Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415 UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548
[jira] [Updated] (TEZ-2330) Create reconfigureVertex() API for input based initialization
[ https://issues.apache.org/jira/browse/TEZ-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-2330: Attachment: TEZ-2330.1.patch Create reconfigureVertex() API for input based initialization -- Key: TEZ-2330 URL: https://issues.apache.org/jira/browse/TEZ-2330 Project: Apache Tez Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha Attachments: TEZ-2330.1.patch TEZ-2233 added a reconfigureVertex() to enable a cleaner API to change parallelism of a vertex. Adding a variant to do the same for input initialization based parallelism change would allow us to deprecate the older overloaded setParallelism() API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2331) Container Stop Info Always Missing When Container Reuse Enabled
[ https://issues.apache.org/jira/browse/TEZ-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498920#comment-14498920 ] Chang Li commented on TEZ-2331: --- Have done some investigation, found out that the container is never released and stay in idle even when all tasks finishes running, that container's exit status and end time info will only be added in if container is released and C_STOP_REQUEST occur Container Stop Info Always Missing When Container Reuse Enabled --- Key: TEZ-2331 URL: https://issues.apache.org/jira/browse/TEZ-2331 Project: Apache Tez Issue Type: Bug Reporter: Chang Li Inside otherinfo the container's exit status and end time is always missing when container reuse is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2330) Create reconfigureVertex() API for input based initialization
Bikas Saha created TEZ-2330: --- Summary: Create reconfigureVertex() API for input based initialization Key: TEZ-2330 URL: https://issues.apache.org/jira/browse/TEZ-2330 Project: Apache Tez Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha TEZ-2233 added a reconfigureVertex() to enable a cleaner API to change parallelism of a vertex. Adding a variant to do the same for input initialization based parallelism change would allow us to deprecate the older overloaded setParallelism() API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2333) enable local fetch optimization by default.
[ https://issues.apache.org/jira/browse/TEZ-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-2333: -- Attachment: TEZ-2333.1.patch enable local fetch optimization by default. --- Key: TEZ-2333 URL: https://issues.apache.org/jira/browse/TEZ-2333 Project: Apache Tez Issue Type: Task Reporter: Prakash Ramachandran Assignee: Prakash Ramachandran Attachments: TEZ-2333.1.patch enable TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2332) StateChangeNotifier should send out user code exception instead of internal error
Bikas Saha created TEZ-2332: --- Summary: StateChangeNotifier should send out user code exception instead of internal error Key: TEZ-2332 URL: https://issues.apache.org/jira/browse/TEZ-2332 Project: Apache Tez Issue Type: Task Reporter: Bikas Saha https://issues.apache.org/jira/browse/TEZ-2310?focusedCommentId=14498955page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14498955 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2310) AM Deadlock in VertexImpl
[ https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-2310: Attachment: TEZ-2310.2.patch Patch addresses review comments. Please take a look. AM Deadlock in VertexImpl - Key: TEZ-2310 URL: https://issues.apache.org/jira/browse/TEZ-2310 Project: Apache Tez Issue Type: Bug Reporter: Daniel Dai Assignee: Bikas Saha Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch, TEZ-2310.2.patch See the following deadlock in testing: Thread#1: {code} Daemon Thread [App Shared Pool - #3] (Suspended) owns: VertexManager$VertexManagerPluginContextImpl (id=327) owns: ShuffleVertexManager (id=328) owns: VertexManager (id=329) waiting for: VertexManager$VertexManagerPluginContextImpl (id=326) VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate) line: 344 StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) line: 138 StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer, VertexStateUpdate) line: 122 StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) line: 116 StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 106 VertexImpl.maybeSendConfiguredEvent() line: 3385 VertexImpl.doneReconfiguringVertex() line: 1634 VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() line: 339 ShuffleVertexManager.schedulePendingTasks(int) line: 561 ShuffleVertexManager.schedulePendingTasks() line: 620 ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 731 ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744 VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527 VertexManager$VertexManagerEvent$1.run() line: 612 VertexManager$VertexManagerEvent$1.run() line: 607 AccessController.doPrivileged(PrivilegedExceptionActionT, AccessControlContext) line: not available [native method] Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415 UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 607 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 596 ListenableFutureTaskV(FutureTaskV).run() line: 262 ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145 ThreadPoolExecutor$Worker.run() line: 615 Thread.run() line: 745 {code} Thread #2 {code} Daemon Thread [App Shared Pool - #2] (Suspended) owns: VertexManager$VertexManagerPluginContextImpl (id=326) owns: PigGraceShuffleVertexManager (id=344) owns: VertexManager (id=345) Unsafe.park(boolean, long) line: not available [native method] LockSupport.park(Object) line: 186 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt() line: 834 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int) line: 964 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int) line: 1282 ReentrantReadWriteLock$ReadLock.lock() line: 731 VertexImpl.getTotalTasks() line: 952 VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) line: 162 PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() line: 435 PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(MapString,ListInteger) line: 353 VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541 VertexManager$VertexManagerEvent$1.run() line: 612 VertexManager$VertexManagerEvent$1.run() line: 607 AccessController.doPrivileged(PrivilegedExceptionActionT, AccessControlContext) line: not available [native method] Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415 UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call() line: 607 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call() line: 596 ListenableFutureTaskV(FutureTaskV).run() line: 262 ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145 ThreadPoolExecutor$Worker.run() line:
[jira] [Commented] (TEZ-1897) Allow higher concurrency in AsyncDispatcher
[ https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498974#comment-14498974 ] Hitesh Shah commented on TEZ-1897: -- This might need a mini benchmark run to verify the benefits of this change when used and also to verify correctness. Allow higher concurrency in AsyncDispatcher --- Key: TEZ-1897 URL: https://issues.apache.org/jira/browse/TEZ-1897 Project: Apache Tez Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch Currently, it processes events on a single thread. For events that can be executed in parallel, e.g. vertex manager events, allowing higher concurrency may be beneficial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2310) AM Deadlock in VertexImpl
[ https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498985#comment-14498985 ] Bikas Saha commented on TEZ-2310: - TEZ-2332 created AM Deadlock in VertexImpl - Key: TEZ-2310 URL: https://issues.apache.org/jira/browse/TEZ-2310 Project: Apache Tez Issue Type: Bug Reporter: Daniel Dai Assignee: Bikas Saha Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch, TEZ-2310.2.patch See the following deadlock in testing: Thread#1: {code} Daemon Thread [App Shared Pool - #3] (Suspended) owns: VertexManager$VertexManagerPluginContextImpl (id=327) owns: ShuffleVertexManager (id=328) owns: VertexManager (id=329) waiting for: VertexManager$VertexManagerPluginContextImpl (id=326) VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate) line: 344 StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) line: 138 StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer, VertexStateUpdate) line: 122 StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) line: 116 StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 106 VertexImpl.maybeSendConfiguredEvent() line: 3385 VertexImpl.doneReconfiguringVertex() line: 1634 VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() line: 339 ShuffleVertexManager.schedulePendingTasks(int) line: 561 ShuffleVertexManager.schedulePendingTasks() line: 620 ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 731 ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744 VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527 VertexManager$VertexManagerEvent$1.run() line: 612 VertexManager$VertexManagerEvent$1.run() line: 607 AccessController.doPrivileged(PrivilegedExceptionActionT, AccessControlContext) line: not available [native method] Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415 UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 607 VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 596 ListenableFutureTaskV(FutureTaskV).run() line: 262 ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145 ThreadPoolExecutor$Worker.run() line: 615 Thread.run() line: 745 {code} Thread #2 {code} Daemon Thread [App Shared Pool - #2] (Suspended) owns: VertexManager$VertexManagerPluginContextImpl (id=326) owns: PigGraceShuffleVertexManager (id=344) owns: VertexManager (id=345) Unsafe.park(boolean, long) line: not available [native method] LockSupport.park(Object) line: 186 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt() line: 834 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int) line: 964 ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int) line: 1282 ReentrantReadWriteLock$ReadLock.lock() line: 731 VertexImpl.getTotalTasks() line: 952 VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) line: 162 PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() line: 435 PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(MapString,ListInteger) line: 353 VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541 VertexManager$VertexManagerEvent$1.run() line: 612 VertexManager$VertexManagerEvent$1.run() line: 607 AccessController.doPrivileged(PrivilegedExceptionActionT, AccessControlContext) line: not available [native method] Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415 UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call() line: 607 VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call() line: 596 ListenableFutureTaskV(FutureTaskV).run() line: 262 ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145 ThreadPoolExecutor$Worker.run() line: 615
Failed: TEZ-2310 PreCommit Build #477
Jira: https://issues.apache.org/jira/browse/TEZ-2310 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/477/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2378 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726012/TEZ-2310.2.patch against master revision e196868. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.dag.app.dag.impl.TestDAGImpl Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/477//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/477//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 24a3232a13a5636bc9d621d9e8353e55753ddd40 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #472 Archived 44 artifacts Archive block size is 32768 Received 4 blocks and 2599144 bytes Compression is 4.8% Took 1.6 sec [description-setter] Could not determine description. Recording test results Publish JUnit test result report is waiting for a checkpoint on PreCommit-TEZ-Build #476 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## 2 tests failed. REGRESSION: org.apache.tez.dag.app.dag.impl.TestDAGImpl.testEdgeManager_RouteInputErrorEventToSource Error Message: null Stack Trace: java.lang.NullPointerException: null at org.apache.tez.dag.app.dag.impl.TestDAGImpl.testEdgeManager_RouteInputErrorEventToSource(TestDAGImpl.java:1098) REGRESSION: org.apache.tez.dag.app.dag.impl.TestDAGImpl.testEdgeManager_GetNumDestinationTaskPhysicalInputs Error Message: null Stack Trace: java.lang.NullPointerException: null at org.apache.tez.dag.app.dag.impl.TestDAGImpl.testEdgeManager_GetNumDestinationTaskPhysicalInputs(TestDAGImpl.java:965)
[jira] [Commented] (TEZ-2317) Successful task attempts getting killed
[ https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498464#comment-14498464 ] Bikas Saha commented on TEZ-2317: - Yes. I will pull this all the way to 0.5 Thanks. Successful task attempts getting killed --- Key: TEZ-2317 URL: https://issues.apache.org/jira/browse/TEZ-2317 Project: Apache Tez Issue Type: Bug Reporter: Rohini Palaniswamy Assignee: Bikas Saha Attachments: AM-taskkill.log, TEZ-2317.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2317) Successful task attempts getting killed
[ https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-2317: Attachment: TEZ-2317.2.patch Successful task attempts getting killed --- Key: TEZ-2317 URL: https://issues.apache.org/jira/browse/TEZ-2317 Project: Apache Tez Issue Type: Bug Reporter: Rohini Palaniswamy Assignee: Bikas Saha Attachments: AM-taskkill.log, TEZ-2317.1.patch, TEZ-2317.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2317) Successful task attempts getting killed
[ https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498475#comment-14498475 ] Bikas Saha commented on TEZ-2317: - Added commit patch that had a minor update to status update event serde code. The code is actually dead because the value is always non null but putting the code in for correctness. Successful task attempts getting killed --- Key: TEZ-2317 URL: https://issues.apache.org/jira/browse/TEZ-2317 Project: Apache Tez Issue Type: Bug Reporter: Rohini Palaniswamy Assignee: Bikas Saha Attachments: AM-taskkill.log, TEZ-2317.1.patch, TEZ-2317.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2314) Tez task attempt failures due to bad event serialization
[ https://issues.apache.org/jira/browse/TEZ-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2314: - Attachment: TEZ-2314.log.patch [~rohini] Mind trying to reproduce the problem with the log patch. This will help drill into which event is causing a problem. Feel free to add a try/catch around the whole deserialization block too if that helps. If there is a simple pig script we can use to reproduce this locally, that would help too. Tez task attempt failures due to bad event serialization Key: TEZ-2314 URL: https://issues.apache.org/jira/browse/TEZ-2314 Project: Apache Tez Issue Type: Bug Reporter: Rohini Palaniswamy Attachments: TEZ-2314.log.patch {code} 2015-04-13 19:21:48,516 WARN [Socket Reader #3 for port 53530] ipc.Server: Unable to read call parameters for client 10.216.13.112on connection protocol org.apache.tez.common.TezTaskUmbilicalProtocol for rpcKind RPC_WRITABLE java.lang.ArrayIndexOutOfBoundsException: 1935896432 at org.apache.tez.runtime.api.impl.EventMetaData.readFields(EventMetaData.java:120) at org.apache.tez.runtime.api.impl.TezEvent.readFields(TezEvent.java:271) at org.apache.tez.runtime.api.impl.TezHeartbeatRequest.readFields(TezHeartbeatRequest.java:110) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285) at org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:160) at org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1884) at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1816) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1574) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:806) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:673) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:644) {code} cc/ [~hitesh] and [~bikassaha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2317) Successful task attempts getting killed
[ https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498481#comment-14498481 ] Rohini Palaniswamy commented on TEZ-2317: - +1 Successful task attempts getting killed --- Key: TEZ-2317 URL: https://issues.apache.org/jira/browse/TEZ-2317 Project: Apache Tez Issue Type: Bug Reporter: Rohini Palaniswamy Assignee: Bikas Saha Attachments: AM-taskkill.log, TEZ-2317.1.patch, TEZ-2317.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2317) Event processing backlog can result task failures for short tasks
[ https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-2317: Summary: Event processing backlog can result task failures for short tasks (was: Successful task attempts getting killed) Event processing backlog can result task failures for short tasks - Key: TEZ-2317 URL: https://issues.apache.org/jira/browse/TEZ-2317 Project: Apache Tez Issue Type: Bug Reporter: Rohini Palaniswamy Assignee: Bikas Saha Attachments: AM-taskkill.log, TEZ-2317.1.patch, TEZ-2317.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2317) Event processing backlog can result in task failures for short tasks
[ https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-2317: Summary: Event processing backlog can result in task failures for short tasks (was: Event processing backlog can result task failures for short tasks) Event processing backlog can result in task failures for short tasks Key: TEZ-2317 URL: https://issues.apache.org/jira/browse/TEZ-2317 Project: Apache Tez Issue Type: Bug Reporter: Rohini Palaniswamy Assignee: Bikas Saha Attachments: AM-taskkill.log, TEZ-2317.1.patch, TEZ-2317.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2314) Tez task attempt failures due to bad event serialization
[ https://issues.apache.org/jira/browse/TEZ-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated TEZ-2314: Affects Version/s: 0.7.0 Fix Version/s: 0.7.0 bq. If there is a simple pig script we can use to reproduce this locally, that would help too. I don't have any. I noticed it in two of the large pig scripts that I ran. I will debug it with log statements and update. Tez task attempt failures due to bad event serialization Key: TEZ-2314 URL: https://issues.apache.org/jira/browse/TEZ-2314 Project: Apache Tez Issue Type: Bug Affects Versions: 0.7.0 Reporter: Rohini Palaniswamy Fix For: 0.7.0 Attachments: TEZ-2314.log.patch {code} 2015-04-13 19:21:48,516 WARN [Socket Reader #3 for port 53530] ipc.Server: Unable to read call parameters for client 10.216.13.112on connection protocol org.apache.tez.common.TezTaskUmbilicalProtocol for rpcKind RPC_WRITABLE java.lang.ArrayIndexOutOfBoundsException: 1935896432 at org.apache.tez.runtime.api.impl.EventMetaData.readFields(EventMetaData.java:120) at org.apache.tez.runtime.api.impl.TezEvent.readFields(TezEvent.java:271) at org.apache.tez.runtime.api.impl.TezHeartbeatRequest.readFields(TezHeartbeatRequest.java:110) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285) at org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:160) at org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1884) at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1816) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1574) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:806) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:673) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:644) {code} cc/ [~hitesh] and [~bikassaha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)