[jira] [Commented] (TEZ-1961) Remove misleading exception No running dag from AM logs

2015-05-07 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532138#comment-14532138
 ] 

TezQA commented on TEZ-1961:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12731085/TEZ-1961-3.patch
  against master revision 02870f0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/649//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/649//console

This message is automatically generated.

 Remove misleading exception No running dag from AM logs
 -

 Key: TEZ-1961
 URL: https://issues.apache.org/jira/browse/TEZ-1961
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Jeff Zhang
Priority: Critical
 Attachments: TEZ-1961-1.patch, TEZ-1961-2.patch, TEZ-1961-3.patch


 {code}
 15/01/14 16:45:06 INFO ipc.Server: IPC Server handler 0 on 51000, call 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
 from  Call#0 Retry#0
 org.apache.tez.dag.api.TezException: No running dag at present
   at 
 org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:84)
   at 
 org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:151)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:94)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2041)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2037)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2035)
 15/01/14 16:45:06 INFO client.DAGClientImpl: DAG initialized: 
 CurrentState=Running
 {code}
 This exception shows up fairly often and isn't very relevant - queries before 
 a DAG is submitted to the AM.
 This is very misleading, especially for folks new to Tez, and should be 
 removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-1961 PreCommit Build #649

2015-05-07 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-1961
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/649/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2850 lines...]
[INFO] Final Memory: 70M/931M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12731085/TEZ-1961-3.patch
  against master revision 02870f0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/649//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/649//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
2a8b86df1ccfb4cd7e51a1a513e609b74e98353f logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #646
Archived 44 artifacts
Archive block size is 32768
Received 2 blocks and 2706810 bytes
Compression is 2.4%
Took 1.1 sec
Description set: TEZ-1961
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2426) Task input not complete before sending Task completed event

2015-05-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533094#comment-14533094
 ] 

Siddharth Seth commented on TEZ-2426:
-

[~bikassaha] - do you have additional logs - the entire AM log specifically. 
There seems to be a discrepancy in the AM / task log times as well. Assuming 
the nodes are out of sync. 

I can see how the exception happens during execution of the next task - since 
we don't join on the eventRouter thread.
However, I'm not sure how the FAILED message will go through for the previous 
attempt as a result of this. It should have gone through for the currently 
running task. If it went for the previous task - the AM should have thrown an 
error related to an invalid taskAttemptId. That leads me to believe something 
else is broken at the same time.

 Task input not complete before sending Task completed event
 ---

 Key: TEZ-2426
 URL: https://issues.apache.org/jira/browse/TEZ-2426
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Priority: Critical
 Attachments: am.log, container.log


 Sequence of events
 1) Task A starts in a container
 2) Task A complete event comes to AM
 3) Task B starts in the same container
 4) Task A's input calls some method on its context. Crashes with NPE
 5) The crash sends an input failed event for Task A to the AM
 6) Task A state machine crashes saying cannot handle failed after success
 In some cases, it could be that status update event is also sent after 
 completion, though not sure if its related to the failed event being sent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2426) Task input not complete before sending Task completed event

2015-05-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533096#comment-14533096
 ] 

Siddharth Seth commented on TEZ-2426:
-

The status update event after the task failed is also strange. Will look into 
that. The thread for the last running task may not be exiting properly.

 Task input not complete before sending Task completed event
 ---

 Key: TEZ-2426
 URL: https://issues.apache.org/jira/browse/TEZ-2426
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Priority: Critical
 Attachments: am.log, container.log


 Sequence of events
 1) Task A starts in a container
 2) Task A complete event comes to AM
 3) Task B starts in the same container
 4) Task A's input calls some method on its context. Crashes with NPE
 5) The crash sends an input failed event for Task A to the AM
 6) Task A state machine crashes saying cannot handle failed after success
 In some cases, it could be that status update event is also sent after 
 completion, though not sure if its related to the failed event being sent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2418) TASK_ATTEMPT_FAILED_EVENT and TASK_COMPLETED_EVENT should move back to direct routing to attempt

2015-05-07 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2418:

Description: Due to recovery code path, they are currently double routed to 
the vertex first and then the attempt.

 TASK_ATTEMPT_FAILED_EVENT and TASK_COMPLETED_EVENT should move back to direct 
 routing to attempt
 

 Key: TEZ-2418
 URL: https://issues.apache.org/jira/browse/TEZ-2418
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-2418.1.patch


 Due to recovery code path, they are currently double routed to the vertex 
 first and then the attempt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

2015-05-07 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533177#comment-14533177
 ] 

Hitesh Shah commented on TEZ-776:
-

+1 on the patch.

Though as [~sseth] pointed out, there are potential concerns around 
BroadcastEdgeManager thread safety. From a practical point of view, it likely 
should not be hit as the prepare function is invoked long before the edge is 
used and the rpc threads will likely not have looked up the event route 
metadata object before this point. Theoretically, there is a possibility of 
visibility issues given that there is no lock on any function inside 
BroadcastEdgeManager ( and the happens-before semantics would not kick in ).




 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
Priority: Blocker
 Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, 
 TEZ-776.12.patch, TEZ-776.13.patch, TEZ-776.14.patch, TEZ-776.2.patch, 
 TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, 
 TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.8.patch, TEZ-776.9.patch, 
 TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, 
 TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, 
 TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
 With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
 events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
 without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1961) Remove misleading exception No running dag from AM logs

2015-05-07 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533202#comment-14533202
 ] 

Bikas Saha commented on TEZ-1961:
-

bq. Previously DAGClientAMProtocol#getAMStatus is not supported for non-session 
mode
That was because it returns session status, which makes no sense in non-session 
mode. Please dont change that.

 Remove misleading exception No running dag from AM logs
 -

 Key: TEZ-1961
 URL: https://issues.apache.org/jira/browse/TEZ-1961
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Jeff Zhang
Priority: Critical
 Attachments: TEZ-1961-1.patch, TEZ-1961-2.patch, TEZ-1961-3.patch


 {code}
 15/01/14 16:45:06 INFO ipc.Server: IPC Server handler 0 on 51000, call 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
 from  Call#0 Retry#0
 org.apache.tez.dag.api.TezException: No running dag at present
   at 
 org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:84)
   at 
 org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:151)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:94)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2041)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2037)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2035)
 15/01/14 16:45:06 INFO client.DAGClientImpl: DAG initialized: 
 CurrentState=Running
 {code}
 This exception shows up fairly often and isn't very relevant - queries before 
 a DAG is submitted to the AM.
 This is very misleading, especially for folks new to Tez, and should be 
 removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

2015-05-07 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533129#comment-14533129
 ] 

Bikas Saha commented on TEZ-776:


[~hitesh] [~rajesh.balamohan] Any further comments?

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
Priority: Blocker
 Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, 
 TEZ-776.12.patch, TEZ-776.13.patch, TEZ-776.14.patch, TEZ-776.2.patch, 
 TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, 
 TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.8.patch, TEZ-776.9.patch, 
 TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, 
 TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, 
 TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
 With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
 events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
 without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2410) VertexGroupCommitFinishedEvent VertexCommitStartedEvent is not logged correctly

2015-05-07 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533146#comment-14533146
 ] 

Bikas Saha commented on TEZ-2410:
-

Shouldnt vertexGroup.isCommitted=true be replaced by 
vertexGroup.commitStarted=true ? This is when the commit process starts, right? 
Without this vertexGroup.isInCommitting() will return false. Not sure how tests 
are passing with this.
{code}
 for (final VertexGroupInfo groupInfo : vertexGroups.values()) {
   if (!groupInfo.outputs.isEmpty()) {
-groupInfo.committed = true;
 final Vertex v = getVertex(groupInfo.groupMembers.iterator().next());
 for (final String outputName : groupInfo.outputs) {
   final OutputKey outputKey = new OutputKey(outputName, 
groupInfo.groupName, true);
@@ -1920,7 +1931,6 @@ public class DAGImpl implements 
org.apache.tez.dag.app.dag.DAG,
 +  data, groupName= + groupInfo.groupName);
 continue;
   }
-  groupInfo.committed = true;{code}

This is probably not going to work with the above code
{code}  // partial output may already have been in committing or committed. 
fail if so
  ListVertexGroupInfo groupList = vertexGroupInfo.get(vertex.getName());
  if (groupList != null) {
for (VertexGroupInfo groupInfo : groupList) {
  if (groupInfo.isInCommitting()) {
String msg = Aborting job as committing vertex: 
+ vertex.getLogIdentifier() +  is re-running;
LOG.info(msg);
addDiagnostic(msg);
enactKill(DAGTerminationCause.VERTEX_RERUN_IN_COMMITTING,
VertexTerminationCause.VERTEX_RERUN_IN_COMMITTING);
return true;
  } else if (groupInfo.isCommitted()) {{code}

1) succeededCommits looks unused - we could remove it
2) Why is vertexGroup.commitStarted=true here? this is where commit finishes, 
right?
3) if condition can be replaced by vertexGroup.isCommitted();
4) unnecessary space before ++
5) missing { after if stmt
{code}
+  OutputKey outputKey = commitCompletedEvent.getOutputKey();
+  succeededCommits.add(outputKey);  unused
+  if (outputKey.isVertexGroupOutput){
+VertexGroupInfo vertexGroup = 
vertexGroups.get(outputKey.getEntityName());
+vertexGroup.commitStarted = true;  why here at finish time?
+vertexGroup.successfulCommits ++;  space
+if (vertexGroup.successfulCommits == vertexGroup.outputs.size()) { 
 replace with isCommitted()
+  if (!commitAllOutputsOnSuccess)  missing {
+  try {
{code}

Which test case is covered the VertexImpl change? 
testVertexCommit_OnVertexSuccess()?

Which test/check is covering that vertexgroupcommit event is not written for a 
non-group vertex when all commits happen on dag success?

Rename testVertexSucceed_OnDAGSuccess() to testVertexCommit_OnDAGSuccess()?



 VertexGroupCommitFinishedEvent  VertexCommitStartedEvent is not logged 
 correctly
 -

 Key: TEZ-2410
 URL: https://issues.apache.org/jira/browse/TEZ-2410
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Blocker
 Attachments: TEZ-2410-1.patch, TEZ-2410-1.patch, TEZ-2410-2.patch


 VertexGroupCommitFinishedEvent may be logged for non-vertex group commits.
 VertexGroupCommitFinishedEvent may be logged for each member vertex of the 
 group instead of once per group.
 VertexCommitStartedEvent may be logged for each output of vertex



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2418) TASK_ATTEMPT_FAILED_EVENT and TASK_COMPLETED_EVENT should move back to direct routing to attempt

2015-05-07 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2418:

Summary: TASK_ATTEMPT_FAILED_EVENT and TASK_COMPLETED_EVENT should move 
back to direct routing to attempt  (was: TASK_ATTEMPT_FAILED_EVENT missed in 
TEZ-2325)

 TASK_ATTEMPT_FAILED_EVENT and TASK_COMPLETED_EVENT should move back to direct 
 routing to attempt
 

 Key: TEZ-2418
 URL: https://issues.apache.org/jira/browse/TEZ-2418
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-2418.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2430) Add test for RecoveryEvent Spec

2015-05-07 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2430:

Issue Type: Sub-task  (was: Improvement)
Parent: TEZ-15

 Add test for RecoveryEvent Spec
 ---

 Key: TEZ-2430
 URL: https://issues.apache.org/jira/browse/TEZ-2430
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Jeff Zhang
Assignee: Jeff Zhang

 * Ordering of RecoveryEvents.
 ** DataMovementEvent must be logged before TaskAttemptFinishedEvent
 ** InputDataInfoEvent must be logged before VertexInitializedEvent (already 
 covered in the existing test)
 * Frequency of RecoveryEvent. e.g. TaskAttemptStartedEvent can only been 
 logged once, but TaskAttemptFinishedEvent can been logged twice. (TaskAttempt 
  transit from SUCCEEDED to FAILED)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2431) Recovery of task events (eg. datamovement events) should not depend on ordering of task attempt events

2015-05-07 Thread Bikas Saha (JIRA)
Bikas Saha created TEZ-2431:
---

 Summary: Recovery of task events (eg. datamovement events) should 
not depend on ordering of task attempt events
 Key: TEZ-2431
 URL: https://issues.apache.org/jira/browse/TEZ-2431
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Bikas Saha


Today, task attempt events need to go through verteximpl before reaching the 
task in order to maintain ordering guarantees for recovery. This causes these 
events to be routed twice through the dispatcher. This can cause overhead 
delays in large jobs. Also, this makes assumptions about event ordering which 
make the system fragile. Recovery should work independently of other system 
interactions so that evolution of other components is not affected by recovery 
unless it affects recovery logically. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2418) TASK_ATTEMPT_FAILED_EVENT missed in TEZ-2325

2015-05-07 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2418:

Priority: Major  (was: Blocker)

 TASK_ATTEMPT_FAILED_EVENT missed in TEZ-2325
 

 Key: TEZ-2418
 URL: https://issues.apache.org/jira/browse/TEZ-2418
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-2418.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2076) Tez framework to extract/analyze data stored in ATS for specific dag

2015-05-07 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533354#comment-14533354
 ] 

Jonathan Eagles commented on TEZ-2076:
--

[~rajesh.balamohan], regarding the TezTaskAttemptID.fromString slowness. I have 
put up a patch to TEZ-1526 that may help understand  some of the parameters of 
slowness above. Unfortunately, I haven't been able to fully codify a solution, 
but perhaps you could take a look to see if it addresses some of the issues you 
see above?

 Tez framework to extract/analyze data stored in ATS for specific dag
 

 Key: TEZ-2076
 URL: https://issues.apache.org/jira/browse/TEZ-2076
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-2076.1.patch, TEZ-2076.10.patch, TEZ-2076.2.patch, 
 TEZ-2076.3.patch, TEZ-2076.4.patch, TEZ-2076.5.patch, TEZ-2076.6.patch, 
 TEZ-2076.7.patch, TEZ-2076.8.patch, TEZ-2076.9.patch, TEZ-2076.WIP.2.patch, 
 TEZ-2076.WIP.3.patch, TEZ-2076.WIP.patch


 - Users should be able to download ATS data pertaining to a DAG from Tez-UI 
 (more like a zip file containing DAG/Vertex/Task/TaskAttempt info).
 - This can be plugged to an analyzer which parses the data, adds semantics 
 and provides an in-memory representation for further analysis.
 - This will enable to write different analyzer rules, which can be run on top 
 of this in-memory representation to come up with analysis on the DAG.
 - Results of this analyzer rules can be rendered on to UI (standalone webapp) 
 later point in time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1526) LoadingCache for TezTaskID slow for large jobs

2015-05-07 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-1526:
-
Attachment: TEZ-1526.4.patch

 LoadingCache for TezTaskID slow for large jobs
 --

 Key: TEZ-1526
 URL: https://issues.apache.org/jira/browse/TEZ-1526
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
  Labels: performance
 Attachments: 10-TezTaskIDs.patch, TEZ-1526-v1.patch, 
 TEZ-1526-v2.patch, TEZ-1526.3.patch, TEZ-1526.4.patch


 Using the LoadingCache with default builder settings. 100,000 TezTaskIDs are 
 created in 10 seconds on my setup. With a LoadingCache initialCapacity of 
 10,000 they are created in 300 ms. With no LoadingCache, they are created in 
 10 ms. A test case in attached to illustrate the condition I would like to be 
 sped up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out

2015-05-07 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2421:

Attachment: TEZ-2421.1.patch

 Deadlock in AM because attempt and vertex locking each other out
 

 Key: TEZ-2421
 URL: https://issues.apache.org/jira/browse/TEZ-2421
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-2421.1.patch


 Ideally locks should be taken one way - either going down or up. Preferably 
 not going up because most such data can be passed in during object 
 construction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1961) Remove misleading exception No running dag from AM logs

2015-05-07 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533607#comment-14533607
 ] 

Jeff Zhang commented on TEZ-1961:
-

bq. That was because it returns session status, which makes no sense in 
non-session mode.
I check the DAGClientHandler#getTezAppMasterStatus, it just return the 
DAGAppMaster's state, The state should be also valid in non-session mode, right 
?


 Remove misleading exception No running dag from AM logs
 -

 Key: TEZ-1961
 URL: https://issues.apache.org/jira/browse/TEZ-1961
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Jeff Zhang
Priority: Critical
 Attachments: TEZ-1961-1.patch, TEZ-1961-2.patch, TEZ-1961-3.patch


 {code}
 15/01/14 16:45:06 INFO ipc.Server: IPC Server handler 0 on 51000, call 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
 from  Call#0 Retry#0
 org.apache.tez.dag.api.TezException: No running dag at present
   at 
 org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:84)
   at 
 org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:151)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:94)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2041)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2037)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2035)
 15/01/14 16:45:06 INFO client.DAGClientImpl: DAG initialized: 
 CurrentState=Running
 {code}
 This exception shows up fairly often and isn't very relevant - queries before 
 a DAG is submitted to the AM.
 This is very misleading, especially for folks new to Tez, and should be 
 removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-05-07 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533628#comment-14533628
 ] 

Jeff Zhang commented on TEZ-2221:
-

[~rohini] It's been committed, suppose it won't affect pig any more

 VertexGroup name should be unqiue
 -

 Key: TEZ-2221
 URL: https://issues.apache.org/jira/browse/TEZ-2221
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0, 0.5.4, 0.6.1

 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
 TEZ-2221-4.patch, TEZ-2221-5-revert.patch


 VertexGroupCommitStartedEvent  VertexGroupCommitFinishedEvent use vertex 
 group name to identify the vertex group commit, the same name of vertex group 
 will conflict. While in the current equals  hashCode of VertexGroup, vertex 
 group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-1961) Remove misleading exception No running dag from AM logs

2015-05-07 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533653#comment-14533653
 ] 

Jeff Zhang edited comment on TEZ-1961 at 5/8/15 12:17 AM:
--

I also rename DAGClientHandler#getSessionState to DAGClientHandler#getAMState 
and make it support non-session mode in this patch to avoid misleading. If it 
is valid, I can do it in this patch, because the it is a very simple change. 


was (Author: zjffdu):
I also rename DAGClientHandler#getSessionState to DAGClientHandler#getAMState 
in this patch to avoid misleading. If it is valid, I can do it in this patch, 
because the it is a very simple change. 

 Remove misleading exception No running dag from AM logs
 -

 Key: TEZ-1961
 URL: https://issues.apache.org/jira/browse/TEZ-1961
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Jeff Zhang
Priority: Critical
 Attachments: TEZ-1961-1.patch, TEZ-1961-2.patch, TEZ-1961-3.patch


 {code}
 15/01/14 16:45:06 INFO ipc.Server: IPC Server handler 0 on 51000, call 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
 from  Call#0 Retry#0
 org.apache.tez.dag.api.TezException: No running dag at present
   at 
 org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:84)
   at 
 org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:151)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:94)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2041)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2037)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2035)
 15/01/14 16:45:06 INFO client.DAGClientImpl: DAG initialized: 
 CurrentState=Running
 {code}
 This exception shows up fairly often and isn't very relevant - queries before 
 a DAG is submitted to the AM.
 This is very misleading, especially for folks new to Tez, and should be 
 removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1961) Remove misleading exception No running dag from AM logs

2015-05-07 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533653#comment-14533653
 ] 

Jeff Zhang commented on TEZ-1961:
-

I also rename DAGClientHandler#getSessionState to DAGClientHandler#getAMState 
in this patch to avoid misleading. If it is valid, I can do it in this patch, 
because the it is a very simple change. 

 Remove misleading exception No running dag from AM logs
 -

 Key: TEZ-1961
 URL: https://issues.apache.org/jira/browse/TEZ-1961
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Jeff Zhang
Priority: Critical
 Attachments: TEZ-1961-1.patch, TEZ-1961-2.patch, TEZ-1961-3.patch


 {code}
 15/01/14 16:45:06 INFO ipc.Server: IPC Server handler 0 on 51000, call 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
 from  Call#0 Retry#0
 org.apache.tez.dag.api.TezException: No running dag at present
   at 
 org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:84)
   at 
 org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:151)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:94)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2041)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2037)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2035)
 15/01/14 16:45:06 INFO client.DAGClientImpl: DAG initialized: 
 CurrentState=Running
 {code}
 This exception shows up fairly often and isn't very relevant - queries before 
 a DAG is submitted to the AM.
 This is very misleading, especially for folks new to Tez, and should be 
 removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2426) Task input not complete before sending Task completed event

2015-05-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2426:

Attachment: TEZ-2426.1.txt

This should fix it. Main changes in the patch
- Wait for the eventRouter thread to complete before considering a task as done 
and accepting the next one.
- Fixed visibility concerns in *Context.
- Moved some of the cleanup into LogicalIOProcessorRuntimeTask - since 
progress() etc can happen often and shouldn't hit a volatile.

 Task input not complete before sending Task completed event
 ---

 Key: TEZ-2426
 URL: https://issues.apache.org/jira/browse/TEZ-2426
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Priority: Critical
 Attachments: TEZ-2426.1.txt, am.log, container.log


 Sequence of events
 1) Task A starts in a container
 2) Task A complete event comes to AM
 3) Task B starts in the same container
 4) Task A's input calls some method on its context. Crashes with NPE
 5) The crash sends an input failed event for Task A to the AM
 6) Task A state machine crashes saying cannot handle failed after success
 In some cases, it could be that status update event is also sent after 
 completion, though not sure if its related to the failed event being sent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2410) VertexGroupCommitFinishedEvent VertexCommitStartedEvent is not logged correctly

2015-05-07 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2410:

Attachment: TEZ-2410-3.patch

 VertexGroupCommitFinishedEvent  VertexCommitStartedEvent is not logged 
 correctly
 -

 Key: TEZ-2410
 URL: https://issues.apache.org/jira/browse/TEZ-2410
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Blocker
 Attachments: TEZ-2410-1.patch, TEZ-2410-1.patch, TEZ-2410-2.patch, 
 TEZ-2410-3.patch


 VertexGroupCommitFinishedEvent may be logged for non-vertex group commits.
 VertexGroupCommitFinishedEvent may be logged for each member vertex of the 
 group instead of once per group.
 VertexCommitStartedEvent may be logged for each output of vertex



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out

2015-05-07 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2421:

Priority: Blocker  (was: Major)

 Deadlock in AM because attempt and vertex locking each other out
 

 Key: TEZ-2421
 URL: https://issues.apache.org/jira/browse/TEZ-2421
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
Priority: Blocker
 Attachments: TEZ-2421.1.patch


 Ideally locks should be taken one way - either going down or up. Preferably 
 not going up because most such data can be passed in during object 
 construction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out

2015-05-07 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533734#comment-14533734
 ] 

Bikas Saha commented on TEZ-2421:
-

The main issue is that the attempt takes a lock upwards into the vertex while 
vertex takes locks downwards into the attempt. One way has to be broken to 
prevent deadlock. The key culprits are getting the remoteTaskSpec and getting 
the taskLocation.
Instead of the attempt up-calling into the vertex to get these after getting 
scheduled, the vertex is now sending these to the task when it schedules the 
task. [~zjffdu] [~sseth] [~hitesh] Please review.

 Deadlock in AM because attempt and vertex locking each other out
 

 Key: TEZ-2421
 URL: https://issues.apache.org/jira/browse/TEZ-2421
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-2421.1.patch


 Ideally locks should be taken one way - either going down or up. Preferably 
 not going up because most such data can be passed in during object 
 construction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2410) VertexGroupCommitFinishedEvent VertexCommitStartedEvent is not logged correctly

2015-05-07 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533733#comment-14533733
 ] 

Jeff Zhang commented on TEZ-2410:
-

bq. Did not find any testDAGCommitSucceeded_OnDAGSuccess. Somewhere there 
should be this check for vertex v1 commit on dag success, right? 
historyEventHandler.verifyVertexGroupCommitFinishedEvent(v1, 0);

What do you mean ?  
historyEventHandler.verifyVertexGroupCommitFinishedEvent(v1, 0); is in 
TestCommit#testDAGCommitSucceeded_OnDAGSuccess

 VertexGroupCommitFinishedEvent  VertexCommitStartedEvent is not logged 
 correctly
 -

 Key: TEZ-2410
 URL: https://issues.apache.org/jira/browse/TEZ-2410
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Blocker
 Attachments: TEZ-2410-1.patch, TEZ-2410-1.patch, TEZ-2410-2.patch, 
 TEZ-2410-3.patch


 VertexGroupCommitFinishedEvent may be logged for non-vertex group commits.
 VertexGroupCommitFinishedEvent may be logged for each member vertex of the 
 group instead of once per group.
 VertexCommitStartedEvent may be logged for each output of vertex



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1961) Remove misleading exception No running dag from AM logs

2015-05-07 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533647#comment-14533647
 ] 

Bikas Saha commented on TEZ-1961:
-

Its actually session state. Perhaps we should do that in a separate jira.

 Remove misleading exception No running dag from AM logs
 -

 Key: TEZ-1961
 URL: https://issues.apache.org/jira/browse/TEZ-1961
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Jeff Zhang
Priority: Critical
 Attachments: TEZ-1961-1.patch, TEZ-1961-2.patch, TEZ-1961-3.patch


 {code}
 15/01/14 16:45:06 INFO ipc.Server: IPC Server handler 0 on 51000, call 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
 from  Call#0 Retry#0
 org.apache.tez.dag.api.TezException: No running dag at present
   at 
 org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:84)
   at 
 org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:151)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:94)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2041)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2037)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2035)
 15/01/14 16:45:06 INFO client.DAGClientImpl: DAG initialized: 
 CurrentState=Running
 {code}
 This exception shows up fairly often and isn't very relevant - queries before 
 a DAG is submitted to the AM.
 This is very misleading, especially for folks new to Tez, and should be 
 removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2410) VertexGroupCommitFinishedEvent VertexCommitStartedEvent is not logged correctly

2015-05-07 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533677#comment-14533677
 ] 

Jeff Zhang commented on TEZ-2410:
-

[~bikassaha] Sorry for my ugly mistake in my last patch. Upload new patch to 
address issue in comments.

bq. Which test case is covered the VertexImpl change? 
testVertexCommit_OnVertexSuccess()?
All the verification of VertexCommitStartedEvent cover this (Especially 
testVertexCommit_OnDAGSuccess  testVertexCommit_OnVertexSuccess ). With the 
change in VertexImpl, VertexCommitStartedEvent may be logged multiple times 
(one time for each output) 

bq. Which test/check is covering that vertexgroupcommit event is not written 
for a non-group vertex when all commits happen on dag success?
testDAGCommitSucceeded_OnDAGSuccess

 VertexGroupCommitFinishedEvent  VertexCommitStartedEvent is not logged 
 correctly
 -

 Key: TEZ-2410
 URL: https://issues.apache.org/jira/browse/TEZ-2410
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Blocker
 Attachments: TEZ-2410-1.patch, TEZ-2410-1.patch, TEZ-2410-2.patch, 
 TEZ-2410-3.patch


 VertexGroupCommitFinishedEvent may be logged for non-vertex group commits.
 VertexGroupCommitFinishedEvent may be logged for each member vertex of the 
 group instead of once per group.
 VertexCommitStartedEvent may be logged for each output of vertex



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-1961) Remove misleading exception No running dag from AM logs

2015-05-07 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533653#comment-14533653
 ] 

Jeff Zhang edited comment on TEZ-1961 at 5/8/15 12:48 AM:
--

I also rename DAGClientHandler#getSessionStatus to 
DAGClientHandler#getTezAppMasterStatus and make it support non-session mode in 
this patch to avoid misleading. If it is valid, I can do it in this patch, 
because the it is a very simple change. 

{noformat}
-  public synchronized TezAppMasterStatus getSessionStatus() throws 
TezException {
-if (!dagAppMaster.isSession()) {
-  throw new TezException(Unsupported operation as AM not running in
-  +  session mode);
-}
+  public synchronized TezAppMasterStatus getTezAppMasterStatus() throws 
TezException {
 switch (dagAppMaster.getState()) {
 case NEW:
 case INITED:
{noformat}


was (Author: zjffdu):
I also rename DAGClientHandler#getSessionState to DAGClientHandler#getAMState 
and make it support non-session mode in this patch to avoid misleading. If it 
is valid, I can do it in this patch, because the it is a very simple change. 

{noformat}
-  public synchronized TezAppMasterStatus getSessionStatus() throws 
TezException {
-if (!dagAppMaster.isSession()) {
-  throw new TezException(Unsupported operation as AM not running in
-  +  session mode);
-}
+  public synchronized TezAppMasterStatus getTezAppMasterStatus() throws 
TezException {
 switch (dagAppMaster.getState()) {
 case NEW:
 case INITED:
{noformat}

 Remove misleading exception No running dag from AM logs
 -

 Key: TEZ-1961
 URL: https://issues.apache.org/jira/browse/TEZ-1961
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Jeff Zhang
Priority: Critical
 Attachments: TEZ-1961-1.patch, TEZ-1961-2.patch, TEZ-1961-3.patch


 {code}
 15/01/14 16:45:06 INFO ipc.Server: IPC Server handler 0 on 51000, call 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
 from  Call#0 Retry#0
 org.apache.tez.dag.api.TezException: No running dag at present
   at 
 org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:84)
   at 
 org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:151)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:94)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2041)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2037)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2035)
 15/01/14 16:45:06 INFO client.DAGClientImpl: DAG initialized: 
 CurrentState=Running
 {code}
 This exception shows up fairly often and isn't very relevant - queries before 
 a DAG is submitted to the AM.
 This is very misleading, especially for folks new to Tez, and should be 
 removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-1961) Remove misleading exception No running dag from AM logs

2015-05-07 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533653#comment-14533653
 ] 

Jeff Zhang edited comment on TEZ-1961 at 5/8/15 12:47 AM:
--

I also rename DAGClientHandler#getSessionState to DAGClientHandler#getAMState 
and make it support non-session mode in this patch to avoid misleading. If it 
is valid, I can do it in this patch, because the it is a very simple change. 

{noformat}
-  public synchronized TezAppMasterStatus getSessionStatus() throws 
TezException {
-if (!dagAppMaster.isSession()) {
-  throw new TezException(Unsupported operation as AM not running in
-  +  session mode);
-}
+  public synchronized TezAppMasterStatus getTezAppMasterStatus() throws 
TezException {
 switch (dagAppMaster.getState()) {
 case NEW:
 case INITED:
{noformat}


was (Author: zjffdu):
I also rename DAGClientHandler#getSessionState to DAGClientHandler#getAMState 
and make it support non-session mode in this patch to avoid misleading. If it 
is valid, I can do it in this patch, because the it is a very simple change. 

 Remove misleading exception No running dag from AM logs
 -

 Key: TEZ-1961
 URL: https://issues.apache.org/jira/browse/TEZ-1961
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Jeff Zhang
Priority: Critical
 Attachments: TEZ-1961-1.patch, TEZ-1961-2.patch, TEZ-1961-3.patch


 {code}
 15/01/14 16:45:06 INFO ipc.Server: IPC Server handler 0 on 51000, call 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
 from  Call#0 Retry#0
 org.apache.tez.dag.api.TezException: No running dag at present
   at 
 org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:84)
   at 
 org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:151)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:94)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2041)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2037)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2035)
 15/01/14 16:45:06 INFO client.DAGClientImpl: DAG initialized: 
 CurrentState=Running
 {code}
 This exception shows up fairly often and isn't very relevant - queries before 
 a DAG is submitted to the AM.
 This is very misleading, especially for folks new to Tez, and should be 
 removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2410) VertexGroupCommitFinishedEvent VertexCommitStartedEvent is not logged correctly

2015-05-07 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533726#comment-14533726
 ] 

Bikas Saha commented on TEZ-2410:
-

+1 lgtm.

Did not find any testDAGCommitSucceeded_OnDAGSuccess.
Somewhere there should be this check for vertex v1 commit on dag success, right?
historyEventHandler.verifyVertexGroupCommitFinishedEvent(v1, 0);

 VertexGroupCommitFinishedEvent  VertexCommitStartedEvent is not logged 
 correctly
 -

 Key: TEZ-2410
 URL: https://issues.apache.org/jira/browse/TEZ-2410
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Blocker
 Attachments: TEZ-2410-1.patch, TEZ-2410-1.patch, TEZ-2410-2.patch, 
 TEZ-2410-3.patch


 VertexGroupCommitFinishedEvent may be logged for non-vertex group commits.
 VertexGroupCommitFinishedEvent may be logged for each member vertex of the 
 group instead of once per group.
 VertexCommitStartedEvent may be logged for each output of vertex



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

2015-05-07 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533428#comment-14533428
 ] 

Bikas Saha commented on TEZ-776:


Thanks for the reviews. Uploading rebased patch for a jenkins run before 
committing.

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
Priority: Blocker
 Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, 
 TEZ-776.12.patch, TEZ-776.13.patch, TEZ-776.14.patch, TEZ-776.15.patch, 
 TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, 
 TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.8.patch, 
 TEZ-776.9.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, 
 TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, 
 TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, 
 With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, 
 Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, 
 with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

2015-05-07 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-776:
---
Attachment: TEZ-776.15.patch

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
Priority: Blocker
 Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, 
 TEZ-776.12.patch, TEZ-776.13.patch, TEZ-776.14.patch, TEZ-776.15.patch, 
 TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, 
 TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.8.patch, 
 TEZ-776.9.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, 
 TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, 
 TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, 
 With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, 
 Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, 
 with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1961) Remove misleading exception No running dag from AM logs

2015-05-07 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532244#comment-14532244
 ] 

Jeff Zhang commented on TEZ-1961:
-

[~sseth] Please help review it.

* Wait for DAGAppMaster go to RUNNING/SHUTDOWN, then return DAGClient in 
non-session mode. This can ensure that dag has started to run.   [~bikassaha] 
Previously DAGClientAMProtocol#getAMStatus is not supported for non-session 
mode, is there any considering for that ? I make it supported under non-session 
mode in this patch.  
* The patch cause a little difference on the tracking URL of application. This 
is one bug of YARN which has been been solved in YARN-2246 (solved in 
hadoop-2.7)
The bug is there may be some suffix at the end of trackingURL when app move 
from SUBMITTED to RUNNING. So after this patch, the trackingURL will change 
from http://localhost:53419/proxy/application_1430963524753_0005  to 
http://localhost:53419/proxy/application_1430963524753_0005/ui/
* Still keep the null currentDAG check in DAGClientHandler as sanity check. 

 Remove misleading exception No running dag from AM logs
 -

 Key: TEZ-1961
 URL: https://issues.apache.org/jira/browse/TEZ-1961
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Jeff Zhang
Priority: Critical
 Attachments: TEZ-1961-1.patch, TEZ-1961-2.patch, TEZ-1961-3.patch


 {code}
 15/01/14 16:45:06 INFO ipc.Server: IPC Server handler 0 on 51000, call 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
 from  Call#0 Retry#0
 org.apache.tez.dag.api.TezException: No running dag at present
   at 
 org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:84)
   at 
 org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:151)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:94)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2041)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2037)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2035)
 15/01/14 16:45:06 INFO client.DAGClientImpl: DAG initialized: 
 CurrentState=Running
 {code}
 This exception shows up fairly often and isn't very relevant - queries before 
 a DAG is submitted to the AM.
 This is very misleading, especially for folks new to Tez, and should be 
 removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2426) Task input not complete before sending Task completed event

2015-05-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2426:

Attachment: TEZ-2426.2.txt

Updated patch to remove some unnecessary synchronization which causes the 
findbugs issues.

 Task input not complete before sending Task completed event
 ---

 Key: TEZ-2426
 URL: https://issues.apache.org/jira/browse/TEZ-2426
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Bikas Saha
Assignee: Siddharth Seth
Priority: Critical
 Attachments: TEZ-2426.1.txt, TEZ-2426.2.txt, am.log, container.log


 Sequence of events
 1) Task A starts in a container
 2) Task A complete event comes to AM
 3) Task B starts in the same container
 4) Task A's input calls some method on its context. Crashes with NPE
 5) The crash sends an input failed event for Task A to the AM
 6) Task A state machine crashes saying cannot handle failed after success
 In some cases, it could be that status update event is also sent after 
 completion, though not sure if its related to the failed event being sent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2426) Task input not complete before sending Task completed event

2015-05-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533861#comment-14533861
 ] 

Siddharth Seth commented on TEZ-2426:
-

[~rajesh.balamohan], [~bikassaha], [~zjffdu] - please review.

 Task input not complete before sending Task completed event
 ---

 Key: TEZ-2426
 URL: https://issues.apache.org/jira/browse/TEZ-2426
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Bikas Saha
Assignee: Siddharth Seth
Priority: Critical
 Attachments: TEZ-2426.1.txt, TEZ-2426.2.txt, am.log, container.log


 Sequence of events
 1) Task A starts in a container
 2) Task A complete event comes to AM
 3) Task B starts in the same container
 4) Task A's input calls some method on its context. Crashes with NPE
 5) The crash sends an input failed event for Task A to the AM
 6) Task A state machine crashes saying cannot handle failed after success
 In some cases, it could be that status update event is also sent after 
 completion, though not sure if its related to the failed event being sent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2426) Task input not complete before sending Task completed event

2015-05-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533862#comment-14533862
 ] 

Siddharth Seth commented on TEZ-2426:
-

Tested on a large noop job - ran through without any issues.

 Task input not complete before sending Task completed event
 ---

 Key: TEZ-2426
 URL: https://issues.apache.org/jira/browse/TEZ-2426
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Bikas Saha
Assignee: Siddharth Seth
Priority: Critical
 Attachments: TEZ-2426.1.txt, TEZ-2426.2.txt, am.log, container.log


 Sequence of events
 1) Task A starts in a container
 2) Task A complete event comes to AM
 3) Task B starts in the same container
 4) Task A's input calls some method on its context. Crashes with NPE
 5) The crash sends an input failed event for Task A to the AM
 6) Task A state machine crashes saying cannot handle failed after success
 In some cases, it could be that status update event is also sent after 
 completion, though not sure if its related to the failed event being sent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2412) Should kill vertex in DAGImpl#VertexRerunWhileCommitting

2015-05-07 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2412:

Attachment: TEZ-2412-2.patch

Rebase the patch and add some comments in code. 

 Should kill vertex in DAGImpl#VertexRerunWhileCommitting
 

 Key: TEZ-2412
 URL: https://issues.apache.org/jira/browse/TEZ-2412
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Blocker
 Attachments: TEZ-2412-1.patch, TEZ-2412-2.patch


 * When vertex rerun, it move to RUNNING state, so should kill it in 
 DAGImpl#VertexRerunWhileCommitting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2426) Task input not complete before sending Task completed event

2015-05-07 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533764#comment-14533764
 ] 

TezQA commented on TEZ-2426:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12731316/TEZ-2426.1.txt
  against master revision 05f77fe.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/651//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/651//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/651//console

This message is automatically generated.

 Task input not complete before sending Task completed event
 ---

 Key: TEZ-2426
 URL: https://issues.apache.org/jira/browse/TEZ-2426
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Bikas Saha
Assignee: Siddharth Seth
Priority: Critical
 Attachments: TEZ-2426.1.txt, am.log, container.log


 Sequence of events
 1) Task A starts in a container
 2) Task A complete event comes to AM
 3) Task B starts in the same container
 4) Task A's input calls some method on its context. Crashes with NPE
 5) The crash sends an input failed event for Task A to the AM
 6) Task A state machine crashes saying cannot handle failed after success
 In some cases, it could be that status update event is also sent after 
 completion, though not sure if its related to the failed event being sent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2429) Tez AM does not die after hitting internal error

2015-05-07 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533776#comment-14533776
 ] 

Jeff Zhang edited comment on TEZ-2429 at 5/8/15 2:47 AM:
-

Can reproduce the InvalidTransition in TestFaultTolerance, looking at the cause
{code}
2015-05-06 23:55:54,421 ERROR [Dispatcher thread: Central] impl.DAGImpl: Can't 
handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
DAG_VERTEX_RERUNNING at SUCCEEDED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1079)
at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:143)
at 
org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1871)
at 
org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1862)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
at java.lang.Thread.run(Thread.java:662)
{code}


was (Author: zjffdu):
Can produce the InvalidTransition in TestFaultTolerance, looking at the cause
{code}
2015-05-06 23:55:54,421 ERROR [Dispatcher thread: Central] impl.DAGImpl: Can't 
handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
DAG_VERTEX_RERUNNING at SUCCEEDED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1079)
at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:143)
at 
org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1871)
at 
org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1862)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
at java.lang.Thread.run(Thread.java:662)
{code}

 Tez AM does not die after hitting internal error 
 -

 Key: TEZ-2429
 URL: https://issues.apache.org/jira/browse/TEZ-2429
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Priority: Blocker
 Attachments: syslog_dag_1430956448478_0001_16_post, 
 syslog_dag_1430956448478_0001_17


 From https://builds.apache.org/job/Tez-Build/1055/: 
 2015-05-06 23:55:54,421 ERROR [Dispatcher thread: Central] impl.DAGImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 DAG_VERTEX_RERUNNING at SUCCEEDED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
   at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1079)
   at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:143)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1871)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1862)
   at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
   at java.lang.Thread.run(Thread.java:662)
 2015-05-06 23:55:54,423 INFO [Dispatcher thread: Central] app.DAGAppMaster: 
 Cleaning up DAG: name=testRandomFailingInputs, with 
 id=dag_1430956448478_0001_16
 2015-05-06 23:55:54,423 INFO [Dispatcher thread: Central] app.DAGAppMaster: 
 Completed cleanup for DAG: name=testRandomFailingInputs, with 
 id=dag_1430956448478_0001_16
 2015-05-06 23:55:54,424 INFO [Dispatcher thread: Central] impl.DAGImpl: 
 dag_1430956448478_0001_16 terminating due to internal error
 2015-05-06 23:55:54,433 INFO [IPC Server handler 0 on 47432] 
 app.DAGAppMaster: Starting DAG submitted via 

[jira] [Commented] (TEZ-2426) Task input not complete before sending Task completed event

2015-05-07 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533824#comment-14533824
 ] 

TezQA commented on TEZ-2426:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12731348/TEZ-2426.2.txt
  against master revision 05f77fe.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/654//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/654//console

This message is automatically generated.

 Task input not complete before sending Task completed event
 ---

 Key: TEZ-2426
 URL: https://issues.apache.org/jira/browse/TEZ-2426
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Bikas Saha
Assignee: Siddharth Seth
Priority: Critical
 Attachments: TEZ-2426.1.txt, TEZ-2426.2.txt, am.log, container.log


 Sequence of events
 1) Task A starts in a container
 2) Task A complete event comes to AM
 3) Task B starts in the same container
 4) Task A's input calls some method on its context. Crashes with NPE
 5) The crash sends an input failed event for Task A to the AM
 6) Task A state machine crashes saying cannot handle failed after success
 In some cases, it could be that status update event is also sent after 
 completion, though not sure if its related to the failed event being sent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-2426 PreCommit Build #654

2015-05-07 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2426
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/654/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2843 lines...]
[INFO] Final Memory: 76M/933M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12731348/TEZ-2426.2.txt
  against master revision 05f77fe.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/654//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/654//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
be199b38066bd334d1edd0003b0bd729e1106855 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #652
Archived 44 artifacts
Archive block size is 32768
Received 22 blocks and 2056276 bytes
Compression is 26.0%
Took 2 sec
Description set: TEZ-2426
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

Failed: TEZ-2421 PreCommit Build #655

2015-05-07 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2421
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/655/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2665 lines...]



{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12731352/TEZ-2421.3.patch
  against master revision 05f77fe.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestAMRecovery
  org.apache.tez.test.TestDAGRecovery

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/655//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/655//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
822086f2d482afc631ba47942dee53d56cefd63e logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #654
Archived 44 artifacts
Archive block size is 32768
Received 18 blocks and 2176850 bytes
Compression is 21.3%
Took 1.3 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
8 tests failed.
REGRESSION:  
org.apache.tez.test.TestAMRecovery.testVertexPartiallyFinished_Broadcast

Error Message:
expected:SUCCEEDED but was:FAILED

Stack Trace:
java.lang.AssertionError: expected:SUCCEEDED but was:FAILED
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.tez.test.TestAMRecovery.runDAGAndVerify(TestAMRecovery.java:412)
at 
org.apache.tez.test.TestAMRecovery.testVertexPartiallyFinished_Broadcast(TestAMRecovery.java:206)


REGRESSION:  
org.apache.tez.test.TestAMRecovery.testVertexPartialFinished_One2One

Error Message:
expected:SUCCEEDED but was:FAILED

Stack Trace:
java.lang.AssertionError: expected:SUCCEEDED but was:FAILED
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.tez.test.TestAMRecovery.runDAGAndVerify(TestAMRecovery.java:412)
at 
org.apache.tez.test.TestAMRecovery.testVertexPartialFinished_One2One(TestAMRecovery.java:268)


REGRESSION:  
org.apache.tez.test.TestAMRecovery.testVertexPartiallyFinished_ScatterGather

Error Message:
expected:SUCCEEDED but was:FAILED

Stack Trace:
java.lang.AssertionError: expected:SUCCEEDED but was:FAILED
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.tez.test.TestAMRecovery.runDAGAndVerify(TestAMRecovery.java:412)
at 
org.apache.tez.test.TestAMRecovery.testVertexPartiallyFinished_ScatterGather(TestAMRecovery.java:332)


REGRESSION:  
org.apache.tez.test.TestAMRecovery.testVertexCompletelyFinished_ScatterGather

Error Message:
expected:SUCCEEDED but was:FAILED

Stack Trace:
java.lang.AssertionError: expected:SUCCEEDED but was:FAILED
at org.junit.Assert.fail(Assert.java:88)
at 

[jira] [Commented] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out

2015-05-07 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533857#comment-14533857
 ] 

TezQA commented on TEZ-2421:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12731352/TEZ-2421.3.patch
  against master revision 05f77fe.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestAMRecovery
  org.apache.tez.test.TestDAGRecovery

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/655//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/655//console

This message is automatically generated.

 Deadlock in AM because attempt and vertex locking each other out
 

 Key: TEZ-2421
 URL: https://issues.apache.org/jira/browse/TEZ-2421
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
Priority: Blocker
 Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch, TEZ-2421.3.patch


 Ideally locks should be taken one way - either going down or up. Preferably 
 not going up because most such data can be passed in during object 
 construction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2426 PreCommit Build #651

2015-05-07 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2426
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/651/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2851 lines...]




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12731316/TEZ-2426.1.txt
  against master revision 05f77fe.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/651//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/651//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/651//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
49244363923f36d5d16c2f408d9b04fc45947e43 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #650
Archived 44 artifacts
Archive block size is 32768
Received 22 blocks and 2066358 bytes
Compression is 25.9%
Took 1.6 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2410) VertexGroupCommitFinishedEvent VertexCommitStartedEvent is not logged correctly

2015-05-07 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533775#comment-14533775
 ] 

TezQA commented on TEZ-2410:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12731315/TEZ-2410-3.patch
  against master revision 05f77fe.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/652//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/652//console

This message is automatically generated.

 VertexGroupCommitFinishedEvent  VertexCommitStartedEvent is not logged 
 correctly
 -

 Key: TEZ-2410
 URL: https://issues.apache.org/jira/browse/TEZ-2410
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Blocker
 Attachments: TEZ-2410-1.patch, TEZ-2410-1.patch, TEZ-2410-2.patch, 
 TEZ-2410-3.patch


 VertexGroupCommitFinishedEvent may be logged for non-vertex group commits.
 VertexGroupCommitFinishedEvent may be logged for each member vertex of the 
 group instead of once per group.
 VertexCommitStartedEvent may be logged for each output of vertex



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-2410 PreCommit Build #652

2015-05-07 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2410
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/652/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2842 lines...]
[INFO] Final Memory: 71M/945M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12731315/TEZ-2410-3.patch
  against master revision 05f77fe.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/652//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/652//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
47b641bb1b1931096578ad216acbafa6b125f8bc logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #650
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2647421 bytes
Compression is 4.7%
Took 1.1 sec
Description set: TEZ-2410
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

Failed: TEZ-2421 PreCommit Build #653

2015-05-07 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2421
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/653/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2452 lines...]



{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12731341/TEZ-2421.2.patch
  against master revision 05f77fe.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.dag.app.dag.impl.TestDAGImpl

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/653//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/653//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
6dfefc7cbf88aa396327d10f8c9ae72fe629af3f logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #650
Archived 44 artifacts
Archive block size is 32768
Received 19 blocks and 2123407 bytes
Compression is 22.7%
Took 1.3 sec
[description-setter] Could not determine description.
Recording test results
Publish JUnit test result report is waiting for a checkpoint on 
PreCommit-TEZ-Build #652
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
2 tests failed.
REGRESSION:  
org.apache.tez.dag.app.dag.impl.TestDAGImpl.testEdgeManager_GetNumSourceTaskPhysicalOutputs

Error Message:
null

Stack Trace:
java.lang.NullPointerException: null
at 
org.apache.tez.dag.app.dag.impl.TestDAGImpl.testEdgeManager_GetNumSourceTaskPhysicalOutputs(TestDAGImpl.java:1004)


REGRESSION:  
org.apache.tez.dag.app.dag.impl.TestDAGImpl.testEdgeManager_GetNumDestinationTaskPhysicalInputs

Error Message:
null

Stack Trace:
java.lang.NullPointerException: null
at 
org.apache.tez.dag.app.dag.impl.TestDAGImpl.testEdgeManager_GetNumDestinationTaskPhysicalInputs(TestDAGImpl.java:982)




[jira] [Commented] (TEZ-2429) Tez AM does not die after hitting internal error

2015-05-07 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533776#comment-14533776
 ] 

Jeff Zhang commented on TEZ-2429:
-

Can produce the InvalidTransition in TestFaultTolerance, looking at the cause
{code}
2015-05-06 23:55:54,421 ERROR [Dispatcher thread: Central] impl.DAGImpl: Can't 
handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
DAG_VERTEX_RERUNNING at SUCCEEDED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1079)
at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:143)
at 
org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1871)
at 
org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1862)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
at java.lang.Thread.run(Thread.java:662)
{code}

 Tez AM does not die after hitting internal error 
 -

 Key: TEZ-2429
 URL: https://issues.apache.org/jira/browse/TEZ-2429
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Priority: Blocker
 Attachments: syslog_dag_1430956448478_0001_16_post, 
 syslog_dag_1430956448478_0001_17


 From https://builds.apache.org/job/Tez-Build/1055/: 
 2015-05-06 23:55:54,421 ERROR [Dispatcher thread: Central] impl.DAGImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 DAG_VERTEX_RERUNNING at SUCCEEDED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
   at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1079)
   at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:143)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1871)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1862)
   at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
   at java.lang.Thread.run(Thread.java:662)
 2015-05-06 23:55:54,423 INFO [Dispatcher thread: Central] app.DAGAppMaster: 
 Cleaning up DAG: name=testRandomFailingInputs, with 
 id=dag_1430956448478_0001_16
 2015-05-06 23:55:54,423 INFO [Dispatcher thread: Central] app.DAGAppMaster: 
 Completed cleanup for DAG: name=testRandomFailingInputs, with 
 id=dag_1430956448478_0001_16
 2015-05-06 23:55:54,424 INFO [Dispatcher thread: Central] impl.DAGImpl: 
 dag_1430956448478_0001_16 terminating due to internal error
 2015-05-06 23:55:54,433 INFO [IPC Server handler 0 on 47432] 
 app.DAGAppMaster: Starting DAG submitted via RPC: 
 testBasicInputFailureWithExit
 2015-05-06 23:55:54,455 ERROR [Dispatcher thread: Central] 
 common.AsyncDispatcher: Error in dispatcher thread
 java.lang.NullPointerException
   at 
 org.apache.tez.dag.history.recovery.RecoveryService.doFlush(RecoveryService.java:458)
   at 
 org.apache.tez.dag.history.recovery.RecoveryService.handle(RecoveryService.java:289)
   at 
 org.apache.tez.dag.history.HistoryEventHandler.handleCriticalEvent(HistoryEventHandler.java:102)
   at 
 org.apache.tez.dag.app.dag.impl.DAGImpl.logJobHistoryUnsuccesfulEvent(DAGImpl.java:1161)
   at org.apache.tez.dag.app.dag.impl.DAGImpl.finished(DAGImpl.java:1275)
   at org.apache.tez.dag.app.dag.impl.DAGImpl.access$2600(DAGImpl.java:144)
   at 
 org.apache.tez.dag.app.dag.impl.DAGImpl$InternalErrorTransition.transition(DAGImpl.java:2151)
   at 
 org.apache.tez.dag.app.dag.impl.DAGImpl$InternalErrorTransition.transition(DAGImpl.java:2140)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 

[jira] [Commented] (TEZ-2410) VertexGroupCommitFinishedEvent VertexCommitStartedEvent is not logged correctly

2015-05-07 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533851#comment-14533851
 ] 

Jeff Zhang commented on TEZ-2410:
-

Committed to branch-0.7  master. 

 VertexGroupCommitFinishedEvent  VertexCommitStartedEvent is not logged 
 correctly
 -

 Key: TEZ-2410
 URL: https://issues.apache.org/jira/browse/TEZ-2410
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Blocker
 Fix For: 0.7.0

 Attachments: TEZ-2410-1.patch, TEZ-2410-1.patch, TEZ-2410-2.patch, 
 TEZ-2410-3.patch


 VertexGroupCommitFinishedEvent may be logged for non-vertex group commits.
 VertexGroupCommitFinishedEvent may be logged for each member vertex of the 
 group instead of once per group.
 VertexCommitStartedEvent may be logged for each output of vertex



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out

2015-05-07 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2421:

Attachment: TEZ-2421.2.patch

Patch with few more tests

 Deadlock in AM because attempt and vertex locking each other out
 

 Key: TEZ-2421
 URL: https://issues.apache.org/jira/browse/TEZ-2421
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
Priority: Blocker
 Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch


 Ideally locks should be taken one way - either going down or up. Preferably 
 not going up because most such data can be passed in during object 
 construction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2430) Add test for RecoveryEvent Spec

2015-05-07 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2430:

Target Version/s: 0.7.1

 Add test for RecoveryEvent Spec
 ---

 Key: TEZ-2430
 URL: https://issues.apache.org/jira/browse/TEZ-2430
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Jeff Zhang
Assignee: Jeff Zhang

 * Ordering of RecoveryEvents.
 ** DataMovementEvent must be logged before TaskAttemptFinishedEvent
 ** InputDataInfoEvent must be logged before VertexInitializedEvent (already 
 covered in the existing test)
 * Frequency of RecoveryEvent. e.g. TaskAttemptStartedEvent can only been 
 logged once, but TaskAttemptFinishedEvent can been logged twice. (TaskAttempt 
  transit from SUCCEEDED to FAILED)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out

2015-05-07 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533771#comment-14533771
 ] 

TezQA commented on TEZ-2421:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12731341/TEZ-2421.2.patch
  against master revision 05f77fe.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.dag.app.dag.impl.TestDAGImpl

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/653//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/653//console

This message is automatically generated.

 Deadlock in AM because attempt and vertex locking each other out
 

 Key: TEZ-2421
 URL: https://issues.apache.org/jira/browse/TEZ-2421
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
Priority: Blocker
 Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch


 Ideally locks should be taken one way - either going down or up. Preferably 
 not going up because most such data can be passed in during object 
 construction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2429) Tez AM does not die after hitting internal error

2015-05-07 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533815#comment-14533815
 ] 

Bikas Saha commented on TEZ-2429:
-

The main issue though, is whether the AM does not shutdown after the 
InternalError. If it shuts down then this should not be a blocker for 0.7.0.

 Tez AM does not die after hitting internal error 
 -

 Key: TEZ-2429
 URL: https://issues.apache.org/jira/browse/TEZ-2429
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Priority: Blocker
 Attachments: syslog_dag_1430956448478_0001_16_post, 
 syslog_dag_1430956448478_0001_17


 From https://builds.apache.org/job/Tez-Build/1055/: 
 2015-05-06 23:55:54,421 ERROR [Dispatcher thread: Central] impl.DAGImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 DAG_VERTEX_RERUNNING at SUCCEEDED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
   at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1079)
   at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:143)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1871)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1862)
   at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
   at java.lang.Thread.run(Thread.java:662)
 2015-05-06 23:55:54,423 INFO [Dispatcher thread: Central] app.DAGAppMaster: 
 Cleaning up DAG: name=testRandomFailingInputs, with 
 id=dag_1430956448478_0001_16
 2015-05-06 23:55:54,423 INFO [Dispatcher thread: Central] app.DAGAppMaster: 
 Completed cleanup for DAG: name=testRandomFailingInputs, with 
 id=dag_1430956448478_0001_16
 2015-05-06 23:55:54,424 INFO [Dispatcher thread: Central] impl.DAGImpl: 
 dag_1430956448478_0001_16 terminating due to internal error
 2015-05-06 23:55:54,433 INFO [IPC Server handler 0 on 47432] 
 app.DAGAppMaster: Starting DAG submitted via RPC: 
 testBasicInputFailureWithExit
 2015-05-06 23:55:54,455 ERROR [Dispatcher thread: Central] 
 common.AsyncDispatcher: Error in dispatcher thread
 java.lang.NullPointerException
   at 
 org.apache.tez.dag.history.recovery.RecoveryService.doFlush(RecoveryService.java:458)
   at 
 org.apache.tez.dag.history.recovery.RecoveryService.handle(RecoveryService.java:289)
   at 
 org.apache.tez.dag.history.HistoryEventHandler.handleCriticalEvent(HistoryEventHandler.java:102)
   at 
 org.apache.tez.dag.app.dag.impl.DAGImpl.logJobHistoryUnsuccesfulEvent(DAGImpl.java:1161)
   at org.apache.tez.dag.app.dag.impl.DAGImpl.finished(DAGImpl.java:1275)
   at org.apache.tez.dag.app.dag.impl.DAGImpl.access$2600(DAGImpl.java:144)
   at 
 org.apache.tez.dag.app.dag.impl.DAGImpl$InternalErrorTransition.transition(DAGImpl.java:2151)
   at 
 org.apache.tez.dag.app.dag.impl.DAGImpl$InternalErrorTransition.transition(DAGImpl.java:2140)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
   at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1079)
   at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:143)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1871)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1862)
   at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
   at java.lang.Thread.run(Thread.java:662)
 2015-05-06 23:55:54,456 INFO [Dispatcher thread: Central] impl.VertexImpl: 
 Killing tasks in vertex: vertex_1430956448478_0001_16_10 [l4v1] due to 
 trigger: INTERNAL_ERROR
 2015-05-06 23:55:54,456 INFO [Dispatcher thread: Central] impl.VertexImpl: 
 vertex_1430956448478_0001_16_10 

[jira] [Comment Edited] (TEZ-2429) Tez AM does not die after hitting internal error

2015-05-07 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533815#comment-14533815
 ] 

Bikas Saha edited comment on TEZ-2429 at 5/8/15 3:58 AM:
-

The main issue though, is whether the AM does not shutdown after the 
InternalError. If it shuts down then this should not be a blocker for 0.7.0. 
Can be fixed after 0.7.0 unless the hang is a regression.


was (Author: bikassaha):
The main issue though, is whether the AM does not shutdown after the 
InternalError. If it shuts down then this should not be a blocker for 0.7.0.

 Tez AM does not die after hitting internal error 
 -

 Key: TEZ-2429
 URL: https://issues.apache.org/jira/browse/TEZ-2429
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Priority: Blocker
 Attachments: syslog_dag_1430956448478_0001_16_post, 
 syslog_dag_1430956448478_0001_17


 From https://builds.apache.org/job/Tez-Build/1055/: 
 2015-05-06 23:55:54,421 ERROR [Dispatcher thread: Central] impl.DAGImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 DAG_VERTEX_RERUNNING at SUCCEEDED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
   at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1079)
   at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:143)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1871)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1862)
   at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
   at java.lang.Thread.run(Thread.java:662)
 2015-05-06 23:55:54,423 INFO [Dispatcher thread: Central] app.DAGAppMaster: 
 Cleaning up DAG: name=testRandomFailingInputs, with 
 id=dag_1430956448478_0001_16
 2015-05-06 23:55:54,423 INFO [Dispatcher thread: Central] app.DAGAppMaster: 
 Completed cleanup for DAG: name=testRandomFailingInputs, with 
 id=dag_1430956448478_0001_16
 2015-05-06 23:55:54,424 INFO [Dispatcher thread: Central] impl.DAGImpl: 
 dag_1430956448478_0001_16 terminating due to internal error
 2015-05-06 23:55:54,433 INFO [IPC Server handler 0 on 47432] 
 app.DAGAppMaster: Starting DAG submitted via RPC: 
 testBasicInputFailureWithExit
 2015-05-06 23:55:54,455 ERROR [Dispatcher thread: Central] 
 common.AsyncDispatcher: Error in dispatcher thread
 java.lang.NullPointerException
   at 
 org.apache.tez.dag.history.recovery.RecoveryService.doFlush(RecoveryService.java:458)
   at 
 org.apache.tez.dag.history.recovery.RecoveryService.handle(RecoveryService.java:289)
   at 
 org.apache.tez.dag.history.HistoryEventHandler.handleCriticalEvent(HistoryEventHandler.java:102)
   at 
 org.apache.tez.dag.app.dag.impl.DAGImpl.logJobHistoryUnsuccesfulEvent(DAGImpl.java:1161)
   at org.apache.tez.dag.app.dag.impl.DAGImpl.finished(DAGImpl.java:1275)
   at org.apache.tez.dag.app.dag.impl.DAGImpl.access$2600(DAGImpl.java:144)
   at 
 org.apache.tez.dag.app.dag.impl.DAGImpl$InternalErrorTransition.transition(DAGImpl.java:2151)
   at 
 org.apache.tez.dag.app.dag.impl.DAGImpl$InternalErrorTransition.transition(DAGImpl.java:2140)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
   at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1079)
   at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:143)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1871)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1862)
   at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
   at java.lang.Thread.run(Thread.java:662)
 

[jira] [Updated] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out

2015-05-07 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2421:

Attachment: TEZ-2421.3.patch

Patches fixes jenkins test failure

 Deadlock in AM because attempt and vertex locking each other out
 

 Key: TEZ-2421
 URL: https://issues.apache.org/jira/browse/TEZ-2421
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
Priority: Blocker
 Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch, TEZ-2421.3.patch


 Ideally locks should be taken one way - either going down or up. Preferably 
 not going up because most such data can be passed in during object 
 construction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out

2015-05-07 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533904#comment-14533904
 ] 

Jeff Zhang commented on TEZ-2421:
-

It cause the TestAMRecovery fail.  

{code}
2015-05-08 13:35:25,672 INFO [Dispatcher thread: Central] impl.VertexImpl: 
Source task attempt completed for vertex: vertex_1431063298340_0001_1_01 [v2] 
attempt: attempt_1431063298340_0001_1_00_00_0 with state: SUCCEEDED 
vertexState: RUNNING
2015-05-08 13:35:25,672 ERROR [Dispatcher thread: Central] 
common.AsyncDispatcher: Error in dispatcher thread
java.lang.NullPointerException
at 
org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.createRemoteTaskSpec(TaskAttemptImpl.java:461)
at 
org.apache.tez.dag.app.dag.impl.TaskAttemptImpl$ScheduleTaskattemptTransition.transition(TaskAttemptImpl.java:1012)
at 
org.apache.tez.dag.app.dag.impl.TaskAttemptImpl$ScheduleTaskattemptTransition.transition(TaskAttemptImpl.java:1)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:673)
at 
org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1)
at 
org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:1920)
at 
org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:1)
at 
org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
at java.lang.Thread.run(Thread.java:745)
{code}

 Deadlock in AM because attempt and vertex locking each other out
 

 Key: TEZ-2421
 URL: https://issues.apache.org/jira/browse/TEZ-2421
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
Priority: Blocker
 Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch, TEZ-2421.3.patch


 Ideally locks should be taken one way - either going down or up. Preferably 
 not going up because most such data can be passed in during object 
 construction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1526) LoadingCache for TezTaskID slow for large jobs

2015-05-07 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-1526:
-
Attachment: TEZ-1526.3.patch

 LoadingCache for TezTaskID slow for large jobs
 --

 Key: TEZ-1526
 URL: https://issues.apache.org/jira/browse/TEZ-1526
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
  Labels: performance
 Attachments: 10-TezTaskIDs.patch, TEZ-1526-v1.patch, 
 TEZ-1526-v2.patch, TEZ-1526.3.patch


 Using the LoadingCache with default builder settings. 100,000 TezTaskIDs are 
 created in 10 seconds on my setup. With a LoadingCache initialCapacity of 
 10,000 they are created in 300 ms. With no LoadingCache, they are created in 
 10 ms. A test case in attached to illustrate the condition I would like to be 
 sped up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2426) Task input not complete before sending Task completed event

2015-05-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533404#comment-14533404
 ] 

Siddharth Seth commented on TEZ-2426:
-

Alright. Have a theory on what's happening. Lots of threads involved. This 
ignores the LOG lines showing up in the wrong log files (assuming the logger 
doesn't guarantee ordering when logging from different threads).

- TaskEventRouter for 456 sees an error. (This can happen because of clean up / 
some fields not being volatile in inputContext).
- TaskEventRouter is swapped out.
- TaskCompletes, sends out it's success message (heartbeat)
- TaskEventRouter thread regains control - tries sending out the TaskFailed 
message. (This is all before the next start has started. It may or may not have 
got an interrupt by this point).
- Main thread falls off. Starts running another task. This thread can heartbeat 
since it doesn't synchronize with the previous tasks heartbeats.
- The TaskEventRouter for 465 regains control. Goes into the IPC layer and 
tries sending the FAILED message (via a future). There's a context switch 
before the futute.get(). The future runs. future.get() is interrupted, because 
the thread has seen it's interrupt status by this point. Leads to the various 
errors in the logs.

This doesn't however explain a status_update after the failed message is sent. 
Don't really see what can cause that.

Couple of things which need fixing here 
1) Join on the TaskEventRouter
2) Join on the last tasks heartbeat thread
3) Fixes to *Context to revert fields back to final, or volatile
4) Avoid sending any more messages once any one final message has been sent.

 Task input not complete before sending Task completed event
 ---

 Key: TEZ-2426
 URL: https://issues.apache.org/jira/browse/TEZ-2426
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Priority: Critical
 Attachments: am.log, container.log


 Sequence of events
 1) Task A starts in a container
 2) Task A complete event comes to AM
 3) Task B starts in the same container
 4) Task A's input calls some method on its context. Crashes with NPE
 5) The crash sends an input failed event for Task A to the AM
 6) Task A state machine crashes saying cannot handle failed after success
 In some cases, it could be that status update event is also sent after 
 completion, though not sure if its related to the failed event being sent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2426) Task input not complete before sending Task completed event

2015-05-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533406#comment-14533406
 ] 

Siddharth Seth commented on TEZ-2426:
-

Longer term - 0.8, may be worthwhile to rework some of this, along with 
protocol changes.

 Task input not complete before sending Task completed event
 ---

 Key: TEZ-2426
 URL: https://issues.apache.org/jira/browse/TEZ-2426
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Priority: Critical
 Attachments: am.log, container.log


 Sequence of events
 1) Task A starts in a container
 2) Task A complete event comes to AM
 3) Task B starts in the same container
 4) Task A's input calls some method on its context. Crashes with NPE
 5) The crash sends an input failed event for Task A to the AM
 6) Task A state machine crashes saying cannot handle failed after success
 In some cases, it could be that status update event is also sent after 
 completion, though not sure if its related to the failed event being sent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

2015-05-07 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533495#comment-14533495
 ] 

TezQA commented on TEZ-776:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12731275/TEZ-776.15.patch
  against master revision a382324.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/650//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/650//console

This message is automatically generated.

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
Priority: Blocker
 Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, 
 TEZ-776.12.patch, TEZ-776.13.patch, TEZ-776.14.patch, TEZ-776.15.patch, 
 TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, 
 TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.8.patch, 
 TEZ-776.9.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, 
 TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, 
 TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, 
 With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, 
 Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, 
 with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2429) Tez AM does not die after hitting internal error

2015-05-07 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533587#comment-14533587
 ] 

Bikas Saha commented on TEZ-2429:
-

Not seeing this on an internal error (due to WIP code) in a cluster
{noformat}2015-05-07 16:17:04,876 INFO [main] impl.DAGImpl: Using DAG 
Scheduler: org.apache.tez.dag.app.dag.impl.DAGSchedulerNaturalOrder 


2015-05-07 16:17:04,878 INFO [main] history.HistoryEventHandler: 
[HISTORY][DAG:dag_1429683757595_0799_1][Event:DAG_INITIALIZED]: 
dagID=dag_1429683757595_0799_1, initTime=1431040624805  
   
2015-05-07 16:17:04,878 INFO [main] impl.DAGImpl: dag_1429683757595_0799_1 
transitioned from NEW to INITED  
2015-05-07 16:17:04,884 INFO [Dispatcher thread: Central] 
history.HistoryEventHandler: 
[HISTORY][DAG:dag_1429683757595_0799_1][Event:DAG_STARTED]: 
dagID=dag_1429683757595_0799_1, startTime=1431040624883 
 
2015-05-07 16:17:04,884 INFO [Dispatcher thread: Central] impl.DAGImpl: Added 
additional resources : [[]] to classpath  
2015-05-07 16:17:04,885 INFO [Dispatcher thread: Central] impl.DAGImpl: 
dag_1429683757595_0799_1 transitioned from INITED to RUNNING


2015-05-07 16:17:04,886 INFO [Dispatcher thread: Central] impl.VertexImpl: 
Setting vertexManager to ImmediateStartVertexManager for 
vertex_1429683757595_0799_1_00 [map]

2015-05-07 16:17:04,894 INFO [Dispatcher thread: Central] impl.VertexImpl: 
Creating 1 tasks for vertex: vertex_1429683757595_0799_1_00 [map]   

 
2015-05-07 16:17:04,907 ERROR [Dispatcher thread: Central] 
common.AsyncDispatcher: Error in dispatcher thread   
java.lang.NullPointerException: taskAttemptID is null   

at 
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)   
 
at org.apache.tez.runtime.api.impl.TaskSpec.init(TaskSpec.java:55)

at 
org.apache.tez.dag.app.dag.impl.VertexImpl.createRemoteTaskSpec(VertexImpl.java:2178)

at 
org.apache.tez.dag.app.dag.impl.VertexImpl.createTask(VertexImpl.java:2195) 
 
at 
org.apache.tez.dag.app.dag.impl.VertexImpl.createTasks(VertexImpl.java:2200)
 
at 
org.apache.tez.dag.app.dag.impl.VertexImpl.access$4500(VertexImpl.java:196) 
 
at 
org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:3207)
  
at 
org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3129)
   
at 
org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3110)
   
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
  
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
  
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 
at 
org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)  
 
at 
org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1748) 
 
at 
org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:195)  
 
at 
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1938)
 
at 
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1924)
 
at 
org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
 
at 
org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)   
 
at java.lang.Thread.run(Thread.java:745)


Success: TEZ-776 PreCommit Build #650

2015-05-07 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-776
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/650/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2867 lines...]
[INFO] Final Memory: 68M/866M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12731275/TEZ-776.15.patch
  against master revision a382324.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/650//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/650//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
70902dc5884b5c972a00f134c1a87ddcaafec793 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #649
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2646352 bytes
Compression is 4.7%
Took 1.5 sec
Description set: TEZ-776
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2428) Investigate ignored event: DAGAppMaster: ignore event when DAGAppMaster is in the state of STOPPED, eventType=NEW_DAG_SUBMITTED

2015-05-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532910#comment-14532910
 ] 

Siddharth Seth commented on TEZ-2428:
-

[~hitesh] - a little more context please. Couldn't figure out what this means 
from the linked build.

 Investigate ignored event: DAGAppMaster: ignore event when DAGAppMaster is in 
 the state of STOPPED, eventType=NEW_DAG_SUBMITTED
 ---

 Key: TEZ-2428
 URL: https://issues.apache.org/jira/browse/TEZ-2428
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah

 \cc [~sseth]
 From https://builds.apache.org/job/Tez-Build/1055/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-05-07 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532993#comment-14532993
 ] 

Rohini Palaniswamy commented on TEZ-2221:
-

https://issues.apache.org/jira/secure/attachment/12730678/TEZ-2221-5-revert.patch
 looks good. +1 for that patch.

 VertexGroup name should be unqiue
 -

 Key: TEZ-2221
 URL: https://issues.apache.org/jira/browse/TEZ-2221
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0, 0.5.4, 0.6.1

 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
 TEZ-2221-4.patch, TEZ-2221-5-revert.patch


 VertexGroupCommitStartedEvent  VertexGroupCommitFinishedEvent use vertex 
 group name to identify the vertex group commit, the same name of vertex group 
 will conflict. While in the current equals  hashCode of VertexGroup, vertex 
 group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2404 PreCommit Build #648

2015-05-07 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2404
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/648/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2638 lines...]




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12731076/TEZ-2404-3.patch
  against master revision 02870f0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/648//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/648//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
0be151e4b9c7abb835891328d7cfd36825324a32 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #646
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2628496 bytes
Compression is 4.7%
Took 1.6 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
6 tests failed.
FAILED:  org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit

Error Message:
expected:SUCCEEDED but was:FAILED

Stack Trace:
java.lang.AssertionError: expected:SUCCEEDED but was:FAILED
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:135)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:114)
at 
org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit(TestFaultTolerance.java:248)


FAILED:  
org.apache.tez.test.TestFaultTolerance.testInputFailureRerunCanSendOutputToTwoDownstreamVertices

Error Message:
TezSession has already shutdown. No cluster diagnostics found.

Stack Trace:
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No 
cluster diagnostics found.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:678)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:118)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:114)
at 
org.apache.tez.test.TestFaultTolerance.testInputFailureRerunCanSendOutputToTwoDownstreamVertices(TestFaultTolerance.java:672)


FAILED:  
org.apache.tez.test.TestFaultTolerance.testMultipleInputFailureWithoutExit

Error Message:
TezSession has already shutdown. No cluster diagnostics found.

Stack Trace:
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No 
cluster diagnostics found.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:678)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:118)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:114)
at 
org.apache.tez.test.TestFaultTolerance.testMultipleInputFailureWithoutExit(TestFaultTolerance.java:297)


FAILED:  
org.apache.tez.test.TestFaultTolerance.testCascadingInputFailureWithExitSuccess

Error Message:
TezSession has already 

[jira] [Commented] (TEZ-2404) Handle DataMovementEvent before its TaskAttemptCompletedEvent

2015-05-07 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532076#comment-14532076
 ] 

TezQA commented on TEZ-2404:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12731076/TEZ-2404-3.patch
  against master revision 02870f0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/648//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/648//console

This message is automatically generated.

 Handle DataMovementEvent before its TaskAttemptCompletedEvent
 -

 Key: TEZ-2404
 URL: https://issues.apache.org/jira/browse/TEZ-2404
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Critical
 Attachments: TEZ-2404-1.patch, TEZ-2404-2.patch, TEZ-2404-3.patch


 TEZ-2325 route TASK_ATTEMPT_COMPLETED_EVENT directly to the attempt, but it 
 would cause recovery issue. Recovery need that DataMovement event is handled 
 before TaskAttemptCompletedEvent, otherwise DataMovement event may be lost in 
 recovering and cause the its dependent tasks hang.
 2 Ways to fix this issue.
 1. Still route TaskAtttemptCompletedEvent in Vertex
 2. route DataMovementEvent before TaskAttemptCompeltedEvent in 
 TezTaskAttemptListener



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2416) Tez UI: Make tooltips display faster.

2015-05-07 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-2416:
--
Summary: Tez UI: Make tooltips display faster.  (was: TEZ-UI: Make tooltips 
display faster.)

 Tez UI: Make tooltips display faster.
 -

 Key: TEZ-2416
 URL: https://issues.apache.org/jira/browse/TEZ-2416
 Project: Apache Tez
  Issue Type: Bug
Reporter: Sreenath Somarajapuram
Assignee: Sreenath Somarajapuram
 Attachments: TEZ-2416.1.patch, TEZ-2416.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2423) Tez UI: Remove Attempt Index column from task-attempts page

2015-05-07 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532532#comment-14532532
 ] 

Prakash Ramachandran commented on TEZ-2423:
---

+1 commiting

 Tez UI: Remove Attempt Index column from task-attempts page
 

 Key: TEZ-2423
 URL: https://issues.apache.org/jira/browse/TEZ-2423
 Project: Apache Tez
  Issue Type: Bug
Reporter: Sreenath Somarajapuram
Assignee: Sreenath Somarajapuram
 Fix For: 0.7.0, 0.8.0

 Attachments: TEZ-2423.1.patch


 Attempt Index and Attempt No serves the same purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)