date:20150817


[ 
https://issues.apache.org/jira/browse/TEZ-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700278#comment-14700278
 ] 

Bikas Saha commented on TEZ-2300:
-

Then perhaps the patch could killDAG() and send a (new) message to the 
scheduler to release resources. Then proceed with normal stop (like we do 
today). From what I see, AM shutdown today does not kill the DAG.

 TezClient.stop() takes a lot of time or does not work sometimes
 ---

 Key: TEZ-2300
 URL: https://issues.apache.org/jira/browse/TEZ-2300
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy
Assignee: Jonathan Eagles
 Attachments: TEZ-2300.1.patch, TEZ-2300.2.patch, TEZ-2300.3.patch, 
 TEZ-2300.4.patch, syslog_dag_1428329756093_325099_1_post 


   Noticed this with a couple of pig scripts which were not behaving well (AM 
 close to OOM, etc) and even with some that were running fine. Pig calls 
 Tezclient.stop() in shutdown hook. Ctrl+C to the pig script either exits 
 immediately or is hung. In both cases it either takes a long time for the 
 yarn application to go to KILLED state. Many times I just end up calling yarn 
 application -kill separately after waiting for 5 mins or more for it to get 
 killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2670) Remove TaskAttempt holder used within TezTaskCommunicator


 [ 
https://issues.apache.org/jira/browse/TEZ-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2670:

Attachment: TEZ-2670.1.txt

Simple patch to replace TaskAttempt with TaskAttemptId, and reduce unnecessary 
object creation.

 Remove TaskAttempt holder used within TezTaskCommunicator
 -

 Key: TEZ-2670
 URL: https://issues.apache.org/jira/browse/TEZ-2670
 Project: Apache Tez
  Issue Type: Sub-task
Affects Versions: TEZ-2003
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: TEZ-2670.1.txt


 This will rely on using IDs or the equivalent construct exposed by Tez.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (TEZ-2670) Remove TaskAttempt holder used within TezTaskCommunicator


 [ 
https://issues.apache.org/jira/browse/TEZ-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved TEZ-2670.
-
   Resolution: Fixed
Fix Version/s: TEZ-2003

 Remove TaskAttempt holder used within TezTaskCommunicator
 -

 Key: TEZ-2670
 URL: https://issues.apache.org/jira/browse/TEZ-2670
 Project: Apache Tez
  Issue Type: Sub-task
Affects Versions: TEZ-2003
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: TEZ-2003

 Attachments: TEZ-2670.1.txt


 This will rely on using IDs or the equivalent construct exposed by Tez.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2300) TezClient.stop() takes a lot of time or does not work sometimes


[ 
https://issues.apache.org/jira/browse/TEZ-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700300#comment-14700300
 ] 

Hitesh Shah commented on TEZ-2300:
--

bq. Currently there are no APIs to cancel a DAG

DAGClient::tryKillDAG()

bq. From what I see, AM shutdown today does not kill the DAG.

DAGAppMaster::shutdownTezAM() tries to kill the dag first.

 TezClient.stop() takes a lot of time or does not work sometimes
 ---

 Key: TEZ-2300
 URL: https://issues.apache.org/jira/browse/TEZ-2300
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy
Assignee: Jonathan Eagles
 Attachments: TEZ-2300.1.patch, TEZ-2300.2.patch, TEZ-2300.3.patch, 
 TEZ-2300.4.patch, syslog_dag_1428329756093_325099_1_post 


   Noticed this with a couple of pig scripts which were not behaving well (AM 
 close to OOM, etc) and even with some that were running fine. Pig calls 
 Tezclient.stop() in shutdown hook. Ctrl+C to the pig script either exits 
 immediately or is hung. In both cases it either takes a long time for the 
 yarn application to go to KILLED state. Many times I just end up calling yarn 
 application -kill separately after waiting for 5 mins or more for it to get 
 killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2726) Handle invalid number of partitions for SCATTER-GATHER edge


[ 
https://issues.apache.org/jira/browse/TEZ-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700410#comment-14700410
 ] 

Rohini Palaniswamy commented on TEZ-2726:
-

There was some bug in Pig planning  (yet to debug and create jira) which was 
setting incorrect edge types.

 Handle invalid number of partitions for SCATTER-GATHER edge
 ---

 Key: TEZ-2726
 URL: https://issues.apache.org/jira/browse/TEZ-2726
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Saikat
Assignee: Saikat

 Encountered an issue where the source vertex has M task and sink vertex has N 
 tasks (N  M), [e.g. M = 1, N = 3]and the edge is of type SCATTER -GATHER.
 This resulted in sink vertex receiving DMEs with non existent targetIds.
 The fetchers for the sink vertex tasks then try to retrieve the map outputs 
 and retrieve invalid headers due to exception in the ShuffleHandler.
 Possible fixes:
 1. raise proper Tez Exception to indicate this invalid scenario.
 2. or write appropriate empty partition bits, for the missing partitions 
 before sending out the DMEs to sink vertex. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2300) TezClient.stop() takes a lot of time or does not work sometimes


[ 
https://issues.apache.org/jira/browse/TEZ-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700319#comment-14700319
 ] 

Bikas Saha commented on TEZ-2300:
-

I mean directly in the shutdown handler which would happen when the AM was 
killed by the RM. Not sure if Pig is using shutdownTezAM() or just calling 
killApplication on YARN.

 TezClient.stop() takes a lot of time or does not work sometimes
 ---

 Key: TEZ-2300
 URL: https://issues.apache.org/jira/browse/TEZ-2300
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy
Assignee: Jonathan Eagles
 Attachments: TEZ-2300.1.patch, TEZ-2300.2.patch, TEZ-2300.3.patch, 
 TEZ-2300.4.patch, syslog_dag_1428329756093_325099_1_post 


   Noticed this with a couple of pig scripts which were not behaving well (AM 
 close to OOM, etc) and even with some that were running fine. Pig calls 
 Tezclient.stop() in shutdown hook. Ctrl+C to the pig script either exits 
 immediately or is hung. In both cases it either takes a long time for the 
 yarn application to go to KILLED state. Many times I just end up calling yarn 
 application -kill separately after waiting for 5 mins or more for it to get 
 killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2728) Wrap IPC connection Exception as SessionNotRunning - RM crash

2015-08-17 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-2728:
-
Attachment: hive.log.gz

 Wrap IPC connection Exception as SessionNotRunning - RM crash
 -

 Key: TEZ-2728
 URL: https://issues.apache.org/jira/browse/TEZ-2728
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0, 0.5.4, 0.6.2, 0.8.0
Reporter: Gopal V
Assignee: Hitesh Shah
 Attachments: hive.log.gz


 Crashing the RM when a query session is open and restarting it does not 
 result in a recoverable state for a Hive session.
 {code}
 2015-08-17T22:34:21,981 INFO  [main]: ipc.Client 
 (Client.java:handleConnectionFailure(885)) - Retrying connect to server: 
 cn042-10.sandbox.hortonworks.com/172.19.128.42:10200. Already tried 48 
 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
 sleepTime=1000 MILLISECONDS)
 2015-08-17T22:34:22,982 INFO  [main]: ipc.Client 
 (Client.java:handleConnectionFailure(885)) - Retrying connect to server: 
 cn042-10.sandbox.hortonworks.com/172.19.128.42:10200. Already tried 49 
 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
 sleepTime=1000 MILLISECONDS)
 2015-08-17T22:34:22,987 ERROR [main]: exec.Task (TezTask.java:execute(195)) - 
 Failed to execute tez graph.
 java.net.ConnectException: Call From 
 cn041.sandbox.hortonworks.com/172.19.128.41 to 
 cn042.sandbox.hortonworks.com:10200 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method) ~[?:1.8.0_51]
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  ~[?:1.8.0_51]
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  ~[?:1.8.0_51]
 at java.lang.reflect.Constructor.newInstance(Constructor.java:422) 
 ~[?:1.8.0_51]
 at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) 
 ~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) 
 ~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
 at org.apache.hadoop.ipc.Client.call(Client.java:1444) 
 ~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
 at org.apache.hadoop.ipc.Client.call(Client.java:1371) 
 ~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
  ~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
 at com.sun.proxy.$Proxy41.getApplicationReport(Unknown Source) ~[?:?]
 at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationHistoryProtocolPBClientImpl.getApplicationReport(ApplicationHistoryProtocolPBClientImpl.java:108)
  ~[hadoop-yarn-common-2.8.0-20150721.221214-843.jar:?]
 at 
 org.apache.hadoop.yarn.client.api.impl.AHSClientImpl.getApplicationReport(AHSClientImpl.java:101)
  ~[hadoop-yarn-client-2.8.0-20150721.221233-841.jar:?]
 at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:442)
  ~[hadoop-yarn-client-2.8.0-20150721.221233-841.
 jar:?]
 at 
 org.apache.tez.client.TezYarnClient.getApplicationReport(TezYarnClient.java:89)
  ~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:835) 
 ~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
 at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:713) 
 ~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
 at org.apache.tez.client.TezClient.waitForProxy(TezClient.java:723) 
 ~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:453) 
 ~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
 at org.apache.tez.client.TezClient.submitDAG(TezClient.java:391) 
 ~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:409) 
 ~[hive-exec-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2300) TezClient.stop() takes a lot of time or does not work sometimes


[ 
https://issues.apache.org/jira/browse/TEZ-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700329#comment-14700329
 ] 

Hitesh Shah commented on TEZ-2300:
--

[~bikassaha] As the jira title states, I believe they are invoking 
TezClient::stop()

 TezClient.stop() takes a lot of time or does not work sometimes
 ---

 Key: TEZ-2300
 URL: https://issues.apache.org/jira/browse/TEZ-2300
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy
Assignee: Jonathan Eagles
 Attachments: TEZ-2300.1.patch, TEZ-2300.2.patch, TEZ-2300.3.patch, 
 TEZ-2300.4.patch, syslog_dag_1428329756093_325099_1_post 


   Noticed this with a couple of pig scripts which were not behaving well (AM 
 close to OOM, etc) and even with some that were running fine. Pig calls 
 Tezclient.stop() in shutdown hook. Ctrl+C to the pig script either exits 
 immediately or is hung. In both cases it either takes a long time for the 
 yarn application to go to KILLED state. Many times I just end up calling yarn 
 application -kill separately after waiting for 5 mins or more for it to get 
 killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2726) Handle invalid number of partitions for SCATTER-GATHER edge


[ 
https://issues.apache.org/jira/browse/TEZ-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700555#comment-14700555
 ] 

Bikas Saha commented on TEZ-2726:
-

Still not sure, what the exact sequence of events was for the error. A planning 
bug cause empty partitions and somehow Tez handled the empty partitions 
erroneously? It will really help if we had logs or some sequence of events that 
produced the error. Tez does have some handling for empty partitions but thats 
an optimization to not fetch them (since they are empty).

 Handle invalid number of partitions for SCATTER-GATHER edge
 ---

 Key: TEZ-2726
 URL: https://issues.apache.org/jira/browse/TEZ-2726
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Saikat
Assignee: Saikat

 Encountered an issue where the source vertex has M task and sink vertex has N 
 tasks (N  M), [e.g. M = 1, N = 3]and the edge is of type SCATTER -GATHER.
 This resulted in sink vertex receiving DMEs with non existent targetIds.
 The fetchers for the sink vertex tasks then try to retrieve the map outputs 
 and retrieve invalid headers due to exception in the ShuffleHandler.
 Possible fixes:
 1. raise proper Tez Exception to indicate this invalid scenario.
 2. or write appropriate empty partition bits, for the missing partitions 
 before sending out the DMEs to sink vertex. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Success: TEZ-2727 PreCommit Build #1000

Jira: https://issues.apache.org/jira/browse/TEZ-2727
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1000/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 3620 lines...]
[INFO] Final Memory: 93M/1163M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12750879/2003_20150817.1.txt
  against master revision 6cb8206.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 51 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1000//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1000//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
9789e853f14a49bfc68c5ee4acc461d5551d0d68 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #988
Archived 53 artifacts
Archive block size is 32768
Received 0 blocks and 3205963 bytes
Compression is 0.0%
Took 1.3 sec
Description set: TEZ-2727
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2728) Wrap IPC connection Exception as SessionNotRunning - RM crash


[ 
https://issues.apache.org/jira/browse/TEZ-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700383#comment-14700383
 ] 

Hitesh Shah commented on TEZ-2728:
--

Could you attach the full stack trace/log? 

This looks like the ipc call eventually timed out. I am not sure whether we can 
safely assume that the session is not running if the RM is down but could later 
come back up and recover the yarn application. 

Instead should Hive consider treating any exception from submitDAG as an excuse 
to try killing the session and re-trying with a new one?

 

 Wrap IPC connection Exception as SessionNotRunning - RM crash
 -

 Key: TEZ-2728
 URL: https://issues.apache.org/jira/browse/TEZ-2728
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0, 0.5.4, 0.6.2, 0.8.0
Reporter: Gopal V
Assignee: Hitesh Shah

 Crashing the RM when a query session is open and restarting it does not 
 result in a recoverable state for a Hive session.
 {code}
 2015-08-17T22:34:21,981 INFO  [main]: ipc.Client 
 (Client.java:handleConnectionFailure(885)) - Retrying connect to server: 
 cn042-10.sandbox.hortonworks.com/172.19.128.42:10200. Already tried 48 
 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
 sleepTime=1000 MILLISECONDS)
 2015-08-17T22:34:22,982 INFO  [main]: ipc.Client 
 (Client.java:handleConnectionFailure(885)) - Retrying connect to server: 
 cn042-10.sandbox.hortonworks.com/172.19.128.42:10200. Already tried 49 
 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
 sleepTime=1000 MILLISECONDS)
 2015-08-17T22:34:22,987 ERROR [main]: exec.Task (TezTask.java:execute(195)) - 
 Failed to execute tez graph.
 java.net.ConnectException: Call From 
 cn041.sandbox.hortonworks.com/172.19.128.41 to 
 cn042.sandbox.hortonworks.com:10200 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method) ~[?:1.8.0_51]
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  ~[?:1.8.0_51]
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  ~[?:1.8.0_51]
 at java.lang.reflect.Constructor.newInstance(Constructor.java:422) 
 ~[?:1.8.0_51]
 at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) 
 ~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) 
 ~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
 at org.apache.hadoop.ipc.Client.call(Client.java:1444) 
 ~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
 at org.apache.hadoop.ipc.Client.call(Client.java:1371) 
 ~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
  ~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
 at com.sun.proxy.$Proxy41.getApplicationReport(Unknown Source) ~[?:?]
 at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationHistoryProtocolPBClientImpl.getApplicationReport(ApplicationHistoryProtocolPBClientImpl.java:108)
  ~[hadoop-yarn-common-2.8.0-20150721.221214-843.jar:?]
 at 
 org.apache.hadoop.yarn.client.api.impl.AHSClientImpl.getApplicationReport(AHSClientImpl.java:101)
  ~[hadoop-yarn-client-2.8.0-20150721.221233-841.jar:?]
 at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:442)
  ~[hadoop-yarn-client-2.8.0-20150721.221233-841.
 jar:?]
 at 
 org.apache.tez.client.TezYarnClient.getApplicationReport(TezYarnClient.java:89)
  ~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:835) 
 ~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
 at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:713) 
 ~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
 at org.apache.tez.client.TezClient.waitForProxy(TezClient.java:723) 
 ~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:453) 
 ~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
 at org.apache.tez.client.TezClient.submitDAG(TezClient.java:391) 
 ~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:409) 
 ~[hive-exec-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2300) TezClient.stop() takes a lot of time or does not work sometimes


[ 
https://issues.apache.org/jira/browse/TEZ-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700334#comment-14700334
 ] 

Rohini Palaniswamy commented on TEZ-2300:
-

bq. DAGClient::tryKillDAG()
Sorry missed the DAGClient API as I was only looking at TezClient API.

bq. Not sure if Pig is using shutdownTezAM() or just calling killApplication on 
YARN.
  We do not killApplication on YARN. We call TezClient.stop() which calls 
proxy.shutdownSession. TezClient.stop() tries to kill via YARN but only if it 
was not able to connect and send shutdown request to Tez AM. Don't think I have 
seen cases which have gone into that condition. 
Problem is in bad cases like big event queue backlog the shutdown happens 
after 10-15 mins. It should kill via YARN if shutdown does not happen within a 
reasonable amount of time in addition to when not able to connect.

{code}
if (!sessionShutdownSuccessful) {
  LOG.info(Could not connect to AM, killing session via YARN
  + , sessionName= + clientName
  + , applicationId= + sessionAppId);
  try {
frameworkClient.killApplication(sessionAppId);
  } catch (ApplicationNotFoundException e) {
LOG.info(Failed to kill nonexistent application  + sessionAppId, 
e);
  } catch (YarnException e) {
throw new TezException(e);
  }
}
{code}

 TezClient.stop() takes a lot of time or does not work sometimes
 ---

 Key: TEZ-2300
 URL: https://issues.apache.org/jira/browse/TEZ-2300
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy
Assignee: Jonathan Eagles
 Attachments: TEZ-2300.1.patch, TEZ-2300.2.patch, TEZ-2300.3.patch, 
 TEZ-2300.4.patch, syslog_dag_1428329756093_325099_1_post 


   Noticed this with a couple of pig scripts which were not behaving well (AM 
 close to OOM, etc) and even with some that were running fine. Pig calls 
 Tezclient.stop() in shutdown hook. Ctrl+C to the pig script either exits 
 immediately or is hung. In both cases it either takes a long time for the 
 yarn application to go to KILLED state. Many times I just end up calling yarn 
 application -kill separately after waiting for 5 mins or more for it to get 
 killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2727) Fix findbugs warnings


[ 
https://issues.apache.org/jira/browse/TEZ-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700355#comment-14700355
 ] 

TezQA commented on TEZ-2727:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12750879/2003_20150817.1.txt
  against master revision 6cb8206.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 51 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1000//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1000//console

This message is automatically generated.

 Fix findbugs warnings
 -

 Key: TEZ-2727
 URL: https://issues.apache.org/jira/browse/TEZ-2727
 Project: Apache Tez
  Issue Type: Sub-task
Affects Versions: TEZ-2003
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: TEZ-2003

 Attachments: 2003_20150817.1.txt, TEZ-2727.1.txt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2726) Handle invalid number of partitions for SCATTER-GATHER edge

2015-08-17 Thread Rajesh Balamohan (JIRA)

[
https://issues.apache.org/jira/browse/TEZ-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700369#comment-14700369
]

Rajesh Balamohan commented on TEZ-2726:
---

[~saikatr] - Is there any repro for this? When you say invalid headers, is it
something like the following? Can you plz provide more info?

{noformat}
org.apache.tez.runtime.library.common.shuffle.impl.Fetcher: Invalid map id
java.lang.IllegalArgumentException: Invalid header received:
W^s??.attempt_1399351577718_4169_1_ partition: 95
{noformat}

If so, are you using tez.runtime.intermediate-output.compress.codec =
org.apache.hadoop.io.compress.DefaultCodec ?

Handle invalid number of partitions for SCATTER-GATHER edge
---

Key: TEZ-2726
URL: https://issues.apache.org/jira/browse/TEZ-2726
Project: Apache Tez
Issue Type: Improvement
Reporter: Saikat
Assignee: Saikat

Encountered an issue where the source vertex has M task and sink vertex has N
tasks (N M), [e.g. M = 1, N = 3]and the edge is of type SCATTER -GATHER.
This resulted in sink vertex receiving DMEs with non existent targetIds.
The fetchers for the sink vertex tasks then try to retrieve the map outputs
and retrieve invalid headers due to exception in the ShuffleHandler.
Possible fixes:
1. raise proper Tez Exception to indicate this invalid scenario.
2. or write appropriate empty partition bits, for the missing partitions
before sending out the DMEs to sink vertex.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2294) Add tez-site-template.xml with description of config properties


[ 
https://issues.apache.org/jira/browse/TEZ-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700645#comment-14700645
 ] 

TezQA commented on TEZ-2294:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12750911/TEZ-2294.7.patch
  against master revision 6cb8206.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1001//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1001//console

This message is automatically generated.

 Add tez-site-template.xml with description of config properties
 ---

 Key: TEZ-2294
 URL: https://issues.apache.org/jira/browse/TEZ-2294
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Rajesh Balamohan
Assignee: Hitesh Shah
 Attachments: TEZ-2294.4.patch, TEZ-2294.5.patch, TEZ-2294.6.patch, 
 TEZ-2294.7.patch, TEZ-2294.wip.2.patch, TEZ-2294.wip.3.patch, 
 TEZ-2294.wip.patch, TezConfiguration.html, TezRuntimeConfiguration.html, 
 tez-default-template.xml, tez-runtime-default-template.xml


 Document all tez configs with descriptions and default values. 
 Also, document MR configs that can be easily translated to Tez configs via 
 Tez helpers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-2728) Wrap IPC connection Exception as SessionNotRunning - RM crash

2015-08-17 Thread Gopal V (JIRA)

Gopal V created TEZ-2728:


 Summary: Wrap IPC connection Exception as SessionNotRunning - RM 
crash
 Key: TEZ-2728
 URL: https://issues.apache.org/jira/browse/TEZ-2728
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.2, 0.5.4, 0.7.0, 0.8.0
Reporter: Gopal V
Assignee: Hitesh Shah


Crashing the RM when a query session is open and restarting it does not result 
in a recoverable state for a Hive session.

{code}
2015-08-17T22:34:21,981 INFO  [main]: ipc.Client 
(Client.java:handleConnectionFailure(885)) - Retrying connect to server: 
cn042-10.l42scl.hortonworks.com/172.19.128.42:10200. Already tried 48 time(s); 
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
sleepTime=1000 MILLISECONDS)
2015-08-17T22:34:22,982 INFO  [main]: ipc.Client 
(Client.java:handleConnectionFailure(885)) - Retrying connect to server: 
cn042-10.l42scl.hortonworks.com/172.19.128.42:10200. Already tried 49 time(s); 
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
sleepTime=1000 MILLISECONDS)
2015-08-17T22:34:22,987 ERROR [main]: exec.Task (TezTask.java:execute(195)) - 
Failed to execute tez graph.
java.net.ConnectException: Call From 
cn041-10.l42scl.hortonworks.com/172.19.128.41 to 
cn042-10.l42scl.hortonworks.com:10200 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) ~[?:1.8.0_51]
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 ~[?:1.8.0_51]
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[?:1.8.0_51]
at java.lang.reflect.Constructor.newInstance(Constructor.java:422) 
~[?:1.8.0_51]
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) 
~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) 
~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
at org.apache.hadoop.ipc.Client.call(Client.java:1444) 
~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
at org.apache.hadoop.ipc.Client.call(Client.java:1371) 
~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
 ~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
at com.sun.proxy.$Proxy41.getApplicationReport(Unknown Source) ~[?:?]
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationHistoryProtocolPBClientImpl.getApplicationReport(ApplicationHistoryProtocolPBClientImpl.java:108)
 ~[hadoop-yarn-common-2.8.0-20150721.221214-843.jar:?]
at 
org.apache.hadoop.yarn.client.api.impl.AHSClientImpl.getApplicationReport(AHSClientImpl.java:101)
 ~[hadoop-yarn-client-2.8.0-20150721.221233-841.jar:?]
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:442)
 ~[hadoop-yarn-client-2.8.0-20150721.221233-841.
jar:?]
at 
org.apache.tez.client.TezYarnClient.getApplicationReport(TezYarnClient.java:89) 
~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
at 
org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:835) 
~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:713) 
~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
at org.apache.tez.client.TezClient.waitForProxy(TezClient.java:723) 
~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:453) 
~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
at org.apache.tez.client.TezClient.submitDAG(TezClient.java:391) 
~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:409) 
~[hive-exec-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (TEZ-2164) Shade the guava version used by Tez


 [ 
https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reassigned TEZ-2164:


Assignee: Hitesh Shah

 Shade the guava version used by Tez
 ---

 Key: TEZ-2164
 URL: https://issues.apache.org/jira/browse/TEZ-2164
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Hitesh Shah
Priority: Critical
 Attachments: TEZ-2164.3.patch, TEZ-2164.wip.2.patch, 
 allow-guava-16.0.1.patch


 Should allow us to upgrade to a newer version without shipping a guava 
 dependency.
 Would be good to do this in 0.7 so that we stop shipping guava as early as 
 possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2164) Shade the guava version used by Tez


[ 
https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700588#comment-14700588
 ] 

TezQA commented on TEZ-2164:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12750920/TEZ-2164.3.patch
  against master revision 6cb8206.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 71 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1002//console

This message is automatically generated.

 Shade the guava version used by Tez
 ---

 Key: TEZ-2164
 URL: https://issues.apache.org/jira/browse/TEZ-2164
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Hitesh Shah
Priority: Critical
 Attachments: TEZ-2164.3.patch, TEZ-2164.wip.2.patch, 
 allow-guava-16.0.1.patch


 Should allow us to upgrade to a newer version without shipping a guava 
 dependency.
 Would be good to do this in 0.7 so that we stop shipping guava as early as 
 possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2728) Wrap IPC connection Exception as SessionNotRunning - RM crash

2015-08-17 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-2728:
-
Description: 
Crashing the RM when a query session is open and restarting it does not result 
in a recoverable state for a Hive session.

{code}
2015-08-17T22:34:21,981 INFO  [main]: ipc.Client 
(Client.java:handleConnectionFailure(885)) - Retrying connect to server: 
cn042-10.sandbox.hortonworks.com/172.19.128.42:10200. Already tried 48 time(s); 
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
sleepTime=1000 MILLISECONDS)
2015-08-17T22:34:22,982 INFO  [main]: ipc.Client 
(Client.java:handleConnectionFailure(885)) - Retrying connect to server: 
cn042-10.sandbox.hortonworks.com/172.19.128.42:10200. Already tried 49 time(s); 
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
sleepTime=1000 MILLISECONDS)
2015-08-17T22:34:22,987 ERROR [main]: exec.Task (TezTask.java:execute(195)) - 
Failed to execute tez graph.
java.net.ConnectException: Call From 
cn041.sandbox.hortonworks.com/172.19.128.41 to 
cn042.sandbox.hortonworks.com:10200 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) ~[?:1.8.0_51]
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 ~[?:1.8.0_51]
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[?:1.8.0_51]
at java.lang.reflect.Constructor.newInstance(Constructor.java:422) 
~[?:1.8.0_51]
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) 
~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) 
~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
at org.apache.hadoop.ipc.Client.call(Client.java:1444) 
~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
at org.apache.hadoop.ipc.Client.call(Client.java:1371) 
~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
 ~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
at com.sun.proxy.$Proxy41.getApplicationReport(Unknown Source) ~[?:?]
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationHistoryProtocolPBClientImpl.getApplicationReport(ApplicationHistoryProtocolPBClientImpl.java:108)
 ~[hadoop-yarn-common-2.8.0-20150721.221214-843.jar:?]
at 
org.apache.hadoop.yarn.client.api.impl.AHSClientImpl.getApplicationReport(AHSClientImpl.java:101)
 ~[hadoop-yarn-client-2.8.0-20150721.221233-841.jar:?]
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:442)
 ~[hadoop-yarn-client-2.8.0-20150721.221233-841.
jar:?]
at 
org.apache.tez.client.TezYarnClient.getApplicationReport(TezYarnClient.java:89) 
~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
at 
org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:835) 
~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:713) 
~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
at org.apache.tez.client.TezClient.waitForProxy(TezClient.java:723) 
~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:453) 
~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
at org.apache.tez.client.TezClient.submitDAG(TezClient.java:391) 
~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:409) 
~[hive-exec-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
{code}



  was:
Crashing the RM when a query session is open and restarting it does not result 
in a recoverable state for a Hive session.

{code}
2015-08-17T22:34:21,981 INFO  [main]: ipc.Client 
(Client.java:handleConnectionFailure(885)) - Retrying connect to server: 
cn042-10.l42scl.hortonworks.com/172.19.128.42:10200. Already tried 48 time(s); 
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
sleepTime=1000 MILLISECONDS)
2015-08-17T22:34:22,982 INFO  [main]: ipc.Client 
(Client.java:handleConnectionFailure(885)) - Retrying connect to server: 
cn042-10.l42scl.hortonworks.com/172.19.128.42:10200. Already tried 49 time(s); 
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
sleepTime=1000 MILLISECONDS)
2015-08-17T22:34:22,987 ERROR [main]: exec.Task (TezTask.java:execute(195)) - 
Failed to execute tez graph.
java.net.ConnectException: Call From 
cn041-10.l42scl.hortonworks.com/172.19.128.41 to 
cn042-10.l42scl.hortonworks.com:10200 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:

Failed: TEZ-2164 PreCommit Build #1002

Jira: https://issues.apache.org/jira/browse/TEZ-2164
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1002/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 419 lines...]


==
==
Determining number of patched javac warnings.
==
==


/home/jenkins/tools/maven/latest/bin/mvn clean test -DskipTests -Ptest-patch  
/home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build/../patchprocess/patchJavacWarnings.txt
 21




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12750920/TEZ-2164.3.patch
  against master revision 6cb8206.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 71 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1002//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
a321b6a92ad051370e1d5133a95d47e33309a98f logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #1000
Archived 3 artifacts
Archive block size is 32768
Received 0 blocks and 810886 bytes
Compression is 0.0%
Took 6.7 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.

Failed: TEZ-2294 PreCommit Build #1001

Jira: https://issues.apache.org/jira/browse/TEZ-2294
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1001/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 3423 lines...]



{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12750911/TEZ-2294.7.patch
  against master revision 6cb8206.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1001//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1001//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
04570d164c2b71de8569f0b8d96cc64cfd12c6d1 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #1000
Archived 53 artifacts
Archive block size is 32768
Received 16 blocks and 2612241 bytes
Compression is 16.7%
Took 0.87 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Comment Edited] (TEZ-2164) Shade the guava version used by Tez


[ 
https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700309#comment-14700309
 ] 

Hitesh Shah edited comment on TEZ-2164 at 8/17/15 10:06 PM:


[~rajesh.balamohan] [~sseth] [~cchepelov] Mind trying this patch out? Check 
BUILDING.txt for more details. 


was (Author: hitesh):
[~rajesh.balamohan] [~sseth] [~cchepelov] Mind trying this patch out? 

 Shade the guava version used by Tez
 ---

 Key: TEZ-2164
 URL: https://issues.apache.org/jira/browse/TEZ-2164
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Priority: Critical
 Attachments: TEZ-2164.3.patch, TEZ-2164.wip.2.patch, 
 allow-guava-16.0.1.patch


 Should allow us to upgrade to a newer version without shipping a guava 
 dependency.
 Would be good to do this in 0.7 so that we stop shipping guava as early as 
 possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2164) Shade the guava version used by Tez


 [ 
https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2164:
-
Attachment: TEZ-2164.3.patch

[~rajesh.balamohan] [~sseth] [~cchepelov] Mind trying this patch out? 

 Shade the guava version used by Tez
 ---

 Key: TEZ-2164
 URL: https://issues.apache.org/jira/browse/TEZ-2164
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Priority: Critical
 Attachments: TEZ-2164.3.patch, TEZ-2164.wip.2.patch, 
 allow-guava-16.0.1.patch


 Should allow us to upgrade to a newer version without shipping a guava 
 dependency.
 Would be good to do this in 0.7 so that we stop shipping guava as early as 
 possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2724) Tez Client keeps on showing old status when application is finished but RM is shutdown

2015-08-17 Thread Prakash Ramachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699873#comment-14699873
 ] 

Prakash Ramachandran commented on TEZ-2724:
---

* if realClient.getApplicationReportInternal returns null (say temp n/w issue) 
and we switch to ats client , should we switch back to getting status via am 
once the appreport is available and app has not completed?
* minor - switchToTimelineClient debug log can be changed.

 Tez Client keeps on showing old status when application is finished but RM is 
 shutdown
 --

 Key: TEZ-2724
 URL: https://issues.apache.org/jira/browse/TEZ-2724
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.4
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: TEZ-2724-1.patch, amrecovery_mutlipleamrestart.txt


 From the logs, it seems the ipc retry interval is set as 20 seconds and ipc 
 max retries is 45. This means that the client will retry the RPC connection 
 for total 900 (20*45) seconds. And in this period, the application may 
 already complete and RM Restarting may be triggered as said in the jira 
 description. And I think the RM recovery is not enabled, so even the new RM 
 is restarted, the original application info is lost, that means the client 
 can never get the correct application report which makes it showing the old 
 status forever. 
 {code}
 15/05/07 19:13:43 INFO ipc.Client: Retrying connect to server: 
 maint22-tez12/100.79.80.19:52822. Already tried 26 time(s); maxRetries=45
 Deleted /user/hadoopqa/Input1
 RUNNING: call D:\hdp\hadoop-2.6.0.2.2.6.0-2782\bin\hdfs.cmd dfs -ls 
 /user/hadoopqa/Input2
 RUNNING: call D:\hdp\hadoop-2.6.0.2.2.6.0-2782\bin\hdfs.cmd dfs  -rm -r 
 -skipTrash /user/hadoopqa/Input2
 15/05/07 19:14:03 INFO ipc.Client: Retrying connect to server: 
 maint22-tez12/100.79.80.19:52822. Already tried 27 time(s); maxRetries=45
 {code}
 Configuration to reproduce this issue
 * disable generic application history 
 (yarn.timeline-service.generic-application-history.enabled)
 * disable rm recovery (yarn.resourcemanager.recovery.enabled)
 * increase the ipc retry interval and max retry 
 (ipc.client.connect.retry.interval  ipc.client.connect.max.retries)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2726) Handle invalid number of partitions for SCATTER-GATHER edge


[ 
https://issues.apache.org/jira/browse/TEZ-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699945#comment-14699945
 ] 

Rohini Palaniswamy commented on TEZ-2726:
-

We should raise proper exception in Tez and not write empty partition bits and 
mask the issue which is most due to some DAG misconfiguration.  

 Handle invalid number of partitions for SCATTER-GATHER edge
 ---

 Key: TEZ-2726
 URL: https://issues.apache.org/jira/browse/TEZ-2726
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Saikat
Assignee: Saikat

 Encountered an issue where the source vertex has M task and sink vertex has N 
 tasks (N  M), [e.g. M = 1, N = 3]and the edge is of type SCATTER -GATHER.
 This resulted in sink vertex receiving DMEs with non existent targetIds.
 The fetchers for the sink vertex tasks then try to retrieve the map outputs 
 and retrieve invalid headers due to exception in the ShuffleHandler.
 Possible fixes:
 1. raise proper Tez Exception to indicate this invalid scenario.
 2. or write appropriate empty partition bits, for the missing partitions 
 before sending out the DMEs to sink vertex. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2725) Tez UI: Unit tests


[ 
https://issues.apache.org/jira/browse/TEZ-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699876#comment-14699876
 ] 

Hitesh Shah commented on TEZ-2725:
--

Is this single jira meant to create unit tests for the full existing UI code 
base? 

 Tez UI: Unit tests
 --

 Key: TEZ-2725
 URL: https://issues.apache.org/jira/browse/TEZ-2725
 Project: Apache Tez
  Issue Type: Bug
Reporter: Sreenath Somarajapuram
Assignee: Sreenath Somarajapuram





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2726) Handle invalid number of partitions for SCATTER-GATHER edge


[ 
https://issues.apache.org/jira/browse/TEZ-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699942#comment-14699942
 ] 

Saikat commented on TEZ-2726:
-

Adding [~jlowe] [~rohini] [~jeagles] for watch and comments.

 Handle invalid number of partitions for SCATTER-GATHER edge
 ---

 Key: TEZ-2726
 URL: https://issues.apache.org/jira/browse/TEZ-2726
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Saikat
Assignee: Saikat

 Encountered an issue where the source vertex has M task and sink vertex has N 
 tasks (N  M), [e.g. M = 1, N = 3]and the edge is of type SCATTER -GATHER.
 This resulted in sink vertex receiving DMEs with non existent targetIds.
 The fetchers for the sink vertex tasks then try to retrieve the map outputs 
 and retrieve invalid headers due to exception in the ShuffleHandler.
 Possible fixes:
 1. raise proper Tez Exception to indicate this invalid scenario.
 2. or write appropriate empty partition bits, for the missing partitions 
 before sending out the DMEs to sink vertex. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-2726) Handle invalid number of partitions for SCATTER-GATHER edge

Saikat created TEZ-2726:
---

 Summary: Handle invalid number of partitions for SCATTER-GATHER 
edge
 Key: TEZ-2726
 URL: https://issues.apache.org/jira/browse/TEZ-2726
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Saikat


Encountered an issue where the source vertex has M task and sink vertex has N 
tasks (N  M), [e.g. M = 1, N = 3]and the edge is of type SCATTER -GATHER.
This resulted in sink vertex receiving DMEs with non existent targetIds.
The fetchers for the sink vertex tasks then try to retrieve the map outputs and 
retrieve invalid headers due to exception in the ShuffleHandler.

Possible fixes:
1. raise proper Tez Exception to indicate this invalid scenario.
2. or write appropriate empty partition bits, for the missing partitions before 
sending out the DMEs to sink vertex. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-2726) Handle invalid number of partitions for SCATTER-GATHER edge


[ 
https://issues.apache.org/jira/browse/TEZ-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699942#comment-14699942
 ] 

Saikat edited comment on TEZ-2726 at 8/17/15 6:00 PM:
--

Adding [~jlowe] [~rohini] [~jeagles] [~rajesh.balamohan] for watch and comments.


was (Author: saikatr):
Adding [~jlowe] [~rohini] [~jeagles] for watch and comments.

 Handle invalid number of partitions for SCATTER-GATHER edge
 ---

 Key: TEZ-2726
 URL: https://issues.apache.org/jira/browse/TEZ-2726
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Saikat
Assignee: Saikat

 Encountered an issue where the source vertex has M task and sink vertex has N 
 tasks (N  M), [e.g. M = 1, N = 3]and the edge is of type SCATTER -GATHER.
 This resulted in sink vertex receiving DMEs with non existent targetIds.
 The fetchers for the sink vertex tasks then try to retrieve the map outputs 
 and retrieve invalid headers due to exception in the ShuffleHandler.
 Possible fixes:
 1. raise proper Tez Exception to indicate this invalid scenario.
 2. or write appropriate empty partition bits, for the missing partitions 
 before sending out the DMEs to sink vertex. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2670) Remove TaskAttempt holder used within TezTaskCommunicator


[ 
https://issues.apache.org/jira/browse/TEZ-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699966#comment-14699966
 ] 

Siddharth Seth commented on TEZ-2670:
-

To be replaced with changes post TEZ-2697. For now, moving this back to 
TaskAttemptId to remove unnecessary object creation.

 Remove TaskAttempt holder used within TezTaskCommunicator
 -

 Key: TEZ-2670
 URL: https://issues.apache.org/jira/browse/TEZ-2670
 Project: Apache Tez
  Issue Type: Sub-task
Affects Versions: TEZ-2003
Reporter: Siddharth Seth
Assignee: Siddharth Seth

 This will rely on using IDs or the equivalent construct exposed by Tez.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2727) Fix findbugs warnings


 [ 
https://issues.apache.org/jira/browse/TEZ-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2727:

Attachment: TEZ-2727.1.txt

Actual patch to fix findbugs.

 Fix findbugs warnings
 -

 Key: TEZ-2727
 URL: https://issues.apache.org/jira/browse/TEZ-2727
 Project: Apache Tez
  Issue Type: Sub-task
Affects Versions: TEZ-2003
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: TEZ-2727.1.txt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-2727) Fix findbugs warnings

Siddharth Seth created TEZ-2727:
---

 Summary: Fix findbugs warnings
 Key: TEZ-2727
 URL: https://issues.apache.org/jira/browse/TEZ-2727
 Project: Apache Tez
  Issue Type: Sub-task
Affects Versions: TEZ-2003
Reporter: Siddharth Seth
Assignee: Siddharth Seth






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2726) Handle invalid number of partitions for SCATTER-GATHER edge


[ 
https://issues.apache.org/jira/browse/TEZ-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1475#comment-1475
 ] 

Hitesh Shah commented on TEZ-2726:
--

\cc [~bikassaha]

 Handle invalid number of partitions for SCATTER-GATHER edge
 ---

 Key: TEZ-2726
 URL: https://issues.apache.org/jira/browse/TEZ-2726
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Saikat
Assignee: Saikat

 Encountered an issue where the source vertex has M task and sink vertex has N 
 tasks (N  M), [e.g. M = 1, N = 3]and the edge is of type SCATTER -GATHER.
 This resulted in sink vertex receiving DMEs with non existent targetIds.
 The fetchers for the sink vertex tasks then try to retrieve the map outputs 
 and retrieve invalid headers due to exception in the ShuffleHandler.
 Possible fixes:
 1. raise proper Tez Exception to indicate this invalid scenario.
 2. or write appropriate empty partition bits, for the missing partitions 
 before sending out the DMEs to sink vertex. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2727) Fix findbugs warnings


 [ 
https://issues.apache.org/jira/browse/TEZ-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2727:

Attachment: 2003_20150817.1.txt

Patch for jenkins.

 Fix findbugs warnings
 -

 Key: TEZ-2727
 URL: https://issues.apache.org/jira/browse/TEZ-2727
 Project: Apache Tez
  Issue Type: Sub-task
Affects Versions: TEZ-2003
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: 2003_20150817.1.txt, TEZ-2727.1.txt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2726) Handle invalid number of partitions for SCATTER-GATHER edge

2015-08-17 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699963#comment-14699963
 ] 

Jason Lowe commented on TEZ-2726:
-

+1 for throwing an exception.  I think it could be dangerous to assume that 
putting in empty bits for missing partitions is the correct action to take.  If 
that approach is mistaken we could end up with missing or corrupted outputs for 
a successful job.

 Handle invalid number of partitions for SCATTER-GATHER edge
 ---

 Key: TEZ-2726
 URL: https://issues.apache.org/jira/browse/TEZ-2726
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Saikat
Assignee: Saikat

 Encountered an issue where the source vertex has M task and sink vertex has N 
 tasks (N  M), [e.g. M = 1, N = 3]and the edge is of type SCATTER -GATHER.
 This resulted in sink vertex receiving DMEs with non existent targetIds.
 The fetchers for the sink vertex tasks then try to retrieve the map outputs 
 and retrieve invalid headers due to exception in the ShuffleHandler.
 Possible fixes:
 1. raise proper Tez Exception to indicate this invalid scenario.
 2. or write appropriate empty partition bits, for the missing partitions 
 before sending out the DMEs to sink vertex. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2294) Add tez-site-template.xml with description of config properties


 [ 
https://issues.apache.org/jira/browse/TEZ-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2294:
-
Attachment: TEZ-2294.6.patch

Add findbugs-exclude file.

 Add tez-site-template.xml with description of config properties
 ---

 Key: TEZ-2294
 URL: https://issues.apache.org/jira/browse/TEZ-2294
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Rajesh Balamohan
Assignee: Hitesh Shah
 Attachments: TEZ-2294.4.patch, TEZ-2294.5.patch, TEZ-2294.6.patch, 
 TEZ-2294.wip.2.patch, TEZ-2294.wip.3.patch, TEZ-2294.wip.patch, 
 TezConfiguration.html, TezRuntimeConfiguration.html, 
 tez-default-template.xml, tez-runtime-default-template.xml


 Document all tez configs with descriptions and default values. 
 Also, document MR configs that can be easily translated to Tez configs via 
 Tez helpers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2724) Tez Client keeps on showing old status when application is finished but RM is shutdown

2015-08-17 Thread Jeff Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699116#comment-14699116
 ] 

Jeff Zhang commented on TEZ-2724:
-

Upload patch to fix it. Verified it manually. 

[~pramachandran] [~hitesh] Please help review. 



 Tez Client keeps on showing old status when application is finished but RM is 
 shutdown
 --

 Key: TEZ-2724
 URL: https://issues.apache.org/jira/browse/TEZ-2724
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.4
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: TEZ-2724-1.patch, amrecovery_mutlipleamrestart.txt


 From the logs, it seems the ipc retry interval is set as 20 seconds and ipc 
 max retries is 45. This means that the client will retry the RPC connection 
 for total 900 (20*45) seconds. And in this period, the application may 
 already complete and RM Restarting may be triggered as said in the jira 
 description. And I think the RM recovery is not enabled, so even the new RM 
 is restarted, the original application info is lost, that means the client 
 can never get the correct application report which makes it showing the old 
 status forever. 
 {code}
 15/05/07 19:13:43 INFO ipc.Client: Retrying connect to server: 
 maint22-tez12/100.79.80.19:52822. Already tried 26 time(s); maxRetries=45
 Deleted /user/hadoopqa/Input1
 RUNNING: call D:\hdp\hadoop-2.6.0.2.2.6.0-2782\bin\hdfs.cmd dfs -ls 
 /user/hadoopqa/Input2
 RUNNING: call D:\hdp\hadoop-2.6.0.2.2.6.0-2782\bin\hdfs.cmd dfs  -rm -r 
 -skipTrash /user/hadoopqa/Input2
 15/05/07 19:14:03 INFO ipc.Client: Retrying connect to server: 
 maint22-tez12/100.79.80.19:52822. Already tried 27 time(s); maxRetries=45
 {code}
 Configuration to reproduce this issue
 * disable generic application history 
 (yarn.timeline-service.generic-application-history.enabled)
 * disable rm recovery (yarn.resourcemanager.recovery.enabled)
 * increase the ipc retry interval and max retry 
 (ipc.client.connect.retry.interval  ipc.client.connect.max.retries)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-2724) Tez Client keeps on showing old status when application is finished but RM is shutdown

2015-08-17 Thread Jeff Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699116#comment-14699116
 ] 

Jeff Zhang edited comment on TEZ-2724 at 8/17/15 7:27 AM:
--

Upload patch to fix it.  This patch can not solve the problem completely. (When 
ATS is not eabled, the patch only fix it when ATS is enabled )

Verified it manually. 

[~pramachandran] [~hitesh] Please help review. 




was (Author: zjffdu):
Upload patch to fix it. Verified it manually. 

[~pramachandran] [~hitesh] Please help review. 



 Tez Client keeps on showing old status when application is finished but RM is 
 shutdown
 --

 Key: TEZ-2724
 URL: https://issues.apache.org/jira/browse/TEZ-2724
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.4
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: TEZ-2724-1.patch, amrecovery_mutlipleamrestart.txt


 From the logs, it seems the ipc retry interval is set as 20 seconds and ipc 
 max retries is 45. This means that the client will retry the RPC connection 
 for total 900 (20*45) seconds. And in this period, the application may 
 already complete and RM Restarting may be triggered as said in the jira 
 description. And I think the RM recovery is not enabled, so even the new RM 
 is restarted, the original application info is lost, that means the client 
 can never get the correct application report which makes it showing the old 
 status forever. 
 {code}
 15/05/07 19:13:43 INFO ipc.Client: Retrying connect to server: 
 maint22-tez12/100.79.80.19:52822. Already tried 26 time(s); maxRetries=45
 Deleted /user/hadoopqa/Input1
 RUNNING: call D:\hdp\hadoop-2.6.0.2.2.6.0-2782\bin\hdfs.cmd dfs -ls 
 /user/hadoopqa/Input2
 RUNNING: call D:\hdp\hadoop-2.6.0.2.2.6.0-2782\bin\hdfs.cmd dfs  -rm -r 
 -skipTrash /user/hadoopqa/Input2
 15/05/07 19:14:03 INFO ipc.Client: Retrying connect to server: 
 maint22-tez12/100.79.80.19:52822. Already tried 27 time(s); maxRetries=45
 {code}
 Configuration to reproduce this issue
 * disable generic application history 
 (yarn.timeline-service.generic-application-history.enabled)
 * disable rm recovery (yarn.resourcemanager.recovery.enabled)
 * increase the ipc retry interval and max retry 
 (ipc.client.connect.retry.interval  ipc.client.connect.max.retries)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2724) Tez Client keeps on showing old status when application is finished but RM is shutdown


[ 
https://issues.apache.org/jira/browse/TEZ-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700228#comment-14700228
 ] 

Hitesh Shah commented on TEZ-2724:
--

I think this is an edge case where RM HA is not enabled or if RM recovery is 
not enabled. 

I think the switch to using TimelineClient should only happen in the following 
condition: RM either says app finished or throws an AppNotFound exception. If 
the RM is down, we should just wait or throw an error if it is being done 
today. Switching to the TimelineClient while the RM is down is probably going 
to be problematic as it will not switch back to the AM after the RM comes back 
up ( if recovery is enabled ).


 Tez Client keeps on showing old status when application is finished but RM is 
 shutdown
 --

 Key: TEZ-2724
 URL: https://issues.apache.org/jira/browse/TEZ-2724
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.4
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: TEZ-2724-1.patch, amrecovery_mutlipleamrestart.txt


 From the logs, it seems the ipc retry interval is set as 20 seconds and ipc 
 max retries is 45. This means that the client will retry the RPC connection 
 for total 900 (20*45) seconds. And in this period, the application may 
 already complete and RM Restarting may be triggered as said in the jira 
 description. And I think the RM recovery is not enabled, so even the new RM 
 is restarted, the original application info is lost, that means the client 
 can never get the correct application report which makes it showing the old 
 status forever. 
 {code}
 15/05/07 19:13:43 INFO ipc.Client: Retrying connect to server: 
 maint22-tez12/100.79.80.19:52822. Already tried 26 time(s); maxRetries=45
 Deleted /user/hadoopqa/Input1
 RUNNING: call D:\hdp\hadoop-2.6.0.2.2.6.0-2782\bin\hdfs.cmd dfs -ls 
 /user/hadoopqa/Input2
 RUNNING: call D:\hdp\hadoop-2.6.0.2.2.6.0-2782\bin\hdfs.cmd dfs  -rm -r 
 -skipTrash /user/hadoopqa/Input2
 15/05/07 19:14:03 INFO ipc.Client: Retrying connect to server: 
 maint22-tez12/100.79.80.19:52822. Already tried 27 time(s); maxRetries=45
 {code}
 Configuration to reproduce this issue
 * disable generic application history 
 (yarn.timeline-service.generic-application-history.enabled)
 * disable rm recovery (yarn.resourcemanager.recovery.enabled)
 * increase the ipc retry interval and max retry 
 (ipc.client.connect.retry.interval  ipc.client.connect.max.retries)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-2724) Tez Client keeps on showing old status when application is finished but RM is shutdown


[ 
https://issues.apache.org/jira/browse/TEZ-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700228#comment-14700228
 ] 

Hitesh Shah edited comment on TEZ-2724 at 8/17/15 9:05 PM:
---

I think this is an edge case where RM HA is not enabled or if RM recovery is 
not enabled. 

I think the switch to using TimelineClient should only happen in the following 
condition: RM either says app finished or throws an AppNotFound exception ( 
AppNotFound would imply recovery disabled ). If the RM is down, we should just 
wait or throw an error if it is being done today. Switching to the 
TimelineClient while the RM is down is probably going to be problematic as it 
will not switch back to the AM after the RM comes back up ( if recovery is 
enabled ).



was (Author: hitesh):
I think this is an edge case where RM HA is not enabled or if RM recovery is 
not enabled. 

I think the switch to using TimelineClient should only happen in the following 
condition: RM either says app finished or throws an AppNotFound exception. If 
the RM is down, we should just wait or throw an error if it is being done 
today. Switching to the TimelineClient while the RM is down is probably going 
to be problematic as it will not switch back to the AM after the RM comes back 
up ( if recovery is enabled ).


 Tez Client keeps on showing old status when application is finished but RM is 
 shutdown
 --

 Key: TEZ-2724
 URL: https://issues.apache.org/jira/browse/TEZ-2724
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.4
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: TEZ-2724-1.patch, amrecovery_mutlipleamrestart.txt


 From the logs, it seems the ipc retry interval is set as 20 seconds and ipc 
 max retries is 45. This means that the client will retry the RPC connection 
 for total 900 (20*45) seconds. And in this period, the application may 
 already complete and RM Restarting may be triggered as said in the jira 
 description. And I think the RM recovery is not enabled, so even the new RM 
 is restarted, the original application info is lost, that means the client 
 can never get the correct application report which makes it showing the old 
 status forever. 
 {code}
 15/05/07 19:13:43 INFO ipc.Client: Retrying connect to server: 
 maint22-tez12/100.79.80.19:52822. Already tried 26 time(s); maxRetries=45
 Deleted /user/hadoopqa/Input1
 RUNNING: call D:\hdp\hadoop-2.6.0.2.2.6.0-2782\bin\hdfs.cmd dfs -ls 
 /user/hadoopqa/Input2
 RUNNING: call D:\hdp\hadoop-2.6.0.2.2.6.0-2782\bin\hdfs.cmd dfs  -rm -r 
 -skipTrash /user/hadoopqa/Input2
 15/05/07 19:14:03 INFO ipc.Client: Retrying connect to server: 
 maint22-tez12/100.79.80.19:52822. Already tried 27 time(s); maxRetries=45
 {code}
 Configuration to reproduce this issue
 * disable generic application history 
 (yarn.timeline-service.generic-application-history.enabled)
 * disable rm recovery (yarn.resourcemanager.recovery.enabled)
 * increase the ipc retry interval and max retry 
 (ipc.client.connect.retry.interval  ipc.client.connect.max.retries)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2294) Add tez-site-template.xml with description of config properties


 [ 
https://issues.apache.org/jira/browse/TEZ-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2294:
-
Attachment: TEZ-2294.7.patch

Re-upload to trigger pre-commit 

 Add tez-site-template.xml with description of config properties
 ---

 Key: TEZ-2294
 URL: https://issues.apache.org/jira/browse/TEZ-2294
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Rajesh Balamohan
Assignee: Hitesh Shah
 Attachments: TEZ-2294.4.patch, TEZ-2294.5.patch, TEZ-2294.6.patch, 
 TEZ-2294.7.patch, TEZ-2294.wip.2.patch, TEZ-2294.wip.3.patch, 
 TEZ-2294.wip.patch, TezConfiguration.html, TezRuntimeConfiguration.html, 
 tez-default-template.xml, tez-runtime-default-template.xml


 Document all tez configs with descriptions and default values. 
 Also, document MR configs that can be easily translated to Tez configs via 
 Tez helpers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-2726) Handle invalid number of partitions for SCATTER-GATHER edge

[
https://issues.apache.org/jira/browse/TEZ-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700237#comment-14700237
]

Saikat edited comment on TEZ-2726 at 8/17/15 9:10 PM:
--

One possible place to raise a proper exception can be in
sendTezEventToDestinationTasks() in Edge.java before sending out the DME(for a
scattergather edgemanger). We can raise AMUserCodeException with source as
edgemanager, and appropriate message.

was (Author: saikatr):
One possible place to raise a proper exception can be in
sendTezEventToDestinationTasks() in Edge.java before sending out the DME. We
can raise AMUserCodeException with source as edgemanager, and appropriate
message.

Handle invalid number of partitions for SCATTER-GATHER edge
---

Key: TEZ-2726
URL: https://issues.apache.org/jira/browse/TEZ-2726
Project: Apache Tez
Issue Type: Improvement
Reporter: Saikat
Assignee: Saikat

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2726) Handle invalid number of partitions for SCATTER-GATHER edge


[ 
https://issues.apache.org/jira/browse/TEZ-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700237#comment-14700237
 ] 

Saikat commented on TEZ-2726:
-

One possible place to raise a proper exception can be in 
sendTezEventToDestinationTasks() in Edge.java before sending out the DME. We 
can raise AMUserCodeException with source as edgemanager, and appropriate 
message. 

 Handle invalid number of partitions for SCATTER-GATHER edge
 ---

 Key: TEZ-2726
 URL: https://issues.apache.org/jira/browse/TEZ-2726
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Saikat
Assignee: Saikat

 Encountered an issue where the source vertex has M task and sink vertex has N 
 tasks (N  M), [e.g. M = 1, N = 3]and the edge is of type SCATTER -GATHER.
 This resulted in sink vertex receiving DMEs with non existent targetIds.
 The fetchers for the sink vertex tasks then try to retrieve the map outputs 
 and retrieve invalid headers due to exception in the ShuffleHandler.
 Possible fixes:
 1. raise proper Tez Exception to indicate this invalid scenario.
 2. or write appropriate empty partition bits, for the missing partitions 
 before sending out the DMEs to sink vertex. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2726) Handle invalid number of partitions for SCATTER-GATHER edge


[ 
https://issues.apache.org/jira/browse/TEZ-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700262#comment-14700262
 ] 

Bikas Saha commented on TEZ-2726:
-

Are there any details as to what exactly happened. I am not clear about that. 
Seems to be some issue where user misconfiguration caused empty partitions that 
were not handled correctly? //cc [~rajesh.balamohan]

 Handle invalid number of partitions for SCATTER-GATHER edge
 ---

 Key: TEZ-2726
 URL: https://issues.apache.org/jira/browse/TEZ-2726
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Saikat
Assignee: Saikat

 Encountered an issue where the source vertex has M task and sink vertex has N 
 tasks (N  M), [e.g. M = 1, N = 3]and the edge is of type SCATTER -GATHER.
 This resulted in sink vertex receiving DMEs with non existent targetIds.
 The fetchers for the sink vertex tasks then try to retrieve the map outputs 
 and retrieve invalid headers due to exception in the ShuffleHandler.
 Possible fixes:
 1. raise proper Tez Exception to indicate this invalid scenario.
 2. or write appropriate empty partition bits, for the missing partitions 
 before sending out the DMEs to sink vertex. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2723) Tez UI: Breadcrumb changes

2015-08-17 Thread Sreenath Somarajapuram (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699064#comment-14699064
 ] 

Sreenath Somarajapuram commented on TEZ-2723:
-

Sorry my bad. Adding the framework as part of TEZ-2725.

 Tez UI: Breadcrumb changes
 --

 Key: TEZ-2723
 URL: https://issues.apache.org/jira/browse/TEZ-2723
 Project: Apache Tez
  Issue Type: Bug
Reporter: Sreenath Somarajapuram
Assignee: Sreenath Somarajapuram
Priority: Minor
 Attachments: TEZ-2723.1.patch


 - Update breadcrumb on tab change
 - Tune breadcrumb font-size



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2607) SIMD-based bitonic merge sorting

2015-08-17 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699070#comment-14699070
 ] 

Tsuyoshi Ozawa commented on TEZ-2607:
-

Implemented bitonic_algorithm with [~maropu].  
https://github.com/oza/bitonic_sort
Flash report of micro benchmark is as follows:

||algorithm||speed(million sort per sec)||
|qsort(C)|5.9883126432|
|bitonic_sort(C)|29.1652639347|

I've started to work integrate this code with Tez.

 SIMD-based bitonic merge sorting
 

 Key: TEZ-2607
 URL: https://issues.apache.org/jira/browse/TEZ-2607
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa
 Attachments: map_phase.png






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Failed: TEZ-2724 PreCommit Build #998

Jira: https://issues.apache.org/jira/browse/TEZ-2724
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/998/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 3288 lines...]



{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12750751/TEZ-2724-1.patch
  against master revision 6cb8206.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/998//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/998//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
e8a16e5069ee061dadad7d571ce396b1548bc200 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #988
Archived 50 artifacts
Archive block size is 32768
Received 0 blocks and 3094207 bytes
Compression is 0.0%
Took 0.92 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2724) Tez Client keeps on showing old status when application is finished but RM is shutdown


[ 
https://issues.apache.org/jira/browse/TEZ-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699107#comment-14699107
 ] 

TezQA commented on TEZ-2724:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12750751/TEZ-2724-1.patch
  against master revision 6cb8206.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/998//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/998//console

This message is automatically generated.

 Tez Client keeps on showing old status when application is finished but RM is 
 shutdown
 --

 Key: TEZ-2724
 URL: https://issues.apache.org/jira/browse/TEZ-2724
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.4
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: TEZ-2724-1.patch, amrecovery_mutlipleamrestart.txt


 From the logs, it seems the ipc retry interval is set as 20 seconds and ipc 
 max retries is 45. This means that the client will retry the RPC connection 
 for total 900 (20*45) seconds. And in this period, the application may 
 already complete and RM Restarting may be triggered as said in the jira 
 description. And I think the RM recovery is not enabled, so even the new RM 
 is restarted, the original application info is lost, that means the client 
 can never get the correct application report which makes it showing the old 
 status forever. 
 {code}
 15/05/07 19:13:43 INFO ipc.Client: Retrying connect to server: 
 maint22-tez12/100.79.80.19:52822. Already tried 26 time(s); maxRetries=45
 Deleted /user/hadoopqa/Input1
 RUNNING: call D:\hdp\hadoop-2.6.0.2.2.6.0-2782\bin\hdfs.cmd dfs -ls 
 /user/hadoopqa/Input2
 RUNNING: call D:\hdp\hadoop-2.6.0.2.2.6.0-2782\bin\hdfs.cmd dfs  -rm -r 
 -skipTrash /user/hadoopqa/Input2
 15/05/07 19:14:03 INFO ipc.Client: Retrying connect to server: 
 maint22-tez12/100.79.80.19:52822. Already tried 27 time(s); maxRetries=45
 {code}
 Configuration to reproduce this issue
 * disable generic application history 
 (yarn.timeline-service.generic-application-history.enabled)
 * disable rm recovery (yarn.resourcemanager.recovery.enabled)
 * increase the ipc retry interval and max retry 
 (ipc.client.connect.retry.interval  ipc.client.connect.max.retries)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2300) TezClient.stop() takes a lot of time or does not work sometimes

[
https://issues.apache.org/jira/browse/TEZ-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700194#comment-14700194
]

Rohini Palaniswamy commented on TEZ-2300:
-

When a user aborts a Pig script, Pig kills the jobs it launched in the shutdown
hook. What I am looking for is the same behaviour as killing a mapreduce job.
The job should stop whatever it is doing and AM should exit in less than half a
minute.

bq. Are we waiting for the DAG to be finished?
No. We are trying to kill it. It should be interrupted and processing
stopped.

bq. Are we waiting until the AM is closed as well?
Currently the call is not blocking. It should block and exit after the kill
succeeds.

bq. Or is the most important aspect to reduce the amount of time of it takes to
shutdown an AM with a DAG running?
That as well. AM should be terminated after a timeout period if graceful
kill/shutdown does not work similar to mapreduce.

bq. With the pig interactive command line, will pig want to cancel a DAG and
run another in the same AM?
Currently there are no APIs to cancel a DAG and I don't see the need at this
point to cancel a DAG and reuse that AM.

TezClient.stop() takes a lot of time or does not work sometimes
---

Key: TEZ-2300
URL: https://issues.apache.org/jira/browse/TEZ-2300
Project: Apache Tez
Issue Type: Bug
Reporter: Rohini Palaniswamy
Assignee: Jonathan Eagles
Attachments: TEZ-2300.1.patch, TEZ-2300.2.patch, TEZ-2300.3.patch,
TEZ-2300.4.patch, syslog_dag_1428329756093_325099_1_post

Noticed this with a couple of pig scripts which were not behaving well (AM
close to OOM, etc) and even with some that were running fine. Pig calls
Tezclient.stop() in shutdown hook. Ctrl+C to the pig script either exits
immediately or is hung. In both cases it either takes a long time for the
yarn application to go to KILLED state. Many times I just end up calling yarn
application -kill separately after waiting for 5 mins or more for it to get
killed.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Failed: TEZ-2294 PreCommit Build #999

Jira: https://issues.apache.org/jira/browse/TEZ-2294
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/999/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 38 lines...]
TEZ-2294 patch is being downloaded at Mon Aug 17 20:59:59 UTC 2015 from
http://issues.apache.org/jira/secure/attachment/12750871/TEZ-2294.6.patch


==
==
 Pre-build master to verify master stability and javac warnings
==
==


Compiling /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build
/home/jenkins/tools/maven/latest/bin/mvn clean test -DskipTests -Ptest-patch  
/home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build/../patchprocess/masterJavacWarnings.txt
 21
master compilation is broken?




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12750871/TEZ-2294.6.patch
  against master revision 6cb8206.

{color:red}-1 patch{color}.  master compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/999//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
876db1de0274684c51f7e0136549c906f0661902 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #988
Archived 1 artifacts
Archive block size is 32768
Received 0 blocks and 180377 bytes
Compression is 0.0%
Took 0.43 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (TEZ-2294) Add tez-site-template.xml with description of config properties


[ 
https://issues.apache.org/jira/browse/TEZ-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700226#comment-14700226
 ] 

TezQA commented on TEZ-2294:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12750871/TEZ-2294.6.patch
  against master revision 6cb8206.

{color:red}-1 patch{color}.  master compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/999//console

This message is automatically generated.

 Add tez-site-template.xml with description of config properties
 ---

 Key: TEZ-2294
 URL: https://issues.apache.org/jira/browse/TEZ-2294
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Rajesh Balamohan
Assignee: Hitesh Shah
 Attachments: TEZ-2294.4.patch, TEZ-2294.5.patch, TEZ-2294.6.patch, 
 TEZ-2294.wip.2.patch, TEZ-2294.wip.3.patch, TEZ-2294.wip.patch, 
 TezConfiguration.html, TezRuntimeConfiguration.html, 
 tez-default-template.xml, tez-runtime-default-template.xml


 Document all tez configs with descriptions and default values. 
 Also, document MR configs that can be easily translated to Tez configs via 
 Tez helpers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (TEZ-2629) LimitExceededException in Tez client when DAG has exceeds the default max


 [ 
https://issues.apache.org/jira/browse/TEZ-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reassigned TEZ-2629:
---

Assignee: Siddharth Seth

 LimitExceededException in Tez client when DAG has exceeds the default max
 -

 Key: TEZ-2629
 URL: https://issues.apache.org/jira/browse/TEZ-2629
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Jason Dere
Assignee: Siddharth Seth
 Attachments: TEZ-2629.1.txt


 Original issue was HIVE-11303, seeing LimitExceededException when the client 
 tries to get the counters for a completed job:
 {noformat}
 2015-07-17 18:18:11,830 INFO  [main]: counters.Limits 
 (Limits.java:ensureInitialized(59)) - Counter limits initialized with 
 parameters:  GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, 
 MAX_COUNTERS=1200
 2015-07-17 18:18:11,841 ERROR [main]: exec.Task (TezTask.java:execute(189)) - 
 Failed to execute tez graph.
 org.apache.tez.common.counters.LimitExceededException: Too many counters: 
 1201 max=1200
 at org.apache.tez.common.counters.Limits.checkCounters(Limits.java:87)
 at org.apache.tez.common.counters.Limits.incrCounters(Limits.java:94)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:76)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:93)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:104)
 at 
 org.apache.tez.dag.api.DagTypeConverters.convertTezCountersFromProto(DagTypeConverters.java:567)
 at 
 org.apache.tez.dag.api.client.DAGStatus.getDAGCounters(DAGStatus.java:148)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:175)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1673)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1432)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1213)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1064)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {noformat}
 It looks like Limits.ensureInitialized() is defaulting to an empty 
 configuration, resulting in COUNTERS_MAX being set to the default of 1200 
 (even though Hive's configuration specified tez.counters.max=16000).
 Per [~sseth]:
 {quote}
 I think the Tez client does need to make this call to setup the Configuration 
 correctly. We do this for the AM and the executing task - which is why it 
 works. Could you please open a Tez jira for this ?
 Also, Limits is making use of Configuration instead of TezConfiguration for 
 default initialization, which implies changes to tez-site on the local node 
 won't be picked up.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2300) TezClient.stop() takes a lot of time or does not work sometimes