[jira] [Created] (YARN-1005) Log aggregators should check for FSDataOutputStream close before renaming to aggregated file.

2013-07-31 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-1005:
---

 Summary: Log aggregators should check for FSDataOutputStream close 
before renaming to aggregated file.
 Key: YARN-1005
 URL: https://issues.apache.org/jira/browse/YARN-1005
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta, 2.0.5-alpha
Reporter: Rohith Sharma K S


If AggregatedLogFormat.LogWriter.closeWriter() is interuppted, then 
remoteNodeTmpLogFileForApp is renamed to remoteNodeLogFileForApp file. This 
renamed file does not contain valid aggregated logs. There can be situation 
renamed file can be not in BCFile format. 

This cause issue while viewing from JobHistoryServer web page.

{noformat}
2013-07-27 18:51:14,787 ERROR org.apache.hadoop.yarn.webapp.View: Error getting 
logs for job_1374918614757_0002
java.io.IOException: Not a valid BCFile.
at 
org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927)
at org.apache.hadoop.io.file.tfile.BCFile$Reader.init(BCFile.java:628)
at org.apache.hadoop.io.file.tfile.TFile$Reader.init(TFile.java:804)
at 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.init(AggregatedLogFormat.java:337)
at 
org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:89)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:64)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:74)
{noformat}


 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1061) NodeManager is indefinitely waiting for nodeHeartBeat() response from ResouceManager.

2013-08-13 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-1061:
---

 Summary: NodeManager is indefinitely waiting for nodeHeartBeat() 
response from ResouceManager.
 Key: YARN-1061
 URL: https://issues.apache.org/jira/browse/YARN-1061
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.5-alpha
Reporter: Rohith Sharma K S


It is observed that in one of the scenario, NodeManger is indefinetly waiting 
for nodeHeartbeat response from ResouceManger where ResouceManger is in hanged 
up state.

NodeManager should get timeout exception instead of waiting indefinetly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1061) NodeManager is indefinitely waiting for nodeHeartBeat() response from ResouceManager.

2013-08-13 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737990#comment-13737990
 ] 

Rohith Sharma K S commented on YARN-1061:
-

Extracted thread dump from NodeManager is 

{noformat}
Node Status Updater prio=10 tid=0x414dc000 nid=0x1d754 in 
Object.wait() [0x7fefa2dec000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at org.apache.hadoop.ipc.Client.call(Client.java:1231)
- locked 0xdef4f158 (a org.apache.hadoop.ipc.Client$Call)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy28.nodeHeartbeat(Unknown Source)
at 
org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:70)
at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy30.nodeHeartbeat(Unknown Source)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:348)
{noformat}

 NodeManager is indefinitely waiting for nodeHeartBeat() response from 
 ResouceManager.
 -

 Key: YARN-1061
 URL: https://issues.apache.org/jira/browse/YARN-1061
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.5-alpha
Reporter: Rohith Sharma K S

 It is observed that in one of the scenario, NodeManger is indefinetly waiting 
 for nodeHeartbeat response from ResouceManger where ResouceManger is in 
 hanged up state.
 NodeManager should get timeout exception instead of waiting indefinetly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1061) NodeManager is indefinitely waiting for nodeHeartBeat() response from ResouceManager.

2013-08-13 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739178#comment-13739178
 ] 

Rohith Sharma K S commented on YARN-1061:
-

Actual issue I got in 5 node cluster (1 RM and 5 NM).It is hard to recure 
scenario for resourcemanager is hang up state in real cluster. 

The same scenario can be simulated manually bringing resourcemanager to hang up 
state with help of linux command KILL -STOP RM_PID. All the NM-RM call 
wait indefinitely. Another case where we can observer indefinite wait is Add 
new NodeManager when ResouceMangaer is hang up state.



 NodeManager is indefinitely waiting for nodeHeartBeat() response from 
 ResouceManager.
 -

 Key: YARN-1061
 URL: https://issues.apache.org/jira/browse/YARN-1061
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.5-alpha
Reporter: Rohith Sharma K S

 It is observed that in one of the scenario, NodeManger is indefinetly waiting 
 for nodeHeartbeat response from ResouceManger where ResouceManger is in 
 hanged up state.
 NodeManager should get timeout exception instead of waiting indefinetly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1061) NodeManager is indefinitely waiting for nodeHeartBeat() response from ResouceManager.

2013-08-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748469#comment-13748469
 ] 

Rohith Sharma K S commented on YARN-1061:
-

I added all the ipc configurations to log4j.properities file, stil same issue 
recured.

bq. How can NM wait infinitely? I mean what is your connection timeout set to? 
When I debug the issue , found that it is an issue with IPC layer. This problem 
ocure in DataNode to NameNode communication also.

When process is in T state(for running process, state is S1. This can be seen 
by ps -p pid -o pid,stat ) i.e process is stopped using kill -stop pid 
, ipc proxy does not throw any timeout exception.
This is becaue , during proxy creation RPC timetime out is set to 
Zero(hardcoded) at RPC.waitForProtocolProxy method. Settiing rpc timeout to 
Zero makes ipc call does not throw any exception.Always ipc call(client) retry 
for sendPing to server(RM).
This can be seen in Client.handleTimeout method
{noformat}
  private void handleTimeout(SocketTimeoutException e) throws IOException {
if (shouldCloseConnection.get() || !running.get() || rpcTimeout  0) {
  throw e;
} else {
  sendPing();
}
  }
{noformat}

I think RPC timeout should be taken from configurations instead of hardcoding 
to 0.

 NodeManager is indefinitely waiting for nodeHeartBeat() response from 
 ResouceManager.
 -

 Key: YARN-1061
 URL: https://issues.apache.org/jira/browse/YARN-1061
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.5-alpha
Reporter: Rohith Sharma K S

 It is observed that in one of the scenario, NodeManger is indefinetly waiting 
 for nodeHeartbeat response from ResouceManger where ResouceManger is in 
 hanged up state.
 NodeManager should get timeout exception instead of waiting indefinetly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Moved] (YARN-1112) MR AppMaster command options does not replace @taskid@ with the current task ID.

2013-08-28 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S moved MAPREDUCE-5460 to YARN-1112:


  Component/s: (was: applicationmaster)
   (was: mrv2)
 Assignee: (was: Rohith Sharma K S)
 Target Version/s:   (was: 3.0.0, 2.1.1-beta)
Affects Version/s: (was: 2.1.1-beta)
   (was: 3.0.0)
   2.1.1-beta
   3.0.0
  Key: YARN-1112  (was: MAPREDUCE-5460)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 MR AppMaster command options does not replace @taskid@ with the current task 
 ID.
 

 Key: YARN-1112
 URL: https://issues.apache.org/jira/browse/YARN-1112
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Chris Nauroth

 The description of {{yarn.app.mapreduce.am.command-opts}} in 
 mapred-default.xml states that occurrences of {{@taskid@}} will be replaced 
 by the current task ID.  This substitution is not happening.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1112) MR AppMaster command options does not replace @taskid@ with the current task ID.

2013-08-28 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-1112:


Attachment: YARN-1112.patch

Attaching patch for replacement of @appid@ in am.command_opts. @appid@ is 
replaced with app attempt id.

 MR AppMaster command options does not replace @taskid@ with the current task 
 ID.
 

 Key: YARN-1112
 URL: https://issues.apache.org/jira/browse/YARN-1112
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Chris Nauroth
 Attachments: YARN-1112.patch


 The description of {{yarn.app.mapreduce.am.command-opts}} in 
 mapred-default.xml states that occurrences of {{@taskid@}} will be replaced 
 by the current task ID.  This substitution is not happening.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1145) Potential file handle leak in aggregated logs web ui

2013-09-04 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-1145:


Attachment: YARN-1145.patch

Thank you Vinod Kumar Vavilapalli and Jason Lowe for reviewing patch :-)

I have addressed Vinode comments and attached updated patch. Please review 
updated patch.

 Potential file handle leak in aggregated logs web ui
 

 Key: YARN-1145
 URL: https://issues.apache.org/jira/browse/YARN-1145
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha, 0.23.9, 2.1.1-beta
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: MAPREDUCE-5486.patch, YARN-1145.patch


 Any problem in getting aggregated logs for rendering on web ui, then 
 LogReader is not closed. 
 Now, it reader is not closed which causing many connections in close_wait 
 state.
 hadoopuser@hadoopuser: jps
 *27909* JobHistoryServer
 DataNode port is 50010. When greped with DataNode port, many connections are 
 in CLOSE_WAIT from JHS.
 hadoopuser@hadoopuser: netstat -tanlp |grep 50010
 tcp0  0 10.18.40.48:50010   0.0.0.0:*   LISTEN
   21453/java  
 tcp1  0 10.18.40.48:20596   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19667   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:20593   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:12290   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19662   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1145) Potential file handle leak in aggregated logs web ui

2013-09-04 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-1145:


Attachment: YARN-1145.1.patch

Handled clean up during reader creation.Previous patch misses this clean up.

 Potential file handle leak in aggregated logs web ui
 

 Key: YARN-1145
 URL: https://issues.apache.org/jira/browse/YARN-1145
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha, 0.23.9, 2.1.1-beta
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: MAPREDUCE-5486.patch, YARN-1145.1.patch, YARN-1145.patch


 Any problem in getting aggregated logs for rendering on web ui, then 
 LogReader is not closed. 
 Now, it reader is not closed which causing many connections in close_wait 
 state.
 hadoopuser@hadoopuser: jps
 *27909* JobHistoryServer
 DataNode port is 50010. When greped with DataNode port, many connections are 
 in CLOSE_WAIT from JHS.
 hadoopuser@hadoopuser: netstat -tanlp |grep 50010
 tcp0  0 10.18.40.48:50010   0.0.0.0:*   LISTEN
   21453/java  
 tcp1  0 10.18.40.48:20596   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19667   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:20593   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:12290   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19662   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1145) Potential file handle leak in aggregated logs web ui

2013-09-04 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-1145:


Attachment: YARN-1145.2.patch

Please ignore YARN-1145.1.patch.

All the comments has been fixed in YARN-1145.2.patch. Please consider this 
patch for review.

 Potential file handle leak in aggregated logs web ui
 

 Key: YARN-1145
 URL: https://issues.apache.org/jira/browse/YARN-1145
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha, 0.23.9, 2.1.1-beta
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: MAPREDUCE-5486.patch, YARN-1145.1.patch, 
 YARN-1145.2.patch, YARN-1145.patch


 Any problem in getting aggregated logs for rendering on web ui, then 
 LogReader is not closed. 
 Now, it reader is not closed which causing many connections in close_wait 
 state.
 hadoopuser@hadoopuser: jps
 *27909* JobHistoryServer
 DataNode port is 50010. When greped with DataNode port, many connections are 
 in CLOSE_WAIT from JHS.
 hadoopuser@hadoopuser: netstat -tanlp |grep 50010
 tcp0  0 10.18.40.48:50010   0.0.0.0:*   LISTEN
   21453/java  
 tcp1  0 10.18.40.48:20596   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19667   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:20593   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:12290   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19662   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1145) Potential file handle leak in aggregated logs web ui

2013-09-05 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-1145:


Attachment: YARN-1145.3.patch

Modified the patch for closing the streams only on return from render method.

 Potential file handle leak in aggregated logs web ui
 

 Key: YARN-1145
 URL: https://issues.apache.org/jira/browse/YARN-1145
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha, 0.23.9, 2.1.1-beta
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: MAPREDUCE-5486.patch, YARN-1145.1.patch, 
 YARN-1145.2.patch, YARN-1145.3.patch, YARN-1145.patch


 Any problem in getting aggregated logs for rendering on web ui, then 
 LogReader is not closed. 
 Now, it reader is not closed which causing many connections in close_wait 
 state.
 hadoopuser@hadoopuser: jps
 *27909* JobHistoryServer
 DataNode port is 50010. When greped with DataNode port, many connections are 
 in CLOSE_WAIT from JHS.
 hadoopuser@hadoopuser: netstat -tanlp |grep 50010
 tcp0  0 10.18.40.48:50010   0.0.0.0:*   LISTEN
   21453/java  
 tcp1  0 10.18.40.48:20596   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19667   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:20593   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:12290   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19662   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2013-10-30 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809382#comment-13809382
 ] 

Rohith Sharma K S commented on YARN-1366:
-

Hi Bikas,
 I have gone through your pdf file attached (YARN-556) and got understand about 
over all idea behind this subtask.
I have some doubts , please clariffy 

1. Resync means resetting the allocate RPC sequence number to 0 and the AM 
should send its entire outstanding request to the RM
 I understood like, need to reset lastResponseID to 0 and should not clear 
 ask , release , blacklistAdditions and blacklistRemovals. Is am I correct?

2. During RM restart , RM get new AMRMTokenSecretManager. At this time, there 
will be difference password. Is this handled from RM side during recovery for 
individual application? Otherwise impact is , heatbeat to restarted RM get fail 
with an authentication error passoword does not match


 ApplicationMasterService should Resync with the AM upon allocate call after 
 restart
 ---

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha

 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1145) Potential file handle leak in aggregated logs web ui

2013-11-06 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-1145:


Attachment: YARN-1145.4.patch

Apologies for delayed response. Thank you Vinod for reviewing patch :-)

Attaching patch addressing all Vinod comments. 

For 5th comment, added try{}finally{} for whole render method in 
AggregatedLogsBlock.java.
Eventhough patch looks with lot of difference(since try catch added for whole 
render method), modified code is 
{noformat}
protected void render(Block html) {
+AggregatedLogFormat.LogReader reader = null;
+try{
  // render block : NO CHANGE
  Path remoteRootLogDir = new Path(conf.get(
  YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
  YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR));
-AggregatedLogFormat.LogReader reader = null;
 // render block : NO CHANGE

+} finally{
+  if (reader != null) {
+reader.close();
+  }
+   }
}
{noformat}

 Potential file handle leak in aggregated logs web ui
 

 Key: YARN-1145
 URL: https://issues.apache.org/jira/browse/YARN-1145
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha, 0.23.9, 2.1.1-beta
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: MAPREDUCE-5486.patch, YARN-1145.1.patch, 
 YARN-1145.2.patch, YARN-1145.3.patch, YARN-1145.4.patch, YARN-1145.patch


 Any problem in getting aggregated logs for rendering on web ui, then 
 LogReader is not closed. 
 Now, it reader is not closed which causing many connections in close_wait 
 state.
 hadoopuser@hadoopuser: jps
 *27909* JobHistoryServer
 DataNode port is 50010. When greped with DataNode port, many connections are 
 in CLOSE_WAIT from JHS.
 hadoopuser@hadoopuser: netstat -tanlp |grep 50010
 tcp0  0 10.18.40.48:50010   0.0.0.0:*   LISTEN
   21453/java  
 tcp1  0 10.18.40.48:20596   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19667   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:20593   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:12290   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19662   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2013-11-06 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-1366:


Attachment: YARN-1366.patch

   Correct me If am wrong, I have prepared initial patch and attached the same. 
RM should differentiate Resync and Shutdown command.
Please review whether this will fullfill expectations mentioned in JIra.



 ApplicationMasterService should Resync with the AM upon allocate call after 
 restart
 ---

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
 Attachments: YARN-1366.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1398) Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call

2013-11-19 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826421#comment-13826421
 ] 

Rohith Sharma K S commented on YARN-1398:
-

Hi Sunil, 
I think this is same as https://issues.apache.org/jira/i#browse/YARN-325. 

 Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo 
 and completedConatiner call
 ---

 Key: YARN-1398
 URL: https://issues.apache.org/jira/browse/YARN-1398
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Priority: Critical

 getQueueInfo in parentQueue will call  child.getQueueInfo().
 This will try acquire the leaf queue lock over parent queue lock.
 Now at same time if a completedContainer call comes and acquired LeafQueue 
 lock and it will wait for ParentQueue's completedConatiner call.
 This lock usage is not in synchronous and can lead to deadlock.
 With JCarder, this is showing as a potential deadlock scenario.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1469) ApplicationMaster crash cause the TaskAttemptImpl couldn't handle the TA_TOO_MANY_FETCH_FAILURE at KILLED

2013-12-03 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837533#comment-13837533
 ] 

Rohith Sharma K S commented on YARN-1469:
-

This is duplicate of https://issues.apache.org/jira/i#browse/MAPREDUCE-5409.

 ApplicationMaster crash cause the TaskAttemptImpl  couldn't handle the 
 TA_TOO_MANY_FETCH_FAILURE at KILLED
 --

 Key: YARN-1469
 URL: https://issues.apache.org/jira/browse/YARN-1469
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: qus-jiawei
 Attachments: job_1384857622207_15-amlog.txt


 This bug could happen when using demission command to demission an 
 nodemanager.The detail is bellow:
 1.one job running happily on the yarn cluster and some MapTask finish on 
 machine A then begin to schedule the reduce task.Now,the MapTask's state is 
 successed.
 2.The hadoop admin demission machine A 's NodeManager.
 3.The ApplicationMaster find the some MapTask hived finish on a demissioned 
 nodemanager, change this MapTask 's state to KILLED.
 4.Some running ReduceTask couldn't get the data from MapTask throw an event 
 TA_TOO_MANY_FETCH_FAILURE to TaskAttemptImpl.
 5.TaskAttemptImpl couldn't handle TA_TOO_MANY_FETCH_FAILURE  at KILLED state 
 then throw an exception,cause the ApplicationMaster turn to ERROR.
 I think TaskAttemptImpl could just ignore the TA_TOO_MANY_FETCH_FAILURE  
 event at KILLED state 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-3790) TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in trunk for FS scheduler

2015-06-21 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595372#comment-14595372
 ] 

Rohith Sharma K S commented on YARN-3790:
-

[~jianhe] Do you have any comments on the patch?

 TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in 
 trunk for FS scheduler
 

 Key: YARN-3790
 URL: https://issues.apache.org/jira/browse/YARN-3790
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, test
Reporter: Rohith Sharma K S
Assignee: zhihai xu
 Attachments: YARN-3790.000.patch


 Failure trace is as follows
 {noformat}
 Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 284.078 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
 testSchedulerRecovery[1](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
   Time elapsed: 6.502 sec   FAILURE!
 java.lang.AssertionError: expected:6144 but was:8192
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:853)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:342)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:241)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3001) RM dies because of divide by zero

2015-06-21 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595325#comment-14595325
 ] 

Rohith Sharma K S commented on YARN-3001:
-

Hi [~huizane], thanks for reply. 
 Would you please attach the RM logs if you have?

 RM dies because of divide by zero
 -

 Key: YARN-3001
 URL: https://issues.apache.org/jira/browse/YARN-3001
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: hoelog
Assignee: Rohith Sharma K S

 RM dies because of divide by zero exception.
 {code}
 2014-12-31 21:27:05,022 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.ArithmeticException: / by zero
 at 
 org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:656)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:570)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:851)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:900)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599)
 at java.lang.Thread.run(Thread.java:745)
 2014-12-31 21:27:05,023 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues

2015-06-26 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603382#comment-14603382
 ] 

Rohith Sharma K S commented on YARN-3849:
-

I mean for TestProportionalPreemptinPolicy.

 Too much of preemption activity causing continuos killing of containers 
 across queues
 -

 Key: YARN-3849
 URL: https://issues.apache.org/jira/browse/YARN-3849
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.7.0
Reporter: Sunil G
Assignee: Sunil G
Priority: Critical

 Two queues are used. Each queue has given a capacity of 0.5. Dominant 
 Resource policy is used.
 1. An app is submitted in QueueA which is consuming full cluster capacity
 2. After submitting an app in QueueB, there are some demand  and invoking 
 preemption in QueueA
 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that 
 all containers other than AM is getting killed in QueueA
 4. Now the app in QueueB is trying to take over cluster with the current free 
 space. But there are some updated demand from the app in QueueA which lost 
 its containers earlier, and preemption is kicked in QueueB now.
 Scenario in step 3 and 4 continuously happening in loop. Thus none of the 
 apps are completing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues

2015-06-26 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603308#comment-14603308
 ] 

Rohith Sharma K S commented on YARN-3849:
-

The below is the log trace for the issue.

In our cluster, 
there are 3 NodeManager and each with resource {{memory:327680, vCores:35}}. 
Total cluster resource is {{clusterResource: memory:983040, vCores:105}} with 
CapacityScheduler configured queue's with name *default* and *QueueA*.


 # Application app-1 is submitted to queue default and containers are started 
running the applications with 10 containers,each with {{resource: memory:1024, 
vCores:10}}. so total used is {{usedResources=memory:10240, vCores:91}}
{noformat}
default user=spark used=memory:10240, vCores:91 numContainers=10 headroom = 
memory:1024, vCores:10 user-resources=memory:10240, vCores:91
Re-sorting assigned queue: root.default stats: default: capacity=0.5, 
absoluteCapacity=0.5, usedResources=memory:10240, vCores:91, 
usedCapacity=1.733, absoluteUsedCapacity=0.867, numApps=1, 
numContainers=10
{noformat}
*NOTE : Resource allocation is by CPU DOMINANT*
After 10 container running, available NodeManagers memory is
{noformat}
linux-174, available: memory:323584, vCores:4
linux-175, available: memory:324608, vCores:5
linux-223, available: memory:324608, vCores:5
{noformat}
# Application app-2 is submitted to QueueA. ApplicationMaster container started 
running and NodeManager memory is {{available: memory:322560, vCores:3}}
 {noformat}
Assigned container container_1435072598099_0002_01_01 of capacity 
memory:1024, vCores:1 on host linux-174:26009, which has 5 containers, 
memory:5120, vCores:32 used and memory:322560, vCores:3 available after 
allocation | SchedulerNode.java:154
linux-174, available: memory:322560, vCores:3
{noformat}
# the preemption policy does the below calculation
{noformat}
2015-06-23 23:20:51,127 NAME: QueueA CUR: memory:0, vCores:0 PEN: memory:0, 
vCores:0 GAR: memory:491520, vCores:52 NORM: NaN IDEAL_ASSIGNED: memory:0, 
vCores:0 IDEAL_PREEMPT: memory:0, vCores:0 ACTUAL_PREEMPT: memory:0, 
vCores:0 UNTOUCHABLE: memory:0, vCores:0 PREEMPTABLE: memory:0, vCores:0
2015-06-23 23:20:51,128 NAME: default CUR: memory:851968, vCores:91 PEN: 
memory:0, vCores:0 GAR: memory:491520, vCores:52 NORM: 1.0 IDEAL_ASSIGNED: 
memory:851968, vCores:91 IDEAL_PREEMPT: memory:0, vCores:0 ACTUAL_PREEMPT: 
memory:0, vCores:0 UNTOUCHABLE: memory:0, vCores:0 PREEMPTABLE: 
memory:360448, vCores:39
{noformat}
In the above log , observe for the queue default *CUR is memory:851968, 
vCores:91*, but actually *usedResources=memory:10240, vCores:91*. Here, only 
CPU is matching but not MEMORY. The CUR calculation is done below formula
#* CUR=  {{clusterResource: memory:983040, vCores:105}} *  
{{absoluteUsedCapacity(0.8)}} = {{memory:851968, vCores:91}}
#* GAR=  {{clusterResource: memory:983040, vCores:105}} *  
{{absoluteCapacity(0.5)}} = {{ memory:491520, vCores:52}}
#* PREEMPTABLE= GAR - CUR = {{memory:360448, vCores:39}}
# App-2 request for the containers with {{resource: memory:1024, vCores:10}}. 
So, the preemption cycle finds that how much memory toBePreempt
{noformat}
2015-06-23 23:21:03,131 | DEBUG | SchedulingMonitor 
(ProportionalCapacityPreemptionPolicy) | 1435072863131:  NAME: default CUR: 
memory:851968, vCores:91 PEN: memory:0, vCores:0 GAR: memory:491520, 
vCores:52 NORM: NaN IDEAL_ASSIGNED: memory:491520, vCores:52 IDEAL_PREEMPT: 
memory:97043, vCores:10 ACTUAL_PREEMPT: memory:0, vCores:0 UNTOUCHABLE: 
memory:0, vCores:0 PREEMPTABLE: memory:360448, vCores:39
{noformat}
Observe that *IDEAL_PREEMPT: memory:97043, vCores:10*, but app-2 in queue 
QueueA required only 10 CPU resource to be preempt, but memory to be preempt is 
97043 but memory sufficiently available.
Below is the calculations which does IDEAL_PREMPT, 
#* totalPreemptionAllowed = clusterResource: memory:983040, vCores:105 *  0.1 
= memory:98304, vCores:10.5
#* totPreemptionNeeded = CUR - IDEAL_ASSIGNED = CUR: memory:851968, vCores:91
#* scalingFactor = Resources.divide(drc, memory:491520, vCores:52, 
memory:98304, vCores:10.5, memory:851968, vCores:91);
scalingFactor = 0.114285715
#* toBePreempted = CUR: memory:851968, vCores:91 *  
scalingFactor(0.1139045128455529) = memory:97368, vCores:10
{{resource-to-obtain = memory:97043, vCores:10}}

*So the problem is in either of the below steps*
# As [~sunilg] said, usedResources=memory:10240, vCores:91 but preemption 
policy calculate wrongly that current used capacity as {{memory:851968, 
vCores:91}}. This is mainly becaue preemption policy is using absoluteCapacity 
for calculating for Current usage which always gives wrong result for one of 
the resources in DominantResourceAllocator used. I think, fraction should not 
be used which caused problem in DRC(Multi dimentional resources) instead we 
should be usedResource from CSQueue.
# Even bypassing 

[jira] [Commented] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues

2015-06-26 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603375#comment-14603375
 ] 

Rohith Sharma K S commented on YARN-3849:
-

For the test,how it would be using parameterized test class which uses 
defaultRC and dominatRC ?

 Too much of preemption activity causing continuos killing of containers 
 across queues
 -

 Key: YARN-3849
 URL: https://issues.apache.org/jira/browse/YARN-3849
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.7.0
Reporter: Sunil G
Assignee: Sunil G
Priority: Critical

 Two queues are used. Each queue has given a capacity of 0.5. Dominant 
 Resource policy is used.
 1. An app is submitted in QueueA which is consuming full cluster capacity
 2. After submitting an app in QueueB, there are some demand  and invoking 
 preemption in QueueA
 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that 
 all containers other than AM is getting killed in QueueA
 4. Now the app in QueueB is trying to take over cluster with the current free 
 space. But there are some updated demand from the app in QueueA which lost 
 its containers earlier, and preemption is kicked in QueueB now.
 Scenario in step 3 and 4 continuously happening in loop. Thus none of the 
 apps are completing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3790) usedResource from rootQueue metrics may get stale data for FS scheduler after recovering the container

2015-06-24 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-3790:

Summary: usedResource from rootQueue metrics may get stale data for FS 
scheduler after recovering the container  (was: 
TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in trunk 
for FS scheduler)

 usedResource from rootQueue metrics may get stale data for FS scheduler after 
 recovering the container
 --

 Key: YARN-3790
 URL: https://issues.apache.org/jira/browse/YARN-3790
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, test
Reporter: Rohith Sharma K S
Assignee: zhihai xu
 Attachments: YARN-3790.000.patch


 Failure trace is as follows
 {noformat}
 Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 284.078 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
 testSchedulerRecovery[1](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
   Time elapsed: 6.502 sec   FAILURE!
 java.lang.AssertionError: expected:6144 but was:8192
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:853)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:342)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:241)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority

2015-08-07 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662754#comment-14662754
 ] 

Rohith Sharma K S commented on YARN-3250:
-

Thanks [~eepayne] and [~leftnoteasy] for suggestion. I have taken care of this 
pattern for ApplicationCLI change. 
And since current JIRA is only for admin proto changes and RMAdminCLI, 
ApplicationCLI changes are done at YARN-4014.
I have updated version-1 patch for both i.e current jira and yarn-4014, kindly 
review both the patches.

 Support admin cli interface in for Application Priority
 ---

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-3250-V1.patch


 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4034) Render cluster Max Priority in scheduler metrics in RM web UI

2015-08-07 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-4034:
---

 Summary: Render cluster Max Priority in scheduler metrics in RM 
web UI
 Key: YARN-4034
 URL: https://issues.apache.org/jira/browse/YARN-4034
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S


Currently Scheduler Metric renders the common scheduler metrics in RM web UI. 
It would be helpful for the user to know what is the configured cluster max 
priority from web UI. 
So, in RM web UI front page, Scheduler Metrics can render configured max 
cluster priority.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-07 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662757#comment-14662757
 ] 

Rohith Sharma K S commented on YARN-4014:
-

Thinking that should users i.e ApplicationClientProtocol should 
{{getClusterMaxPriority}} API exposed even though RM take care of resetting to 
clusterMax priority?? any thoughts?

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4034) Render cluster Max Priority in scheduler metrics in RM web UI

2015-08-07 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4034:

Priority: Minor  (was: Major)

 Render cluster Max Priority in scheduler metrics in RM web UI
 -

 Key: YARN-4034
 URL: https://issues.apache.org/jira/browse/YARN-4034
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
Priority: Minor

 Currently Scheduler Metric renders the common scheduler metrics in RM web UI. 
 It would be helpful for the user to know what is the configured cluster max 
 priority from web UI. 
 So, in RM web UI front page, Scheduler Metrics can render configured max 
 cluster priority.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4035) Some tests in TestRMAdminService fails with NPE

2015-08-07 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-4035:
---

 Summary: Some tests in TestRMAdminService fails with NPE 
 Key: YARN-4035
 URL: https://issues.apache.org/jira/browse/YARN-4035
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Rohith Sharma K S



It is observed that after YARN-4019 some tests are failing in 
TestRMAdminService with null pointer exceptions in build [build failure 
|https://builds.apache.org/job/PreCommit-YARN-Build/8792/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt]

{noformat}
unning org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService
Tests run: 19, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 11.541 sec  
FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService
testModifyLabelsOnNodesWithDistributedConfigurationDisabled(org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService)
  Time elapsed: 0.132 sec   ERROR!
java.lang.NullPointerException: null
at org.apache.hadoop.util.JvmPauseMonitor.stop(JvmPauseMonitor.java:86)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:601)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.stopActiveServices(ResourceManager.java:983)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:1038)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1085)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at 
org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testModifyLabelsOnNodesWithDistributedConfigurationDisabled(TestRMAdminService.java:824)

testRemoveClusterNodeLabelsWithDistributedConfigurationEnabled(org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService)
  Time elapsed: 0.121 sec   ERROR!
java.lang.NullPointerException: null
at org.apache.hadoop.util.JvmPauseMonitor.stop(JvmPauseMonitor.java:86)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:601)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.stopActiveServices(ResourceManager.java:983)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:1038)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1085)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at 
org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRemoveClusterNodeLabelsWithDistributedConfigurationEnabled(TestRMAdminService.java:867)

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4035) Some tests in TestRMAdminService fails with NPE

2015-08-07 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4035:

Affects Version/s: 2.8.0

 Some tests in TestRMAdminService fails with NPE 
 

 Key: YARN-4035
 URL: https://issues.apache.org/jira/browse/YARN-4035
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Rohith Sharma K S

 It is observed that after YARN-4019 some tests are failing in 
 TestRMAdminService with null pointer exceptions in build [build failure 
 |https://builds.apache.org/job/PreCommit-YARN-Build/8792/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt]
 {noformat}
 unning org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService
 Tests run: 19, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 11.541 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService
 testModifyLabelsOnNodesWithDistributedConfigurationDisabled(org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService)
   Time elapsed: 0.132 sec   ERROR!
 java.lang.NullPointerException: null
   at org.apache.hadoop.util.JvmPauseMonitor.stop(JvmPauseMonitor.java:86)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:601)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.stopActiveServices(ResourceManager.java:983)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:1038)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1085)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testModifyLabelsOnNodesWithDistributedConfigurationDisabled(TestRMAdminService.java:824)
 testRemoveClusterNodeLabelsWithDistributedConfigurationEnabled(org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService)
   Time elapsed: 0.121 sec   ERROR!
 java.lang.NullPointerException: null
   at org.apache.hadoop.util.JvmPauseMonitor.stop(JvmPauseMonitor.java:86)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:601)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.stopActiveServices(ResourceManager.java:983)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:1038)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1085)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRemoveClusterNodeLabelsWithDistributedConfigurationEnabled(TestRMAdminService.java:867)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-13 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695038#comment-14695038
 ] 

Rohith Sharma K S commented on YARN-4014:
-

Tried the syntax app-id,but options does not take app-id as valid input. May 
this is the reason other commands has camel case.

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4014) Support user cli interface in for Application Priority

2015-08-13 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4014:

Attachment: 0001-YARN-4014.patch

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-13 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695085#comment-14695085
 ] 

Rohith Sharma K S commented on YARN-4014:
-

Updated the working patch with test cases, kindly review it.

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-13 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695084#comment-14695084
 ] 

Rohith Sharma K S commented on YARN-4014:
-

thanks Sunil G for the review.. 
bq. In ApplicationCLI, public static final String SET_PRIORITY = setPriority;
Done, changed to updatePriority

bq. In future --appId can be used with other parameters also, correct?
Yes, Done

bq. updateApplicationPriority can throw NumberFormatException
Since exception is directly thrown back to client cli, I think this should be 
fine.

bq. ClientRMService.java has few commented code.
Yes , Since YARN-3887 was not committed, I was used that patch to compile but 
while uploading patch I commented for HadoopQA compilation. Now I have 
uncommented those lines.



 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3924) Submitting an application to standby ResourceManager should respond better than Connection Refused

2015-08-12 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694745#comment-14694745
 ] 

Rohith Sharma K S commented on YARN-3924:
-

bq. A more informative error message might be enough here? 
Yes, user wants to differentiate RM state like *StandbyRM* VS *Not Started 
RM/attempt to connect invalid RM ha-ids*. So error message would help more.

 Submitting an application to standby ResourceManager should respond better 
 than Connection Refused
 --

 Key: YARN-3924
 URL: https://issues.apache.org/jira/browse/YARN-3924
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Dustin Cote
Assignee: Ajith S
Priority: Minor

 When submitting an application directly to a standby resource manager, the 
 resource manager responds with 'Connection Refused' rather than indicating 
 that it is a standby resource manager.  Because the resource manager is aware 
 of its own state, I feel like we can have the 8032 port open for standby 
 resource managers and reject the request with something like 'Cannot process 
 application submission from this standby resource manager'.  
 This would be especially helpful for debugging oozie problems when users put 
 in the wrong address for the 'jobtracker' (i.e. they don't put the logical RM 
 address but rather point to a specific resource manager).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3924) Submitting an application to standby ResourceManager should respond better than Connection Refused

2015-08-13 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694770#comment-14694770
 ] 

Rohith Sharma K S commented on YARN-3924:
-

bq. None of the RMs specified by ha-ids appear to be active.
This error message would be more appropriate to me.

 Submitting an application to standby ResourceManager should respond better 
 than Connection Refused
 --

 Key: YARN-3924
 URL: https://issues.apache.org/jira/browse/YARN-3924
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Dustin Cote
Assignee: Ajith S
Priority: Minor

 When submitting an application directly to a standby resource manager, the 
 resource manager responds with 'Connection Refused' rather than indicating 
 that it is a standby resource manager.  Because the resource manager is aware 
 of its own state, I feel like we can have the 8032 port open for standby 
 resource managers and reject the request with something like 'Cannot process 
 application submission from this standby resource manager'.  
 This would be especially helpful for debugging oozie problems when users put 
 in the wrong address for the 'jobtracker' (i.e. they don't put the logical RM 
 address but rather point to a specific resource manager).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3689) FifoComparator logic is wrong. In method compare in FifoPolicy.java file, the s1 and s2 should change position when compare priority

2015-08-12 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694687#comment-14694687
 ] 

Rohith Sharma K S commented on YARN-3689:
-

As per the application priority design, *Higher Interger* indicates *Higher 
priority*, so comparator implementation seems to be fine for me. And the test 
by [~ajithshetty] also proving higher prioirty value i.e 2 is first in the list.

 FifoComparator logic is wrong. In method compare in FifoPolicy.java file, 
 the s1 and s2 should change position when compare priority 
 -

 Key: YARN-3689
 URL: https://issues.apache.org/jira/browse/YARN-3689
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, scheduler
Affects Versions: 2.5.0
Reporter: zhoulinlin
Assignee: Ajith S

 In method compare in FifoPolicy.java file, the s1 and s2 should 
 change position when compare priority.
 I did a test. Configured the schedulerpolicy fifo,  submitted 2 jobs to the 
 same queue.
 The result is below:
 2015-05-20 11:57:41,449 DEBUG 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
 before sort --  
 2015-05-20 11:57:41,449 DEBUG 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
 appName:application_1432094103221_0001 appPririty:4  
 appStartTime:1432094170038
 2015-05-20 11:57:41,449 DEBUG 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
 appName:application_1432094103221_0002 appPririty:2  
 appStartTime:1432094173131
 2015-05-20 11:57:41,449 DEBUG 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
 after sort % 
 2015-05-20 11:57:41,449 DEBUG 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
 appName:application_1432094103221_0001 appPririty:4  
 appStartTime:1432094170038  
 2015-05-20 11:57:41,449 DEBUG 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
 appName:application_1432094103221_0002 appPririty:2  
 appStartTime:1432094173131  
 But when change the s1 and s2 position like below:
 public int compare(Schedulable s1, Schedulable s2) {
   int res = s2.getPriority().compareTo(s1.getPriority());
 .}
 The result:
 2015-05-20 11:36:37,119 DEBUG 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
 before sort -- 
 2015-05-20 11:36:37,119 DEBUG 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
 appName:application_1432090734333_0009 appPririty:4  
 appStartTime:1432092992503
 2015-05-20 11:36:37,119 DEBUG 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
 appName:application_1432090734333_0010 appPririty:2  
 appStartTime:1432092996437
 2015-05-20 11:36:37,119 DEBUG 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
 after sort % 
 2015-05-20 11:36:37,119 DEBUG 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
 appName:application_1432090734333_0010 appPririty:2  
 appStartTime:1432092996437
 2015-05-20 11:36:37,119 DEBUG 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
 appName:application_1432090734333_0009 appPririty:4  
 appStartTime:1432092992503 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-17 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700755#comment-14700755
 ] 

Rohith Sharma K S commented on YARN-4014:
-

Updated the patch fixing race condition in updating the priority vs 
SchedulerApplicationAttemp creation which would take up old priority rather 
than  updated priority.

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4014) Support user cli interface in for Application Priority

2015-08-17 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4014:

Attachment: 0002-YARN-4017.patch

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
 0002-YARN-4017.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-17 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699591#comment-14699591
 ] 

Rohith Sharma K S commented on YARN-4014:
-

bq. we can make updateApplicationPriority throw an 
ApplicationNotRunningException and let client catch the exception and prints 
“Application not running “ msg
In {{ClientRMService#updateApplicationPriority}}, update priority to scheduler 
will not be called if application is in NEW , NEW_SAVING also. So I feel having 
new exception ApplicationNotRunningException would lead to confusion. I think 
we can throw YarnException with message Application in app-state state 
cannot be update priority.  Any thoughts?



 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4014) Support user cli interface in for Application Priority

2015-08-17 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4014:

Attachment: 0004-YARN-4014.patch

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-17 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700753#comment-14700753
 ] 

Rohith Sharma K S commented on YARN-4014:
-

bq. That means the updated priority is lost
Discussed offline with Jian He, updated priority wont be lost if application is 
in ACCEPTED state.

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-17 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699930#comment-14699930
 ] 

Rohith Sharma K S commented on YARN-4014:
-

I did the above check leaving SUBMITTED, ACCEPTED, RUNNING state because 
thinking that application priority should be able to update in these states. 
Should we update only for RUNNING? I feel these states should be allowed to 
change priority. What do you think?

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
 0002-YARN-4014.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-18 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702421#comment-14702421
 ] 

Rohith Sharma K S commented on YARN-4014:
-

test failures are unrelated to this patch.

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, 
 0004-YARN-4014.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708470#comment-14708470
 ] 

Rohith Sharma K S commented on YARN-3893:
-

I had closer look at either of the solutions as above. One of the potential 
issue in both are
# Moving createAndInitService just before starting activeServices in 
transitionToActive. 
## switch time will be impacted since every transitionToActive initializes 
active services.
## And RMWebApp has dependency on clienRMService for starting webapps. Without 
clientRMService initialization, RMWebapp can not be started. 
# Moving refreshAll before transitionToActive in adminService is same as 
triggering RMAdminCIi on standby node. This call throw StandByException and 
retried to active RM in RMAdminCli. When it comes to  
AdminService#transitionedToActive(), refreshing before 
{{rm.transitionedToActive}} throws an standby exception. 



 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708471#comment-14708471
 ] 

Rohith Sharma K S commented on YARN-3893:
-

I think for any configuration issues while transitioningToActive, Adminservice 
should not allow JVM to continue. Because if AdminService throws any exception 
back to elector, elector again try to make RM active which goes in loop forever 
filling the logs. 
There could be 2 calls can lead to point of failures i.e first 
{{rm.transitionedToActive}}, second {{refreshAll()}}. 
# If any failures in {{rm.transitionedToActive}} then RM services will be 
stopped and RM will be in STANDBY state.
# If {{refreshAll()}} fails, BOTH RM will be in ACTIVE state as per this 
defect. Continuing RM services with invalid configuration does not good idea. 
Moreover invalid configurations should be notified to user immediately. So it 
would be better to make use of fail-fast configuration to exit the RM JVM. If 
this configuration is set to false , then call {{rm.handleTransitionToStandBy}}.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4044) Running applications information changes such as movequeue is not published to TimeLine server

2015-08-20 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706224#comment-14706224
 ] 

Rohith Sharma K S commented on YARN-4044:
-

Thanks [~sunilg] for the patch.. The patch mostly looks good to me.. Have you 
verified in the real cluster?

 Running applications information changes such as movequeue is not published 
 to TimeLine server
 --

 Key: YARN-4044
 URL: https://issues.apache.org/jira/browse/YARN-4044
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, timelineserver
Affects Versions: 2.7.0
Reporter: Sunil G
Assignee: Sunil G
Priority: Critical
 Attachments: 0001-YARN-4044.patch


 SystemMetricsPublisher need to expose an appUpdated api to update any change 
 for a running application.
 Events can be 
   - change of queue for a running application.
 - change of application priority for a running application.
 This ticket intends to handle both RM and timeline side changes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-08-21 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706319#comment-14706319
 ] 

Rohith Sharma K S commented on YARN-3896:
-

Thanks [~hex108] for the patch, overall patch looks good to me.. Verified the 
tests without source, it is failing every time.. 
nit: Can you add public modifier to the interface api i.e. {{void 
resetLastNodeHeartBeatResponse();}}?

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch, 
 YARN-3896.03.patch, YARN-3896.04.patch, YARN-3896.05.patch, YARN-3896.06.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-08-21 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706464#comment-14706464
 ] 

Rohith Sharma K S commented on YARN-3896:
-

Thanks for the clariffication..

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch, 
 YARN-3896.03.patch, YARN-3896.04.patch, YARN-3896.05.patch, YARN-3896.06.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-08-21 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-3896:

Attachment: 0001-YARN-3896.patch

When I applying the patch, patch apply was failing for 2 chunks in RMNodeImpl. 
So rebased patch against trunk and uploading to check Jenkins result.. Once 
HadooQA runs, will commit it.. 

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
 YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
 YARN-3896.05.patch, YARN-3896.06.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-20 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706269#comment-14706269
 ] 

Rohith Sharma K S commented on YARN-4014:
-

bq. When 2nd or subsequent AM attempt is spawned, we are never setting the old 
attempt as null in SchedulerApplication, correct? Hence there is a chance that 
we set priority to old attempt while new attempt is getting created.. 
Right.. Since latest priority has been reset to attempt after attempt got 
updated in the SchedulerApplication#setCurrentAttempt, I think there would NOT 
ocur any possibility where currentAttempt has old priority. So I believe 
currentAttempt NEED NOT to be volatile.
[~jianhe] Could you give your opinion on this?

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, 
 0004-YARN-4014.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4014) Support user cli interface in for Application Priority

2015-08-17 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4014:

Attachment: 0002-YARN-4014.patch

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
 0002-YARN-4014.patch, 0002-YARN-4017.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4014) Support user cli interface in for Application Priority

2015-08-17 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4014:

Attachment: (was: 0002-YARN-4017.patch)

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
 0002-YARN-4014.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-17 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699628#comment-14699628
 ] 

Rohith Sharma K S commented on YARN-4014:
-

Updating the modified patch, kindly review the patch.

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
 0002-YARN-4014.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4014) Support user cli interface in for Application Priority

2015-08-18 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4014:

Attachment: 0004-YARN-4014.patch

Updating the same with fixing java doc issues.. Kick off jenkins

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, 
 0004-YARN-4014.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority

2015-08-18 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701292#comment-14701292
 ] 

Rohith Sharma K S commented on YARN-3250:
-

[~sunilg] [~jianhe] would you have look at patch please?  I will rebase the 
patch based on the review comments.

 Support admin cli interface in for Application Priority
 ---

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch


 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3986) getTransferredContainers in AbstractYarnScheduler should be present in YarnScheduler interface instead

2015-08-18 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701334#comment-14701334
 ] 

Rohith Sharma K S commented on YARN-3986:
-

+1 for the latest patch..

 getTransferredContainers in AbstractYarnScheduler should be present in 
 YarnScheduler interface instead
 --

 Key: YARN-3986
 URL: https://issues.apache.org/jira/browse/YARN-3986
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3986.01.patch, YARN-3986.02.patch, 
 YARN-3986.03.patch


 Currently getTransferredContainers is present in {{AbstractYarnScheduler}}.
 *But in ApplicationMasterService, while registering AM, we are calling this 
 method by typecasting it to AbstractYarnScheduler, which is incorrect.*
 This method should be moved to YarnScheduler.
 Because if a custom scheduler is to be added, it will implement 
 YarnScheduler, not AbstractYarnScheduler.
 As ApplicationMasterService is calling getTransferredContainers by 
 typecasting it to AbstractYarnScheduler, it is imposing an indirect 
 dependency on AbstractYarnScheduler for any pluggable custom scheduler.
 We can move the method to YarnScheduler and leave the definition in 
 AbstractYarnScheduler as it is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4014) Support user cli interface in for Application Priority

2015-08-17 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4014:

Attachment: 0003-YARN-4014.patch

updating the patch that check only for ACCEPTED and RUNNING application state 
before updating priority of an application.

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
 0002-YARN-4014.patch, 0003-YARN-4014.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-17 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1461#comment-1461
 ] 

Rohith Sharma K S commented on YARN-4014:
-

If the application is in SUBMITTED state, update priority should not be called 
because application would not be added to scheduler. In ACCEPTED state, update 
priority can be called. 
One of the doubt Jian He hasis  if application is in ACCEPTED state, then 
application attempt would not be created. I rechecked the code flow, where we 
can do  update in ACCEPTED state even though application is not created. IIRR, 
While doing YARN-3887, this specific scenario we discussed and handled *null* 
entry adding to SchedulableEntity.

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
 0002-YARN-4014.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-16 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698908#comment-14698908
 ] 

Rohith Sharma K S commented on YARN-3893:
-

Sorry for coming very late.. This issue has become stale, need to move forward!!
Regarding the patch, 
# Instead of setting boolean flag for reinitActiveServices in AdminService and 
other changes, moving {{createAndInitActiveServices();}} from 
transitionedToStandby to just before starting activeServices would solve such 
issues. And on exception transitioningToActive, handle add method 
stopActiveServices in ResourceManager#transitioningToActive() only. 
# Probably we can remove refreshAll() from AdminService#transitioneToActive if 
the above approach.

Any thoughts?

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously

2015-08-24 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-3896:

Labels: resourcemanager  (was: )

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset synchronously
 -

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
  Labels: resourcemanager
 Fix For: 2.8.0

 Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
 YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
 YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously

2015-08-24 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-3896:

Summary: RMNode transitioned from RUNNING to REBOOTED because its response 
id had not been reset synchronously  (was: RMNode transitioned from RUNNING to 
REBOOTED because its response id had not been reset)

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset synchronously
 -

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Fix For: 2.8.0

 Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
 YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
 YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-08-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708810#comment-14708810
 ] 

Rohith Sharma K S commented on YARN-3896:
-

Test failures are unrelated to the patch.. committing shortly..

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
 YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
 YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-842) Resource Manager Node Manager UI's doesn't work with IE

2015-08-24 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709098#comment-14709098
 ] 

Rohith Sharma K S commented on YARN-842:


I verified in the IE9 and greater, able to view the applicaitons. Does this 
issue still anyone facing in community? else can it be closed?

 Resource Manager  Node Manager UI's doesn't work with IE
 -

 Key: YARN-842
 URL: https://issues.apache.org/jira/browse/YARN-842
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Devaraj K

 {code:xml}
 Webpage error details
 User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; 
 SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media 
 Center PC 6.0)
 Timestamp: Mon, 17 Jun 2013 12:06:03 UTC
 Message: 'JSON' is undefined
 Line: 41
 Char: 218
 Code: 0
 URI: http://10.18.40.24:8088/cluster/apps
 {code}
 RM  NM UI's are not working with IE and showing the above error for every 
 link on the UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3919) NPEs' while stopping service after exception during CommonNodeLabelsManager#start

2015-07-29 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-3919:

Attachment: 0003-YARN-3919.patch

The current patch not able to apply in my machine. So regenerating the same 
patches from my machine.  Uploading to HadoopQA to kick off before commmit..

 NPEs' while stopping service after exception during 
 CommonNodeLabelsManager#start
 -

 Key: YARN-3919
 URL: https://issues.apache.org/jira/browse/YARN-3919
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: 0003-YARN-3919.patch, YARN-3919.01.patch, 
 YARN-3919.02.patch


 We get NPE during CommonNodeLabelsManager#serviceStop and 
 AsyncDispatcher#serviceStop if ConnectException on call to 
 CommonNodeLabelsManager#serviceStart occurs.
 {noformat}
 2015-07-10 19:39:37,825 WARN main-EventThread 
 org.apache.hadoop.service.AbstractService: When stopping the service 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : 
 java.lang.NullPointerException
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.close(FileSystemNodeLabelsStore.java:99)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:278)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
 at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:588)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:998)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1039)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 {noformat}
 {noformat}
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:142)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
 at 
 org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 {noformat}
 These NPEs' fill up the logs. Although, this doesn't cause any functional 
 issue but its a nuisance and we ideally should have null checks in 
 serviceStop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3919) NPEs' while stopping service after exception during CommonNodeLabelsManager#start

2015-07-29 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646448#comment-14646448
 ] 

Rohith Sharma K S commented on YARN-3919:
-

No... git apply --whitespace=fix  patch-file

 NPEs' while stopping service after exception during 
 CommonNodeLabelsManager#start
 -

 Key: YARN-3919
 URL: https://issues.apache.org/jira/browse/YARN-3919
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: 0003-YARN-3919.patch, YARN-3919.01.patch, 
 YARN-3919.02.patch


 We get NPE during CommonNodeLabelsManager#serviceStop and 
 AsyncDispatcher#serviceStop if ConnectException on call to 
 CommonNodeLabelsManager#serviceStart occurs.
 {noformat}
 2015-07-10 19:39:37,825 WARN main-EventThread 
 org.apache.hadoop.service.AbstractService: When stopping the service 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : 
 java.lang.NullPointerException
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.close(FileSystemNodeLabelsStore.java:99)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:278)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
 at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:588)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:998)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1039)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 {noformat}
 {noformat}
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:142)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
 at 
 org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 {noformat}
 These NPEs' fill up the logs. Although, this doesn't cause any functional 
 issue but its a nuisance and we ideally should have null checks in 
 serviceStop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime

2015-07-28 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644008#comment-14644008
 ] 

Rohith Sharma K S commented on YARN-3887:
-

Hi [~sunilg] For REST support, proto changes are not done, but for admin/user 
proto changes to be done. So I mean it can be done separate jira

 Support for changing Application priority during runtime
 

 Key: YARN-3887
 URL: https://issues.apache.org/jira/browse/YARN-3887
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3887.patch


 After YARN-2003, adding support to change priority of an application after 
 submission. This ticket will handle the server side implementation for same.
 A new RMAppEvent will be created to handle this, and will be common for all 
 schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3948) Display Application Priority in RM Web UI

2015-07-28 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644150#comment-14644150
 ] 

Rohith Sharma K S commented on YARN-3948:
-

+lgtm, [~sunilg] would have look at the findbugs failures?

 Display Application Priority in RM Web UI
 -

 Key: YARN-3948
 URL: https://issues.apache.org/jira/browse/YARN-3948
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: webapp
Affects Versions: 2.7.1
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3948.patch, 0002-YARN-3948.patch, 
 0003-YARN-3948.patch, ApplicationPage.png, ClusterPage.png


 Application Priority can be displayed in RM Web UI Application page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3919) NPEs' while stopping service after exception during CommonNodeLabelsManager#start

2015-07-28 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644169#comment-14644169
 ] 

Rohith Sharma K S commented on YARN-3919:
-

+1 for trivial change, lgtm.. will commit it..

 NPEs' while stopping service after exception during 
 CommonNodeLabelsManager#start
 -

 Key: YARN-3919
 URL: https://issues.apache.org/jira/browse/YARN-3919
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3919.01.patch, YARN-3919.02.patch


 We get NPE during CommonNodeLabelsManager#serviceStop and 
 AsyncDispatcher#serviceStop if ConnectException on call to 
 CommonNodeLabelsManager#serviceStart occurs.
 {noformat}
 2015-07-10 19:39:37,825 WARN main-EventThread 
 org.apache.hadoop.service.AbstractService: When stopping the service 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : 
 java.lang.NullPointerException
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.close(FileSystemNodeLabelsStore.java:99)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:278)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
 at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:588)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:998)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1039)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 {noformat}
 {noformat}
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:142)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
 at 
 org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 {noformat}
 These NPEs' fill up the logs. Although, this doesn't cause any functional 
 issue but its a nuisance and we ideally should have null checks in 
 serviceStop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2015-07-29 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646510#comment-14646510
 ] 

Rohith Sharma K S commented on YARN-3979:
-

Oops, 50 lakh events 
I checked the attached logs, since you have attached only ERROR logs, did not 
able to trace it. One observation is there are many InvalidStateTransitions 
events CLEAN_UP  in RMNodeImpl. 
# Would you possible give RM logs, if not able to attach  to JIRA, could you 
send me through mail. 
# would give more info like what is the cluster size? how much is apps are 
running? how many were completed? What is the state of state of NodeManager i.e 
whether they are running OR any other state? Which version  of Hadoop are you 
using?

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao
 Attachments: ERROR103.log


 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3919) NPEs' while stopping service after exception during CommonNodeLabelsManager#start

2015-07-29 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-3919:

Priority: Trivial  (was: Major)

 NPEs' while stopping service after exception during 
 CommonNodeLabelsManager#start
 -

 Key: YARN-3919
 URL: https://issues.apache.org/jira/browse/YARN-3919
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Trivial
 Fix For: 2.8.0

 Attachments: 0003-YARN-3919.patch, YARN-3919.01.patch, 
 YARN-3919.02.patch


 We get NPE during CommonNodeLabelsManager#serviceStop and 
 AsyncDispatcher#serviceStop if ConnectException on call to 
 CommonNodeLabelsManager#serviceStart occurs.
 {noformat}
 2015-07-10 19:39:37,825 WARN main-EventThread 
 org.apache.hadoop.service.AbstractService: When stopping the service 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : 
 java.lang.NullPointerException
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.close(FileSystemNodeLabelsStore.java:99)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:278)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
 at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:588)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:998)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1039)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 {noformat}
 {noformat}
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:142)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
 at 
 org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 {noformat}
 These NPEs' fill up the logs. Although, this doesn't cause any functional 
 issue but its a nuisance and we ideally should have null checks in 
 serviceStop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin/user cli interface in for Application Priority

2015-07-29 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646490#comment-14646490
 ] 

Rohith Sharma K S commented on YARN-3250:
-

Adding to User API discussion, 
the ApplicationCLI command can be {{./yarn application appId --set-priority 
ApplicationId --priority value}}

 Support admin/user cli interface in for Application Priority
 

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S

 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.

2015-07-29 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646598#comment-14646598
 ] 

Rohith Sharma K S commented on YARN-3543:
-

I got what you mean!! Right.. Modifying other files like *ApplicationStartData* 
and others are related to applicationhistoryservice I think. Is it so?

 ApplicationReport should be able to tell whether the Application is AM 
 managed or not. 
 ---

 Key: YARN-3543
 URL: https://issues.apache.org/jira/browse/YARN-3543
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Spandan Dutta
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 
 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 
 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, 
 0005-YARN-3543.patch, 0006-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG


 Currently we can know whether the application submitted by the user is AM 
 managed from the applicationSubmissionContext. This can be only done  at the 
 time when the user submits the job. We should have access to this info from 
 the ApplicationReport as well so that we can check whether an app is AM 
 managed or not anytime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin/user cli interface in for Application Priority

2015-07-29 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646478#comment-14646478
 ] 

Rohith Sharma K S commented on YARN-3250:
-

Hi [~sunilg]
   As part of this JIRA, 
# User API : 
## I am planning to introduce 
{{ApplicationClientProtocol#setPriority(SetApplicationProrityRequest)}}. 
*SetApplicationProrityRequest* comprises of ApplicationId and Priority. The 
clientRMService invokes API introduced by YARN-3887 i.e. 
updateApplicationPriority();
## Thinking that does getPriority is required at user side?  I feel that, since 
ApplicationReport can gives the priority of an application, this API is NOT 
required to  have. What do you suggests, any thoughts?

# Admin API :
## As admin, he should be able to change the *cluster-max-application-priority* 
value. Having an rmadmin API would be great!!. But one issue in with api is 
that cluster-max-application-priority is inmemory, but when rmadmin updates it, 
inmemory value can be updated. But in HA/Restart cases, the  configuration 
value is taken. So I suggest to store cluster-level-application-priority in 
store and whenever RM is switched/Restarted, give higher preference to store. 
What do you think about this approach? 

Apart from above API's , should there any new API's to be added? Kindly share 
your thoughts?

 Support admin/user cli interface in for Application Priority
 

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S

 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.

2015-07-29 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646568#comment-14646568
 ] 

Rohith Sharma K S commented on YARN-3543:
-

Thanks [~xgong] for review..
bq. But we still made some un-necessary changes.
sorry could not get what are un necessary changes. Could you explain please?

 ApplicationReport should be able to tell whether the Application is AM 
 managed or not. 
 ---

 Key: YARN-3543
 URL: https://issues.apache.org/jira/browse/YARN-3543
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Spandan Dutta
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 
 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 
 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, 
 0005-YARN-3543.patch, 0006-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG


 Currently we can know whether the application submitted by the user is AM 
 managed from the applicationSubmissionContext. This can be only done  at the 
 time when the user submits the job. We should have access to this info from 
 the ApplicationReport as well so that we can check whether an app is AM 
 managed or not anytime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin/user cli interface in for Application Priority

2015-07-29 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646495#comment-14646495
 ] 

Rohith Sharma K S commented on YARN-3250:
-

small correction in above syntax. Correct syntax is {{./yarn application 
--set-priority ApplicationId --priority value}}

 Support admin/user cli interface in for Application Priority
 

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S

 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.

2015-07-29 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646622#comment-14646622
 ] 

Rohith Sharma K S commented on YARN-3543:
-

I have one doubt that whether it is able to render on timeline web UI. I 
remember that these changes I did for timeline web UI fetching the data. Anyway 
I will verify it tomorrow and confirm does it required. 

 ApplicationReport should be able to tell whether the Application is AM 
 managed or not. 
 ---

 Key: YARN-3543
 URL: https://issues.apache.org/jira/browse/YARN-3543
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Spandan Dutta
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 
 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 
 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, 
 0005-YARN-3543.patch, 0006-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG


 Currently we can know whether the application submitted by the user is AM 
 managed from the applicationSubmissionContext. This can be only done  at the 
 time when the user submits the job. We should have access to this info from 
 the ApplicationReport as well so that we can check whether an app is AM 
 managed or not anytime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin/user cli interface in for Application Priority

2015-07-30 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647507#comment-14647507
 ] 

Rohith Sharma K S commented on YARN-3250:
-

bq. I think one problem is that if there's ever a value set in state-store, RM 
cannot pick up the value using the config any more
I see, I agree. configuration files would become stale after one 
restart/switch. How about having command that read yarn-site.xml specific 
configurations very likely similar to {{./yarn rmadmin refreshAdminAcls}}.This 
read *yarn.admin.acl* from yarn-site.xml configuration when refreshAdminAcls 
invoked. Similar line, setting cluster-max-application-priorty would be 
{{./yarn rmadmin refreshClusterMaxPriority}} or {{./yarn rmadmin 
refreshClusterPriority}}. Thoughts?

bq. How about yarn application ApplicationId -setPriority priority ?
Make sense.

 Support admin/user cli interface in for Application Priority
 

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S

 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2015-07-29 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647177#comment-14647177
 ] 

Rohith Sharma K S commented on YARN-3979:
-

Thanks for the information!!
bq. NodeManager in one times all lost and recovery for a monment
I can think of the scenario very close to YARN-3990. Since you have 2 lakh apps 
completed and 1600 NodeManager, when the all the nodes lost and reconnected, 
the number of events that generated are {{(2lakh completed + 550 running = 
200550)*1600(number of NodeManager) = 32088}} events..Ooops!!!

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao
 Attachments: ERROR103.log


 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime

2015-08-03 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653002#comment-14653002
 ] 

Rohith Sharma K S commented on YARN-3887:
-

One comment
# TreeSet will throw NullPointerException while adding/removing null object. 
Suppose, SchedulintApplicationAttempt is not created then 
{{application.getCurrentAppAttempt()}} will be null which would throw NPE. I 
think this has to be handled in 
{{AbstractComparatorOrderingPolicy#removeSchedulableEntity}} and 
{{AbstractComparatorOrderingPolicy#addSchedulableEntity}}

 Support for changing Application priority during runtime
 

 Key: YARN-3887
 URL: https://issues.apache.org/jira/browse/YARN-3887
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3887.patch, 0002-YARN-3887.patch, 
 0003-YARN-3887.patch, 0004-YARN-3887.patch, 0005-YARN-3887.patch


 After YARN-2003, adding support to change priority of an application after 
 submission. This ticket will handle the server side implementation for same.
 A new RMAppEvent will be created to handle this, and will be common for all 
 schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-03 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653078#comment-14653078
 ] 

Rohith Sharma K S commented on YARN-4014:
-

Basic API's discussions were done in YARN-3250, 
[comment1|https://issues.apache.org/jira/browse/YARN-3250?focusedCommentId=14646478page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14646478].
 Just reiterating the discussion summary here
# User API : 
## For changing priority of an application, the API 
{{ApplicationClientProtocol#setApplicationPriority(SetApplicationProrityRequest)}}
 wil be added. *SetApplicationProrityRequest comprises of ApplicationId and 
Priority*. The clientRMService invokes API introduced by YARN-3887 i.e. 
updateApplicationPriority();
## For getting prioryt of any applicaiton, there will be NO api's will be 
added. Retriving an priority of any application can be done using 
ApplicationReport after YARN-3948 committed.

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S

 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3948) Display Application Priority in RM Web UI

2015-08-03 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653024#comment-14653024
 ] 

Rohith Sharma K S commented on YARN-3948:
-

Hi [~sunilg], 
Would  you rebase the patch since YARN-3543 has committed !!

 Display Application Priority in RM Web UI
 -

 Key: YARN-3948
 URL: https://issues.apache.org/jira/browse/YARN-3948
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: webapp
Affects Versions: 2.7.1
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3948.patch, 0002-YARN-3948.patch, 
 0003-YARN-3948.patch, ApplicationPage.png, ClusterPage.png


 Application Priority can be displayed in RM Web UI Application page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin/user cli interface in for Application Priority

2015-08-03 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653033#comment-14653033
 ] 

Rohith Sharma K S commented on YARN-3250:
-

How about passing any option for specifying applicationId, i.e {{./yarn 
application --appId Applicationid --setPriority value}}?

 Support admin/user cli interface in for Application Priority
 

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S

 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.

2015-08-03 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653023#comment-14653023
 ] 

Rohith Sharma K S commented on YARN-3543:
-

Thanks [~xgong] for review and commit. I really appreciate your detailed review 
:-)

 ApplicationReport should be able to tell whether the Application is AM 
 managed or not. 
 ---

 Key: YARN-3543
 URL: https://issues.apache.org/jira/browse/YARN-3543
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.6.0
Reporter: Spandan Dutta
Assignee: Rohith Sharma K S
 Fix For: 2.8.0

 Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 
 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 
 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, 
 0005-YARN-3543.patch, 0006-YARN-3543.patch, 0007-YARN-3543.patch, 
 YARN-3543-AH.PNG, YARN-3543-RM.PNG


 Currently we can know whether the application submitted by the user is AM 
 managed from the applicationSubmissionContext. This can be only done  at the 
 time when the user submits the job. We should have access to this info from 
 the ApplicationReport as well so that we can check whether an app is AM 
 managed or not anytime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4014) Support user cli interface in for Application Priority

2015-08-03 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-4014:
---

 Summary: Support user cli interface in for Application Priority
 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S


Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3250) Support admin cli interface in for Application Priority

2015-08-03 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-3250:

Summary: Support admin cli interface in for Application Priority  (was: 
Support admin/user cli interface in for Application Priority)

 Support admin cli interface in for Application Priority
 ---

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S

 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin/user cli interface in for Application Priority

2015-08-03 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653072#comment-14653072
 ] 

Rohith Sharma K S commented on YARN-3250:
-

Moving user CLI(ApplicationClientProtocol) changes to separate jira YARN-4014 
for more distinguish discussions and reviews!!

 Support admin/user cli interface in for Application Priority
 

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S

 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.

2015-07-31 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648817#comment-14648817
 ] 

Rohith Sharma K S commented on YARN-3543:
-

Thanks [~xgong]  for identifying ApplicationHistoryServer modifications that 
are not required at all.. !!

Updated the patch by removing ApplicationHistoryServer modifications. This 
patchs contains only TimeLineServer modifications. 
I verified the patch in cluster to check Timeline WebUi is rendering 
*unmanagedApplication*. And also verified with REST apis for obtaininig 
applicationReport.
[~xgong] would have look at updated patch please?

 ApplicationReport should be able to tell whether the Application is AM 
 managed or not. 
 ---

 Key: YARN-3543
 URL: https://issues.apache.org/jira/browse/YARN-3543
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Spandan Dutta
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 
 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 
 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, 
 0005-YARN-3543.patch, 0006-YARN-3543.patch, 0007-YARN-3543.patch, 
 YARN-3543-AH.PNG, YARN-3543-RM.PNG


 Currently we can know whether the application submitted by the user is AM 
 managed from the applicationSubmissionContext. This can be only done  at the 
 time when the user submits the job. We should have access to this info from 
 the ApplicationReport as well so that we can check whether an app is AM 
 managed or not anytime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.

2015-07-31 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-3543:

Attachment: 0007-YARN-3543.patch

 ApplicationReport should be able to tell whether the Application is AM 
 managed or not. 
 ---

 Key: YARN-3543
 URL: https://issues.apache.org/jira/browse/YARN-3543
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Spandan Dutta
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 
 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 
 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, 
 0005-YARN-3543.patch, 0006-YARN-3543.patch, 0007-YARN-3543.patch, 
 YARN-3543-AH.PNG, YARN-3543-RM.PNG


 Currently we can know whether the application submitted by the user is AM 
 managed from the applicationSubmissionContext. This can be only done  at the 
 time when the user submits the job. We should have access to this info from 
 the ApplicationReport as well so that we can check whether an app is AM 
 managed or not anytime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3990) AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is connected

2015-07-29 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-3990:
---

 Summary: AsyncDispatcher may overloaded with RMAppNodeUpdateEvent 
when Node is connected 
 Key: YARN-3990
 URL: https://issues.apache.org/jira/browse/YARN-3990
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith Sharma K S
Priority: Critical


Whenever node is added or removed, NodeListManager sends RMAppNodeUpdateEvent 
to all the applications that are in the rmcontext. But for 
finished/killed/failed applications it is not required to send these events. 
Additional check for wheather app is finished/killed/failed would minimizes the 
unnecessary events

{code}
  public void handle(NodesListManagerEvent event) {
RMNode eventNode = event.getNode();
switch (event.getType()) {
case NODE_UNUSABLE:
  LOG.debug(eventNode +  reported unusable);
  unusableRMNodesConcurrentSet.add(eventNode);
  for(RMApp app: rmContext.getRMApps().values()) {
this.rmContext
.getDispatcher()
.getEventHandler()
.handle(
new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
RMAppNodeUpdateType.NODE_UNUSABLE));
  }
  break;
case NODE_USABLE:
  if (unusableRMNodesConcurrentSet.contains(eventNode)) {
LOG.debug(eventNode +  reported usable);
unusableRMNodesConcurrentSet.remove(eventNode);
  }
  for (RMApp app : rmContext.getRMApps().values()) {
this.rmContext
.getDispatcher()
.getEventHandler()
.handle(
new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
RMAppNodeUpdateType.NODE_USABLE));
  }
  break;
default:
  LOG.error(Ignoring invalid eventtype  + event.getType());
}
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime

2015-07-28 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645425#comment-14645425
 ] 

Rohith Sharma K S commented on YARN-3887:
-

thanks [~sunilg] for updating patch.
Some comments
# The invocation 
{{rmContext.getStateStore().updateApplicationState(appState);}} is 
asynchronous. So I feel stil there will be corner case would ocure where 
priority has set in scheduler but not updated to RMStateSstore. So if any RM 
switch/Restart would end up in resulting in old priority set.  I think this 
particular invocation should be should be synchronous like any others API's  
E.g: {{storeRMDelegationToken}}, {{storeRMDTMasterKey}}

 Support for changing Application priority during runtime
 

 Key: YARN-3887
 URL: https://issues.apache.org/jira/browse/YARN-3887
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3887.patch, 0002-YARN-3887.patch


 After YARN-2003, adding support to change priority of an application after 
 submission. This ticket will handle the server side implementation for same.
 A new RMAppEvent will be created to handle this, and will be common for all 
 schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2015-07-28 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645404#comment-14645404
 ] 

Rohith Sharma K S commented on YARN-3979:
-

[~piaoyu zhang] In the description you have given NM logs, but in previous 
comment you have give stack trace of RM. It would be easy to analyze if you can 
provide more info like RM logs, NM logs and AM logs if started. And NM stack 
trace would help much since NM side holding 10 mins. 

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao

 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3990) AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is connected/disconnected

2015-07-29 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-3990:

Summary: AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node 
is connected/disconnected  (was: AsyncDispatcher may overloaded with 
RMAppNodeUpdateEvent when Node is connected )

 AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is 
 connected/disconnected
 

 Key: YARN-3990
 URL: https://issues.apache.org/jira/browse/YARN-3990
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith Sharma K S
Assignee: Bibin A Chundatt
Priority: Critical

 Whenever node is added or removed, NodeListManager sends RMAppNodeUpdateEvent 
 to all the applications that are in the rmcontext. But for 
 finished/killed/failed applications it is not required to send these events. 
 Additional check for wheather app is finished/killed/failed would minimizes 
 the unnecessary events
 {code}
   public void handle(NodesListManagerEvent event) {
 RMNode eventNode = event.getNode();
 switch (event.getType()) {
 case NODE_UNUSABLE:
   LOG.debug(eventNode +  reported unusable);
   unusableRMNodesConcurrentSet.add(eventNode);
   for(RMApp app: rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_UNUSABLE));
   }
   break;
 case NODE_USABLE:
   if (unusableRMNodesConcurrentSet.contains(eventNode)) {
 LOG.debug(eventNode +  reported usable);
 unusableRMNodesConcurrentSet.remove(eventNode);
   }
   for (RMApp app : rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_USABLE));
   }
   break;
 default:
   LOG.error(Ignoring invalid eventtype  + event.getType());
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2015-07-29 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645535#comment-14645535
 ] 

Rohith Sharma K S commented on YARN-3979:
-

How many applications completed? How many applications are running? How many NM 
are running? When is this event queeu is full? Any observation  you made?

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao

 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime

2015-07-29 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645566#comment-14645566
 ] 

Rohith Sharma K S commented on YARN-3887:
-

Your understanding is correct. I was meant to say to have new synchronous API 
like {{updateApplicationStateSynchronizly}} in RMStateStore.
[~jianhe] what do you think having new synchronous api in RMstatstore?

 Support for changing Application priority during runtime
 

 Key: YARN-3887
 URL: https://issues.apache.org/jira/browse/YARN-3887
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3887.patch, 0002-YARN-3887.patch


 After YARN-2003, adding support to change priority of an application after 
 submission. This ticket will handle the server side implementation for same.
 A new RMAppEvent will be created to handle this, and will be common for all 
 schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3990) AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is connected/disconnected

2015-07-29 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645583#comment-14645583
 ] 

Rohith Sharma K S commented on YARN-3990:
-

thanks [~bibinchundatt] for reproducing the issue. I believe in you clustesr 
appsCompleted/appsRunning are 2 and max number of completed apps to keep is 
set to 20k? 

 AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is 
 connected/disconnected
 

 Key: YARN-3990
 URL: https://issues.apache.org/jira/browse/YARN-3990
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith Sharma K S
Assignee: Bibin A Chundatt
Priority: Critical

 Whenever node is added or removed, NodeListManager sends RMAppNodeUpdateEvent 
 to all the applications that are in the rmcontext. But for 
 finished/killed/failed applications it is not required to send these events. 
 Additional check for wheather app is finished/killed/failed would minimizes 
 the unnecessary events
 {code}
   public void handle(NodesListManagerEvent event) {
 RMNode eventNode = event.getNode();
 switch (event.getType()) {
 case NODE_UNUSABLE:
   LOG.debug(eventNode +  reported unusable);
   unusableRMNodesConcurrentSet.add(eventNode);
   for(RMApp app: rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_UNUSABLE));
   }
   break;
 case NODE_USABLE:
   if (unusableRMNodesConcurrentSet.contains(eventNode)) {
 LOG.debug(eventNode +  reported usable);
 unusableRMNodesConcurrentSet.remove(eventNode);
   }
   for (RMApp app : rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_USABLE));
   }
   break;
 default:
   LOG.error(Ignoring invalid eventtype  + event.getType());
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime

2015-07-28 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643931#comment-14643931
 ] 

Rohith Sharma K S commented on YARN-3887:
-

Hi Jian He, 
bq. Do you plan to do client side changes as part of this jira ?
YARN-3250 is planning to do changes for admin and user CLI i.e 
ApplicationClientProtocol. This jira is intended for only scheduler side 
chagnes support for API's. In YARN-3250, will be using these exposed API's and 
implementing it. 
Current plan,  Admin/User both have previlages to change priority of 
applications. More API's from Admin and User to be discussed in yarn-3250. 

 Support for changing Application priority during runtime
 

 Key: YARN-3887
 URL: https://issues.apache.org/jira/browse/YARN-3887
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3887.patch


 After YARN-2003, adding support to change priority of an application after 
 submission. This ticket will handle the server side implementation for same.
 A new RMAppEvent will be created to handle this, and will be common for all 
 schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4015) Is there any way to dynamically change container size after allocation.

2015-08-04 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S resolved YARN-4015.
-
Resolution: Invalid

Hi [~dhruv007] If any queries, post in hadoop user mailing list  
u...@hadoop.apache.org. JIRA is for tracking development issues.

 Is there any way to dynamically change container size after allocation.
 ---

 Key: YARN-4015
 URL: https://issues.apache.org/jira/browse/YARN-4015
 Project: Hadoop YARN
  Issue Type: Wish
Reporter: dhruv
Priority: Minor

 Hadoop yarn assumes that container size won't be changed after allocation.
 It is possible that job do not use resource allocated fully or required more 
 resource for container.so is there any way so that container size change 
 according to run time after allocation of container.Means elasticity for both 
 memory and cpu. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently

2015-08-04 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654769#comment-14654769
 ] 

Rohith Sharma K S commented on YARN-3992:
-

The patch looks overall good, 
nit: Can you add new API with additional parameter for host instead of changing 
existing API {{allocateAndWaitForContainers}} arguments?

 TestApplicationPriority.testApplicationPriorityAllocation fails intermittently
 --

 Key: YARN-3992
 URL: https://issues.apache.org/jira/browse/YARN-3992
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Sunil G
 Attachments: 0001-YARN-3992.patch, 0002-YARN-3992.patch


 {code}
 java.lang.AssertionError: expected:7 but was:5
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime

2015-07-31 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648969#comment-14648969
 ] 

Rohith Sharma K S commented on YARN-3887:
-

[~sunilg] thanks for  updating patch
One comment
# The below code should not be synchronized. If we have synchronized, then 
there is very high chance of deadlock. The locking order should be always from 
{{stateMachine -- RMStateStore}} but below code locks in {{RMStateStore -- 
stateMachine -- RMStateStore}} which causes deadlock. For more discussion 
refer YARN-2946
{code}
+  public synchronized void updateApplicationStateSynchronously(
+  ApplicationStateData appState) {
+handleStoreEvent(new RMStateUpdateAppEvent(appState));
+  }
{code}

 Support for changing Application priority during runtime
 

 Key: YARN-3887
 URL: https://issues.apache.org/jira/browse/YARN-3887
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3887.patch, 0002-YARN-3887.patch, 
 0003-YARN-3887.patch, 0004-YARN-3887.patch


 After YARN-2003, adding support to change priority of an application after 
 submission. This ticket will handle the server side implementation for same.
 A new RMAppEvent will be created to handle this, and will be common for all 
 schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

2015-07-31 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648944#comment-14648944
 ] 

Rohith Sharma K S commented on YARN-3996:
-

CIIAW, SchedulerUtils.normalizeRequests() is being called in allocate() in CS 
and FS which resourceRequest is normalized(reset) to minimumAllocation. So, it 
should not be matter for AM container resource Request where normalization is 
done in RMAppManager. Instead of normalizing at scheduler, here normalization 
is done at RMAppManager. Does this is impacting?

 YARN-789 (Support for zero capabilities in fairscheduler) is broken after 
 YARN-3305
 ---

 Key: YARN-3996
 URL: https://issues.apache.org/jira/browse/YARN-3996
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical

 RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest 
 with mininumResource for the incrementResource. This causes normalize to 
 return zero if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently

2015-07-31 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649001#comment-14649001
 ] 

Rohith Sharma K S commented on YARN-3992:
-

Thanks [~sunilg] for providing the patch!!
One comment
# Instead of rewritting below code twice, can you use method 
{{MockAM#allocateAndWaitForContainers}} so many lines of code can be avoided.
{code}
+int NUM_CONTAINERS = 7;
+// allocate NUM_CONTAINERS containers
+am1.allocate(127.0.0.1, 2 * GB, NUM_CONTAINERS,
+new ArrayListContainerId());
 nm1.nodeHeartbeat(true);
-while (alloc1Response.getAllocatedContainers().size()  1) {
-  LOG.info(Waiting for containers to be created for app 1...);
-  Thread.sleep(100);
-  alloc1Response = am1.schedule();
+
+// wait for containers to be allocated.
+ListContainer allocated1 = am1.allocate(new ArrayListResourceRequest(),
+new ArrayListContainerId()).getAllocatedContainers();
+while (allocated1.size() != NUM_CONTAINERS) {
+  nm1.nodeHeartbeat(true);
+  allocated1.addAll(am1.allocate(new ArrayListResourceRequest(),
+  new ArrayListContainerId()).getAllocatedContainers());
+  Thread.sleep(200);
 }
{code}

 TestApplicationPriority.testApplicationPriorityAllocation fails intermittently
 --

 Key: YARN-3992
 URL: https://issues.apache.org/jira/browse/YARN-3992
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Sunil G
 Attachments: 0001-YARN-3992.patch


 {code}
 java.lang.AssertionError: expected:7 but was:5
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >