[jira] [Commented] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377655#comment-14377655
 ] 

Hudson commented on YARN-3393:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #876 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/876/])
YARN-3393. Getting application(s) goes wrong when app finishes before (xgong: 
rev 9fae455e26e0230107e1c6db58a49a5b6b296cf4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java


 Getting application(s) goes wrong when app finishes before starting the 
 attempt
 ---

 Key: YARN-3393
 URL: https://issues.apache.org/jira/browse/YARN-3393
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-3393.1.patch


 When generating app report in ApplicationHistoryManagerOnTimelineStore, it 
 checks if appAttempt == null.
 {code}
 ApplicationAttemptReport appAttempt = 
 getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId());
 if (appAttempt != null) {
   app.appReport.setHost(appAttempt.getHost());
   app.appReport.setRpcPort(appAttempt.getRpcPort());
   app.appReport.setTrackingUrl(appAttempt.getTrackingUrl());
   
 app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl());
 }
 {code}
 However, {{getApplicationAttempt}} doesn't return null but throws 
 ApplicationAttemptNotFoundException:
 {code}
 if (entity == null) {
   throw new ApplicationAttemptNotFoundException(
   The entity for application attempt  + appAttemptId +
doesn't exist in the timeline store);
 } else {
   return convertToApplicationAttemptReport(entity);
 }
 {code}
 They code isn't coupled well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377656#comment-14377656
 ] 

Hudson commented on YARN-2777:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #876 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/876/])
YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. 
Contributed by Naganarasimha G R. (ozawa: rev 
82eda771e05cf2b31788ee1582551e65f1c0f9aa)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* hadoop-yarn-project/CHANGES.txt


 Mark the end of individual log in aggregated log
 

 Key: YARN-2777
 URL: https://issues.apache.org/jira/browse/YARN-2777
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Varun Saxena
  Labels: log-aggregation
 Fix For: 2.7.0

 Attachments: YARN-2777.001.patch, YARN-2777.02.patch


 Below is snippet of aggregated log showing hbase master log:
 {code}
 LogType: hbase-hbase-master-ip-172-31-34-167.log
 LogUploadTime: 29-Oct-2014 22:31:55
 LogLength: 24103045
 Log Contents:
 Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
 ...
   at 
 org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
   at java.lang.Thread.run(Thread.java:745)
 LogType: hbase-hbase-master-ip-172-31-34-167.out
 {code}
 Since logs from various daemons are aggregated in one log file, it would be 
 desirable to mark the end of one log before starting with the next.
 e.g. with such a line:
 {code}
 End of LogType: hbase-hbase-master-ip-172-31-34-167.log
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3384) TestLogAggregationService.verifyContainerLogs fails after YARN-2777

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377659#comment-14377659
 ] 

Hudson commented on YARN-3384:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #876 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/876/])
YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. 
Contributed by Naganarasimha G R. (ozawa: rev 
82eda771e05cf2b31788ee1582551e65f1c0f9aa)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* hadoop-yarn-project/CHANGES.txt


 TestLogAggregationService.verifyContainerLogs fails after YARN-2777
 ---

 Key: YARN-3384
 URL: https://issues.apache.org/jira/browse/YARN-3384
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
  Labels: test-fail
 Fix For: 2.7.0

 Attachments: YARN-3384.20150321-1.patch


 Following test cases of TestLogAggregationService is failing :
 testMultipleAppsLogAggregation
 testLogAggregationServiceWithRetention
 testLogAggregationServiceWithInterval
 testLogAggregationServiceWithPatterns 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3241) FairScheduler handles invalid queue names inconsistently

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377649#comment-14377649
 ] 

Hudson commented on YARN-3241:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #876 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/876/])
YARN-3241. FairScheduler handles invalid queue names inconsistently. (Zhihai Xu 
via kasha) (kasha: rev 2bc097cd14692e6ceb06bff959f28531534eb307)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/InvalidQueueNameException.java


 FairScheduler handles invalid queue names inconsistently
 --

 Key: YARN-3241
 URL: https://issues.apache.org/jira/browse/YARN-3241
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.8.0

 Attachments: YARN-3241.000.patch, YARN-3241.001.patch, 
 YARN-3241.002.patch


 Leading space, trailing space and empty sub queue name may cause 
 MetricsException(Metrics source XXX already exists! ) when add application to 
 FairScheduler.
 The reason is because QueueMetrics parse the queue name different from the 
 QueueManager.
 QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
 and trailing space in the sub queue name, It will also remove empty sub queue 
 name.
 {code}
   static final Splitter Q_SPLITTER =
   Splitter.on('.').omitEmptyStrings().trimResults(); 
 {code}
 But QueueManager won't remove Leading space, trailing space and empty sub 
 queue name.
 This will cause out of sync between FSQueue and FSQueueMetrics.
 QueueManager will think two queue names are different so it will try to 
 create a new queue.
 But FSQueueMetrics will treat these two queue names as same queue which will 
 create Metrics source XXX already exists! MetricsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377657#comment-14377657
 ] 

Hudson commented on YARN-3336:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #876 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/876/])
YARN-3336. FileSystem memory leak in DelegationTokenRenewer. (cnauroth: rev 
6ca1f12024fd7cec7b01df0f039ca59f3f365dc1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java


 FileSystem memory leak in DelegationTokenRenewer
 

 Key: YARN-3336
 URL: https://issues.apache.org/jira/browse/YARN-3336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-3336.000.patch, YARN-3336.001.patch, 
 YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch


 FileSystem memory leak in DelegationTokenRenewer.
 Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
 FileSystem entry will be added to  FileSystem#CACHE which will never be 
 garbage collected.
 This is the implementation of obtainSystemTokensForUser:
 {code}
   protected Token?[] obtainSystemTokensForUser(String user,
   final Credentials credentials) throws IOException, InterruptedException 
 {
 // Get new hdfs tokens on behalf of this user
 UserGroupInformation proxyUser =
 UserGroupInformation.createProxyUser(user,
   UserGroupInformation.getLoginUser());
 Token?[] newTokens =
 proxyUser.doAs(new PrivilegedExceptionActionToken?[]() {
   @Override
   public Token?[] run() throws Exception {
 return FileSystem.get(getConfig()).addDelegationTokens(
   UserGroupInformation.getLoginUser().getUserName(), credentials);
   }
 });
 return newTokens;
   }
 {code}
 The memory leak happened when FileSystem.get(getConfig()) is called with a 
 new proxy user.
 Because createProxyUser will always create a new Subject.
 The calling sequence is 
 FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), 
 conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, 
 conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf)
 {code}
 public static UserGroupInformation createProxyUser(String user,
   UserGroupInformation realUser) {
 if (user == null || user.isEmpty()) {
   throw new IllegalArgumentException(Null user);
 }
 if (realUser == null) {
   throw new IllegalArgumentException(Null real user);
 }
 Subject subject = new Subject();
 SetPrincipal principals = subject.getPrincipals();
 principals.add(new User(user));
 principals.add(new RealUser(realUser));
 UserGroupInformation result =new UserGroupInformation(subject);
 result.setAuthenticationMethod(AuthenticationMethod.PROXY);
 return result;
   }
 {code}
 FileSystem#Cache#Key.equals will compare the ugi
 {code}
   Key(URI uri, Configuration conf, long unique) throws IOException {
 scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase();
 authority = 
 uri.getAuthority()==null?:uri.getAuthority().toLowerCase();
 this.unique = unique;
 this.ugi = UserGroupInformation.getCurrentUser();
   }
   public boolean equals(Object obj) {
 if (obj == this) {
   return true;
 }
 if (obj != null  obj instanceof Key) {
   Key that = (Key)obj;
   return isEqual(this.scheme, that.scheme)
   isEqual(this.authority, that.authority)
   isEqual(this.ugi, that.ugi)
   (this.unique == that.unique);
 }
 return false;
   }
 {code}
 UserGroupInformation.equals will compare subject by reference.
 {code}
   public boolean equals(Object o) {
 if (o == this) {
   return true;
 } else if (o == null || getClass() != o.getClass()) {
   return false;
 } else {
   return subject == ((UserGroupInformation) o).subject;
 }
   }
 {code}
 So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
 are called, a new FileSystem will be created and a new entry will be added to 
 FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2868) FairScheduler: Metric for latency to allocate first container for an application

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377653#comment-14377653
 ] 

Hudson commented on YARN-2868:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #876 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/876/])
YARN-2868. FairScheduler: Metric for latency to allocate first container for an 
application. (Ray Chiang via kasha) (kasha: rev 
972f1f1ab94a26ec446a272ad030fe13f03ed442)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


 FairScheduler: Metric for latency to allocate first container for an 
 application
 

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Fix For: 2.8.0

 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, 
 YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch, 
 YARN-2868.012.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3241) FairScheduler handles invalid queue names inconsistently

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377924#comment-14377924
 ] 

Hudson commented on YARN-3241:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2074 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2074/])
YARN-3241. FairScheduler handles invalid queue names inconsistently. (Zhihai Xu 
via kasha) (kasha: rev 2bc097cd14692e6ceb06bff959f28531534eb307)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/InvalidQueueNameException.java


 FairScheduler handles invalid queue names inconsistently
 --

 Key: YARN-3241
 URL: https://issues.apache.org/jira/browse/YARN-3241
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.8.0

 Attachments: YARN-3241.000.patch, YARN-3241.001.patch, 
 YARN-3241.002.patch


 Leading space, trailing space and empty sub queue name may cause 
 MetricsException(Metrics source XXX already exists! ) when add application to 
 FairScheduler.
 The reason is because QueueMetrics parse the queue name different from the 
 QueueManager.
 QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
 and trailing space in the sub queue name, It will also remove empty sub queue 
 name.
 {code}
   static final Splitter Q_SPLITTER =
   Splitter.on('.').omitEmptyStrings().trimResults(); 
 {code}
 But QueueManager won't remove Leading space, trailing space and empty sub 
 queue name.
 This will cause out of sync between FSQueue and FSQueueMetrics.
 QueueManager will think two queue names are different so it will try to 
 create a new queue.
 But FSQueueMetrics will treat these two queue names as same queue which will 
 create Metrics source XXX already exists! MetricsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377931#comment-14377931
 ] 

Hudson commented on YARN-2777:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2074 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2074/])
YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. 
Contributed by Naganarasimha G R. (ozawa: rev 
82eda771e05cf2b31788ee1582551e65f1c0f9aa)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 Mark the end of individual log in aggregated log
 

 Key: YARN-2777
 URL: https://issues.apache.org/jira/browse/YARN-2777
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Varun Saxena
  Labels: log-aggregation
 Fix For: 2.7.0

 Attachments: YARN-2777.001.patch, YARN-2777.02.patch


 Below is snippet of aggregated log showing hbase master log:
 {code}
 LogType: hbase-hbase-master-ip-172-31-34-167.log
 LogUploadTime: 29-Oct-2014 22:31:55
 LogLength: 24103045
 Log Contents:
 Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
 ...
   at 
 org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
   at java.lang.Thread.run(Thread.java:745)
 LogType: hbase-hbase-master-ip-172-31-34-167.out
 {code}
 Since logs from various daemons are aggregated in one log file, it would be 
 desirable to mark the end of one log before starting with the next.
 e.g. with such a line:
 {code}
 End of LogType: hbase-hbase-master-ip-172-31-34-167.log
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377932#comment-14377932
 ] 

Hudson commented on YARN-3336:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2074 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2074/])
YARN-3336. FileSystem memory leak in DelegationTokenRenewer. (cnauroth: rev 
6ca1f12024fd7cec7b01df0f039ca59f3f365dc1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


 FileSystem memory leak in DelegationTokenRenewer
 

 Key: YARN-3336
 URL: https://issues.apache.org/jira/browse/YARN-3336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-3336.000.patch, YARN-3336.001.patch, 
 YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch


 FileSystem memory leak in DelegationTokenRenewer.
 Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
 FileSystem entry will be added to  FileSystem#CACHE which will never be 
 garbage collected.
 This is the implementation of obtainSystemTokensForUser:
 {code}
   protected Token?[] obtainSystemTokensForUser(String user,
   final Credentials credentials) throws IOException, InterruptedException 
 {
 // Get new hdfs tokens on behalf of this user
 UserGroupInformation proxyUser =
 UserGroupInformation.createProxyUser(user,
   UserGroupInformation.getLoginUser());
 Token?[] newTokens =
 proxyUser.doAs(new PrivilegedExceptionActionToken?[]() {
   @Override
   public Token?[] run() throws Exception {
 return FileSystem.get(getConfig()).addDelegationTokens(
   UserGroupInformation.getLoginUser().getUserName(), credentials);
   }
 });
 return newTokens;
   }
 {code}
 The memory leak happened when FileSystem.get(getConfig()) is called with a 
 new proxy user.
 Because createProxyUser will always create a new Subject.
 The calling sequence is 
 FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), 
 conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, 
 conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf)
 {code}
 public static UserGroupInformation createProxyUser(String user,
   UserGroupInformation realUser) {
 if (user == null || user.isEmpty()) {
   throw new IllegalArgumentException(Null user);
 }
 if (realUser == null) {
   throw new IllegalArgumentException(Null real user);
 }
 Subject subject = new Subject();
 SetPrincipal principals = subject.getPrincipals();
 principals.add(new User(user));
 principals.add(new RealUser(realUser));
 UserGroupInformation result =new UserGroupInformation(subject);
 result.setAuthenticationMethod(AuthenticationMethod.PROXY);
 return result;
   }
 {code}
 FileSystem#Cache#Key.equals will compare the ugi
 {code}
   Key(URI uri, Configuration conf, long unique) throws IOException {
 scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase();
 authority = 
 uri.getAuthority()==null?:uri.getAuthority().toLowerCase();
 this.unique = unique;
 this.ugi = UserGroupInformation.getCurrentUser();
   }
   public boolean equals(Object obj) {
 if (obj == this) {
   return true;
 }
 if (obj != null  obj instanceof Key) {
   Key that = (Key)obj;
   return isEqual(this.scheme, that.scheme)
   isEqual(this.authority, that.authority)
   isEqual(this.ugi, that.ugi)
   (this.unique == that.unique);
 }
 return false;
   }
 {code}
 UserGroupInformation.equals will compare subject by reference.
 {code}
   public boolean equals(Object o) {
 if (o == this) {
   return true;
 } else if (o == null || getClass() != o.getClass()) {
   return false;
 } else {
   return subject == ((UserGroupInformation) o).subject;
 }
   }
 {code}
 So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
 are called, a new FileSystem will be created and a new entry will be added to 
 FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3384) TestLogAggregationService.verifyContainerLogs fails after YARN-2777

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377934#comment-14377934
 ] 

Hudson commented on YARN-3384:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2074 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2074/])
YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. 
Contributed by Naganarasimha G R. (ozawa: rev 
82eda771e05cf2b31788ee1582551e65f1c0f9aa)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 TestLogAggregationService.verifyContainerLogs fails after YARN-2777
 ---

 Key: YARN-3384
 URL: https://issues.apache.org/jira/browse/YARN-3384
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
  Labels: test-fail
 Fix For: 2.7.0

 Attachments: YARN-3384.20150321-1.patch


 Following test cases of TestLogAggregationService is failing :
 testMultipleAppsLogAggregation
 testLogAggregationServiceWithRetention
 testLogAggregationServiceWithInterval
 testLogAggregationServiceWithPatterns 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377930#comment-14377930
 ] 

Hudson commented on YARN-3393:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2074 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2074/])
YARN-3393. Getting application(s) goes wrong when app finishes before (xgong: 
rev 9fae455e26e0230107e1c6db58a49a5b6b296cf4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* hadoop-yarn-project/CHANGES.txt


 Getting application(s) goes wrong when app finishes before starting the 
 attempt
 ---

 Key: YARN-3393
 URL: https://issues.apache.org/jira/browse/YARN-3393
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-3393.1.patch


 When generating app report in ApplicationHistoryManagerOnTimelineStore, it 
 checks if appAttempt == null.
 {code}
 ApplicationAttemptReport appAttempt = 
 getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId());
 if (appAttempt != null) {
   app.appReport.setHost(appAttempt.getHost());
   app.appReport.setRpcPort(appAttempt.getRpcPort());
   app.appReport.setTrackingUrl(appAttempt.getTrackingUrl());
   
 app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl());
 }
 {code}
 However, {{getApplicationAttempt}} doesn't return null but throws 
 ApplicationAttemptNotFoundException:
 {code}
 if (entity == null) {
   throw new ApplicationAttemptNotFoundException(
   The entity for application attempt  + appAttemptId +
doesn't exist in the timeline store);
 } else {
   return convertToApplicationAttemptReport(entity);
 }
 {code}
 They code isn't coupled well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2868) FairScheduler: Metric for latency to allocate first container for an application

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377928#comment-14377928
 ] 

Hudson commented on YARN-2868:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2074 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2074/])
YARN-2868. FairScheduler: Metric for latency to allocate first container for an 
application. (Ray Chiang via kasha) (kasha: rev 
972f1f1ab94a26ec446a272ad030fe13f03ed442)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* hadoop-yarn-project/CHANGES.txt


 FairScheduler: Metric for latency to allocate first container for an 
 application
 

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Fix For: 2.8.0

 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, 
 YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch, 
 YARN-2868.012.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector

2015-03-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377920#comment-14377920
 ] 

Junping Du commented on YARN-3034:
--

Thanks [~Naganarasimha] for updating the patch! Latest patch LGTM. [~zjshen] 
and [~sjlee0], do you have further comments? If not, I will go ahead and commit 
it today.

 [Collector wireup] Implement RM starting its timeline collector
 ---

 Key: YARN-3034
 URL: https://issues.apache.org/jira/browse/YARN-3034
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3024.20150324-1.patch, YARN-3034-20150312-1.patch, 
 YARN-3034.20150205-1.patch, YARN-3034.20150316-1.patch, 
 YARN-3034.20150318-1.patch, YARN-3034.20150320-1.patch


 Per design in YARN-2928, implement resource managers starting their own ATS 
 writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377874#comment-14377874
 ] 

Hudson commented on YARN-2777:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2092 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2092/])
YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. 
Contributed by Naganarasimha G R. (ozawa: rev 
82eda771e05cf2b31788ee1582551e65f1c0f9aa)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 Mark the end of individual log in aggregated log
 

 Key: YARN-2777
 URL: https://issues.apache.org/jira/browse/YARN-2777
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Varun Saxena
  Labels: log-aggregation
 Fix For: 2.7.0

 Attachments: YARN-2777.001.patch, YARN-2777.02.patch


 Below is snippet of aggregated log showing hbase master log:
 {code}
 LogType: hbase-hbase-master-ip-172-31-34-167.log
 LogUploadTime: 29-Oct-2014 22:31:55
 LogLength: 24103045
 Log Contents:
 Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
 ...
   at 
 org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
   at java.lang.Thread.run(Thread.java:745)
 LogType: hbase-hbase-master-ip-172-31-34-167.out
 {code}
 Since logs from various daemons are aggregated in one log file, it would be 
 desirable to mark the end of one log before starting with the next.
 e.g. with such a line:
 {code}
 End of LogType: hbase-hbase-master-ip-172-31-34-167.log
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377875#comment-14377875
 ] 

Hudson commented on YARN-3336:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2092 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2092/])
YARN-3336. FileSystem memory leak in DelegationTokenRenewer. (cnauroth: rev 
6ca1f12024fd7cec7b01df0f039ca59f3f365dc1)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


 FileSystem memory leak in DelegationTokenRenewer
 

 Key: YARN-3336
 URL: https://issues.apache.org/jira/browse/YARN-3336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-3336.000.patch, YARN-3336.001.patch, 
 YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch


 FileSystem memory leak in DelegationTokenRenewer.
 Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
 FileSystem entry will be added to  FileSystem#CACHE which will never be 
 garbage collected.
 This is the implementation of obtainSystemTokensForUser:
 {code}
   protected Token?[] obtainSystemTokensForUser(String user,
   final Credentials credentials) throws IOException, InterruptedException 
 {
 // Get new hdfs tokens on behalf of this user
 UserGroupInformation proxyUser =
 UserGroupInformation.createProxyUser(user,
   UserGroupInformation.getLoginUser());
 Token?[] newTokens =
 proxyUser.doAs(new PrivilegedExceptionActionToken?[]() {
   @Override
   public Token?[] run() throws Exception {
 return FileSystem.get(getConfig()).addDelegationTokens(
   UserGroupInformation.getLoginUser().getUserName(), credentials);
   }
 });
 return newTokens;
   }
 {code}
 The memory leak happened when FileSystem.get(getConfig()) is called with a 
 new proxy user.
 Because createProxyUser will always create a new Subject.
 The calling sequence is 
 FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), 
 conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, 
 conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf)
 {code}
 public static UserGroupInformation createProxyUser(String user,
   UserGroupInformation realUser) {
 if (user == null || user.isEmpty()) {
   throw new IllegalArgumentException(Null user);
 }
 if (realUser == null) {
   throw new IllegalArgumentException(Null real user);
 }
 Subject subject = new Subject();
 SetPrincipal principals = subject.getPrincipals();
 principals.add(new User(user));
 principals.add(new RealUser(realUser));
 UserGroupInformation result =new UserGroupInformation(subject);
 result.setAuthenticationMethod(AuthenticationMethod.PROXY);
 return result;
   }
 {code}
 FileSystem#Cache#Key.equals will compare the ugi
 {code}
   Key(URI uri, Configuration conf, long unique) throws IOException {
 scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase();
 authority = 
 uri.getAuthority()==null?:uri.getAuthority().toLowerCase();
 this.unique = unique;
 this.ugi = UserGroupInformation.getCurrentUser();
   }
   public boolean equals(Object obj) {
 if (obj == this) {
   return true;
 }
 if (obj != null  obj instanceof Key) {
   Key that = (Key)obj;
   return isEqual(this.scheme, that.scheme)
   isEqual(this.authority, that.authority)
   isEqual(this.ugi, that.ugi)
   (this.unique == that.unique);
 }
 return false;
   }
 {code}
 UserGroupInformation.equals will compare subject by reference.
 {code}
   public boolean equals(Object o) {
 if (o == this) {
   return true;
 } else if (o == null || getClass() != o.getClass()) {
   return false;
 } else {
   return subject == ((UserGroupInformation) o).subject;
 }
   }
 {code}
 So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
 are called, a new FileSystem will be created and a new entry will be added to 
 FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3384) TestLogAggregationService.verifyContainerLogs fails after YARN-2777

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377869#comment-14377869
 ] 

Hudson commented on YARN-3384:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2092 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2092/])
YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. 
Contributed by Naganarasimha G R. (ozawa: rev 
82eda771e05cf2b31788ee1582551e65f1c0f9aa)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 TestLogAggregationService.verifyContainerLogs fails after YARN-2777
 ---

 Key: YARN-3384
 URL: https://issues.apache.org/jira/browse/YARN-3384
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
  Labels: test-fail
 Fix For: 2.7.0

 Attachments: YARN-3384.20150321-1.patch


 Following test cases of TestLogAggregationService is failing :
 testMultipleAppsLogAggregation
 testLogAggregationServiceWithRetention
 testLogAggregationServiceWithInterval
 testLogAggregationServiceWithPatterns 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3241) FairScheduler handles invalid queue names inconsistently

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377862#comment-14377862
 ] 

Hudson commented on YARN-3241:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2092 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2092/])
YARN-3241. FairScheduler handles invalid queue names inconsistently. (Zhihai Xu 
via kasha) (kasha: rev 2bc097cd14692e6ceb06bff959f28531534eb307)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/InvalidQueueNameException.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java


 FairScheduler handles invalid queue names inconsistently
 --

 Key: YARN-3241
 URL: https://issues.apache.org/jira/browse/YARN-3241
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.8.0

 Attachments: YARN-3241.000.patch, YARN-3241.001.patch, 
 YARN-3241.002.patch


 Leading space, trailing space and empty sub queue name may cause 
 MetricsException(Metrics source XXX already exists! ) when add application to 
 FairScheduler.
 The reason is because QueueMetrics parse the queue name different from the 
 QueueManager.
 QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
 and trailing space in the sub queue name, It will also remove empty sub queue 
 name.
 {code}
   static final Splitter Q_SPLITTER =
   Splitter.on('.').omitEmptyStrings().trimResults(); 
 {code}
 But QueueManager won't remove Leading space, trailing space and empty sub 
 queue name.
 This will cause out of sync between FSQueue and FSQueueMetrics.
 QueueManager will think two queue names are different so it will try to 
 create a new queue.
 But FSQueueMetrics will treat these two queue names as same queue which will 
 create Metrics source XXX already exists! MetricsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377868#comment-14377868
 ] 

Hudson commented on YARN-3393:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2092 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2092/])
YARN-3393. Getting application(s) goes wrong when app finishes before (xgong: 
rev 9fae455e26e0230107e1c6db58a49a5b6b296cf4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* hadoop-yarn-project/CHANGES.txt


 Getting application(s) goes wrong when app finishes before starting the 
 attempt
 ---

 Key: YARN-3393
 URL: https://issues.apache.org/jira/browse/YARN-3393
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-3393.1.patch


 When generating app report in ApplicationHistoryManagerOnTimelineStore, it 
 checks if appAttempt == null.
 {code}
 ApplicationAttemptReport appAttempt = 
 getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId());
 if (appAttempt != null) {
   app.appReport.setHost(appAttempt.getHost());
   app.appReport.setRpcPort(appAttempt.getRpcPort());
   app.appReport.setTrackingUrl(appAttempt.getTrackingUrl());
   
 app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl());
 }
 {code}
 However, {{getApplicationAttempt}} doesn't return null but throws 
 ApplicationAttemptNotFoundException:
 {code}
 if (entity == null) {
   throw new ApplicationAttemptNotFoundException(
   The entity for application attempt  + appAttemptId +
doesn't exist in the timeline store);
 } else {
   return convertToApplicationAttemptReport(entity);
 }
 {code}
 They code isn't coupled well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2868) FairScheduler: Metric for latency to allocate first container for an application

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377867#comment-14377867
 ] 

Hudson commented on YARN-2868:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2092 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2092/])
YARN-2868. FairScheduler: Metric for latency to allocate first container for an 
application. (Ray Chiang via kasha) (kasha: rev 
972f1f1ab94a26ec446a272ad030fe13f03ed442)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt


 FairScheduler: Metric for latency to allocate first container for an 
 application
 

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Fix For: 2.8.0

 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, 
 YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch, 
 YARN-2868.012.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377876#comment-14377876
 ] 

Hudson commented on YARN-1880:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2092 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2092/])
YARN-1880. Cleanup TestApplicationClientProtocolOnHA. Contributed by ozawa. 
(harsh: rev fbceb3b41834d6899c4353fb24f12ba3ecf67faf)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationClientProtocolOnHA.java


 Cleanup TestApplicationClientProtocolOnHA
 -

 Key: YARN-1880
 URL: https://issues.apache.org/jira/browse/YARN-1880
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Affects Versions: 2.6.0
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa
Priority: Trivial
 Fix For: 2.8.0

 Attachments: YARN-1880.1.patch


 The tests introduced on YARN-1521 includes multiple assertion with . We 
 should separate them because it's difficult to identify which condition is 
 illegal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2015-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377886#comment-14377886
 ] 

Hadoop QA commented on YARN-1902:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12638931/YARN-1902.v3.patch
  against trunk revision 3ca5bd1.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7090//console

This message is automatically generated.

 Allocation of too many containers when a second request is done with the same 
 resource capability
 -

 Key: YARN-1902
 URL: https://issues.apache.org/jira/browse/YARN-1902
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0, 2.3.0, 2.4.0
Reporter: Sietse T. Au
  Labels: client
 Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch


 Regarding AMRMClientImpl
 Scenario 1:
 Given a ContainerRequest x with Resource y, when addContainerRequest is 
 called z times with x, allocate is called and at least one of the z allocated 
 containers is started, then if another addContainerRequest call is done and 
 subsequently an allocate call to the RM, (z+1) containers will be allocated, 
 where 1 container is expected.
 Scenario 2:
 No containers are started between the allocate calls. 
 Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
 are requested in both scenarios, but that only in the second scenario, the 
 correct behavior is observed.
 Looking at the implementation I have found that this (z+1) request is caused 
 by the structure of the remoteRequestsTable. The consequence of MapResource, 
 ResourceRequestInfo is that ResourceRequestInfo does not hold any 
 information about whether a request has been sent to the RM yet or not.
 There are workarounds for this, such as releasing the excess containers 
 received.
 The solution implemented is to initialize a new ResourceRequest in 
 ResourceRequestInfo when a request has been successfully sent to the RM.
 The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377935#comment-14377935
 ] 

Hudson commented on YARN-1880:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2074 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2074/])
YARN-1880. Cleanup TestApplicationClientProtocolOnHA. Contributed by ozawa. 
(harsh: rev fbceb3b41834d6899c4353fb24f12ba3ecf67faf)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationClientProtocolOnHA.java
* hadoop-yarn-project/CHANGES.txt


 Cleanup TestApplicationClientProtocolOnHA
 -

 Key: YARN-1880
 URL: https://issues.apache.org/jira/browse/YARN-1880
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Affects Versions: 2.6.0
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa
Priority: Trivial
 Fix For: 2.8.0

 Attachments: YARN-1880.1.patch


 The tests introduced on YARN-1521 includes multiple assertion with . We 
 should separate them because it's difficult to identify which condition is 
 illegal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector

2015-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377589#comment-14377589
 ] 

Hadoop QA commented on YARN-3034:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12706868/YARN-3024.20150324-1.patch
  against trunk revision c6c396f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7089//console

This message is automatically generated.

 [Collector wireup] Implement RM starting its timeline collector
 ---

 Key: YARN-3034
 URL: https://issues.apache.org/jira/browse/YARN-3034
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3024.20150324-1.patch, YARN-3034-20150312-1.patch, 
 YARN-3034.20150205-1.patch, YARN-3034.20150316-1.patch, 
 YARN-3034.20150318-1.patch, YARN-3034.20150320-1.patch


 Per design in YARN-2928, implement resource managers starting their own ATS 
 writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector

2015-03-24 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377604#comment-14377604
 ] 

Naganarasimha G R commented on YARN-3034:
-

[~zjshen], 
I have uploaded the patch with the changes which you mentioned for the 
configuration. Please review.

 [Collector wireup] Implement RM starting its timeline collector
 ---

 Key: YARN-3034
 URL: https://issues.apache.org/jira/browse/YARN-3034
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3024.20150324-1.patch, YARN-3034-20150312-1.patch, 
 YARN-3034.20150205-1.patch, YARN-3034.20150316-1.patch, 
 YARN-3034.20150318-1.patch, YARN-3034.20150320-1.patch


 Per design in YARN-2928, implement resource managers starting their own ATS 
 writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2868) FairScheduler: Metric for latency to allocate first container for an application

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377637#comment-14377637
 ] 

Hudson commented on YARN-2868:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #142 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/142/])
YARN-2868. FairScheduler: Metric for latency to allocate first container for an 
application. (Ray Chiang via kasha) (kasha: rev 
972f1f1ab94a26ec446a272ad030fe13f03ed442)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java


 FairScheduler: Metric for latency to allocate first container for an 
 application
 

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Fix For: 2.8.0

 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, 
 YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch, 
 YARN-2868.012.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3241) FairScheduler handles invalid queue names inconsistently

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377633#comment-14377633
 ] 

Hudson commented on YARN-3241:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #142 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/142/])
YARN-3241. FairScheduler handles invalid queue names inconsistently. (Zhihai Xu 
via kasha) (kasha: rev 2bc097cd14692e6ceb06bff959f28531534eb307)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/InvalidQueueNameException.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java


 FairScheduler handles invalid queue names inconsistently
 --

 Key: YARN-3241
 URL: https://issues.apache.org/jira/browse/YARN-3241
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.8.0

 Attachments: YARN-3241.000.patch, YARN-3241.001.patch, 
 YARN-3241.002.patch


 Leading space, trailing space and empty sub queue name may cause 
 MetricsException(Metrics source XXX already exists! ) when add application to 
 FairScheduler.
 The reason is because QueueMetrics parse the queue name different from the 
 QueueManager.
 QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
 and trailing space in the sub queue name, It will also remove empty sub queue 
 name.
 {code}
   static final Splitter Q_SPLITTER =
   Splitter.on('.').omitEmptyStrings().trimResults(); 
 {code}
 But QueueManager won't remove Leading space, trailing space and empty sub 
 queue name.
 This will cause out of sync between FSQueue and FSQueueMetrics.
 QueueManager will think two queue names are different so it will try to 
 create a new queue.
 But FSQueueMetrics will treat these two queue names as same queue which will 
 create Metrics source XXX already exists! MetricsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3384) TestLogAggregationService.verifyContainerLogs fails after YARN-2777

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377643#comment-14377643
 ] 

Hudson commented on YARN-3384:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #142 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/142/])
YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. 
Contributed by Naganarasimha G R. (ozawa: rev 
82eda771e05cf2b31788ee1582551e65f1c0f9aa)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 TestLogAggregationService.verifyContainerLogs fails after YARN-2777
 ---

 Key: YARN-3384
 URL: https://issues.apache.org/jira/browse/YARN-3384
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
  Labels: test-fail
 Fix For: 2.7.0

 Attachments: YARN-3384.20150321-1.patch


 Following test cases of TestLogAggregationService is failing :
 testMultipleAppsLogAggregation
 testLogAggregationServiceWithRetention
 testLogAggregationServiceWithInterval
 testLogAggregationServiceWithPatterns 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377639#comment-14377639
 ] 

Hudson commented on YARN-3393:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #142 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/142/])
YARN-3393. Getting application(s) goes wrong when app finishes before (xgong: 
rev 9fae455e26e0230107e1c6db58a49a5b6b296cf4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java


 Getting application(s) goes wrong when app finishes before starting the 
 attempt
 ---

 Key: YARN-3393
 URL: https://issues.apache.org/jira/browse/YARN-3393
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-3393.1.patch


 When generating app report in ApplicationHistoryManagerOnTimelineStore, it 
 checks if appAttempt == null.
 {code}
 ApplicationAttemptReport appAttempt = 
 getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId());
 if (appAttempt != null) {
   app.appReport.setHost(appAttempt.getHost());
   app.appReport.setRpcPort(appAttempt.getRpcPort());
   app.appReport.setTrackingUrl(appAttempt.getTrackingUrl());
   
 app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl());
 }
 {code}
 However, {{getApplicationAttempt}} doesn't return null but throws 
 ApplicationAttemptNotFoundException:
 {code}
 if (entity == null) {
   throw new ApplicationAttemptNotFoundException(
   The entity for application attempt  + appAttemptId +
doesn't exist in the timeline store);
 } else {
   return convertToApplicationAttemptReport(entity);
 }
 {code}
 They code isn't coupled well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377641#comment-14377641
 ] 

Hudson commented on YARN-3336:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #142 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/142/])
YARN-3336. FileSystem memory leak in DelegationTokenRenewer. (cnauroth: rev 
6ca1f12024fd7cec7b01df0f039ca59f3f365dc1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


 FileSystem memory leak in DelegationTokenRenewer
 

 Key: YARN-3336
 URL: https://issues.apache.org/jira/browse/YARN-3336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-3336.000.patch, YARN-3336.001.patch, 
 YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch


 FileSystem memory leak in DelegationTokenRenewer.
 Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
 FileSystem entry will be added to  FileSystem#CACHE which will never be 
 garbage collected.
 This is the implementation of obtainSystemTokensForUser:
 {code}
   protected Token?[] obtainSystemTokensForUser(String user,
   final Credentials credentials) throws IOException, InterruptedException 
 {
 // Get new hdfs tokens on behalf of this user
 UserGroupInformation proxyUser =
 UserGroupInformation.createProxyUser(user,
   UserGroupInformation.getLoginUser());
 Token?[] newTokens =
 proxyUser.doAs(new PrivilegedExceptionActionToken?[]() {
   @Override
   public Token?[] run() throws Exception {
 return FileSystem.get(getConfig()).addDelegationTokens(
   UserGroupInformation.getLoginUser().getUserName(), credentials);
   }
 });
 return newTokens;
   }
 {code}
 The memory leak happened when FileSystem.get(getConfig()) is called with a 
 new proxy user.
 Because createProxyUser will always create a new Subject.
 The calling sequence is 
 FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), 
 conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, 
 conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf)
 {code}
 public static UserGroupInformation createProxyUser(String user,
   UserGroupInformation realUser) {
 if (user == null || user.isEmpty()) {
   throw new IllegalArgumentException(Null user);
 }
 if (realUser == null) {
   throw new IllegalArgumentException(Null real user);
 }
 Subject subject = new Subject();
 SetPrincipal principals = subject.getPrincipals();
 principals.add(new User(user));
 principals.add(new RealUser(realUser));
 UserGroupInformation result =new UserGroupInformation(subject);
 result.setAuthenticationMethod(AuthenticationMethod.PROXY);
 return result;
   }
 {code}
 FileSystem#Cache#Key.equals will compare the ugi
 {code}
   Key(URI uri, Configuration conf, long unique) throws IOException {
 scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase();
 authority = 
 uri.getAuthority()==null?:uri.getAuthority().toLowerCase();
 this.unique = unique;
 this.ugi = UserGroupInformation.getCurrentUser();
   }
   public boolean equals(Object obj) {
 if (obj == this) {
   return true;
 }
 if (obj != null  obj instanceof Key) {
   Key that = (Key)obj;
   return isEqual(this.scheme, that.scheme)
   isEqual(this.authority, that.authority)
   isEqual(this.ugi, that.ugi)
   (this.unique == that.unique);
 }
 return false;
   }
 {code}
 UserGroupInformation.equals will compare subject by reference.
 {code}
   public boolean equals(Object o) {
 if (o == this) {
   return true;
 } else if (o == null || getClass() != o.getClass()) {
   return false;
 } else {
   return subject == ((UserGroupInformation) o).subject;
 }
   }
 {code}
 So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
 are called, a new FileSystem will be created and a new entry will be added to 
 FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377644#comment-14377644
 ] 

Hudson commented on YARN-1880:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #142 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/142/])
YARN-1880. Cleanup TestApplicationClientProtocolOnHA. Contributed by ozawa. 
(harsh: rev fbceb3b41834d6899c4353fb24f12ba3ecf67faf)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationClientProtocolOnHA.java


 Cleanup TestApplicationClientProtocolOnHA
 -

 Key: YARN-1880
 URL: https://issues.apache.org/jira/browse/YARN-1880
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Affects Versions: 2.6.0
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa
Priority: Trivial
 Fix For: 2.8.0

 Attachments: YARN-1880.1.patch


 The tests introduced on YARN-1521 includes multiple assertion with . We 
 should separate them because it's difficult to identify which condition is 
 illegal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377640#comment-14377640
 ] 

Hudson commented on YARN-2777:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #142 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/142/])
YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. 
Contributed by Naganarasimha G R. (ozawa: rev 
82eda771e05cf2b31788ee1582551e65f1c0f9aa)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 Mark the end of individual log in aggregated log
 

 Key: YARN-2777
 URL: https://issues.apache.org/jira/browse/YARN-2777
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Varun Saxena
  Labels: log-aggregation
 Fix For: 2.7.0

 Attachments: YARN-2777.001.patch, YARN-2777.02.patch


 Below is snippet of aggregated log showing hbase master log:
 {code}
 LogType: hbase-hbase-master-ip-172-31-34-167.log
 LogUploadTime: 29-Oct-2014 22:31:55
 LogLength: 24103045
 Log Contents:
 Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
 ...
   at 
 org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
   at java.lang.Thread.run(Thread.java:745)
 LogType: hbase-hbase-master-ip-172-31-34-167.out
 {code}
 Since logs from various daemons are aggregated in one log file, it would be 
 desirable to mark the end of one log before starting with the next.
 e.g. with such a line:
 {code}
 End of LogType: hbase-hbase-master-ip-172-31-34-167.log
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377660#comment-14377660
 ] 

Hudson commented on YARN-1880:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #876 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/876/])
YARN-1880. Cleanup TestApplicationClientProtocolOnHA. Contributed by ozawa. 
(harsh: rev fbceb3b41834d6899c4353fb24f12ba3ecf67faf)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationClientProtocolOnHA.java


 Cleanup TestApplicationClientProtocolOnHA
 -

 Key: YARN-1880
 URL: https://issues.apache.org/jira/browse/YARN-1880
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Affects Versions: 2.6.0
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa
Priority: Trivial
 Fix For: 2.8.0

 Attachments: YARN-1880.1.patch


 The tests introduced on YARN-1521 includes multiple assertion with . We 
 should separate them because it's difficult to identify which condition is 
 illegal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector

2015-03-24 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3034:

Attachment: YARN-3024.20150324-1.patch

 [Collector wireup] Implement RM starting its timeline collector
 ---

 Key: YARN-3034
 URL: https://issues.apache.org/jira/browse/YARN-3034
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3024.20150324-1.patch, YARN-3034-20150312-1.patch, 
 YARN-3034.20150205-1.patch, YARN-3034.20150316-1.patch, 
 YARN-3034.20150318-1.patch, YARN-3034.20150320-1.patch


 Per design in YARN-2928, implement resource managers starting their own ATS 
 writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters

2015-03-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377753#comment-14377753
 ] 

Junping Du commented on YARN-3304:
--

Thanks for comments, [~kasha]!
bq. In previous releases, we have never called these APIs Public even if they 
were intended to be sub-classed. In my mind, this is the last opportunity to 
decide on what the API should do? I think consistent and reasonable return 
values should be given a higher priority over compatibility.
Agree on the priority here. However, having consistent and reasonable return 
values doesn't have to break compatibility (or consistent behaviors) - just 
like the way I proposed above, we can consistently return resource value to 0 
if they are unavailable and we have an additional flag to mark if resource is 
available or not. 

bq. I am okay with adding boolean methods to capture unavailability, but that 
seems a little overboard. Using -1 in the ResourceCalculatorProcessTree is okay 
by me. My concern was with logging this -1 value in the metrics. In either 
case, I would like for the container usage metrics to see if the usage is 
available before logging the same.
I agree both ways can work. However, I think adding a boolean method sounds 
better, at least former. More important, it doesn't break any consistent 
behavior of previous releases. We don't need to break it if we don't have to. 
Isn't it?

bq. Since it is not too much work or risk, I would prefer we fix both in 2.7. 
This is solely wearing my Apache hat on. My Cloudera hat doesn't really mind it 
being in 2.8 vs 2.7. 
My idea is simple here: a fast-moving, regular and predictable release train 
could benefit our community and ecosystem in many aspects. I also have other 
wish list that cannot catch up 2.7. When this patch get in, I am not sure if 
YARN-3392 is still a blocker for 2.7 and I would also prefer a fix rather than 
a pending JIRA there delay the release unnecessarily. [~vinodkv], [~kasha] and 
[~adhoot], what do you think?

 ResourceCalculatorProcessTree#getCpuUsagePercent default return value is 
 inconsistent with other getters
 

 Key: YARN-3304
 URL: https://issues.apache.org/jira/browse/YARN-3304
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: YARN-3304-v2.patch, YARN-3304.patch


 Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for 
 unavailable case while other resource metrics are return 0 in the same case 
 which sounds inconsistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context

2015-03-24 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3040:
--
Attachment: YARN-3040.4.patch

 [Data Model] Make putEntities operation be aware of the app's context
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, 
 YARN-3040.4.patch


 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3395) [Fair Scheduler] Handle the user name correctly when submit application and use user name as default queue name.

2015-03-24 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3395:
--
Summary: [Fair Scheduler] Handle the user name correctly when submit 
application and use user name as default queue name.  (was: Handle the user 
name correctly when submit application and use user name as default queue name.)

 [Fair Scheduler] Handle the user name correctly when submit application and 
 use user name as default queue name.
 

 Key: YARN-3395
 URL: https://issues.apache.org/jira/browse/YARN-3395
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3395.000.patch


 Handle the user name correctly when submit application and use user name as 
 default queue name.
 We should reject the application with an empty or whitespace only user name.
 because it doesn't make sense to have an empty or whitespace only user name.
 We should remove the trailing and leading whitespace of the user name when we 
 use user name as default queue name, otherwise it will be rejected by 
 InvalidQueueNameException from QueueManager. I think this change make sense, 
 because it will be compatible with queue name convention and also we already 
 did similar thing for '.' in user name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3395) Handle the user name correctly when submit application and use user name as default queue name.

2015-03-24 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3395:

Attachment: YARN-3395.000.patch

 Handle the user name correctly when submit application and use user name as 
 default queue name.
 ---

 Key: YARN-3395
 URL: https://issues.apache.org/jira/browse/YARN-3395
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3395.000.patch


 Handle the user name correctly when submit application and use user name as 
 default queue name.
 We should reject the application with an empty or whitespace only user name.
 because it doesn't make sense to have an empty or whitespace only user name.
 We should remove the trailing and leading whitespace of the user name when we 
 use user name as default queue name, otherwise it will be rejected by 
 InvalidQueueNameException from QueueManager. I think this change make sense, 
 because it will be compatible with queue name convention and also we already 
 did similar thing for '.' in user name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3395) Handle the user name correctly when submit application and use user name as default queue name.

2015-03-24 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378879#comment-14378879
 ] 

zhihai xu commented on YARN-3395:
-

I uploaded a patch YARN-3395.000.patch for review. I added two test case in 
TestFairScheduler
Without the change, both tests will fail.

 Handle the user name correctly when submit application and use user name as 
 default queue name.
 ---

 Key: YARN-3395
 URL: https://issues.apache.org/jira/browse/YARN-3395
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3395.000.patch


 Handle the user name correctly when submit application and use user name as 
 default queue name.
 We should reject the application with an empty or whitespace only user name.
 because it doesn't make sense to have an empty or whitespace only user name.
 We should remove the trailing and leading whitespace of the user name when we 
 use user name as default queue name, otherwise it will be rejected by 
 InvalidQueueNameException from QueueManager. I think this change make sense, 
 because it will be compatible with queue name convention and also we already 
 did similar thing for '.' in user name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context

2015-03-24 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378946#comment-14378946
 ] 

Sangjin Lee commented on YARN-3040:
---

bq. It will check if the tag starts with TIMELINE_FLOW_ID_TAG:, and then if 
the value is empty, 
TIMELINE_FLOW_ID_TAG:.substring(TIMELINE_FLOW_ID_TAG.length() + 1) will 
return an empty value. It shouldn't throw IndexOutOfBoundsException. But it 
seems there's no need to add an empty env, I'll change the code accordingly.

Ack. I was thrown off because the code was like

{code}
if (tag.startsWith(TAG + :)) {
  String value = tag.substring(TAG.length() + 1);
}
{code}
It works because the +1 is really for the semi-colon.

LGTM overall.

 [Data Model] Make putEntities operation be aware of the app's context
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, 
 YARN-3040.4.patch


 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context

2015-03-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378666#comment-14378666
 ] 

Zhijie Shen commented on YARN-3040:
---

Thanks for review, Sangjin and Junping! I've updated the patch accordingly.

bq. I am comfortable with continuing to work on the flow-related items in the 
separate JIRA.

Thanks. This sounds good.

bq. I'm not sure of these set calls. Are these here just to initialize the 
context to default values? 

Yes, these are the defaults. In fact, user will sure to be update by the rpc 
call to get the context info (unless there's bug in the RPC). The current user 
for initialization is usually not correct, but I kept to have a value to ensure 
we have something to pass to the storage to prevent possible NPE that will 
crush the process. Instead, we can easily debug/inspect the storage to verify 
the user if bug occurs. I add some code comments for the initialization.

bq. I would prefer something like yarn.cluster.id because this id is for 
identifying YARN cluster rather than ResourceManager. 

I also agree yarn.cluster.id sounds better, but 
yarn.resourcemanager.cluster-id is the legacy name, which is used by RM HA 
for a while. As it's not sound so bad, how about keeping it, such that we don't 
need to deprecate config or break compatibility.

bq. Can we add a test case that without specifying flow_id and flow_run_id and 
v2 timeline service still can work?

Added the test case in the new patch

bq. Do we need to be case-insensitive here? I think we can be strict about the 
tag names?

This is because the tag text has case sensitive and insensitive mode. When 
insensitive, even if user inputs the upper case strings, it will be normalized 
to lower case strings. So we need to take care this case.

bq. You might want to be bit defensive about the tag not carrying any value 
(e.g. TIMELINE_FLOW_ID_TAG:).

It will check if the tag starts with TIMELINE_FLOW_ID_TAG:, and then if the 
value is empty, 
{{TIMELINE_FLOW_ID_TAG:.substring(TIMELINE_FLOW_ID_TAG.length() + 1)}} will 
return an empty value. It shouldn't throw IndexOutOfBoundsException. But it 
seems there's no need to add an empty env, I'll change the code accordingly.

In addition, I fixed a couple test failure in the new patch.
 

 [Data Model] Make putEntities operation be aware of the app's context
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch


 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor

2015-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378780#comment-14378780
 ] 

Hadoop QA commented on YARN-3365:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12707006/YARN-3365.001.patch
  against trunk revision a16bfff.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7095//console

This message is automatically generated.

 Add support for using the 'tc' tool via container-executor
 --

 Key: YARN-3365
 URL: https://issues.apache.org/jira/browse/YARN-3365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3365.001.patch


 We need the following functionality :
 1) modify network interface traffic shaping rules - to be able to attach a 
 qdisc, create child classes etc
 2) read existing rules in place 
 3) read stats for the various classes 
 Using tc requires elevated privileges - hence this functionality is to be 
 made available via container-executor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container

2015-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378805#comment-14378805
 ] 

Hadoop QA commented on YARN-1621:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12706974/YARN-1621.6.patch
  against trunk revision a16bfff.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapreduce.security.TestJHSSecurity

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7093//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7093//console

This message is automatically generated.

 Add CLI to list rows of task attempt ID, container ID, host of container, 
 state of container
 --

 Key: YARN-1621
 URL: https://issues.apache.org/jira/browse/YARN-1621
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Tassapol Athiapinya
Assignee: Bartosz Ługowski
 Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch, 
 YARN-1621.4.patch, YARN-1621.5.patch, YARN-1621.6.patch


 As more applications are moved to YARN, we need generic CLI to list rows of 
 task attempt ID, container ID, host of container, state of container. Today 
 if YARN application running in a container does hang, there is no way to find 
 out more info because a user does not know where each attempt is running in.
 For each running application, it is useful to differentiate between 
 running/succeeded/failed/killed containers.
  
 {code:title=proposed yarn cli}
 $ yarn application -list-containers -applicationId appId [-containerState 
 state of container]
 where containerState is optional filter to list container in given state only.
 container state can be running/succeeded/killed/failed/all.
 A user can specify more than one container state at once e.g. KILLED,FAILED.
 task attempt ID container ID host of container state of container 
 {code}
 CLI should work with running application/completed application. If a 
 container runs many task attempts, all attempts should be shown. That will 
 likely be the case of Tez container-reuse application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3395) Handle the user name correctly when submit application and use user name as default queue name.

2015-03-24 Thread zhihai xu (JIRA)
zhihai xu created YARN-3395:
---

 Summary: Handle the user name correctly when submit application 
and use user name as default queue name.
 Key: YARN-3395
 URL: https://issues.apache.org/jira/browse/YARN-3395
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: zhihai xu
Assignee: zhihai xu


Handle the user name correctly when submit application and use user name as 
default queue name.
We should reject the application with an empty or whitespace only user name.
because it doesn't make sense to have an empty or whitespace only user name.
We should remove the trailing and leading whitespace of the user name when we 
use user name as default queue name, otherwise it will be rejected by 
InvalidQueueNameException from QueueManager. I think this change make sense, 
because it will be compatible with queue name convention and also we already 
did similar thing for '.' in user name.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2654) Revisit all shared cache config parameters to ensure quality names

2015-03-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378942#comment-14378942
 ] 

Vinod Kumar Vavilapalli commented on YARN-2654:
---

WON'T FIX is perhaps the right resolution..

 Revisit all shared cache config parameters to ensure quality names
 --

 Key: YARN-2654
 URL: https://issues.apache.org/jira/browse/YARN-2654
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
Priority: Blocker
 Attachments: shared_cache_config_parameters.txt


 Revisit all the shared cache config parameters in YarnConfiguration and 
 yarn-default.xml to ensure quality names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3394) WebApplication proxy documentation is incomplete

2015-03-24 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3394:

Attachment: YARN-3394.20150324-1.patch
WebApplicationProxy.html

Attaching the sample html and the patch to fix the issue

 WebApplication  proxy documentation is incomplete
 -

 Key: YARN-3394
 URL: https://issues.apache.org/jira/browse/YARN-3394
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Bibin A Chundatt
Assignee: Naganarasimha G R
Priority: Minor
 Attachments: WebApplicationProxy.html, YARN-3394.20150324-1.patch


 Webproxy documentation is incomplete
 hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html
 1.Configuration of service start/stop as separate server
 2.Steps to start as daemon service
 3.Secure mode for Web proxy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context

2015-03-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377984#comment-14377984
 ] 

Junping Du commented on YARN-3040:
--

Some additional comments:
{code}
   property
+descriptionThe YARN cluster ID./description
+nameyarn.resourcemanager.cluster-id/name
+valueyarn_cluster/value
+  /property
{code}
I would prefer something like yarn.cluster.id because this id is for 
identifying YARN cluster rather than ResourceManager. It should keep consistent 
across RMs (active and standby) get switch over. Also other names like: 
RM_CLUSTER_ID, DEFAULT_RM_CLUSTER_ID, we should use YARN_CLUSTER_ID instead.

{code}
@@ -208,7 +211,11 @@ public void testDSShell(boolean haveDomain, String 
timelineVersion)
 if (timelineVersion.equalsIgnoreCase(v2)) {
   String[] timelineArgs = {
   --timeline_service_version,
-  v2
+  v2,
+  --flow,
+  test_flow_id,
+  --flow_run,
+  12345678
   };
{code}
Can we add a test case that without specifying flow_id and flow_run_id and v2 
timeline service still can work? In my understanding, these info will still be 
optional for applications. So we should make sure these info is nullable in 
launching applications and other following flows.

 [Data Model] Make putEntities operation be aware of the app's context
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch


 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377999#comment-14377999
 ] 

Hudson commented on YARN-3393:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #142 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/142/])
YARN-3393. Getting application(s) goes wrong when app finishes before (xgong: 
rev 9fae455e26e0230107e1c6db58a49a5b6b296cf4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* hadoop-yarn-project/CHANGES.txt


 Getting application(s) goes wrong when app finishes before starting the 
 attempt
 ---

 Key: YARN-3393
 URL: https://issues.apache.org/jira/browse/YARN-3393
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-3393.1.patch


 When generating app report in ApplicationHistoryManagerOnTimelineStore, it 
 checks if appAttempt == null.
 {code}
 ApplicationAttemptReport appAttempt = 
 getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId());
 if (appAttempt != null) {
   app.appReport.setHost(appAttempt.getHost());
   app.appReport.setRpcPort(appAttempt.getRpcPort());
   app.appReport.setTrackingUrl(appAttempt.getTrackingUrl());
   
 app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl());
 }
 {code}
 However, {{getApplicationAttempt}} doesn't return null but throws 
 ApplicationAttemptNotFoundException:
 {code}
 if (entity == null) {
   throw new ApplicationAttemptNotFoundException(
   The entity for application attempt  + appAttemptId +
doesn't exist in the timeline store);
 } else {
   return convertToApplicationAttemptReport(entity);
 }
 {code}
 They code isn't coupled well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3384) TestLogAggregationService.verifyContainerLogs fails after YARN-2777

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378000#comment-14378000
 ] 

Hudson commented on YARN-3384:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #142 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/142/])
YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. 
Contributed by Naganarasimha G R. (ozawa: rev 
82eda771e05cf2b31788ee1582551e65f1c0f9aa)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* hadoop-yarn-project/CHANGES.txt


 TestLogAggregationService.verifyContainerLogs fails after YARN-2777
 ---

 Key: YARN-3384
 URL: https://issues.apache.org/jira/browse/YARN-3384
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
  Labels: test-fail
 Fix For: 2.7.0

 Attachments: YARN-3384.20150321-1.patch


 Following test cases of TestLogAggregationService is failing :
 testMultipleAppsLogAggregation
 testLogAggregationServiceWithRetention
 testLogAggregationServiceWithInterval
 testLogAggregationServiceWithPatterns 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3365) Add support for using the 'tc' tool via container-executor

2015-03-24 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-3365:

Attachment: YARN-3365.002.patch

Re-created the patch against trunk - ensuring a change that is only in trunk 
isn't undone. 

 Add support for using the 'tc' tool via container-executor
 --

 Key: YARN-3365
 URL: https://issues.apache.org/jira/browse/YARN-3365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3365.001.patch, YARN-3365.002.patch


 We need the following functionality :
 1) modify network interface traffic shaping rules - to be able to attach a 
 qdisc, create child classes etc
 2) read existing rules in place 
 3) read stats for the various classes 
 Using tc requires elevated privileges - hence this functionality is to be 
 made available via container-executor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3395) [Fair Scheduler] Handle the user name correctly when submit application and use user name as default queue name.

2015-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379034#comment-14379034
 ] 

Hadoop QA commented on YARN-3395:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12707050/YARN-3395.000.patch
  against trunk revision 53a28af.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7096//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7096//console

This message is automatically generated.

 [Fair Scheduler] Handle the user name correctly when submit application and 
 use user name as default queue name.
 

 Key: YARN-3395
 URL: https://issues.apache.org/jira/browse/YARN-3395
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3395.000.patch


 Handle the user name correctly when submit application and use user name as 
 default queue name.
 We should reject the application with an empty or whitespace only user name.
 because it doesn't make sense to have an empty or whitespace only user name.
 We should remove the trailing and leading whitespace of the user name when we 
 use user name as default queue name, otherwise it will be rejected by 
 InvalidQueueNameException from QueueManager. I think this change make sense, 
 because it will be compatible with queue name convention and also we already 
 did similar thing for '.' in user name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3214) Add non-exclusive node labels

2015-03-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379092#comment-14379092
 ] 

Wangda Tan commented on YARN-3214:
--

Hi [~lohit],
Thanks for review the doc, for your comments:
bq. If so, it would become too restrictive. Labels on nodes can be seen in 
multiple dimension (from app's resource, machine resource and also usecase 
resouce, eg backfill jobs are placed on specific set of nodes). In those cases 
we should have ability to have multiple labels on node
Yes, now we only support one label for each node (partition). We temporarily 
support only one for each node is, if we have multiple labels on each node, it 
will hard to do resource planning (like what we did, we can say queue-A can use 
40% of label-X and queue-B can use 60% of label-X). Assume a node with label-X 
and label-Y, and its resource is 10G, it will be hard to say the node has 10G 
resource of (X+Y) OR 10G resource of X and Y. This also makes preemption hard 
to do. A tradeoff is, if we don't plan resource share (or capacity) on 
node-labels, some resource could be wasted and queues can be starved when they 
still under their configured capacity.
Multiple labels on node (we call this constraint) is in design stage, we've 
some thoughts about it, and will push it to community once it get a better 
share -- should not take too long.

bq. Also, in the documents there is mention of scheduling apps without any 
labels being scheduled on labeled nodes if resources are idle. Does that also 
cover apps which could have different label other than A/B, but still have a 
label be placed on these nodes when there is free resources available?
No, it will only try to allocate non-labeled requests to labeled nodes, if a 
resource request explicitly asks node label, we will only allocate 
corresponding labeled resource for it.




 Add non-exclusive node labels 
 --

 Key: YARN-3214
 URL: https://issues.apache.org/jira/browse/YARN-3214
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Non-exclusive-Node-Partition-Design.pdf


 Currently node labels partition the cluster to some sub-clusters so resources 
 cannot be shared between partitioned cluster. 
 With the current implementation of node labels we cannot use the cluster 
 optimally and the throughput of the cluster will suffer.
 We are proposing adding non-exclusive node labels:
 1. Labeled apps get the preference on Labeled nodes 
 2. If there is no ask for labeled resources we can assign those nodes to non 
 labeled apps
 3. If there is any future ask for those resources , we will preempt the non 
 labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized

2015-03-24 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379169#comment-14379169
 ] 

Chengbing Liu commented on YARN-3024:
-

[~kasha], I created YARN-3396 to track the URISyntaxException issue.
For multiple downloads per ContainerLocalizer, I found YARN-665 already created.
As for the other TODO, i.e. synchronization, I don't see any need for this. I 
think we can safely remove this one.

 LocalizerRunner should give DIE action when all resources are localized
 ---

 Key: YARN-3024
 URL: https://issues.apache.org/jira/browse/YARN-3024
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Fix For: 2.7.0

 Attachments: YARN-3024.01.patch, YARN-3024.02.patch, 
 YARN-3024.03.patch, YARN-3024.04.patch


 We have observed that {{LocalizerRunner}} always gives a LIVE action at the 
 end of localization process.
 The problem is {{findNextResource()}} can return null even when {{pending}} 
 was not empty prior to the call. This method removes localized resources from 
 {{pending}}, therefore we should check the return value, and gives DIE action 
 when it returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor

2015-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379195#comment-14379195
 ] 

Hadoop QA commented on YARN-3365:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12707106/YARN-3365.002.patch
  against trunk revision 53a28af.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7097//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7097//console

This message is automatically generated.

 Add support for using the 'tc' tool via container-executor
 --

 Key: YARN-3365
 URL: https://issues.apache.org/jira/browse/YARN-3365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3365.001.patch, YARN-3365.002.patch


 We need the following functionality :
 1) modify network interface traffic shaping rules - to be able to attach a 
 qdisc, create child classes etc
 2) read existing rules in place 
 3) read stats for the various classes 
 Using tc requires elevated privileges - hence this functionality is to be 
 made available via container-executor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3396) Handle URISyntaxException in ResourceLocalizationService

2015-03-24 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula reassigned YARN-3396:
--

Assignee: Brahma Reddy Battula

 Handle URISyntaxException in ResourceLocalizationService
 

 Key: YARN-3396
 URL: https://issues.apache.org/jira/browse/YARN-3396
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Chengbing Liu
Assignee: Brahma Reddy Battula

 There are two occurrences of the following code snippet:
 {code}
 //TODO fail? Already translated several times...
 {code}
 It should be handled correctly in case that the resource URI is incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3334) [Event Producers] NM start to posting some app related metrics in early POC stage of phase 2.

2015-03-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379240#comment-14379240
 ] 

Zhijie Shen commented on YARN-3334:
---

Junping, thanks for the patch. Here's my comments:

1. Do you want change initialized to started 
{code}
512 // only put initialized client
{code}

2. The following method seems unnecessary, because there's 
{{getTimelineClient(ApplicationId id)}}.
{code}
499 
500 public MapApplicationId, TimelineClient getTimelineClients() {
501   return this.timelineClients;
502 }
503 
{code}

3. It seems there's no need to maintain rmKnownCollectors. We can blindly put 
the service addr into timeline client. It won't affect anything if the address 
is not changed? Or we can do a simple check {{client.getAddr != 
newServiceAddr}} to avoid trivial set.

4. IMHO, the better description is to use ContainerEntity whose ID is this 
container ID.
{code}
441   TimelineEntity entity = new TimelineEntity();
442   entity.setType(NMEntity.NM_CONTAINER_METRICS.toString());
443   entity.setId(containerId.toString());
{code}

5. We need flag to control NM emitting the timeline data or not.

6. Unnecessary empty string.
{code}
503  + cpuUsageTotalCoresPercentage);
{code}

7. You probably want to use addTimeSeriesData to add single key/value pair.
{code}
526 memoryMetric.setTimeSeries(timeSeries);
{code}

8. NM needs to remove the timelineClient of a finished app. Otherwise, 
timelineClients will eat increasingly more resource as NM keeps running, but 
actually don't use it. The difficulty is how to know if an application is 
already finished. We need to think about it.
{code}
368 private ConcurrentHashMapApplicationId, TimelineClient 
timelineClients = 
369 new ConcurrentHashMapApplicationId, TimelineClient();
{code}

 [Event Producers] NM start to posting some app related metrics in early POC 
 stage of phase 2.
 -

 Key: YARN-3334
 URL: https://issues.apache.org/jira/browse/YARN-3334
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: YARN-2928
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor

2015-03-24 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379247#comment-14379247
 ] 

Sidharta Seethana commented on YARN-3365:
-

Summary of changes included in patch :

Additional tests, fixes, cleanup of TestLinuxContainerExecutor ( by [~vvasudev] 
)
container-executor - changes to support superuser execution of ‘tc’ in batch 
mode ( by [~sidharta-s] )
container-executor - refactored main.c to make it easier to read/maintain ( by 
[~sidharta-s] )

 Add support for using the 'tc' tool via container-executor
 --

 Key: YARN-3365
 URL: https://issues.apache.org/jira/browse/YARN-3365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3365.001.patch, YARN-3365.002.patch


 We need the following functionality :
 1) modify network interface traffic shaping rules - to be able to attach a 
 qdisc, create child classes etc
 2) read existing rules in place 
 3) read stats for the various classes 
 Using tc requires elevated privileges - hence this functionality is to be 
 made available via container-executor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-03-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379031#comment-14379031
 ] 

Zhijie Shen commented on YARN-3047:
---

[~varun_saxena], any luck to take a look at the latest comments? Thanks! - 
Zhijie

 [Data Serving] Set up ATS reader with basic request serving structure and 
 lifecycle
 ---

 Key: YARN-3047
 URL: https://issues.apache.org/jira/browse/YARN-3047
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3047.001.patch, YARN-3047.003.patch, 
 YARN-3047.02.patch


 Per design in YARN-2938, set up the ATS reader as a service and implement the 
 basic structure as a service. It includes lifecycle management, request 
 serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled

2015-03-24 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2740:
-
Summary: ResourceManager side should properly handle node label 
modifications when distributed node label configuration enabled  (was: RM 
AdminService should prevent admin change labels on nodes when distributed node 
label configuration enabled)

 ResourceManager side should properly handle node label modifications when 
 distributed node label configuration enabled
 --

 Key: YARN-2740
 URL: https://issues.apache.org/jira/browse/YARN-2740
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch


 According to YARN-2495, labels of nodes will be specified when NM do 
 heartbeat. We shouldn't allow admin modify labels on nodes when distributed 
 node label configuration enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled

2015-03-24 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2740:
-
Description: 
According to YARN-2495, when distributed node label configuration is enabled:
- RMAdmin / REST API should reject change labels on node operations.
- RMNodeLabelsManager shouldn't persistent labels on nodes when NM do heartbeat.

  was:According to YARN-2495, labels of nodes will be specified when NM do 
heartbeat. We shouldn't allow admin modify labels on nodes when distributed 
node label configuration enabled.


 ResourceManager side should properly handle node label modifications when 
 distributed node label configuration enabled
 --

 Key: YARN-2740
 URL: https://issues.apache.org/jira/browse/YARN-2740
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch


 According to YARN-2495, when distributed node label configuration is enabled:
 - RMAdmin / REST API should reject change labels on node operations.
 - RMNodeLabelsManager shouldn't persistent labels on nodes when NM do 
 heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3214) Add non-exclusive node labels

2015-03-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379141#comment-14379141
 ] 

Vinod Kumar Vavilapalli commented on YARN-3214:
---

May be we should start calling out partitions and attributes/constraints (when 
we have a JIRA) everywhere for clarity.

 Add non-exclusive node labels 
 --

 Key: YARN-3214
 URL: https://issues.apache.org/jira/browse/YARN-3214
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Non-exclusive-Node-Partition-Design.pdf


 Currently node labels partition the cluster to some sub-clusters so resources 
 cannot be shared between partitioned cluster. 
 With the current implementation of node labels we cannot use the cluster 
 optimally and the throughput of the cluster will suffer.
 We are proposing adding non-exclusive node labels:
 1. Labeled apps get the preference on Labeled nodes 
 2. If there is no ask for labeled resources we can assign those nodes to non 
 labeled apps
 3. If there is any future ask for those resources , we will preempt the non 
 labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3396) Handle URISyntaxException in ResourceLocalizationService

2015-03-24 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379186#comment-14379186
 ] 

Brahma Reddy Battula commented on YARN-3396:


{quote}There are two occurrences of the following code snippet:{quote}

Actually there are three occurences ,line num : 951,974, 1014
{code}
 } catch (URISyntaxException e) {
// TODO fail? Already translated several times...
  }

{code}

 Handle URISyntaxException in ResourceLocalizationService
 

 Key: YARN-3396
 URL: https://issues.apache.org/jira/browse/YARN-3396
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Chengbing Liu
Assignee: Brahma Reddy Battula

 There are two occurrences of the following code snippet:
 {code}
 //TODO fail? Already translated several times...
 {code}
 It should be handled correctly in case that the resource URI is incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3396) Handle URISyntaxException in ResourceLocalizationService

2015-03-24 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379191#comment-14379191
 ] 

Chengbing Liu commented on YARN-3396:
-

Can you check if you are using the latest code?

 Handle URISyntaxException in ResourceLocalizationService
 

 Key: YARN-3396
 URL: https://issues.apache.org/jira/browse/YARN-3396
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Chengbing Liu
Assignee: Brahma Reddy Battula

 There are two occurrences of the following code snippet:
 {code}
 //TODO fail? Already translated several times...
 {code}
 It should be handled correctly in case that the resource URI is incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3396) Handle URISyntaxException in ResourceLocalizationService

2015-03-24 Thread Chengbing Liu (JIRA)
Chengbing Liu created YARN-3396:
---

 Summary: Handle URISyntaxException in ResourceLocalizationService
 Key: YARN-3396
 URL: https://issues.apache.org/jira/browse/YARN-3396
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Chengbing Liu


There are two occurrences of the following code snippet:
{code}
//TODO fail? Already translated several times...
{code}

It should be handled correctly in case that the resource URI is incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-03-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379218#comment-14379218
 ] 

Varun Saxena commented on YARN-3047:


None actually. Was mistaken. TimelineEvents is required because we will 
continue with three of the v1 APIs', one of which requires TimelineEvents.

 [Data Serving] Set up ATS reader with basic request serving structure and 
 lifecycle
 ---

 Key: YARN-3047
 URL: https://issues.apache.org/jira/browse/YARN-3047
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3047.001.patch, YARN-3047.003.patch, 
 YARN-3047.02.patch


 Per design in YARN-2938, set up the ATS reader as a service and implement the 
 basic structure as a service. It includes lifecycle management, request 
 serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-24 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379249#comment-14379249
 ] 

Yongjun Zhang commented on YARN-3021:
-

Restarted my VM (the same one on which I reported the trace stack in my last 
update), and rerun the failed test TestCapacitySchedulerNodeLabelUpdate, and it 
is successful. There is some flakiness with this test but not related to this 
jira.



 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3330) Implement a protobuf compatibility checker to check if a patch breaks the compatibility with existing client and internal protocols

2015-03-24 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3330:

Attachment: pdiff_patch.py

Update the script to handle the case on file creation and removals. Reorganize 
the code a little bit to dispatch state transition functions dynamically. 

 Implement a protobuf compatibility checker to check if a patch breaks the 
 compatibility with existing client and internal protocols
 ---

 Key: YARN-3330
 URL: https://issues.apache.org/jira/browse/YARN-3330
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: pdiff_patch.py, pdiff_patch.py


 Per YARN-3292, we may want to start YARN rolling upgrade test compatibility 
 verification tool by a simple script to check protobuf compatibility. The 
 script may work on incoming patch files, check if there are any changes to 
 protobuf files, and report any potentially incompatible changes (line 
 removals, etc,.). We may want the tool to be conservative: it may report 
 false positives, but we should minimize its chance to have false negatives. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-03-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379225#comment-14379225
 ] 

Varun Saxena commented on YARN-3047:


Ok.. Will upload a patch.

I meant for reader, you think we can use the same config as v1.
Anyways I am continuing with separate config for reader as of now. Let me know 
if you have a difference in opinion owing to ease in migration.

 [Data Serving] Set up ATS reader with basic request serving structure and 
 lifecycle
 ---

 Key: YARN-3047
 URL: https://issues.apache.org/jira/browse/YARN-3047
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3047.001.patch, YARN-3047.003.patch, 
 YARN-3047.02.patch


 Per design in YARN-2938, set up the ATS reader as a service and implement the 
 basic structure as a service. It includes lifecycle management, request 
 serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-03-24 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3047:
---
Attachment: YARN-3047.04.patch

 [Data Serving] Set up ATS reader with basic request serving structure and 
 lifecycle
 ---

 Key: YARN-3047
 URL: https://issues.apache.org/jira/browse/YARN-3047
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3047.001.patch, YARN-3047.003.patch, 
 YARN-3047.02.patch, YARN-3047.04.patch


 Per design in YARN-2938, set up the ATS reader as a service and implement the 
 basic structure as a service. It includes lifecycle management, request 
 serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3214) Add non-exclusive node labels

2015-03-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379309#comment-14379309
 ] 

Wangda Tan commented on YARN-3214:
--

Hi [~lohit],
Problems of multiple labels on a same node and at most one label on each node 
are quite different:

At most one label on each node makes a cluster becomes several disjoint 
sub-clusters, all scheduling algorithms (no matter if using capacity/fair/fifo) 
can just simply run on the sub-cluster. 

If you want to divide resource for queues on labels (as example above, queue-A 
can use 40% of label-X and queue-B can use 60% of label-X) when we support 
multiple labels (say X and Y) on a same node (say node1), sub-clusters will 
become overlapping, that makes scheduling very hard:
When qA can access X and qB can access Y, how much resource of node1 you plan 
to allocate to qA/qB? A more complex example is, node1 has X,Y; node2 has X 
only, node3 has X,Z. This is a very tough problem and as far as I know (please 
let me know if I missed anything), there's no platform perfectly solved this.

So this is why separating partition vs. attribute/constraints becomes 
important. Partition is a way to divide cluster, each sub-cluster has similar 
properties (like how to share to queues) to a general cluster resource setting, 
that will be useful when a set of nodes contributed and shared to only a subset 
of queues of the entire cluster. Attribute is a just way to allocate container, 
a simple way to improve attribute/constraint is FCFS (first come first serve), 
no quota will be assigned to each attribute.

Mesos is different here, it doesn't do anything for node attributes in 
scheduling side, all node attributes will be directly passed to framework side 
directly, and framework will decide if accept or reject offer according to its 
node attributes, it will not take care of how to balance framework shares on 
each attributes.

 Add non-exclusive node labels 
 --

 Key: YARN-3214
 URL: https://issues.apache.org/jira/browse/YARN-3214
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Non-exclusive-Node-Partition-Design.pdf


 Currently node labels partition the cluster to some sub-clusters so resources 
 cannot be shared between partitioned cluster. 
 With the current implementation of node labels we cannot use the cluster 
 optimally and the throughput of the cluster will suffer.
 We are proposing adding non-exclusive node labels:
 1. Labeled apps get the preference on Labeled nodes 
 2. If there is no ask for labeled resources we can assign those nodes to non 
 labeled apps
 3. If there is any future ask for those resources , we will preempt the non 
 labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3395) [Fair Scheduler] Handle the user name correctly when submit application and use user name as default queue name.

2015-03-24 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3395:
---
Component/s: (was: scheduler)
 fairscheduler

 [Fair Scheduler] Handle the user name correctly when submit application and 
 use user name as default queue name.
 

 Key: YARN-3395
 URL: https://issues.apache.org/jira/browse/YARN-3395
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3395.000.patch


 Handle the user name correctly when submit application and use user name as 
 default queue name.
 We should reject the application with an empty or whitespace only user name.
 because it doesn't make sense to have an empty or whitespace only user name.
 We should remove the trailing and leading whitespace of the user name when we 
 use user name as default queue name, otherwise it will be rejected by 
 InvalidQueueNameException from QueueManager. I think this change make sense, 
 because it will be compatible with queue name convention and also we already 
 did similar thing for '.' in user name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2213) Change proxy-user cookie log in AmIpFilter to DEBUG

2015-03-24 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2213:
---
Attachment: YARN-2213.02.patch

 Change proxy-user cookie log in AmIpFilter to DEBUG
 ---

 Key: YARN-2213
 URL: https://issues.apache.org/jira/browse/YARN-2213
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Ted Yu
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-2213.001.patch, YARN-2213.02.patch


 I saw a lot of the following lines in AppMaster log:
 {code}
 14/06/24 17:12:36 WARN web.SliderAmIpFilter: Could not find proxy-user 
 cookie, so user will not be set
 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user 
 cookie, so user will not be set
 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user 
 cookie, so user will not be set
 {code}
 For long running app, this would consume considerable log space.
 Log level should be changed to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379264#comment-14379264
 ] 

Hadoop QA commented on YARN-3047:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12707130/YARN-3047.04.patch
  against trunk revision 53a28af.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7098//console

This message is automatically generated.

 [Data Serving] Set up ATS reader with basic request serving structure and 
 lifecycle
 ---

 Key: YARN-3047
 URL: https://issues.apache.org/jira/browse/YARN-3047
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3047.001.patch, YARN-3047.003.patch, 
 YARN-3047.02.patch, YARN-3047.04.patch


 Per design in YARN-2938, set up the ATS reader as a service and implement the 
 basic structure as a service. It includes lifecycle management, request 
 serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3214) Add non-exclusive node labels

2015-03-24 Thread Lohit Vijayarenu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379263#comment-14379263
 ] 

Lohit Vijayarenu commented on YARN-3214:


Thanks [~wangda] for reply. I feel partitions and constraints as two separate 
entities will cause more confusion. If allocation is challenge (as you 
described in example for multiple labels), then it is something which should be 
solved in scheduler, no? This is same problem one would have even without 
labels. For a given node which advertises 10G of memory, and app/queue with X 
and Y, how would you divide resource among X and Y? 
PS: Mesos Scheduler for example uses term called constraints which is similar 
to labels. In that sense I agree with [~vinodkv] that we should probably call 
this feature as partition or something related? 

 Add non-exclusive node labels 
 --

 Key: YARN-3214
 URL: https://issues.apache.org/jira/browse/YARN-3214
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Non-exclusive-Node-Partition-Design.pdf


 Currently node labels partition the cluster to some sub-clusters so resources 
 cannot be shared between partitioned cluster. 
 With the current implementation of node labels we cannot use the cluster 
 optimally and the throughput of the cluster will suffer.
 We are proposing adding non-exclusive node labels:
 1. Labeled apps get the preference on Labeled nodes 
 2. If there is no ask for labeled resources we can assign those nodes to non 
 labeled apps
 3. If there is any future ask for those resources , we will preempt the non 
 labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2213) Change proxy-user cookie log in AmIpFilter to DEBUG

2015-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379292#comment-14379292
 ] 

Hadoop QA commented on YARN-2213:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12707135/YARN-2213.02.patch
  against trunk revision 53a28af.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7099//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7099//console

This message is automatically generated.

 Change proxy-user cookie log in AmIpFilter to DEBUG
 ---

 Key: YARN-2213
 URL: https://issues.apache.org/jira/browse/YARN-2213
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Ted Yu
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-2213.001.patch, YARN-2213.02.patch


 I saw a lot of the following lines in AppMaster log:
 {code}
 14/06/24 17:12:36 WARN web.SliderAmIpFilter: Could not find proxy-user 
 cookie, so user will not be set
 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user 
 cookie, so user will not be set
 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user 
 cookie, so user will not be set
 {code}
 For long running app, this would consume considerable log space.
 Log level should be changed to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3396) Handle URISyntaxException in ResourceLocalizationService

2015-03-24 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379345#comment-14379345
 ] 

Brahma Reddy Battula commented on YARN-3396:


I referred 2.6 code,, Yes, it's there Only two places..will upload patch soon..

 Handle URISyntaxException in ResourceLocalizationService
 

 Key: YARN-3396
 URL: https://issues.apache.org/jira/browse/YARN-3396
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Chengbing Liu
Assignee: Brahma Reddy Battula

 There are two occurrences of the following code snippet:
 {code}
 //TODO fail? Already translated several times...
 {code}
 It should be handled correctly in case that the resource URI is incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2015-03-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378200#comment-14378200
 ] 

Wangda Tan commented on YARN-3362:
--

bq. If this is the case then the approach which you specified makes sense but 
by can you mean currently its not there and in future it can come in ?
Some of them are already existed, like user-limit, and some of them are coming, 
like am-resource-percent.

Sorry I may not understand what's your question, user-limit and queue-limit are 
just two different limits regardless of node labels, sometimes user-limit 
higher and sometimes queue-limit higher. Could you explain what's your question 
(maybe by example)?

Thanks,

 Add node label usage in RM CapacityScheduler web UI
 ---

 Key: YARN-3362
 URL: https://issues.apache.org/jira/browse/YARN-3362
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, webapp
Reporter: Wangda Tan
Assignee: Naganarasimha G R

 We don't have node label usage in RM CapacityScheduler web UI now, without 
 this, user will be hard to understand what happened to nodes have labels 
 assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3392) Change NodeManager metrics to not populate resource usage metrics if they are unavailable

2015-03-24 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3392:

Attachment: YARN-3392.prelim.patch

Demonstrates how returning negative value to show unavailable usage helps in 
tracking usage metrics correctly

 Change NodeManager metrics to not populate resource usage metrics if they are 
 unavailable 
 --

 Key: YARN-3392
 URL: https://issues.apache.org/jira/browse/YARN-3392
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3392.prelim.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3383) AdminService should use warn instead of info to log exception when operation fails

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378242#comment-14378242
 ] 

Hudson commented on YARN-3383:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7420 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7420/])
YARN-3383. AdminService should use warn instead of info to log exception when 
operation fails. (Li Lu via wangda) (wangda: rev 
97a7277a2d696474b5c8e2d814c8291d4bde246e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* hadoop-yarn-project/CHANGES.txt


 AdminService should use warn instead of info to log exception when 
 operation fails
 --

 Key: YARN-3383
 URL: https://issues.apache.org/jira/browse/YARN-3383
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Li Lu
 Fix For: 2.8.0

 Attachments: YARN-3383-032015.patch, YARN-3383-032315.patch


 Now it uses info:
 {code}
   private YarnException logAndWrapException(IOException ioe, String user,
   String argName, String msg) throws YarnException {
 LOG.info(Exception  + msg, ioe);
 {code}
 But it should use warn instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters

2015-03-24 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378303#comment-14378303
 ] 

Anubhav Dhoot commented on YARN-3304:
-

I have a patch available now for YARN-3392 that shows how returning -1 would 
help implement it.
If we take the other approach it would be good to validate that its still 
possible by ensuring those changes are done in this jira. Specifically in this 
example, we should add the boolean options here to ensure we can still do 
YARN-3392. We can then compare the two approaches to see which one is better.

 ResourceCalculatorProcessTree#getCpuUsagePercent default return value is 
 inconsistent with other getters
 

 Key: YARN-3304
 URL: https://issues.apache.org/jira/browse/YARN-3304
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: YARN-3304-v2.patch, YARN-3304.patch


 Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for 
 unavailable case while other resource metrics are return 0 in the same case 
 which sounds inconsistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration

2015-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378310#comment-14378310
 ] 

Hadoop QA commented on YARN-3136:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12706936/0009-YARN-3136.patch
  against trunk revision 6413d34.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7091//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7091//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7091//console

This message is automatically generated.

 getTransferredContainers can be a bottleneck during AM registration
 ---

 Key: YARN-3136
 URL: https://issues.apache.org/jira/browse/YARN-3136
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Sunil G
 Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 
 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, 
 0006-YARN-3136.patch, 0007-YARN-3136.patch, 0008-YARN-3136.patch, 
 0009-YARN-3136.patch


 While examining RM stack traces on a busy cluster I noticed a pattern of AMs 
 stuck waiting for the scheduler lock trying to call getTransferredContainers. 
  The scheduler lock is highly contended, especially on a large cluster with 
 many nodes heartbeating, and it would be nice if we could find a way to 
 eliminate the need to grab this lock during this call.  We've already done 
 similar work during AM allocate calls to make sure they don't needlessly grab 
 the scheduler lock, and it would be good to do so here as well, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration

2015-03-24 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3136:
--
Attachment: 0009-YARN-3136.patch

Hi [~jlowe] and [~jianhe]

I used ConcurrentMap for 'applications'. But findbugs warnings are coming for 
non-synchronized access on this map. Hope that is acceptable, pls share your 
opinion.

 getTransferredContainers can be a bottleneck during AM registration
 ---

 Key: YARN-3136
 URL: https://issues.apache.org/jira/browse/YARN-3136
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Sunil G
 Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 
 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, 
 0006-YARN-3136.patch, 0007-YARN-3136.patch, 0008-YARN-3136.patch, 
 0009-YARN-3136.patch


 While examining RM stack traces on a busy cluster I noticed a pattern of AMs 
 stuck waiting for the scheduler lock trying to call getTransferredContainers. 
  The scheduler lock is highly contended, especially on a large cluster with 
 many nodes heartbeating, and it would be nice if we could find a way to 
 eliminate the need to grab this lock during this call.  We've already done 
 similar work during AM allocate calls to make sure they don't needlessly grab 
 the scheduler lock, and it would be good to do so here as well, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3127) Apphistory url crashes when RM switches with ATS enabled

2015-03-24 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378134#comment-14378134
 ] 

Tsuyoshi Ozawa commented on YARN-3127:
--

[~Naganarasimha] Thank you for taking this issue! The policy of fix looks good 
to me. Could you add a test case to TestRMRestart to cover the case?

Also, can we preserve following test cases?
{code}
-verify(writer).applicationStarted(any(RMApp.class));
-verify(publisher).appCreated(any(RMApp.class), anyLong());
{code}


 Apphistory url crashes when RM switches with ATS enabled
 

 Key: YARN-3127
 URL: https://issues.apache.org/jira/browse/YARN-3127
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, timelineserver
Affects Versions: 2.6.0
 Environment: RM HA with ATS
Reporter: Bibin A Chundatt
Assignee: Naganarasimha G R
 Attachments: YARN-3127.20150213-1.patch


 1.Start RM with HA and ATS configured and run some yarn applications
 2.Once applications are finished sucessfully start timeline server
 3.Now failover HA form active to standby
 4.Access timeline server URL IP:PORT/applicationhistory
 Result: Application history URL fails with below info
 {quote}
 2015-02-03 20:28:09,511 ERROR org.apache.hadoop.yarn.webapp.View: Failed to 
 read the applications.
 java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643)
   at 
 org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:80)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
   ...
 Caused by: 
 org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: The 
 entity for application attempt appattempt_1422972608379_0001_01 doesn't 
 exist in the timeline store
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplicationAttempt(ApplicationHistoryManagerOnTimelineStore.java:151)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.generateApplicationReport(ApplicationHistoryManagerOnTimelineStore.java:499)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAllApplications(ApplicationHistoryManagerOnTimelineStore.java:108)
   at 
 org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:84)
   at 
 org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:81)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   ... 51 more
 2015-02-03 20:28:09,512 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /applicationhistory
 org.apache.hadoop.yarn.webapp.WebAppException: Error rendering block: 
 nestLevel=6 expected 5
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
 {quote}
 Behaviour with AHS with file based history store
   -Apphistory url is working 
   -No attempt entries are shown for each application.
   
 Based on inital analysis when RM switches ,application attempts from state 
 store  are not replayed but only applications are.
 So when /applicaitonhistory url is accessed it tries for all attempt id and 
 fails



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378203#comment-14378203
 ] 

Wangda Tan commented on YARN-2495:
--

Thanks for update. Patch LGTM, +1. will wait and commit in a few days if 
there's no opposite opinions.

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
 YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, 
 YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state

2015-03-24 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3212:
-
Attachment: YARN-3212-v3.patch

Update patch to address review comments above, include:
- Properly handling in case of node (in decommissioning) reconnection with a 
different port.
- Some refactor work, include: merge StatusUpdateWhenHealthyTransition and 
StatusUpdateWhenDecommissioningTransition together.

 RMNode State Transition Update with DECOMMISSIONING state
 -

 Key: YARN-3212
 URL: https://issues.apache.org/jira/browse/YARN-3212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Junping Du
Assignee: Junping Du
 Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, 
 YARN-3212-v2.patch, YARN-3212-v3.patch


 As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and 
 can transition from “running” state triggered by a new event - 
 “decommissioning”. 
 This new state can be transit to state of “decommissioned” when 
 Resource_Update if no running apps on this NM or NM reconnect after restart. 
 Or it received DECOMMISSIONED event (after timeout from CLI).
 In addition, it can back to “running” if user decides to cancel previous 
 decommission by calling recommission on the same node. The reaction to other 
 events is similar to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters

2015-03-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378354#comment-14378354
 ] 

Karthik Kambatla commented on YARN-3304:


Thought a little more about this. If we choose to go with returning 0 and 
adding boolean methods for availability, I would like to see how the 
corresponding user code will look like compared to returning -1. 

Do we expect the code to be the following? If so, how do we handle the usage 
being available at the time of calling isAvailable, and not being available at 
the time of calling getUsage? To avoid this issue, we could get the usage on 
the availability call and cache it, and the getUsage call would return this 
cached value? But, requiring the availability call now is an even more 
incompatible change, no? 
{code}
ResourceTrackerProcessTee procTree = new ();
if (procTree.isMemoryUsageAvailable()) {
  procTree.getMemoryUsage();
}
{code}

And, how is the above user code snippet different from the one below: 
{code}
ResourceTrackerProcessTee procTree = new ();
procTree.getMemoryUsage();
{code}

What is the cost of breaking compat of this previously Private API? I have a 
feeling it would be worth not making the API super-complicated. 

I want to avoid fixing this in a hurry just to unblock the release. I am 
willing to prioritize this, chat offline if need be, and solve it the right 
way. If we think that is too slow, we could always revert YARN-3296 for 2.7.

 ResourceCalculatorProcessTree#getCpuUsagePercent default return value is 
 inconsistent with other getters
 

 Key: YARN-3304
 URL: https://issues.apache.org/jira/browse/YARN-3304
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: YARN-3304-v2.patch, YARN-3304.patch


 Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for 
 unavailable case while other resource metrics are return 0 in the same case 
 which sounds inconsistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state

2015-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378378#comment-14378378
 ] 

Hadoop QA commented on YARN-3212:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12706943/YARN-3212-v3.patch
  against trunk revision 51f1f49.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7092//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7092//console

This message is automatically generated.

 RMNode State Transition Update with DECOMMISSIONING state
 -

 Key: YARN-3212
 URL: https://issues.apache.org/jira/browse/YARN-3212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Junping Du
Assignee: Junping Du
 Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, 
 YARN-3212-v2.patch, YARN-3212-v3.patch


 As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and 
 can transition from “running” state triggered by a new event - 
 “decommissioning”. 
 This new state can be transit to state of “decommissioned” when 
 Resource_Update if no running apps on this NM or NM reconnect after restart. 
 Or it received DECOMMISSIONED event (after timeout from CLI).
 In addition, it can back to “running” if user decides to cancel previous 
 decommission by calling recommission on the same node. The reaction to other 
 events is similar to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector

2015-03-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378364#comment-14378364
 ] 

Zhijie Shen commented on YARN-3034:
---

bq. so that ATS V1 and V2 are less coupled and removal of SMP once completely 
deprecated is smoother

Exactly.

The last patch looks good to me.

 [Collector wireup] Implement RM starting its timeline collector
 ---

 Key: YARN-3034
 URL: https://issues.apache.org/jira/browse/YARN-3034
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3024.20150324-1.patch, YARN-3034-20150312-1.patch, 
 YARN-3034.20150205-1.patch, YARN-3034.20150316-1.patch, 
 YARN-3034.20150318-1.patch, YARN-3034.20150320-1.patch


 Per design in YARN-2928, implement resource managers starting their own ATS 
 writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container

2015-03-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378391#comment-14378391
 ] 

Bartosz Ługowski commented on YARN-1621:


Thanks [~Naganarasimha].

Done all, apart of:

{quote}
* May be we can leverage the benifit of passing the states to AHS too, this 
will reduce the transfer of data from AHS to the client. ur opinion ?
* If we are incorporating the above point then i feel only only when 
appNotFoundInRM we need to query for all states from AHS if not querying for 
COMPLETE state would be sufficient.
{quote}
Correct me if I'm wrong, but AHS has only COMPLETE containers, so we need to 
query AHS only if states filter is empty(ALL) or contains COMPLETE state.
{quote}
* No test cases for modification of 
GetContainersRequestPBImpl/GetContainersRequestProto
{quote}
There are already tests for this in: 
org.apache.hadoop.yarn.api.TestPBImplRecords#testGetContainersRequestPBImpl ?
{quote}
* there are some test case failures and findbugs issues reported can you take a 
look at it
{quote}
Not related with this patch.

 Add CLI to list rows of task attempt ID, container ID, host of container, 
 state of container
 --

 Key: YARN-1621
 URL: https://issues.apache.org/jira/browse/YARN-1621
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Tassapol Athiapinya
Assignee: Bartosz Ługowski
 Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch, 
 YARN-1621.4.patch, YARN-1621.5.patch, YARN-1621.6.patch


 As more applications are moved to YARN, we need generic CLI to list rows of 
 task attempt ID, container ID, host of container, state of container. Today 
 if YARN application running in a container does hang, there is no way to find 
 out more info because a user does not know where each attempt is running in.
 For each running application, it is useful to differentiate between 
 running/succeeded/failed/killed containers.
  
 {code:title=proposed yarn cli}
 $ yarn application -list-containers -applicationId appId [-containerState 
 state of container]
 where containerState is optional filter to list container in given state only.
 container state can be running/succeeded/killed/failed/all.
 A user can specify more than one container state at once e.g. KILLED,FAILED.
 task attempt ID container ID host of container state of container 
 {code}
 CLI should work with running application/completed application. If a 
 container runs many task attempts, all attempts should be shown. That will 
 likely be the case of Tez container-reuse application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container

2015-03-24 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bartosz Ługowski updated YARN-1621:
---
Attachment: YARN-1621.6.patch

 Add CLI to list rows of task attempt ID, container ID, host of container, 
 state of container
 --

 Key: YARN-1621
 URL: https://issues.apache.org/jira/browse/YARN-1621
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Tassapol Athiapinya
Assignee: Bartosz Ługowski
 Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch, 
 YARN-1621.4.patch, YARN-1621.5.patch, YARN-1621.6.patch


 As more applications are moved to YARN, we need generic CLI to list rows of 
 task attempt ID, container ID, host of container, state of container. Today 
 if YARN application running in a container does hang, there is no way to find 
 out more info because a user does not know where each attempt is running in.
 For each running application, it is useful to differentiate between 
 running/succeeded/failed/killed containers.
  
 {code:title=proposed yarn cli}
 $ yarn application -list-containers -applicationId appId [-containerState 
 state of container]
 where containerState is optional filter to list container in given state only.
 container state can be running/succeeded/killed/failed/all.
 A user can specify more than one container state at once e.g. KILLED,FAILED.
 task attempt ID container ID host of container state of container 
 {code}
 CLI should work with running application/completed application. If a 
 container runs many task attempts, all attempts should be shown. That will 
 likely be the case of Tez container-reuse application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377356#comment-14377356
 ] 

Hadoop QA commented on YARN-2495:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12706826/YARN-2495.20150324-1.patch
  against trunk revision 9fae455.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7088//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7088//console

This message is automatically generated.

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
 YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, 
 YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377382#comment-14377382
 ] 

Hudson commented on YARN-1880:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7413 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7413/])
YARN-1880. Cleanup TestApplicationClientProtocolOnHA. Contributed by ozawa. 
(harsh: rev fbceb3b41834d6899c4353fb24f12ba3ecf67faf)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationClientProtocolOnHA.java
* hadoop-yarn-project/CHANGES.txt


 Cleanup TestApplicationClientProtocolOnHA
 -

 Key: YARN-1880
 URL: https://issues.apache.org/jira/browse/YARN-1880
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Affects Versions: 2.6.0
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa
Priority: Trivial
 Fix For: 2.8.0

 Attachments: YARN-1880.1.patch


 The tests introduced on YARN-1521 includes multiple assertion with . We 
 should separate them because it's difficult to identify which condition is 
 illegal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-03-24 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377372#comment-14377372
 ] 

Devaraj K commented on YARN-3225:
-

{code:xml}
org.apache.hadoop.yarn.server.resourcemanager.TestRM
{code}

This test failure is not related to the patch.

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K
 Attachments: YARN-3225-1.patch, YARN-3225.patch, YARN-914.patch


 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA

2015-03-24 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377378#comment-14377378
 ] 

Harsh J commented on YARN-1880:
---

+1, this still applies. Committing shortly, thanks [~ozawa] (and [~ajisakaa] 
for the earlier review)!



 Cleanup TestApplicationClientProtocolOnHA
 -

 Key: YARN-1880
 URL: https://issues.apache.org/jira/browse/YARN-1880
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa
Priority: Trivial
 Attachments: YARN-1880.1.patch


 The tests introduced on YARN-1521 includes multiple assertion with . We 
 should separate them because it's difficult to identify which condition is 
 illegal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA

2015-03-24 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-1880:
--
Component/s: test

 Cleanup TestApplicationClientProtocolOnHA
 -

 Key: YARN-1880
 URL: https://issues.apache.org/jira/browse/YARN-1880
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Affects Versions: 2.6.0
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa
Priority: Trivial
 Fix For: 2.8.0

 Attachments: YARN-1880.1.patch


 The tests introduced on YARN-1521 includes multiple assertion with . We 
 should separate them because it's difficult to identify which condition is 
 illegal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA

2015-03-24 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-1880:
--
Affects Version/s: 2.6.0

 Cleanup TestApplicationClientProtocolOnHA
 -

 Key: YARN-1880
 URL: https://issues.apache.org/jira/browse/YARN-1880
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Affects Versions: 2.6.0
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa
Priority: Trivial
 Fix For: 2.8.0

 Attachments: YARN-1880.1.patch


 The tests introduced on YARN-1521 includes multiple assertion with . We 
 should separate them because it's difficult to identify which condition is 
 illegal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA

2015-03-24 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377403#comment-14377403
 ] 

Tsuyoshi Ozawa commented on YARN-1880:
--

[~qwertymaniac] [~ajisakaa] thank you for the review!

 Cleanup TestApplicationClientProtocolOnHA
 -

 Key: YARN-1880
 URL: https://issues.apache.org/jira/browse/YARN-1880
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Affects Versions: 2.6.0
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa
Priority: Trivial
 Fix For: 2.8.0

 Attachments: YARN-1880.1.patch


 The tests introduced on YARN-1521 includes multiple assertion with . We 
 should separate them because it's difficult to identify which condition is 
 illegal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377950#comment-14377950
 ] 

Hudson commented on YARN-3393:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #133 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/133/])
YARN-3393. Getting application(s) goes wrong when app finishes before (xgong: 
rev 9fae455e26e0230107e1c6db58a49a5b6b296cf4)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java


 Getting application(s) goes wrong when app finishes before starting the 
 attempt
 ---

 Key: YARN-3393
 URL: https://issues.apache.org/jira/browse/YARN-3393
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-3393.1.patch


 When generating app report in ApplicationHistoryManagerOnTimelineStore, it 
 checks if appAttempt == null.
 {code}
 ApplicationAttemptReport appAttempt = 
 getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId());
 if (appAttempt != null) {
   app.appReport.setHost(appAttempt.getHost());
   app.appReport.setRpcPort(appAttempt.getRpcPort());
   app.appReport.setTrackingUrl(appAttempt.getTrackingUrl());
   
 app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl());
 }
 {code}
 However, {{getApplicationAttempt}} doesn't return null but throws 
 ApplicationAttemptNotFoundException:
 {code}
 if (entity == null) {
   throw new ApplicationAttemptNotFoundException(
   The entity for application attempt  + appAttemptId +
doesn't exist in the timeline store);
 } else {
   return convertToApplicationAttemptReport(entity);
 }
 {code}
 They code isn't coupled well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2868) FairScheduler: Metric for latency to allocate first container for an application

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377948#comment-14377948
 ] 

Hudson commented on YARN-2868:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #133 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/133/])
YARN-2868. FairScheduler: Metric for latency to allocate first container for an 
application. (Ray Chiang via kasha) (kasha: rev 
972f1f1ab94a26ec446a272ad030fe13f03ed442)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


 FairScheduler: Metric for latency to allocate first container for an 
 application
 

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Fix For: 2.8.0

 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, 
 YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch, 
 YARN-2868.012.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377955#comment-14377955
 ] 

Hudson commented on YARN-1880:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #133 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/133/])
YARN-1880. Cleanup TestApplicationClientProtocolOnHA. Contributed by ozawa. 
(harsh: rev fbceb3b41834d6899c4353fb24f12ba3ecf67faf)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationClientProtocolOnHA.java


 Cleanup TestApplicationClientProtocolOnHA
 -

 Key: YARN-1880
 URL: https://issues.apache.org/jira/browse/YARN-1880
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Affects Versions: 2.6.0
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa
Priority: Trivial
 Fix For: 2.8.0

 Attachments: YARN-1880.1.patch


 The tests introduced on YARN-1521 includes multiple assertion with . We 
 should separate them because it's difficult to identify which condition is 
 illegal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer

2015-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377952#comment-14377952
 ] 

Hudson commented on YARN-3336:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #133 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/133/])
YARN-3336. FileSystem memory leak in DelegationTokenRenewer. (cnauroth: rev 
6ca1f12024fd7cec7b01df0f039ca59f3f365dc1)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


 FileSystem memory leak in DelegationTokenRenewer
 

 Key: YARN-3336
 URL: https://issues.apache.org/jira/browse/YARN-3336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-3336.000.patch, YARN-3336.001.patch, 
 YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch


 FileSystem memory leak in DelegationTokenRenewer.
 Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
 FileSystem entry will be added to  FileSystem#CACHE which will never be 
 garbage collected.
 This is the implementation of obtainSystemTokensForUser:
 {code}
   protected Token?[] obtainSystemTokensForUser(String user,
   final Credentials credentials) throws IOException, InterruptedException 
 {
 // Get new hdfs tokens on behalf of this user
 UserGroupInformation proxyUser =
 UserGroupInformation.createProxyUser(user,
   UserGroupInformation.getLoginUser());
 Token?[] newTokens =
 proxyUser.doAs(new PrivilegedExceptionActionToken?[]() {
   @Override
   public Token?[] run() throws Exception {
 return FileSystem.get(getConfig()).addDelegationTokens(
   UserGroupInformation.getLoginUser().getUserName(), credentials);
   }
 });
 return newTokens;
   }
 {code}
 The memory leak happened when FileSystem.get(getConfig()) is called with a 
 new proxy user.
 Because createProxyUser will always create a new Subject.
 The calling sequence is 
 FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), 
 conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, 
 conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf)
 {code}
 public static UserGroupInformation createProxyUser(String user,
   UserGroupInformation realUser) {
 if (user == null || user.isEmpty()) {
   throw new IllegalArgumentException(Null user);
 }
 if (realUser == null) {
   throw new IllegalArgumentException(Null real user);
 }
 Subject subject = new Subject();
 SetPrincipal principals = subject.getPrincipals();
 principals.add(new User(user));
 principals.add(new RealUser(realUser));
 UserGroupInformation result =new UserGroupInformation(subject);
 result.setAuthenticationMethod(AuthenticationMethod.PROXY);
 return result;
   }
 {code}
 FileSystem#Cache#Key.equals will compare the ugi
 {code}
   Key(URI uri, Configuration conf, long unique) throws IOException {
 scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase();
 authority = 
 uri.getAuthority()==null?:uri.getAuthority().toLowerCase();
 this.unique = unique;
 this.ugi = UserGroupInformation.getCurrentUser();
   }
   public boolean equals(Object obj) {
 if (obj == this) {
   return true;
 }
 if (obj != null  obj instanceof Key) {
   Key that = (Key)obj;
   return isEqual(this.scheme, that.scheme)
   isEqual(this.authority, that.authority)
   isEqual(this.ugi, that.ugi)
   (this.unique == that.unique);
 }
 return false;
   }
 {code}
 UserGroupInformation.equals will compare subject by reference.
 {code}
   public boolean equals(Object o) {
 if (o == this) {
   return true;
 } else if (o == null || getClass() != o.getClass()) {
   return false;
 } else {
   return subject == ((UserGroupInformation) o).subject;
 }
   }
 {code}
 So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
 are called, a new FileSystem will be created and a new entry will be added to 
 FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >