date:20150312


[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358291#comment-14358291
 ] 

zhihai xu commented on YARN-3336:
-

Hi [~cnauroth], Yes, you are right, I forget the FileSystem RPC Proxy also need 
correct ugi information.
I uploaded a new patch YARN-3336.001.patch which addressed your comment. Please 
review it.
many thanks for the review.


 FileSystem memory leak in DelegationTokenRenewer
 

 Key: YARN-3336
 URL: https://issues.apache.org/jira/browse/YARN-3336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3336.000.patch, YARN-3336.001.patch


 FileSystem memory leak in DelegationTokenRenewer.
 Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
 FileSystem entry will be added to  FileSystem#CACHE which will never be 
 garbage collected.
 This is the implementation of obtainSystemTokensForUser:
 {code}
   protected Token?[] obtainSystemTokensForUser(String user,
   final Credentials credentials) throws IOException, InterruptedException 
 {
 // Get new hdfs tokens on behalf of this user
 UserGroupInformation proxyUser =
 UserGroupInformation.createProxyUser(user,
   UserGroupInformation.getLoginUser());
 Token?[] newTokens =
 proxyUser.doAs(new PrivilegedExceptionActionToken?[]() {
   @Override
   public Token?[] run() throws Exception {
 return FileSystem.get(getConfig()).addDelegationTokens(
   UserGroupInformation.getLoginUser().getUserName(), credentials);
   }
 });
 return newTokens;
   }
 {code}
 The memory leak happened when FileSystem.get(getConfig()) is called with a 
 new proxy user.
 Because createProxyUser will always create a new Subject.
 The calling sequence is 
 FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), 
 conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, 
 conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf)
 {code}
 public static UserGroupInformation createProxyUser(String user,
   UserGroupInformation realUser) {
 if (user == null || user.isEmpty()) {
   throw new IllegalArgumentException(Null user);
 }
 if (realUser == null) {
   throw new IllegalArgumentException(Null real user);
 }
 Subject subject = new Subject();
 SetPrincipal principals = subject.getPrincipals();
 principals.add(new User(user));
 principals.add(new RealUser(realUser));
 UserGroupInformation result =new UserGroupInformation(subject);
 result.setAuthenticationMethod(AuthenticationMethod.PROXY);
 return result;
   }
 {code}
 FileSystem#Cache#Key.equals will compare the ugi
 {code}
   Key(URI uri, Configuration conf, long unique) throws IOException {
 scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase();
 authority = 
 uri.getAuthority()==null?:uri.getAuthority().toLowerCase();
 this.unique = unique;
 this.ugi = UserGroupInformation.getCurrentUser();
   }
   public boolean equals(Object obj) {
 if (obj == this) {
   return true;
 }
 if (obj != null  obj instanceof Key) {
   Key that = (Key)obj;
   return isEqual(this.scheme, that.scheme)
   isEqual(this.authority, that.authority)
   isEqual(this.ugi, that.ugi)
   (this.unique == that.unique);
 }
 return false;
   }
 {code}
 UserGroupInformation.equals will compare subject by reference.
 {code}
   public boolean equals(Object o) {
 if (o == this) {
   return true;
 } else if (o == null || getClass() != o.getClass()) {
   return false;
 } else {
   return subject == ((UserGroupInformation) o).subject;
 }
   }
 {code}
 So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
 are called, a new FileSystem will be created and a new entry will be added to 
 FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node

Vinod Kumar Vavilapalli created YARN-3332:
-

 Summary: [Umbrella] Unified Resource Statistics Collection per node
 Key: YARN-3332
 URL: https://issues.apache.org/jira/browse/YARN-3332
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


Today in YARN, NodeManager collects statistics like per container resource 
usage and overall physical resources available on the machine. Currently this 
is used internally in YARN by the NodeManager for only a limited usage: 
automatically determining the capacity of resources on node and enforcing 
memory usage to what is reserved per container.

This proposal is to extend the existing architecture and collect statistics for 
usage beyond the existing usecases.

Proposal attached in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3335) Job In Error State Will Lost Jobhistory For Second and Later Attempts

Chang Li created YARN-3335:
--

 Summary: Job In Error State Will Lost Jobhistory For Second and 
Later Attempts
 Key: YARN-3335
 URL: https://issues.apache.org/jira/browse/YARN-3335
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li


Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error 
state. In that situation Job's second or some later attempt could succeed but 
those later attempts' history file will all be lost. Because the first attempt 
in error state will copy its history file to intermediate dir while mistakenly 
think of itself as lastattempt. Jobhistory server will later move the history 
file of that error attempt from intermediate dir to done dir while ignore all 
later that job attempt's history file in intermediate dir.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging

2015-03-12 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355354#comment-14355354
 ] 

Jian He commented on YARN-3273:
---

thanks Rohith ! overall sounds good.
bq. All active users table wont be rendered
I think it's also useful to display all active users.  

 Improve web UI to facilitate scheduling analysis and debugging
 --

 Key: YARN-3273
 URL: https://issues.apache.org/jira/browse/YARN-3273
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Rohith
 Attachments: 0001-YARN-3273-v1.patch, 
 YARN-3273-am-resource-used-AND-User-limit.PNG, 
 YARN-3273-application-headroom.PNG


 Job may be stuck for reasons such as:
 - hitting queue capacity 
 - hitting user-limit, 
 - hitting AM-resource-percentage 
 The  first queueCapacity is already shown on the UI.
 We may surface things like:
 - what is user's current usage and user-limit; 
 - what is the AM resource usage and limit;
 - what is the application's current HeadRoom;
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers

2015-03-12 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355395#comment-14355395
 ] 

Bikas Saha commented on YARN-2140:
--

This paper may have useful insights into the network sharing issues.
http://research.microsoft.com/en-us/um/people/srikanth/data/nsdi11_seawall.pdf


 Add support for network IO isolation/scheduling for containers
 --

 Key: YARN-2140
 URL: https://issues.apache.org/jira/browse/YARN-2140
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: NetworkAsAResourceDesign.pdf






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer


 [ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3336:

Attachment: YARN-3336.001.patch

 FileSystem memory leak in DelegationTokenRenewer
 

 Key: YARN-3336
 URL: https://issues.apache.org/jira/browse/YARN-3336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3336.000.patch, YARN-3336.001.patch


 FileSystem memory leak in DelegationTokenRenewer.
 Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
 FileSystem entry will be added to  FileSystem#CACHE which will never be 
 garbage collected.
 This is the implementation of obtainSystemTokensForUser:
 {code}
   protected Token?[] obtainSystemTokensForUser(String user,
   final Credentials credentials) throws IOException, InterruptedException 
 {
 // Get new hdfs tokens on behalf of this user
 UserGroupInformation proxyUser =
 UserGroupInformation.createProxyUser(user,
   UserGroupInformation.getLoginUser());
 Token?[] newTokens =
 proxyUser.doAs(new PrivilegedExceptionActionToken?[]() {
   @Override
   public Token?[] run() throws Exception {
 return FileSystem.get(getConfig()).addDelegationTokens(
   UserGroupInformation.getLoginUser().getUserName(), credentials);
   }
 });
 return newTokens;
   }
 {code}
 The memory leak happened when FileSystem.get(getConfig()) is called with a 
 new proxy user.
 Because createProxyUser will always create a new Subject.
 The calling sequence is 
 FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), 
 conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, 
 conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf)
 {code}
 public static UserGroupInformation createProxyUser(String user,
   UserGroupInformation realUser) {
 if (user == null || user.isEmpty()) {
   throw new IllegalArgumentException(Null user);
 }
 if (realUser == null) {
   throw new IllegalArgumentException(Null real user);
 }
 Subject subject = new Subject();
 SetPrincipal principals = subject.getPrincipals();
 principals.add(new User(user));
 principals.add(new RealUser(realUser));
 UserGroupInformation result =new UserGroupInformation(subject);
 result.setAuthenticationMethod(AuthenticationMethod.PROXY);
 return result;
   }
 {code}
 FileSystem#Cache#Key.equals will compare the ugi
 {code}
   Key(URI uri, Configuration conf, long unique) throws IOException {
 scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase();
 authority = 
 uri.getAuthority()==null?:uri.getAuthority().toLowerCase();
 this.unique = unique;
 this.ugi = UserGroupInformation.getCurrentUser();
   }
   public boolean equals(Object obj) {
 if (obj == this) {
   return true;
 }
 if (obj != null  obj instanceof Key) {
   Key that = (Key)obj;
   return isEqual(this.scheme, that.scheme)
   isEqual(this.authority, that.authority)
   isEqual(this.ugi, that.ugi)
   (this.unique == that.unique);
 }
 return false;
   }
 {code}
 UserGroupInformation.equals will compare subject by reference.
 {code}
   public boolean equals(Object o) {
 if (o == this) {
   return true;
 } else if (o == null || getClass() != o.getClass()) {
   return false;
 } else {
   return subject == ((UserGroupInformation) o).subject;
 }
   }
 {code}
 So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
 are called, a new FileSystem will be created and a new entry will be added to 
 FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3295) Fix documentation nits found in markdown conversion


[ 
https://issues.apache.org/jira/browse/YARN-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357015#comment-14357015
 ] 

Hudson commented on YARN-3295:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #129 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/129/])
YARN-3295. Fix documentation nits found in markdown conversion. Contributed by 
Masatake Iwasaki. (ozawa: rev 30c428a858c179645d6dc82b7027f6b7e871b439)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRestart.md


 Fix documentation nits found in markdown conversion
 ---

 Key: YARN-3295
 URL: https://issues.apache.org/jira/browse/YARN-3295
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Trivial
 Fix For: 2.7.0

 Attachments: YARN-3295.001.patch


 * In ResourceManagerRestart page - Inside the Notes, the _e{epoch}_ , was 
 highlighted before but not now.
 * yarn container command
 {noformat}
 list ApplicationId (should be Application Attempt ID ?)
 Lists containers for the application attempt.
 {noformat}
 * yarn application attempt command
 {noformat}
 list ApplicationId
 Lists applications attempts from the RM (should be Lists applications 
 attempts for the given application)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.


[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357431#comment-14357431
 ] 

Hadoop QA commented on YARN-3243:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703964/YARN-3243.4.patch
  against trunk revision c3003eb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRM
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6919//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6919//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6919//console

This message is automatically generated.

 CapacityScheduler should pass headroom from parent to children to make sure 
 ParentQueue obey its capacity limits.
 -

 Key: YARN-3243
 URL: https://issues.apache.org/jira/browse/YARN-3243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, 
 YARN-3243.4.patch


 Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
 its capacity limits, for example:
 1) When allocating container of a parent queue, it will only check 
 parentQueue.usage  parentQueue.max. If leaf queue allocated a container.size 
  (parentQueue.max - parentQueue.usage), parent queue can excess its max 
 resource limit, as following example:
 {code}
 A  (usage=54, max=55)
/ \
   A1 A2 (usage=1, max=55)
 (usage=53, max=53)
 {code}
 Queue-A2 is able to allocate container since its usage  max, but if we do 
 that, A's usage can excess A.max.
 2) When doing continous reservation check, parent queue will only tell 
 children you need unreserve *some* resource, so that I will less than my 
 maximum resource, but it will not tell how many resource need to be 
 unreserved. This may lead to parent queue excesses configured maximum 
 capacity as well.
 With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
 *here is my proposal*:
 - ParentQueue will set its children's ResourceUsage.headroom, which means, 
 *maximum resource its children can allocate*.
 - ParentQueue will set its children's headroom to be (saying parent's name is 
 qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
 ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
 parent).
 - {{needToUnReserve}} is not necessary, instead, children can get how much 
 resource need to be unreserved to keep its parent's resource limit.
 - More over, with this, YARN-3026 will make a clear boundary between 
 LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

[
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357111#comment-14357111
]

Junping Du commented on YARN-3039:
--

Thanks [~sjlee0]!
Providing an end-to-end flow below which could be helpful for your review:
- When AM get launched, NM auxiliary service will add a new aggregator service
to aggregatorCollection (per Node) for necessary binding work.
aggregatorCollection also has a client for AggregatorNodeManagerProtocol to
notify NM on new app aggregator registered and detailed address.
- When NM get notified, it will update registeredAggregators list (for all
local app aggregators), and notify RM in next heartbeat.
- RM received registeredAggregators from NM, it will update its aggregators
list.
- Next time, when other NMs and AM heartbeat with RM, it will provide
aggregatorInfo in heartbeat response (for AM, it is through AllocationResponse).
- AM of DS has AMRMClientAsync which heartbeat with RM so can receive updated
aggregator address periodically. With registered a callback for listening
aggregator address update, it can update address of TimelineClient in a
thread-safe way.
- AM call timeline operations in a non-blocking way (for not hanging there as
deadlock), currently is wrapping with a new thread but will be improved later
(in another JIRA) for saving resource of threads.
- TimelineClient (consuming v2 service) is looping in retry logic until get
correct address that being set by AM.

[Aggregator wireup] Implement ATS app-appgregator service discovery
---

Key: YARN-3039
URL: https://issues.apache.org/jira/browse/YARN-3039
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
Attachments: Service Binding for applicationaggregator of ATS
(draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf,
YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch,
YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch

Per design in YARN-2928, implement ATS writer service discovery. This is
essential for off-node clients to send writes to the right ATS writer. This
should also handle the case of AM failures.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer


[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358390#comment-14358390
 ] 

Hadoop QA commented on YARN-3336:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704121/YARN-3336.001.patch
  against trunk revision ff83ae7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRM

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6935//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6935//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6935//console

This message is automatically generated.

 FileSystem memory leak in DelegationTokenRenewer
 

 Key: YARN-3336
 URL: https://issues.apache.org/jira/browse/YARN-3336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3336.000.patch, YARN-3336.001.patch


 FileSystem memory leak in DelegationTokenRenewer.
 Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
 FileSystem entry will be added to  FileSystem#CACHE which will never be 
 garbage collected.
 This is the implementation of obtainSystemTokensForUser:
 {code}
   protected Token?[] obtainSystemTokensForUser(String user,
   final Credentials credentials) throws IOException, InterruptedException 
 {
 // Get new hdfs tokens on behalf of this user
 UserGroupInformation proxyUser =
 UserGroupInformation.createProxyUser(user,
   UserGroupInformation.getLoginUser());
 Token?[] newTokens =
 proxyUser.doAs(new PrivilegedExceptionActionToken?[]() {
   @Override
   public Token?[] run() throws Exception {
 return FileSystem.get(getConfig()).addDelegationTokens(
   UserGroupInformation.getLoginUser().getUserName(), credentials);
   }
 });
 return newTokens;
   }
 {code}
 The memory leak happened when FileSystem.get(getConfig()) is called with a 
 new proxy user.
 Because createProxyUser will always create a new Subject.
 The calling sequence is 
 FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), 
 conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, 
 conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf)
 {code}
 public static UserGroupInformation createProxyUser(String user,
   UserGroupInformation realUser) {
 if (user == null || user.isEmpty()) {
   throw new IllegalArgumentException(Null user);
 }
 if (realUser == null) {
   throw new IllegalArgumentException(Null real user);
 }
 Subject subject = new Subject();
 SetPrincipal principals = subject.getPrincipals();
 principals.add(new User(user));
 principals.add(new RealUser(realUser));
 UserGroupInformation result =new UserGroupInformation(subject);
 result.setAuthenticationMethod(AuthenticationMethod.PROXY);
 return result;
   }
 {code}
 FileSystem#Cache#Key.equals will compare the ugi
 {code}
   Key(URI uri, Configuration conf, long unique) throws IOException {
 scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase();
 authority = 
 uri.getAuthority()==null?:uri.getAuthority().toLowerCase();
 this.unique = unique;
 this.ugi = UserGroupInformation.getCurrentUser();
   }
   public boolean equals(Object obj) {
 if (obj == this) {
   return true;
 }
 if (obj != null  obj instanceof Key) {

[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer

2015-03-12 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358427#comment-14358427
 ] 

Naganarasimha G R commented on YARN-3034:
-

Seems like patch is fine, I was able to update the latest code from yarn-2928 
branch, apply the patch and build successfully using {{mvn clean install 
-DskipTests  -Dmaven.javadoc.skip=true}}. Not sure why it failed .

 [Aggregator wireup] Implement RM starting its ATS writer
 

 Key: YARN-3034
 URL: https://issues.apache.org/jira/browse/YARN-3034
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch


 Per design in YARN-2928, implement resource managers starting their own ATS 
 writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3339) TestDockerContainerExecutor should pull a single image and not the entire centos repository

2015-03-12 Thread Ravindra Naik (JIRA)

Ravindra Naik created YARN-3339:
---

 Summary: TestDockerContainerExecutor should pull a single image 
and not the entire centos repository
 Key: YARN-3339
 URL: https://issues.apache.org/jira/browse/YARN-3339
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Affects Versions: 2.6.0
 Environment: Linux
Reporter: Ravindra Naik
Priority: Critical


TestDockerContainerExecutor test pulls the entire centos repository which is 
time consuming.
Pulling a specific image (e.g. centos7) will be sufficient to run the test 
successfully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3339) TestDockerContainerExecutor should pull a single image and not the entire centos repository

2015-03-12 Thread Ravindra Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Naik updated YARN-3339:

Description: 
TestDockerContainerExecutor test pulls the entire centos repository which is 
time consuming.
Pulling a specific image (e.g. centos7) will be sufficient to run the test 
successfully and will save time

  was:
TestDockerContainerExecutor test pulls the entire centos repository which is 
time consuming.
Pulling a specific image (e.g. centos7) will be sufficient to run the test 
successfully.


 TestDockerContainerExecutor should pull a single image and not the entire 
 centos repository
 ---

 Key: YARN-3339
 URL: https://issues.apache.org/jira/browse/YARN-3339
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Affects Versions: 2.6.0
 Environment: Linux
Reporter: Ravindra Naik
Priority: Critical

 TestDockerContainerExecutor test pulls the entire centos repository which is 
 time consuming.
 Pulling a specific image (e.g. centos7) will be sufficient to run the test 
 successfully and will save time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress


[ 
https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358538#comment-14358538
 ] 

Hudson commented on YARN-1884:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #864 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/864/])
YARN-1884. Added nodeHttpAddress into ContainerReport and fixed the link to NM 
web page. Contributed by Xuan Gong. (zjshen: rev 
85f6d67fa78511f255fcfa810afc9a156a7b483b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ContainerMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ContainerBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerReportPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ContainerCreatedEvent.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/ContainerInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java


 ContainerReport should have nodeHttpAddress
 ---

 Key: YARN-1884
 URL: https://issues.apache.org/jira/browse/YARN-1884
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Xuan Gong
 Fix For: 2.7.0

 Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch, 
 YARN-1884.4.patch


 In web UI, we're going to show the node, which used to be to link to the NM 
 web page. However, on AHS web UI, and RM web UI after YARN-1809, the node 
 field has to be set to nodeID where the container is allocated. We need to 
 add nodeHttpAddress to the containerReport to link users to NM web page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place


[ 
https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358550#comment-14358550
 ] 

Hadoop QA commented on YARN-3302:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12704134/YARN-3302-trunk.001.patch
  against trunk revision ff83ae7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6936//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6936//console

This message is automatically generated.

 TestDockerContainerExecutor should run automatically if it can detect docker 
 in the usual place
 ---

 Key: YARN-3302
 URL: https://issues.apache.org/jira/browse/YARN-3302
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Ravi Prakash
 Attachments: YARN-3302-trunk.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3339) TestDockerContainerExecutor should pull a single image and not the entire centos repository


[ 
https://issues.apache.org/jira/browse/YARN-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358588#comment-14358588
 ] 

Hadoop QA commented on YARN-3339:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12704141/YARN-3339-branch-2.6.0.001.patch
  against trunk revision ff83ae7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6937//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6937//console

This message is automatically generated.

 TestDockerContainerExecutor should pull a single image and not the entire 
 centos repository
 ---

 Key: YARN-3339
 URL: https://issues.apache.org/jira/browse/YARN-3339
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Affects Versions: 2.6.0
 Environment: Linux
Reporter: Ravindra Kumar Naik
Priority: Minor
 Fix For: 2.6.0

 Attachments: YARN-3339-branch-2.6.0.001.patch, 
 YARN-3339-trunk.001.patch


 TestDockerContainerExecutor test pulls the entire centos repository which is 
 time consuming.
 Pulling a specific image (e.g. centos7) will be sufficient to run the test 
 successfully and will save time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress


[ 
https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358755#comment-14358755
 ] 

Hudson commented on YARN-1884:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #121 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/121/])
YARN-1884. Added nodeHttpAddress into ContainerReport and fixed the link to NM 
web page. Contributed by Xuan Gong. (zjshen: rev 
85f6d67fa78511f255fcfa810afc9a156a7b483b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerReportPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ContainerCreatedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ContainerBlock.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ContainerMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/ContainerInfo.java


 ContainerReport should have nodeHttpAddress
 ---

 Key: YARN-1884
 URL: https://issues.apache.org/jira/browse/YARN-1884
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Xuan Gong
 Fix For: 2.7.0

 Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch, 
 YARN-1884.4.patch


 In web UI, we're going to show the node, which used to be to link to the NM 
 web page. However, on AHS web UI, and RM web UI after YARN-1809, the node 
 field has to be set to nodeID where the container is allocated. We need to 
 add nodeHttpAddress to the containerReport to link users to NM web page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.


[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358710#comment-14358710
 ] 

Hadoop QA commented on YARN-41:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704146/YARN-41-4.patch
  against trunk revision ff83ae7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
  org.apache.hadoop.yarn.server.resourcemanager.TestRM

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6938//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6938//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6938//console

This message is automatically generated.

 The RM should handle the graceful shutdown of the NM.
 -

 Key: YARN-41
 URL: https://issues.apache.org/jira/browse/YARN-41
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Ravi Teja Ch N V
Assignee: Devaraj K
 Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
 MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, 
 YARN-41-4.patch, YARN-41.patch


 Instead of waiting for the NM expiry, RM should remove and handle the NM, 
 which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3335) Job In Error State Will Lost Jobhistory Of Second and Later Attempts

[
https://issues.apache.org/jira/browse/YARN-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chang Li updated YARN-3335:
---
Description: Related to a fixed issue MAPREDUCE-6230 which cause a Job to
get into error state. In that situation Job's second or some later attempt
could succeed but those later attempts' history file will all be lost. Because
the first attempt in error state will copy its history file to intermediate dir
while mistakenly think of itself as lastattempt. Jobhistory server will later
move the history file of that error attempt from intermediate dir to done dir
while ignore the rest of that job attempt's later history files in intermediate
dir.(was: Related to a fixed issue MAPREDUCE-6230 which cause a Job to get
into error state. In that situation Job's second or some later attempt could
succeed but those later attempts' history file will all be lost. Because the
first attempt in error state will copy its history file to intermediate dir
while mistakenly think of itself as lastattempt. Jobhistory server will later
move the history file of that error attempt from intermediate dir to done dir
while ignore all later that job attempt's history file in intermediate dir. )

Job In Error State Will Lost Jobhistory Of Second and Later Attempts

Key: YARN-3335
URL: https://issues.apache.org/jira/browse/YARN-3335
Project: Hadoop YARN
Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
Attachments: YARN-3335.1.patch

Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error
state. In that situation Job's second or some later attempt could succeed but
those later attempts' history file will all be lost. Because the first
attempt in error state will copy its history file to intermediate dir while
mistakenly think of itself as lastattempt. Jobhistory server will later move
the history file of that error attempt from intermediate dir to done dir
while ignore the rest of that job attempt's later history files in
intermediate dir.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress


[ 
https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358737#comment-14358737
 ] 

Hudson commented on YARN-1884:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2062 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2062/])
YARN-1884. Added nodeHttpAddress into ContainerReport and fixed the link to NM 
web page. Contributed by Xuan Gong. (zjshen: rev 
85f6d67fa78511f255fcfa810afc9a156a7b483b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ContainerCreatedEvent.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ContainerMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/ContainerInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerReportPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ContainerBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java


 ContainerReport should have nodeHttpAddress
 ---

 Key: YARN-1884
 URL: https://issues.apache.org/jira/browse/YARN-1884
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Xuan Gong
 Fix For: 2.7.0

 Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch, 
 YARN-1884.4.patch


 In web UI, we're going to show the node, which used to be to link to the NM 
 web page. However, on AHS web UI, and RM web UI after YARN-1809, the node 
 field has to be set to nodeID where the container is allocated. We need to 
 add nodeHttpAddress to the containerReport to link users to NM web page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3331) NodeManager should use directory other than tmp for extracting and loading leveldbjni

2015-03-12 Thread Allen Wittenauer (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358768#comment-14358768
]

Allen Wittenauer commented on YARN-3331:

We really can't read this in via the core-site.xml file? I could have sworn we
populated the system properties with the content of the xml files on startup.
It's really less than ideal to set this up on the command line as it is already
quite long and there is a risk that we'll overflow the buffer, preventing
things from running.

That issue aside:

a) We need to avoid using daemon-specific and/or project-specific environment
variables. This is the #1 reason why the branch-2 shell code is just utter
chaos. This should be HADOOP_ something.

b) Additionally, this should be set for everything, not just the nodemanager in
order to keep this consistent if/when other stuff uses this functionality. The
add_param should probably happen in finalize_hadoop_opts

c) The default should be set with an actual value in hadoop_basic_init. $(pwd)
is a *terrible* default value for this, because there is no guarantee what the
pwd actually is.

d) There needs to be a stanza added to hadoop-env.sh that actually explains
what your new environment variable does, how to set it, etc.

All/most of this is covered on
http://wiki.apache.org/hadoop/UnixShellScriptProgrammingGuide . I'm now going
to go check to make sure I didn't miss anything on that page and add it if I
did. :)

NodeManager should use directory other than tmp for extracting and loading
leveldbjni
-

Key: YARN-3331
URL: https://issues.apache.org/jira/browse/YARN-3331
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Attachments: YARN-3331.001.patch

/tmp can be required to be noexec in many environments. This causes a
problem when nodemanager tries to load the leveldbjni library which can get
unpacked and executed from /tmp.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler

2015-03-12 Thread Nathan Roberts (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358708#comment-14358708
 ] 

Nathan Roberts commented on YARN-3298:
--

I agree. Let's not change anything for the time being. If YARN-2113 requires 
some tweaking in this area, we can do it at that time.


 User-limit should be enforced in CapacityScheduler
 --

 Key: YARN-3298
 URL: https://issues.apache.org/jira/browse/YARN-3298
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, yarn
Reporter: Wangda Tan
Assignee: Wangda Tan

 User-limit is not treat as a hard-limit for now, it will not consider 
 required-resource (resource of being-allocated resource request). And also, 
 when user's used resource equals to user-limit, it will still continue. This 
 will generate jitter issues when we have YARN-2069 (preemption policy kills a 
 container under an user, and scheduler allocate a container under the same 
 user soon after).
 The expected behavior should be as same as queue's capacity:
 Only when user.usage + required = user-limit (1), queue will continue to 
 allocate container.
 (1), user-limit mentioned here is determined by following computing
 {code}
 current-capacity = queue.used + now-required (when queue.used  
 queue.capacity)
queue.capacity (when queue.used  queue.capacity)
 user-limit = min(max(current-capacity / #active-users, current-capacity * 
 user-limit / 100), queue-capacity * user-limit-factor)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3347) Improve YARN log command to get AMContainer logs

[
https://issues.apache.org/jira/browse/YARN-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xuan Gong updated YARN-3347:

Description:
Right now, we could specify applicationId, node http address and container ID
to get the specify container log. Or we could only specify applicationId to get
all the container logs. It is very hard for the users to get logs for AM
container since the AMContainer logs have more useful information. Users need
to know the AMContainer's container ID and related Node http address.

We could improve the YARN Log Command to allow users to get AMContainer logs
directly

was:
Right now, we could specify applicationId node http address and container ID to
get the specify container log. Or we could only specify applicationId to get
all the container logs. It is very hard for the users to get logs for AM
container since the AMContainer logs have more useful information. Users need
to know the AMContainer's container ID and related Node http address.

We could improve the YARN Log Command to allow users to get AMContainer logs
directly

Improve YARN log command to get AMContainer logs

Key: YARN-3347
URL: https://issues.apache.org/jira/browse/YARN-3347
Project: Hadoop YARN
Issue Type: Test
Reporter: Xuan Gong
Assignee: Xuan Gong

Right now, we could specify applicationId, node http address and container ID
to get the specify container log. Or we could only specify applicationId to
get all the container logs. It is very hard for the users to get logs for AM
container since the AMContainer logs have more useful information. Users need
to know the AMContainer's container ID and related Node http address.
We could improve the YARN Log Command to allow users to get AMContainer logs
directly

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3347) Improve YARN log command to get AMContainer logs

Xuan Gong created YARN-3347:
---

 Summary: Improve YARN log command to get AMContainer logs
 Key: YARN-3347
 URL: https://issues.apache.org/jira/browse/YARN-3347
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Xuan Gong
Assignee: Xuan Gong


Right now, we could specify applicationId node http address and container ID to 
get the specify container log. Or we could only specify applicationId to get 
all the container logs. It is very hard for the users to get logs for AM 
container since the AMContainer logs have more useful information. Users need 
to know the AMContainer's container ID and related Node http address.

We could improve the YARN Log Command to allow users to get AMContainer logs 
directly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3347) Improve YARN log command to get AMContainer logs


 [ 
https://issues.apache.org/jira/browse/YARN-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3347:
--
Issue Type: Sub-task  (was: Test)
Parent: YARN-431

 Improve YARN log command to get AMContainer logs
 

 Key: YARN-3347
 URL: https://issues.apache.org/jira/browse/YARN-3347
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong

 Right now, we could specify applicationId, node http address and container ID 
 to get the specify container log. Or we could only specify applicationId to 
 get all the container logs. It is very hard for the users to get logs for AM 
 container since the AMContainer logs have more useful information. Users need 
 to know the AMContainer's container ID and related Node http address.
 We could improve the YARN Log Command to allow users to get AMContainer logs 
 directly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3284) Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN command

[
https://issues.apache.org/jira/browse/YARN-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359969#comment-14359969
]

Hadoop QA commented on YARN-3284:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12704326/YARN-3284.1.patch
against trunk revision 8212877.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 11 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:red}-1 findbugs{color}. The patch appears to introduce 9 new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher

org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
org.apache.hadoop.yarn.server.resourcemanager.TestRM

org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/6949//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/6949//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/6949//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/6949//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6949//console

This message is automatically generated.

Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN
command
-

Key: YARN-3284
URL: https://issues.apache.org/jira/browse/YARN-3284
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Attachments: YARN-3284.1.patch, YARN-3284.1.patch

Current, we have some extra metrics about the application and current attempt
in RM Web UI. We should expose that information through YARN Command, too.
1. Preemption metrics
2. application outstanding resource requests
3. container locality info

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation

2015-03-12 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359893#comment-14359893
 ] 

Rohith commented on YARN-3305:
--

Updated patch as per comment.. Kindly review that latest patch

 AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is 
 less than minimumAllocation
 

 Key: YARN-3305
 URL: https://issues.apache.org/jira/browse/YARN-3305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch


 For given any ResourceRequest, {{CS#allocate}} normalizes request to 
 minimumAllocation if requested memory is less than minimumAllocation.
 But AM-used resource is updated with actual ResourceRequest made by user. 
 This results in AM container allocation more than Max ApplicationMaster 
 Resource.
 This is because AM-Used is updated with actual ResourceRequest made by user 
 while activating the applications. But during allocation of container, 
 ResourceRequest is normalized to minimumAllocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation

2015-03-12 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359881#comment-14359881
 ] 

Rohith commented on YARN-3305:
--

bq. call normalize the am request in RMAppManager after 
validateAndCreateResourceRequest to make sure the am request stored in RMAppIml 
is also correct. 
Agree.. Will change it

 AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is 
 less than minimumAllocation
 

 Key: YARN-3305
 URL: https://issues.apache.org/jira/browse/YARN-3305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Attachments: 0001-YARN-3305.patch


 For given any ResourceRequest, {{CS#allocate}} normalizes request to 
 minimumAllocation if requested memory is less than minimumAllocation.
 But AM-used resource is updated with actual ResourceRequest made by user. 
 This results in AM container allocation more than Max ApplicationMaster 
 Resource.
 This is because AM-Used is updated with actual ResourceRequest made by user 
 while activating the applications. But during allocation of container, 
 ResourceRequest is normalized to minimumAllocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

[
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359957#comment-14359957
]

Junping Du commented on YARN-3039:
--

Also, thanks [~zjshen] for the comments!
bq. Sorry for raising the question in late, but I'd like to think it out loudly
about the first step.
No worry. Good idea never comes late.

bq. Nowadays, app-level aggregator is started by the callback handler listening
to the container start even of NM. Given we are going to support stand-alone
and container mode, this approach may not work. As we're going to have IPC
channel between aggregator and NM, should we use an IPC call to invoke adding
one app-level aggregator.
That will make NM as client and aggregatorCollection as server. I think in our
design for multiple aggregator models (show in YARN-3033), the relationship
between NM and aggregatorCollection in future could be 1 to n mapping. In this
case, making NM as server could be better, or NM have to maintain different
clients to talk to different aggregatorCollections. Also, it is more align with
design of YARN-3332 which sounds like NM will have more information to share
out as a server. [~vinodkv], please correct me if I miss something here.
Making NM as server can still work for stand-alone and container mode, several
ways I can think of now:
1. add a heartbeat between NM and aggregatorCollection: given it is IPC and
number of aggregatorCollection is 1 or a small number, the heartbeat interval
can be much less than 1 second. Also, richer info (like healthy status, etc.)
could be taken through the request and response rather than just applicationID
and service address.
2. if we don't want any delay, we can make NM holding on RPC response a while
and return if a new AM launch request or reach to an interval, aggregator will
send RPC request again immediately even no new aggregator get bind.
Thoughts?

[Aggregator wireup] Implement ATS app-appgregator service discovery
---

Per design in YARN-2928, implement ATS writer service discovery. This is
essential for off-node clients to send writes to the right ATS writer. This
should also handle the case of AM failures.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer

2015-03-12 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359959#comment-14359959
 ] 

Naganarasimha G R commented on YARN-3034:
-

Thanks for reviewing the patch [~gtCarrera9]
I have not yet added the implementation part, where in i am planning to expose 
public methods similar to SystemMetricPublisher (appCreated, appFinished, 
appACLsUpdated,appAttemptRegistered,appAttemptFinished) and more based on the 
need. All these methods require ResourceManager project classes like 
RMAppAttempt, RMApp, RMAppAttemptState etc.. needs to be passed . Hence have 
kept this package in Resourcemanager project itself.
And also planning to move SystemMetricsEvent and its subclasses (related to App 
and Appattempt) to this package.
Please provide your opinion on the same 

 [Aggregator wireup] Implement RM starting its ATS writer
 

 Key: YARN-3034
 URL: https://issues.apache.org/jira/browse/YARN-3034
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch


 Per design in YARN-2928, implement resource managers starting their own ATS 
 writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation

2015-03-12 Thread Rohith (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3305:
-
Attachment: 0001-YARN-3305.patch

 AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is 
 less than minimumAllocation
 

 Key: YARN-3305
 URL: https://issues.apache.org/jira/browse/YARN-3305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch


 For given any ResourceRequest, {{CS#allocate}} normalizes request to 
 minimumAllocation if requested memory is less than minimumAllocation.
 But AM-used resource is updated with actual ResourceRequest made by user. 
 This results in AM container allocation more than Max ApplicationMaster 
 Resource.
 This is because AM-Used is updated with actual ResourceRequest made by user 
 while activating the applications. But during allocation of container, 
 ResourceRequest is normalized to minimumAllocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery


[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359936#comment-14359936
 ] 

Junping Du commented on YARN-3039:
--

[~sjlee0], Thanks for comments here! In my understanding, if 
timelineServiceAddress is not null, the pollTimelineServiceAddress() can be 
fast return as below. Isn't it? Do you think we still need a null check here?
{code}
  private int pollTimelineServiceAddress(int retries) {
while (timelineServiceAddress == null  retries  0) {
  try {
Thread.sleep(this.serviceRetryInterval);
  } catch (InterruptedException e) {
Thread.currentThread().interrupt();
  }
  timelineServiceAddress = getTimelineServiceAddress();
  retries--;
}
return retries;
  }
{code}

 [Aggregator wireup] Implement ATS app-appgregator service discovery
 ---

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, 
 YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, 
 YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation


[ 
https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359953#comment-14359953
 ] 

Hadoop QA commented on YARN-3305:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704352/0001-YARN-3305.patch
  against trunk revision a852910.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
  org.apache.hadoop.yarn.server.resourcemanager.TestAppManager
  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6950//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6950//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6950//console

This message is automatically generated.

 AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is 
 less than minimumAllocation
 

 Key: YARN-3305
 URL: https://issues.apache.org/jira/browse/YARN-3305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch


 For given any ResourceRequest, {{CS#allocate}} normalizes request to 
 minimumAllocation if requested memory is less than minimumAllocation.
 But AM-used resource is updated with actual ResourceRequest made by user. 
 This results in AM container allocation more than Max ApplicationMaster 
 Resource.
 This is because AM-Used is updated with actual ResourceRequest made by user 
 while activating the applications. But during allocation of container, 
 ResourceRequest is normalized to minimumAllocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period


 [ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3294:

Attachment: (was: apache-yarn-3294.0.patch)

 Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
 period
 -

 Key: YARN-3294
 URL: https://issues.apache.org/jira/browse/YARN-3294
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, 
 apache-yarn-3294.0.patch


 It would be nice to have a button on the web UI that would allow dumping of 
 debug logs for just the capacity scheduler for a fixed period of time(1 min, 
 5 min or so) in a separate log file. It would be useful when debugging 
 scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period


 [ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3294:

Attachment: apache-yarn-3294.0.patch

 Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
 period
 -

 Key: YARN-3294
 URL: https://issues.apache.org/jira/browse/YARN-3294
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, 
 apache-yarn-3294.0.patch


 It would be nice to have a button on the web UI that would allow dumping of 
 debug logs for just the capacity scheduler for a fixed period of time(1 min, 
 5 min or so) in a separate log file. It would be useful when debugging 
 scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records


 [ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-3267:
---
Attachment: YARN-3267.4.patch

 Timelineserver applies the ACL rules after applying the limit on the number 
 of records
 --

 Key: YARN-3267
 URL: https://issues.apache.org/jira/browse/YARN-3267
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Prakash Ramachandran
Assignee: Chang Li
 Attachments: YARN-3267.3.patch, YARN-3267.4.patch, 
 YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, 
 YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch


 While fetching the entities from timelineserver, the limit is applied on the 
 entities to be fetched from leveldb, the ACL filters are applied after this 
 (TimelineDataManager.java::getEntities). 
 this could mean that even if there are entities available which match the 
 query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records


 [ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-3267:
---
Attachment: (was: YARN-3267.4.patch)

 Timelineserver applies the ACL rules after applying the limit on the number 
 of records
 --

 Key: YARN-3267
 URL: https://issues.apache.org/jira/browse/YARN-3267
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Prakash Ramachandran
Assignee: Chang Li
 Attachments: YARN-3267.3.patch, YARN_3267_V1.patch, 
 YARN_3267_V2.patch, YARN_3267_WIP.patch, YARN_3267_WIP1.patch, 
 YARN_3267_WIP2.patch, YARN_3267_WIP3.patch


 While fetching the entities from timelineserver, the limit is applied on the 
 entities to be fetched from leveldb, the ACL filters are applied after this 
 (TimelineDataManager.java::getEntities). 
 this could mean that even if there are entities available which match the 
 query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress


[ 
https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358886#comment-14358886
 ] 

Hudson commented on YARN-1884:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2080 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2080/])
YARN-1884. Added nodeHttpAddress into ContainerReport and fixed the link to NM 
web page. Contributed by Xuan Gong. (zjshen: rev 
85f6d67fa78511f255fcfa810afc9a156a7b483b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ContainerMetricsConstants.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ContainerBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/ContainerInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerReportPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ContainerCreatedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto


 ContainerReport should have nodeHttpAddress
 ---

 Key: YARN-1884
 URL: https://issues.apache.org/jira/browse/YARN-1884
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Xuan Gong
 Fix For: 2.7.0

 Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch, 
 YARN-1884.4.patch


 In web UI, we're going to show the node, which used to be to link to the NM 
 web page. However, on AHS web UI, and RM web UI after YARN-1809, the node 
 field has to be set to nodeID where the container is allocated. We need to 
 add nodeHttpAddress to the containerReport to link users to NM web page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3330) Implement a protobuf compatibility checker to check if a patch breaks the compatibility with existing client and internal protocols

2015-03-12 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358792#comment-14358792
 ] 

Allen Wittenauer commented on YARN-3330:


This shouldn't be yarn specific.  I'd HIGHLY recommend bumping this up to a 
full issue into the HADOOP project so that:

a) goes into common

b) has higher visibility

 Implement a protobuf compatibility checker to check if a patch breaks the 
 compatibility with existing client and internal protocols
 ---

 Key: YARN-3330
 URL: https://issues.apache.org/jira/browse/YARN-3330
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: pdiff_patch.py


 Per YARN-3292, we may want to start YARN rolling upgrade test compatibility 
 verification tool by a simple script to check protobuf compatibility. The 
 script may work on incoming patch files, check if there are any changes to 
 protobuf files, and report any potentially incompatible changes (line 
 removals, etc,.). We may want the tool to be conservative: it may report 
 false positives, but we should minimize its chance to have false negatives. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2015-03-12 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358798#comment-14358798
 ] 

Mit Desai commented on YARN-2890:
-

Verified the test failures are not due to my patch.
Some of them passes with my patch on my local machine. And some always fail for 
me.

 MiniMRYarnCluster should turn on timeline service if configured to do so
 

 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
 YARN-2890.patch, YARN-2890.patch


 Currently the MiniMRYarnCluster does not consider the configuration value for 
 enabling timeline service before starting. The MiniYarnCluster should only 
 start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records


 [ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-3267:
---
Attachment: YARN-3267.4.patch

 Timelineserver applies the ACL rules after applying the limit on the number 
 of records
 --

 Key: YARN-3267
 URL: https://issues.apache.org/jira/browse/YARN-3267
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Prakash Ramachandran
Assignee: Chang Li
 Attachments: YARN-3267.3.patch, YARN-3267.4.patch, 
 YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, 
 YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch


 While fetching the entities from timelineserver, the limit is applied on the 
 entities to be fetched from leveldb, the ACL filters are applied after this 
 (TimelineDataManager.java::getEntities). 
 this could mean that even if there are entities available which match the 
 query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period


 [ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3294:

Attachment: Screen Shot 2015-03-12 at 8.51.25 PM.png

Uploaded screenshot with the changes.

 Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
 period
 -

 Key: YARN-3294
 URL: https://issues.apache.org/jira/browse/YARN-3294
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png


 It would be nice to have a button on the web UI that would allow dumping of 
 debug logs for just the capacity scheduler for a fixed period of time(1 min, 
 5 min or so) in a separate log file. It would be useful when debugging 
 scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period


 [ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3294:

Attachment: apache-yarn-3294.0.patch

 Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
 period
 -

 Key: YARN-3294
 URL: https://issues.apache.org/jira/browse/YARN-3294
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, 
 apache-yarn-3294.0.patch


 It would be nice to have a button on the web UI that would allow dumping of 
 debug logs for just the capacity scheduler for a fixed period of time(1 min, 
 5 min or so) in a separate log file. It would be useful when debugging 
 scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records


[ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358823#comment-14358823
 ] 

Chang Li commented on YARN-3267:


Thanks [~jeagles] for further review, updated patch according to those 
suggestions.

 Timelineserver applies the ACL rules after applying the limit on the number 
 of records
 --

 Key: YARN-3267
 URL: https://issues.apache.org/jira/browse/YARN-3267
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Prakash Ramachandran
Assignee: Chang Li
 Attachments: YARN-3267.3.patch, YARN-3267.4.patch, 
 YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, 
 YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch


 While fetching the entities from timelineserver, the limit is applied on the 
 entities to be fetched from leveldb, the ACL filters are applied after this 
 (TimelineDataManager.java::getEntities). 
 this could mean that even if there are entities available which match the 
 query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period


[ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358821#comment-14358821
 ] 

Varun Vasudev commented on YARN-3294:
-

The uploaded patch will dump debug logs for the capacity scheduler to 
yarn-capacity-scheduler-debug.log in the logs directory. Older logs will be 
overwritten and not rotated.

 Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
 period
 -

 Key: YARN-3294
 URL: https://issues.apache.org/jira/browse/YARN-3294
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, 
 apache-yarn-3294.0.patch


 It would be nice to have a button on the web UI that would allow dumping of 
 debug logs for just the capacity scheduler for a fixed period of time(1 min, 
 5 min or so) in a separate log file. It would be useful when debugging 
 scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress


[ 
https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358833#comment-14358833
 ] 

Hudson commented on YARN-1884:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #130 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/130/])
YARN-1884. Added nodeHttpAddress into ContainerReport and fixed the link to NM 
web page. Contributed by Xuan Gong. (zjshen: rev 
85f6d67fa78511f255fcfa810afc9a156a7b483b)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/ContainerInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ContainerCreatedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ContainerBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ContainerMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerReportPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java


 ContainerReport should have nodeHttpAddress
 ---

 Key: YARN-1884
 URL: https://issues.apache.org/jira/browse/YARN-1884
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Xuan Gong
 Fix For: 2.7.0

 Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch, 
 YARN-1884.4.patch


 In web UI, we're going to show the node, which used to be to link to the NM 
 web page. However, on AHS web UI, and RM web UI after YARN-1809, the node 
 field has to be set to nodeID where the container is allocated. We need to 
 add nodeHttpAddress to the containerReport to link users to NM web page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3291) DockerContainerExecutor should run as a non-root user inside the container

2015-03-12 Thread Abin Shahab (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358856#comment-14358856
 ] 

Abin Shahab commented on YARN-3291:
---

[~sidharta-s] [~raviprak] [~aw] [~vinodkv] [~vvasudev]
Please review.

 DockerContainerExecutor should run as a non-root user inside the container
 --

 Key: YARN-3291
 URL: https://issues.apache.org/jira/browse/YARN-3291
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Abin Shahab
Assignee: Abin Shahab
 Attachments: YARN-3291.patch


 Currently DockerContainerExecutor runs container as root(inside the 
 container). Outside the container it runs as yarn. Inside the this can be run 
 as the user which is not root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3291) DockerContainerExecutor should run as a non-root user inside the container

2015-03-12 Thread Abin Shahab (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358858#comment-14358858
 ] 

Abin Shahab commented on YARN-3291:
---

I'll fix the findbug error.

 DockerContainerExecutor should run as a non-root user inside the container
 --

 Key: YARN-3291
 URL: https://issues.apache.org/jira/browse/YARN-3291
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Abin Shahab
Assignee: Abin Shahab
 Attachments: YARN-3291.patch


 Currently DockerContainerExecutor runs container as root(inside the 
 container). Outside the container it runs as yarn. Inside the this can be run 
 as the user which is not root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN


[ 
https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358977#comment-14358977
 ] 

Xuan Gong commented on YARN-3338:
-

+1. lgtm. will commit

 Exclude jline dependency from YARN
 --

 Key: YARN-3338
 URL: https://issues.apache.org/jira/browse/YARN-3338
 Project: Hadoop YARN
  Issue Type: Bug
  Components: build
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-3338.1.patch


 It was fixed in YARN-2815, but is broken again by YARN-1514.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer

2015-03-12 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359012#comment-14359012
 ] 

Sangjin Lee commented on YARN-3034:
---

The hadoop jenkins would try to apply it against the trunk, so it wouldn't pass 
anyway.

I took a look at the patch, and it looks good for the most part. One minor 
comment:

- resourcemanager/pom.xml
  -- you should be able to remove the version as it is specified in the parent 
pom
  -- nit: reduce spaces to 2

Thanks!

 [Aggregator wireup] Implement RM starting its ATS writer
 

 Key: YARN-3034
 URL: https://issues.apache.org/jira/browse/YARN-3034
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch


 Per design in YARN-2928, implement resource managers starting their own ATS 
 writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN


[ 
https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359020#comment-14359020
 ] 

Xuan Gong commented on YARN-3338:
-

Committed to trunk/branch-2/branch-2.7. Thanks, zhijie

 Exclude jline dependency from YARN
 --

 Key: YARN-3338
 URL: https://issues.apache.org/jira/browse/YARN-3338
 Project: Hadoop YARN
  Issue Type: Bug
  Components: build
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3338.1.patch


 It was fixed in YARN-2815, but is broken again by YARN-1514.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN


[ 
https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359032#comment-14359032
 ] 

Hudson commented on YARN-3338:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7309 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7309/])
YARN-3338. Exclude jline dependency from YARN. Contributed by Zhijie (xgong: 
rev 06ce1d9a6cd9bec25e2f478b98264caf96a3ea44)
* hadoop-project/pom.xml
* hadoop-yarn-project/CHANGES.txt


 Exclude jline dependency from YARN
 --

 Key: YARN-3338
 URL: https://issues.apache.org/jira/browse/YARN-3338
 Project: Hadoop YARN
  Issue Type: Bug
  Components: build
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3338.1.patch


 It was fixed in YARN-2815, but is broken again by YARN-1514.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN

2015-03-12 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358963#comment-14358963
 ] 

Chris Douglas commented on YARN-3338:
-

+1 lgtm

 Exclude jline dependency from YARN
 --

 Key: YARN-3338
 URL: https://issues.apache.org/jira/browse/YARN-3338
 Project: Hadoop YARN
  Issue Type: Bug
  Components: build
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-3338.1.patch


 It was fixed in YARN-2815, but is broken again by YARN-1514.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes


[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358986#comment-14358986
 ] 

Varun Vasudev commented on YARN-3326:
-

[~Naganarasimha] - can we change the endpoint from /get-labels-to-Nodes to a 
more RESTful name. Something like /labels/nodes?labels=nodes?

 ReST support for getLabelsToNodes 
 --

 Key: YARN-3326
 URL: https://issues.apache.org/jira/browse/YARN-3326
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
 Attachments: YARN-3326.20150310-1.patch


 REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.


 [ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3243:
-
Attachment: YARN-3243.5.patch

 CapacityScheduler should pass headroom from parent to children to make sure 
 ParentQueue obey its capacity limits.
 -

 Key: YARN-3243
 URL: https://issues.apache.org/jira/browse/YARN-3243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, 
 YARN-3243.4.patch, YARN-3243.5.patch


 Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
 its capacity limits, for example:
 1) When allocating container of a parent queue, it will only check 
 parentQueue.usage  parentQueue.max. If leaf queue allocated a container.size 
  (parentQueue.max - parentQueue.usage), parent queue can excess its max 
 resource limit, as following example:
 {code}
 A  (usage=54, max=55)
/ \
   A1 A2 (usage=1, max=55)
 (usage=53, max=53)
 {code}
 Queue-A2 is able to allocate container since its usage  max, but if we do 
 that, A's usage can excess A.max.
 2) When doing continous reservation check, parent queue will only tell 
 children you need unreserve *some* resource, so that I will less than my 
 maximum resource, but it will not tell how many resource need to be 
 unreserved. This may lead to parent queue excesses configured maximum 
 capacity as well.
 With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
 *here is my proposal*:
 - ParentQueue will set its children's ResourceUsage.headroom, which means, 
 *maximum resource its children can allocate*.
 - ParentQueue will set its children's headroom to be (saying parent's name is 
 qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
 ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
 parent).
 - {{needToUnReserve}} is not necessary, instead, children can get how much 
 resource need to be unreserved to keep its parent's resource limit.
 - More over, with this, YARN-3026 will make a clear boundary between 
 LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2792) Have a public Test-only API for creating important records that ecosystem projects can depend on


 [ 
https://issues.apache.org/jira/browse/YARN-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2792:
-
Priority: Major  (was: Blocker)

 Have a public Test-only API for creating important records that ecosystem 
 projects can depend on
 

 Key: YARN-2792
 URL: https://issues.apache.org/jira/browse/YARN-2792
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli

 From YARN-2789,
 {quote}
 Sigh.
 Even though this is a private API, it will be used by downstream projects 
 for testing. It'll be useful for this to be re-instated, maybe with a 
 deprecated annotation, so that older versions of downstream projects can 
 build against Hadoop 2.6.
 I am inclined to have a separate test-only public util API that keeps 
 compatibility for tests. Rather than opening unwanted APIs up. I'll file a 
 separate ticket for this, we need all YARN apps/frameworks to move to that 
 API instead of these private unstable APIs.
 For now, I am okay keeping a private compat for the APIs changed in YARN-2698.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3341) Fix findbugs warning:BC_UNCONFIRMED_CAST at FSSchedulerNode.reserveResource

zhihai xu created YARN-3341:
---

 Summary: Fix findbugs warning:BC_UNCONFIRMED_CAST at 
FSSchedulerNode.reserveResource
 Key: YARN-3341
 URL: https://issues.apache.org/jira/browse/YARN-3341
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor


Fix findbugs warning:BC_UNCONFIRMED_CAST at FSSchedulerNode.reserveResource
The warning message is
{code}
Unchecked/unconfirmed cast from 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt
 to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt 
in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode.reserveResource(SchedulerApplicationAttempt,
 Priority, RMContainer)
{code}
The code which cause the warning is
{code}
this.reservedAppSchedulable = (FSAppAttempt) application;
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3343) TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate sometime fails in trunk

Xuan Gong created YARN-3343:
---

 Summary: TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate 
sometime fails in trunk
 Key: YARN-3343
 URL: https://issues.apache.org/jira/browse/YARN-3343
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Xuan Gong
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.


[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359460#comment-14359460
 ] 

Hadoop QA commented on YARN-3243:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704207/YARN-3243.5.patch
  against trunk revision b49c3a1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRM

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6945//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6945//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6945//console

This message is automatically generated.

 CapacityScheduler should pass headroom from parent to children to make sure 
 ParentQueue obey its capacity limits.
 -

 Key: YARN-3243
 URL: https://issues.apache.org/jira/browse/YARN-3243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, 
 YARN-3243.4.patch, YARN-3243.5.patch


 Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
 its capacity limits, for example:
 1) When allocating container of a parent queue, it will only check 
 parentQueue.usage  parentQueue.max. If leaf queue allocated a container.size 
  (parentQueue.max - parentQueue.usage), parent queue can excess its max 
 resource limit, as following example:
 {code}
 A  (usage=54, max=55)
/ \
   A1 A2 (usage=1, max=55)
 (usage=53, max=53)
 {code}
 Queue-A2 is able to allocate container since its usage  max, but if we do 
 that, A's usage can excess A.max.
 2) When doing continous reservation check, parent queue will only tell 
 children you need unreserve *some* resource, so that I will less than my 
 maximum resource, but it will not tell how many resource need to be 
 unreserved. This may lead to parent queue excesses configured maximum 
 capacity as well.
 With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
 *here is my proposal*:
 - ParentQueue will set its children's ResourceUsage.headroom, which means, 
 *maximum resource its children can allocate*.
 - ParentQueue will set its children's headroom to be (saying parent's name is 
 qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
 ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
 parent).
 - {{needToUnReserve}} is not necessary, instead, children can get how much 
 resource need to be unreserved to keep its parent's resource limit.
 - More over, with this, YARN-3026 will make a clear boundary between 
 LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3343) TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate sometime fails in trunk


 [ 
https://issues.apache.org/jira/browse/YARN-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3343:

Description: 
Error Message

test timed out after 3 milliseconds
Stacktrace

java.lang.Exception: test timed out after 3 milliseconds
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
at 
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
at java.net.InetAddress.getAllByName0(InetAddress.java:1246)
at java.net.InetAddress.getAllByName(InetAddress.java:1162)
at java.net.InetAddress.getAllByName(InetAddress.java:1098)
at java.net.InetAddress.getByName(InetAddress.java:1048)
at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:563)
at 
org.apache.hadoop.yarn.server.resourcemanager.NodesListManager.isValidNode(NodesListManager.java:147)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(ResourceTrackerService.java:367)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockNM.nodeHeartbeat(MockNM.java:178)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockNM.nodeHeartbeat(MockNM.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:206)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate(TestCapacitySchedulerNodeLabelUpdate.java:157)


 TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate sometime fails in trunk
 ---

 Key: YARN-3343
 URL: https://issues.apache.org/jira/browse/YARN-3343
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Xuan Gong
Priority: Minor

 Error Message
 test timed out after 3 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 3 milliseconds
   at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
   at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
   at 
 java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
   at java.net.InetAddress.getAllByName0(InetAddress.java:1246)
   at java.net.InetAddress.getAllByName(InetAddress.java:1162)
   at java.net.InetAddress.getAllByName(InetAddress.java:1098)
   at java.net.InetAddress.getByName(InetAddress.java:1048)
   at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:563)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.NodesListManager.isValidNode(NodesListManager.java:147)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(ResourceTrackerService.java:367)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockNM.nodeHeartbeat(MockNM.java:178)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockNM.nodeHeartbeat(MockNM.java:136)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:206)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate(TestCapacitySchedulerNodeLabelUpdate.java:157)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications


[ 
https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359367#comment-14359367
 ] 

Hudson commented on YARN-3154:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7312 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7312/])
YARN-3154. Added additional APIs in LogAggregationContext to avoid aggregating 
running logs of application when rolling is enabled. Contributed by Xuan Gong. 
(vinodkv: rev 863079bb874ba77918ca1c0741eae10e245995c8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LogAggregationContextPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationContext.java


 Should not upload partial logs for MR jobs or other short-running' 
 applications 
 -

 Key: YARN-3154
 URL: https://issues.apache.org/jira/browse/YARN-3154
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, 
 YARN-3154.4.patch


 Currently, if we are running a MR job, and we do not set the log interval 
 properly, we will have their partial logs uploaded and then removed from the 
 local filesystem which is not right.
 We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer

2015-03-12 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359364#comment-14359364
 ] 

Li Lu commented on YARN-3034:
-

Hi [~Naganarasimha], thanks for updating the patch! Just a quick question: are 
there any special rationales behind putting RMTimelineCollector into a separate 
place in RM, instead of putting it into the aggregator (or, collector) 
directory in our own timeline server module? (Especially when we're introducing 
dependency to the RM anyways... ) Thanks! 

 [Aggregator wireup] Implement RM starting its ATS writer
 

 Key: YARN-3034
 URL: https://issues.apache.org/jira/browse/YARN-3034
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch


 Per design in YARN-2928, implement resource managers starting their own ATS 
 writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records


 [ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-3267:
---
Attachment: YARN-3267.5.patch

 Timelineserver applies the ACL rules after applying the limit on the number 
 of records
 --

 Key: YARN-3267
 URL: https://issues.apache.org/jira/browse/YARN-3267
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Prakash Ramachandran
Assignee: Chang Li
 Attachments: YARN-3267.3.patch, YARN-3267.4.patch, YARN-3267.5.patch, 
 YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, 
 YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch


 While fetching the entities from timelineserver, the limit is applied on the 
 entities to be fetched from leveldb, the ACL filters are applied after this 
 (TimelineDataManager.java::getEntities). 
 this could mean that even if there are entities available which match the 
 query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1942) Many of ConverterUtils methods need to have public interfaces


[ 
https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359438#comment-14359438
 ] 

Hadoop QA commented on YARN-1942:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704246/YARN-1942.2.patch
  against trunk revision 863079b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

  {color:red}-1 javac{color}.  The applied patch generated 1372 javac 
compiler warnings (more than the trunk's current 1152 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common:

  
org.apache.hadoop.yarn.logaggregation.TestAggregatedLogDeletionService

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6946//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6946//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6946//console

This message is automatically generated.

 Many of ConverterUtils methods need to have public interfaces
 -

 Key: YARN-1942
 URL: https://issues.apache.org/jira/browse/YARN-1942
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.4.0
Reporter: Thomas Graves
Assignee: Wangda Tan
Priority: Critical
 Attachments: YARN-1942.1.patch, YARN-1942.2.patch


 ConverterUtils has a bunch of functions that are useful to application 
 masters.   It should either be made public or we make some of the utilities 
 in it public or we provide other external apis for application masters to 
 use.  Note that distributedshell and MR are both using these interfaces. 
 For instance the main use case I see right now is for getting the application 
 attempt id within the appmaster:
 String containerIdStr =
   System.getenv(Environment.CONTAINER_ID.name());
 ConverterUtils.toContainerId
 ContainerId containerId = ConverterUtils.toContainerId(containerIdStr);
   ApplicationAttemptId applicationAttemptId =
   containerId.getApplicationAttemptId();
 I don't see any other way for the application master to get this information. 
  If there is please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

2015-03-12 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359441#comment-14359441
]

Zhijie Shen commented on YARN-3039:
---

bq. When AM get launched, NM auxiliary service will add a new aggregator
service to aggregatorCollection (per Node) for necessary binding work.
aggregatorCollection also has a client for AggregatorNodeManagerProtocol to
notify NM on new app aggregator registered and detailed address.

Hi Junping, thanks for creating the new patch. Sorry for raising the question
in late, but I'd like to think it out loudly about the first step. Nowadays,
app-level aggregator is started by the callback handler listening to the
container start even of NM. Given we are going to support stand-alone and
container mode, this approach may not work. As we're going to have IPC channel
between aggregator and NM, should we use an IPC call to invoke adding one
app-level aggregator.

So the protocol is that NM sends a request to the aggregator collection to
start a app-level aggregator, and collection responds with the aggregator
address. However, in this case, it may not be AggregatorNodemanagerProtocol,
but NodemanagerAggregatorProtocol instead. The benefit is to uniform the way of
starting app-level aggregator inside node-level aggregator (at lease it seem
that we need to something similar in YARN-3033), and further reducing the
dependency/assumption on aux service.

[~vinodkv] and [~sjlee0] how do you think about it?

[Aggregator wireup] Implement ATS app-appgregator service discovery
---

Per design in YARN-2928, implement ATS writer service discovery. This is
essential for off-node clients to send writes to the right ATS writer. This
should also handle the case of AM failures.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (YARN-2871) TestRMRestart#testRMRestartGetApplicationList sometime fails in trunk


 [ 
https://issues.apache.org/jira/browse/YARN-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reopened YARN-2871:
-

I saw the testcase failure with the same issue.
rMAppManager.logApplicationSummary(
isA(org.apache.hadoop.yarn.api.records.ApplicationId)
);
Wanted 3 times:
- at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:969)
But was 2 times:
- at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:66)
Stacktrace

org.mockito.exceptions.verification.TooLittleActualInvocations: 
rMAppManager.logApplicationSummary(
isA(org.apache.hadoop.yarn.api.records.ApplicationId)
);
Wanted 3 times:
- at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:969)
But was 2 times:
- at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:66)

at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:969)


 TestRMRestart#testRMRestartGetApplicationList sometime fails in trunk
 -

 Key: YARN-2871
 URL: https://issues.apache.org/jira/browse/YARN-2871
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor

 From trunk build #746 (https://builds.apache.org/job/Hadoop-Yarn-trunk/746):
 {code}
 Failed tests:
   TestRMRestart.testRMRestartGetApplicationList:957
 rMAppManager.logApplicationSummary(
 isA(org.apache.hadoop.yarn.api.records.ApplicationId)
 );
 Wanted 3 times:
 - at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:957)
 But was 2 times:
 - at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:66)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3342) TestAllocationFileLoaderService.testGetAllocationFileFromClasspath sometime fails in trunk

Xuan Gong created YARN-3342:
---

 Summary: 
TestAllocationFileLoaderService.testGetAllocationFileFromClasspath sometime 
fails in trunk
 Key: YARN-3342
 URL: https://issues.apache.org/jira/browse/YARN-3342
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Xuan Gong
Priority: Minor


org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService.testGetAllocationFileFromClasspath

Failing for the past 1 build (Since Failed#6924 )
Took 25 ms.
Stacktrace

java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService.testGetAllocationFileFromClasspath(TestAllocationFileLoaderService.java:66)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration

2015-03-12 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359356#comment-14359356
 ] 

Jian He commented on YARN-3136:
---

looks good to me. test failures not related?

 getTransferredContainers can be a bottleneck during AM registration
 ---

 Key: YARN-3136
 URL: https://issues.apache.org/jira/browse/YARN-3136
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Sunil G
 Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 
 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, 
 0006-YARN-3136.patch


 While examining RM stack traces on a busy cluster I noticed a pattern of AMs 
 stuck waiting for the scheduler lock trying to call getTransferredContainers. 
  The scheduler lock is highly contended, especially on a large cluster with 
 many nodes heartbeating, and it would be nice if we could find a way to 
 eliminate the need to grab this lock during this call.  We've already done 
 similar work during AM allocate calls to make sure they don't needlessly grab 
 the scheduler lock, and it would be good to do so here as well, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1942) Many of ConverterUtils methods need to have public interfaces


 [ 
https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1942:
-
Attachment: YARN-1942.2.patch

Attached ver.2.

Offline discussed with Vinod, moved most methods of ConverterUtils to *Id, and 
marked the previous method to deprecated.

This also merged changes in YARN-3340.

 Many of ConverterUtils methods need to have public interfaces
 -

 Key: YARN-1942
 URL: https://issues.apache.org/jira/browse/YARN-1942
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.4.0
Reporter: Thomas Graves
Assignee: Wangda Tan
Priority: Critical
 Attachments: YARN-1942.1.patch, YARN-1942.2.patch


 ConverterUtils has a bunch of functions that are useful to application 
 masters.   It should either be made public or we make some of the utilities 
 in it public or we provide other external apis for application masters to 
 use.  Note that distributedshell and MR are both using these interfaces. 
 For instance the main use case I see right now is for getting the application 
 attempt id within the appmaster:
 String containerIdStr =
   System.getenv(Environment.CONTAINER_ID.name());
 ConverterUtils.toContainerId
 ContainerId containerId = ConverterUtils.toContainerId(containerIdStr);
   ApplicationAttemptId applicationAttemptId =
   containerId.getApplicationAttemptId();
 I don't see any other way for the application master to get this information. 
  If there is please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer


[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359224#comment-14359224
 ] 

Hadoop QA commented on YARN-3336:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704208/YARN-3336.001.patch
  against trunk revision 06ce1d9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
  
org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestRMRestTestTests
org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorizatTestTests
org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheTests
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMReTests
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCRespoTests
org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourcTests
org.apache.hadoop.yarn.server.resourcemanager.TestResourceManTests
org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6941//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6941//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6941//console

This message is automatically generated.

 FileSystem memory leak in DelegationTokenRenewer
 

 Key: YARN-3336
 URL: https://issues.apache.org/jira/browse/YARN-3336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3336.000.patch, YARN-3336.001.patch


 FileSystem memory leak in DelegationTokenRenewer.
 Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
 FileSystem entry will be added to  FileSystem#CACHE which will never be 
 garbage collected.
 This is the implementation of obtainSystemTokensForUser:
 {code}
   protected Token?[] obtainSystemTokensForUser(String user,
   final Credentials credentials) throws IOException, InterruptedException 
 {
 // Get new hdfs tokens on behalf of this user
 UserGroupInformation proxyUser =
 UserGroupInformation.createProxyUser(user,
   UserGroupInformation.getLoginUser());
 Token?[] newTokens =

[jira] [Commented] (YARN-3330) Implement a protobuf compatibility checker to check if a patch breaks the compatibility with existing client and internal protocols

2015-03-12 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359254#comment-14359254
 ] 

Li Lu commented on YARN-3330:
-

Hi [~aw], thanks for the suggestion! I will definitely bump this up after I 
finished the two issues raised above. 

 Implement a protobuf compatibility checker to check if a patch breaks the 
 compatibility with existing client and internal protocols
 ---

 Key: YARN-3330
 URL: https://issues.apache.org/jira/browse/YARN-3330
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: pdiff_patch.py


 Per YARN-3292, we may want to start YARN rolling upgrade test compatibility 
 verification tool by a simple script to check protobuf compatibility. The 
 script may work on incoming patch files, check if there are any changes to 
 protobuf files, and report any potentially incompatible changes (line 
 removals, etc,.). We may want the tool to be conservative: it may report 
 false positives, but we should minimize its chance to have false negatives. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.


[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359230#comment-14359230
 ] 

Hadoop QA commented on YARN-3243:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704207/YARN-3243.5.patch
  against trunk revision 06ce1d9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
  
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStorePerf
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService
  
org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6942//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6942//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6942//console

This message is automatically generated.

 CapacityScheduler should pass headroom from parent to children to make sure 
 ParentQueue obey its capacity limits.
 -

 Key: YARN-3243
 URL: https://issues.apache.org/jira/browse/YARN-3243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, 
 YARN-3243.4.patch, YARN-3243.5.patch


 Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
 its capacity limits, for example:
 1) When allocating container of a parent queue, it will only check 
 parentQueue.usage  parentQueue.max. If leaf queue allocated a container.size 
  (parentQueue.max - parentQueue.usage), parent queue can excess its max 
 resource limit, as following example:
 {code}
 A  (usage=54, max=55)
/ \
   A1 A2 (usage=1, max=55)
 (usage=53, max=53)
 {code}
 Queue-A2 is able to allocate container since its usage  max, but if we do 
 that, A's usage can excess A.max.
 2) When doing continous reservation check, parent queue will only tell 
 children you need unreserve *some* resource, so that I will less than my 
 maximum resource, but it will not tell how many resource need to be 
 unreserved. This may lead to parent queue excesses configured maximum 
 capacity as well.
 With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
 *here is my proposal*:
 - ParentQueue will set its children's ResourceUsage.headroom, which means, 
 *maximum resource its children can allocate*.
 - ParentQueue will set its children's headroom to be (saying parent's name is 
 qA):

[jira] [Commented] (YARN-3337) Provide YARN chaos monkey

2015-03-12 Thread Markus Weimer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359290#comment-14359290
 ] 

Markus Weimer commented on YARN-3337:
-

Excellent idea! We do this over in REEF within the AM as well as the containers 
(via reef-poison), but that doesn't allow us to create failures at a low enough 
level. Having the RM kill our containers would be awesome for tests.

 Provide YARN chaos monkey
 -

 Key: YARN-3337
 URL: https://issues.apache.org/jira/browse/YARN-3337
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: test
Affects Versions: 2.7.0
Reporter: Steve Loughran

 To test failure resilience today you either need custom scripts or implement 
 Chaos Monkey-like logic in your application (SLIDER-202). 
 Killing AMs and containers on a schedule  probability is the core activity 
 here, one that could be handled by a CLI App/client lib that does this. 
 # entry point to have a startup delay before acting
 # frequency of chaos wakeup/polling
 # probability to AM failure generation (0-100)
 # probability of non-AM container kill
 # future: other operations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

[
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359239#comment-14359239
]

Junping Du commented on YARN-3225:
--

Nice discussion, [~devaraj.k]!
bq. If there are some long running containers in the NM and RMAdmin CLI gets
terminated before issuing forceful decommission then the NM could in the
“DECOMMISSIONING” state irrespective of timeout. AM I missing anything?
If users terminate the blocking/pending CLI, then it only means they want to
track timeout themselves or they want to adjust timeout value ahead or delay.
In this case, the decommissioning nodes either get decommissioned when app
finished (a clean quit), or wait user to decommission again later. We can add
some alert messages later if some nodes are in decommissioning stage for really
long time. The basic idea is we agree to not tracking timeout in RM side for
each individual nodes.

bq. If we don't pass timeout to RM then how are we going to achieve this? You
mean this will be handled later, once the basic things are done.
You are right that timeout value could be useful to pass down to AM for
preemption containers (however, no any effect on terminating nodes). Let's keep
it here and we can leverage it later when we are notifying AM.

bq. For making timeout longer, if we use new CLI then there is a chance of
forceful decommission happening with the old CLI timeout. Is there any
constraint like this needs to be done with the same CLI?
Not quite understanding the case described here. Users should terminate the
current CLI and launch a new CLI for adjusted timeout values if they want to
wait shorter or longer. If it already passed previous timeout values, current
CLI should quit already with all nodes decommissioned. Am I missing something
here?

New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
---

Key: YARN-3225
URL: https://issues.apache.org/jira/browse/YARN-3225
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K
Attachments: YARN-3225.patch, YARN-914.patch

New CLI (or existing CLI with parameters) should put each node on
decommission list to decommissioning status and track timeout to terminate
the nodes that haven't get finished.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.


[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359275#comment-14359275
 ] 

Wangda Tan commented on YARN-3243:
--

Seems like Jenkins issue, rekicked.

 CapacityScheduler should pass headroom from parent to children to make sure 
 ParentQueue obey its capacity limits.
 -

 Key: YARN-3243
 URL: https://issues.apache.org/jira/browse/YARN-3243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, 
 YARN-3243.4.patch, YARN-3243.5.patch


 Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
 its capacity limits, for example:
 1) When allocating container of a parent queue, it will only check 
 parentQueue.usage  parentQueue.max. If leaf queue allocated a container.size 
  (parentQueue.max - parentQueue.usage), parent queue can excess its max 
 resource limit, as following example:
 {code}
 A  (usage=54, max=55)
/ \
   A1 A2 (usage=1, max=55)
 (usage=53, max=53)
 {code}
 Queue-A2 is able to allocate container since its usage  max, but if we do 
 that, A's usage can excess A.max.
 2) When doing continous reservation check, parent queue will only tell 
 children you need unreserve *some* resource, so that I will less than my 
 maximum resource, but it will not tell how many resource need to be 
 unreserved. This may lead to parent queue excesses configured maximum 
 capacity as well.
 With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
 *here is my proposal*:
 - ParentQueue will set its children's ResourceUsage.headroom, which means, 
 *maximum resource its children can allocate*.
 - ParentQueue will set its children's headroom to be (saying parent's name is 
 qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
 ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
 parent).
 - {{needToUnReserve}} is not necessary, instead, children can get how much 
 resource need to be unreserved to keep its parent's resource limit.
 - More over, with this, YARN-3026 will make a clear boundary between 
 LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications


[ 
https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359332#comment-14359332
 ] 

Vinod Kumar Vavilapalli commented on YARN-3154:
---

bq. The testcase failures are not related. And They can pass locally.
Can you point any JIRAs tracking them.

The patch looks good, checking it in.

 Should not upload partial logs for MR jobs or other short-running' 
 applications 
 -

 Key: YARN-3154
 URL: https://issues.apache.org/jira/browse/YARN-3154
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, 
 YARN-3154.4.patch


 Currently, if we are running a MR job, and we do not set the log interval 
 properly, we will have their partial logs uploaded and then removed from the 
 local filesystem which is not right.
 We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-1942) Many of ConverterUtils methods need to have public interfaces


 [ 
https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-1942:


Assignee: Wangda Tan

 Many of ConverterUtils methods need to have public interfaces
 -

 Key: YARN-1942
 URL: https://issues.apache.org/jira/browse/YARN-1942
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.4.0
Reporter: Thomas Graves
Assignee: Wangda Tan
Priority: Critical

 ConverterUtils has a bunch of functions that are useful to application 
 masters.   It should either be made public or we make some of the utilities 
 in it public or we provide other external apis for application masters to 
 use.  Note that distributedshell and MR are both using these interfaces. 
 For instance the main use case I see right now is for getting the application 
 attempt id within the appmaster:
 String containerIdStr =
   System.getenv(Environment.CONTAINER_ID.name());
 ConverterUtils.toContainerId
 ContainerId containerId = ConverterUtils.toContainerId(containerIdStr);
   ApplicationAttemptId applicationAttemptId =
   containerId.getApplicationAttemptId();
 I don't see any other way for the application master to get this information. 
  If there is please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1942) Many of ConverterUtils methods need to have public interfaces


 [ 
https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1942:
-
Target Version/s: 2.7.0  (was: 3.0.0, 2.4.1)

 Many of ConverterUtils methods need to have public interfaces
 -

 Key: YARN-1942
 URL: https://issues.apache.org/jira/browse/YARN-1942
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.4.0
Reporter: Thomas Graves
Assignee: Wangda Tan
Priority: Critical

 ConverterUtils has a bunch of functions that are useful to application 
 masters.   It should either be made public or we make some of the utilities 
 in it public or we provide other external apis for application masters to 
 use.  Note that distributedshell and MR are both using these interfaces. 
 For instance the main use case I see right now is for getting the application 
 attempt id within the appmaster:
 String containerIdStr =
   System.getenv(Environment.CONTAINER_ID.name());
 ConverterUtils.toContainerId
 ContainerId containerId = ConverterUtils.toContainerId(containerIdStr);
   ApplicationAttemptId applicationAttemptId =
   containerId.getApplicationAttemptId();
 I don't see any other way for the application master to get this information. 
  If there is please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1942) Many of ConverterUtils methods need to have public interfaces


[ 
https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359091#comment-14359091
 ] 

Wangda Tan commented on YARN-1942:
--

Working on this, will post a patch soon.

 Many of ConverterUtils methods need to have public interfaces
 -

 Key: YARN-1942
 URL: https://issues.apache.org/jira/browse/YARN-1942
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.4.0
Reporter: Thomas Graves
Assignee: Wangda Tan
Priority: Critical

 ConverterUtils has a bunch of functions that are useful to application 
 masters.   It should either be made public or we make some of the utilities 
 in it public or we provide other external apis for application masters to 
 use.  Note that distributedshell and MR are both using these interfaces. 
 For instance the main use case I see right now is for getting the application 
 attempt id within the appmaster:
 String containerIdStr =
   System.getenv(Environment.CONTAINER_ID.name());
 ConverterUtils.toContainerId
 ContainerId containerId = ConverterUtils.toContainerId(containerIdStr);
   ApplicationAttemptId applicationAttemptId =
   containerId.getApplicationAttemptId();
 I don't see any other way for the application master to get this information. 
  If there is please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3340) Mark setters to be @Public for ApplicationId/ApplicationAttemptId/ContainerId.

Wangda Tan created YARN-3340:


 Summary: Mark setters to be @Public for 
ApplicationId/ApplicationAttemptId/ContainerId.
 Key: YARN-3340
 URL: https://issues.apache.org/jira/browse/YARN-3340
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker


Currently, setters of ApplicaitonId/ApplicationAttemptId/ContainerId are all 
private, that's not correct -- user's applications need to set such ids to do 
query status / submit application, etc.

We need mark such setters to be public avoiding downstream applications 
encounters compilation error when changes made on such setters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3340) Mark setters to be @Public for ApplicationId/ApplicationAttemptId/ContainerId.


[ 
https://issues.apache.org/jira/browse/YARN-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359188#comment-14359188
 ] 

Hadoop QA commented on YARN-3340:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704212/YARN-3340.1.patch
  against trunk revision 6dae6d1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6944//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6944//console

This message is automatically generated.

 Mark setters to be @Public for ApplicationId/ApplicationAttemptId/ContainerId.
 --

 Key: YARN-3340
 URL: https://issues.apache.org/jira/browse/YARN-3340
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3340.1.patch


 Currently, setters of ApplicaitonId/ApplicationAttemptId/ContainerId are all 
 private, that's not correct -- user's applications need to set such ids to do 
 query status / submit application, etc.
 We need mark such setters to be public avoiding downstream applications 
 encounters compilation error when changes made on such setters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications


[ 
https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359199#comment-14359199
 ] 

Xuan Gong commented on YARN-3154:
-

The testcase failures are not related. And They can pass locally.

 Should not upload partial logs for MR jobs or other short-running' 
 applications 
 -

 Key: YARN-3154
 URL: https://issues.apache.org/jira/browse/YARN-3154
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, 
 YARN-3154.4.patch


 Currently, if we are running a MR job, and we do not set the log interval 
 properly, we will have their partial logs uploaded and then removed from the 
 local filesystem which is not right.
 We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.


[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359089#comment-14359089
 ] 

Wangda Tan commented on YARN-3243:
--

Hi [~jianhe],
I've done all updates, thanks for your comments!

bq. CapacityScheduler, why following code is moved ?
This is because CS will call updateClusterResource after the a new node is 
added, which will set resourceLimits of queues. To set queue's resourceLimits, 
it needs to how much resource in each partition, so it need to call 
labelManager.activeNode before setting queue's resourceLimits.

Attaching new patch (ver.5)

 CapacityScheduler should pass headroom from parent to children to make sure 
 ParentQueue obey its capacity limits.
 -

 Key: YARN-3243
 URL: https://issues.apache.org/jira/browse/YARN-3243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, 
 YARN-3243.4.patch, YARN-3243.5.patch


 Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
 its capacity limits, for example:
 1) When allocating container of a parent queue, it will only check 
 parentQueue.usage  parentQueue.max. If leaf queue allocated a container.size 
  (parentQueue.max - parentQueue.usage), parent queue can excess its max 
 resource limit, as following example:
 {code}
 A  (usage=54, max=55)
/ \
   A1 A2 (usage=1, max=55)
 (usage=53, max=53)
 {code}
 Queue-A2 is able to allocate container since its usage  max, but if we do 
 that, A's usage can excess A.max.
 2) When doing continous reservation check, parent queue will only tell 
 children you need unreserve *some* resource, so that I will less than my 
 maximum resource, but it will not tell how many resource need to be 
 unreserved. This may lead to parent queue excesses configured maximum 
 capacity as well.
 With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
 *here is my proposal*:
 - ParentQueue will set its children's ResourceUsage.headroom, which means, 
 *maximum resource its children can allocate*.
 - ParentQueue will set its children's headroom to be (saying parent's name is 
 qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
 ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
 parent).
 - {{needToUnReserve}} is not necessary, instead, children can get how much 
 resource need to be unreserved to keep its parent's resource limit.
 - More over, with this, YARN-3026 will make a clear boundary between 
 LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1942) Many of ConverterUtils methods need to have public interfaces


[ 
https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359169#comment-14359169
 ] 

Hadoop QA commented on YARN-1942:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704209/YARN-1942.1.patch
  against trunk revision 06ce1d9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6943//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6943//console

This message is automatically generated.

 Many of ConverterUtils methods need to have public interfaces
 -

 Key: YARN-1942
 URL: https://issues.apache.org/jira/browse/YARN-1942
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.4.0
Reporter: Thomas Graves
Assignee: Wangda Tan
Priority: Critical
 Attachments: YARN-1942.1.patch


 ConverterUtils has a bunch of functions that are useful to application 
 masters.   It should either be made public or we make some of the utilities 
 in it public or we provide other external apis for application masters to 
 use.  Note that distributedshell and MR are both using these interfaces. 
 For instance the main use case I see right now is for getting the application 
 attempt id within the appmaster:
 String containerIdStr =
   System.getenv(Environment.CONTAINER_ID.name());
 ConverterUtils.toContainerId
 ContainerId containerId = ConverterUtils.toContainerId(containerIdStr);
   ApplicationAttemptId applicationAttemptId =
   containerId.getApplicationAttemptId();
 I don't see any other way for the application master to get this information. 
  If there is please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation

2015-03-12 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359186#comment-14359186
 ] 

Jian He commented on YARN-3305:
---

[~rohithsharma], thanks for working on this ! 
I think we can call normalize the am request in RMAppManager after 
validateAndCreateResourceRequest to make sure the am request stored in RMAppIml 
is also correct. 

 AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is 
 less than minimumAllocation
 

 Key: YARN-3305
 URL: https://issues.apache.org/jira/browse/YARN-3305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Attachments: 0001-YARN-3305.patch


 For given any ResourceRequest, {{CS#allocate}} normalizes request to 
 minimumAllocation if requested memory is less than minimumAllocation.
 But AM-used resource is updated with actual ResourceRequest made by user. 
 This results in AM container allocation more than Max ApplicationMaster 
 Resource.
 This is because AM-Used is updated with actual ResourceRequest made by user 
 while activating the applications. But during allocation of container, 
 ResourceRequest is normalized to minimumAllocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer


[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359181#comment-14359181
 ] 

zhihai xu commented on YARN-3336:
-

The findbugs warning is not related to my change.
I just created YARN-3341 to fix this findbugs warning.
The TestRM failure is also not related to my change, which is a timeout failure.
It passed at my local latest build with the following message.
{code}
---
 T E S T S
---
Running org.apache.hadoop.yarn.server.resourcemanager.TestRM
Tests run: 22, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 166.002 sec - 
in org.apache.hadoop.yarn.server.resourcemanager.TestRM
Results :
Tests run: 22, Failures: 0, Errors: 0, Skipped: 0
{code}

 FileSystem memory leak in DelegationTokenRenewer
 

 Key: YARN-3336
 URL: https://issues.apache.org/jira/browse/YARN-3336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3336.000.patch, YARN-3336.001.patch


 FileSystem memory leak in DelegationTokenRenewer.
 Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
 FileSystem entry will be added to  FileSystem#CACHE which will never be 
 garbage collected.
 This is the implementation of obtainSystemTokensForUser:
 {code}
   protected Token?[] obtainSystemTokensForUser(String user,
   final Credentials credentials) throws IOException, InterruptedException 
 {
 // Get new hdfs tokens on behalf of this user
 UserGroupInformation proxyUser =
 UserGroupInformation.createProxyUser(user,
   UserGroupInformation.getLoginUser());
 Token?[] newTokens =
 proxyUser.doAs(new PrivilegedExceptionActionToken?[]() {
   @Override
   public Token?[] run() throws Exception {
 return FileSystem.get(getConfig()).addDelegationTokens(
   UserGroupInformation.getLoginUser().getUserName(), credentials);
   }
 });
 return newTokens;
   }
 {code}
 The memory leak happened when FileSystem.get(getConfig()) is called with a 
 new proxy user.
 Because createProxyUser will always create a new Subject.
 The calling sequence is 
 FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), 
 conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, 
 conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf)
 {code}
 public static UserGroupInformation createProxyUser(String user,
   UserGroupInformation realUser) {
 if (user == null || user.isEmpty()) {
   throw new IllegalArgumentException(Null user);
 }
 if (realUser == null) {
   throw new IllegalArgumentException(Null real user);
 }
 Subject subject = new Subject();
 SetPrincipal principals = subject.getPrincipals();
 principals.add(new User(user));
 principals.add(new RealUser(realUser));
 UserGroupInformation result =new UserGroupInformation(subject);
 result.setAuthenticationMethod(AuthenticationMethod.PROXY);
 return result;
   }
 {code}
 FileSystem#Cache#Key.equals will compare the ugi
 {code}
   Key(URI uri, Configuration conf, long unique) throws IOException {
 scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase();
 authority = 
 uri.getAuthority()==null?:uri.getAuthority().toLowerCase();
 this.unique = unique;
 this.ugi = UserGroupInformation.getCurrentUser();
   }
   public boolean equals(Object obj) {
 if (obj == this) {
   return true;
 }
 if (obj != null  obj instanceof Key) {
   Key that = (Key)obj;
   return isEqual(this.scheme, that.scheme)
   isEqual(this.authority, that.authority)
   isEqual(this.ugi, that.ugi)
   (this.unique == that.unique);
 }
 return false;
   }
 {code}
 UserGroupInformation.equals will compare subject by reference.
 {code}
   public boolean equals(Object o) {
 if (o == this) {
   return true;
 } else if (o == null || getClass() != o.getClass()) {
   return false;
 } else {
   return subject == ((UserGroupInformation) o).subject;
 }
   }
 {code}
 So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
 are called, a new FileSystem will be created and a new entry will be added to 
 FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer


 [ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3336:

Attachment: (was: YARN-3336.001.patch)

 FileSystem memory leak in DelegationTokenRenewer
 

 Key: YARN-3336
 URL: https://issues.apache.org/jira/browse/YARN-3336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3336.000.patch, YARN-3336.001.patch


 FileSystem memory leak in DelegationTokenRenewer.
 Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
 FileSystem entry will be added to  FileSystem#CACHE which will never be 
 garbage collected.
 This is the implementation of obtainSystemTokensForUser:
 {code}
   protected Token?[] obtainSystemTokensForUser(String user,
   final Credentials credentials) throws IOException, InterruptedException 
 {
 // Get new hdfs tokens on behalf of this user
 UserGroupInformation proxyUser =
 UserGroupInformation.createProxyUser(user,
   UserGroupInformation.getLoginUser());
 Token?[] newTokens =
 proxyUser.doAs(new PrivilegedExceptionActionToken?[]() {
   @Override
   public Token?[] run() throws Exception {
 return FileSystem.get(getConfig()).addDelegationTokens(
   UserGroupInformation.getLoginUser().getUserName(), credentials);
   }
 });
 return newTokens;
   }
 {code}
 The memory leak happened when FileSystem.get(getConfig()) is called with a 
 new proxy user.
 Because createProxyUser will always create a new Subject.
 The calling sequence is 
 FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), 
 conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, 
 conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf)
 {code}
 public static UserGroupInformation createProxyUser(String user,
   UserGroupInformation realUser) {
 if (user == null || user.isEmpty()) {
   throw new IllegalArgumentException(Null user);
 }
 if (realUser == null) {
   throw new IllegalArgumentException(Null real user);
 }
 Subject subject = new Subject();
 SetPrincipal principals = subject.getPrincipals();
 principals.add(new User(user));
 principals.add(new RealUser(realUser));
 UserGroupInformation result =new UserGroupInformation(subject);
 result.setAuthenticationMethod(AuthenticationMethod.PROXY);
 return result;
   }
 {code}
 FileSystem#Cache#Key.equals will compare the ugi
 {code}
   Key(URI uri, Configuration conf, long unique) throws IOException {
 scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase();
 authority = 
 uri.getAuthority()==null?:uri.getAuthority().toLowerCase();
 this.unique = unique;
 this.ugi = UserGroupInformation.getCurrentUser();
   }
   public boolean equals(Object obj) {
 if (obj == this) {
   return true;
 }
 if (obj != null  obj instanceof Key) {
   Key that = (Key)obj;
   return isEqual(this.scheme, that.scheme)
   isEqual(this.authority, that.authority)
   isEqual(this.ugi, that.ugi)
   (this.unique == that.unique);
 }
 return false;
   }
 {code}
 UserGroupInformation.equals will compare subject by reference.
 {code}
   public boolean equals(Object o) {
 if (o == this) {
   return true;
 } else if (o == null || getClass() != o.getClass()) {
   return false;
 } else {
   return subject == ((UserGroupInformation) o).subject;
 }
   }
 {code}
 So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
 are called, a new FileSystem will be created and a new entry will be added to 
 FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1942) Many of ConverterUtils methods need to have public interfaces


 [ 
https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1942:
-
Attachment: YARN-1942.1.patch

Attached ver.1 for review, also removed 
{code}
-  public static MapString, String convertToString(
-  MapCharSequence, CharSequence env) {
{code}
It seems so YARN-related, and nobody is using it in Hadoop.

 Many of ConverterUtils methods need to have public interfaces
 -

 Key: YARN-1942
 URL: https://issues.apache.org/jira/browse/YARN-1942
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.4.0
Reporter: Thomas Graves
Assignee: Wangda Tan
Priority: Critical
 Attachments: YARN-1942.1.patch


 ConverterUtils has a bunch of functions that are useful to application 
 masters.   It should either be made public or we make some of the utilities 
 in it public or we provide other external apis for application masters to 
 use.  Note that distributedshell and MR are both using these interfaces. 
 For instance the main use case I see right now is for getting the application 
 attempt id within the appmaster:
 String containerIdStr =
   System.getenv(Environment.CONTAINER_ID.name());
 ConverterUtils.toContainerId
 ContainerId containerId = ConverterUtils.toContainerId(containerIdStr);
   ApplicationAttemptId applicationAttemptId =
   containerId.getApplicationAttemptId();
 I don't see any other way for the application master to get this information. 
  If there is please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2792) Have a public Test-only API for creating important records that ecosystem projects can depend on


[ 
https://issues.apache.org/jira/browse/YARN-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359119#comment-14359119
 ] 

Wangda Tan commented on YARN-2792:
--

Updated priority from blocker to major, this is no longer a blocker after 
YARN-1942 and YARN-3340.

 Have a public Test-only API for creating important records that ecosystem 
 projects can depend on
 

 Key: YARN-2792
 URL: https://issues.apache.org/jira/browse/YARN-2792
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli

 From YARN-2789,
 {quote}
 Sigh.
 Even though this is a private API, it will be used by downstream projects 
 for testing. It'll be useful for this to be re-instated, maybe with a 
 deprecated annotation, so that older versions of downstream projects can 
 build against Hadoop 2.6.
 I am inclined to have a separate test-only public util API that keeps 
 compatibility for tests. Rather than opening unwanted APIs up. I'll file a 
 separate ticket for this, we need all YARN apps/frameworks to move to that 
 API instead of these private unstable APIs.
 For now, I am okay keeping a private compat for the APIs changed in YARN-2698.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

[
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359158#comment-14359158
]

Hadoop QA commented on YARN-3267:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12704176/YARN-3267.4.patch
against trunk revision ff83ae7.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 6 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient

hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell

hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice:

org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
org.apache.hadoop.mapred.TestClusterMRNotification

The following test timeouts occurred in
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient

hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell

hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice:

org.apache.hadoop.mapred.TestClusterMapReduceTestCase

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/6940//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6940//console

This message is automatically generated.

Timelineserver applies the ACL rules after applying the limit on the number
of records
--

Key: YARN-3267
URL: https://issues.apache.org/jira/browse/YARN-3267
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Prakash Ramachandran
Assignee: Chang Li
Attachments: YARN-3267.3.patch, YARN-3267.4.patch,
YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch,
YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch

While fetching the entities from timelineserver, the limit is applied on the
entities to be fetched from leveldb, the ACL filters are applied after this
(TimelineDataManager.java::getEntities).
this could mean that even if there are entities available which match the
query criteria, we could end up not getting any results.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer


[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359174#comment-14359174
 ] 

Vinod Kumar Vavilapalli commented on YARN-3034:
---

How about we keep the same config {{system-metrics-publisher.enabled}} for 
enabling this functionality but a new config which lets us chose the version of 
YTS?

 [Aggregator wireup] Implement RM starting its ATS writer
 

 Key: YARN-3034
 URL: https://issues.apache.org/jira/browse/YARN-3034
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch


 Per design in YARN-2928, implement resource managers starting their own ATS 
 writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3341) Fix findbugs warning:BC_UNCONFIRMED_CAST at FSSchedulerNode.reserveResource


 [ 
https://issues.apache.org/jira/browse/YARN-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3341:

Labels: findbugs  (was: )

 Fix findbugs warning:BC_UNCONFIRMED_CAST at FSSchedulerNode.reserveResource
 ---

 Key: YARN-3341
 URL: https://issues.apache.org/jira/browse/YARN-3341
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
  Labels: findbugs

 Fix findbugs warning:BC_UNCONFIRMED_CAST at FSSchedulerNode.reserveResource
 The warning message is
 {code}
 Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt
  to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt 
 in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode.reserveResource(SchedulerApplicationAttempt,
  Priority, RMContainer)
 {code}
 The code which cause the warning is
 {code}
 this.reservedAppSchedulable = (FSAppAttempt) application;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer


 [ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3336:

Attachment: YARN-3336.001.patch

 FileSystem memory leak in DelegationTokenRenewer
 

 Key: YARN-3336
 URL: https://issues.apache.org/jira/browse/YARN-3336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3336.000.patch, YARN-3336.001.patch


 FileSystem memory leak in DelegationTokenRenewer.
 Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
 FileSystem entry will be added to  FileSystem#CACHE which will never be 
 garbage collected.
 This is the implementation of obtainSystemTokensForUser:
 {code}
   protected Token?[] obtainSystemTokensForUser(String user,
   final Credentials credentials) throws IOException, InterruptedException 
 {
 // Get new hdfs tokens on behalf of this user
 UserGroupInformation proxyUser =
 UserGroupInformation.createProxyUser(user,
   UserGroupInformation.getLoginUser());
 Token?[] newTokens =
 proxyUser.doAs(new PrivilegedExceptionActionToken?[]() {
   @Override
   public Token?[] run() throws Exception {
 return FileSystem.get(getConfig()).addDelegationTokens(
   UserGroupInformation.getLoginUser().getUserName(), credentials);
   }
 });
 return newTokens;
   }
 {code}
 The memory leak happened when FileSystem.get(getConfig()) is called with a 
 new proxy user.
 Because createProxyUser will always create a new Subject.
 The calling sequence is 
 FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), 
 conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, 
 conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf)
 {code}
 public static UserGroupInformation createProxyUser(String user,
   UserGroupInformation realUser) {
 if (user == null || user.isEmpty()) {
   throw new IllegalArgumentException(Null user);
 }
 if (realUser == null) {
   throw new IllegalArgumentException(Null real user);
 }
 Subject subject = new Subject();
 SetPrincipal principals = subject.getPrincipals();
 principals.add(new User(user));
 principals.add(new RealUser(realUser));
 UserGroupInformation result =new UserGroupInformation(subject);
 result.setAuthenticationMethod(AuthenticationMethod.PROXY);
 return result;
   }
 {code}
 FileSystem#Cache#Key.equals will compare the ugi
 {code}
   Key(URI uri, Configuration conf, long unique) throws IOException {
 scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase();
 authority = 
 uri.getAuthority()==null?:uri.getAuthority().toLowerCase();
 this.unique = unique;
 this.ugi = UserGroupInformation.getCurrentUser();
   }
   public boolean equals(Object obj) {
 if (obj == this) {
   return true;
 }
 if (obj != null  obj instanceof Key) {
   Key that = (Key)obj;
   return isEqual(this.scheme, that.scheme)
   isEqual(this.authority, that.authority)
   isEqual(this.ugi, that.ugi)
   (this.unique == that.unique);
 }
 return false;
   }
 {code}
 UserGroupInformation.equals will compare subject by reference.
 {code}
   public boolean equals(Object o) {
 if (o == this) {
   return true;
 } else if (o == null || getClass() != o.getClass()) {
   return false;
 } else {
   return subject == ((UserGroupInformation) o).subject;
 }
   }
 {code}
 So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
 are called, a new FileSystem will be created and a new entry will be added to 
 FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3340) Mark setters to be @Public for ApplicationId/ApplicationAttemptId/ContainerId.


 [ 
https://issues.apache.org/jira/browse/YARN-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3340:
-
Attachment: YARN-3340.1.patch

Attached ver.1 patch for review.

 Mark setters to be @Public for ApplicationId/ApplicationAttemptId/ContainerId.
 --

 Key: YARN-3340
 URL: https://issues.apache.org/jira/browse/YARN-3340
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3340.1.patch


 Currently, setters of ApplicaitonId/ApplicationAttemptId/ContainerId are all 
 private, that's not correct -- user's applications need to set such ids to do 
 query status / submit application, etc.
 We need mark such setters to be public avoiding downstream applications 
 encounters compilation error when changes made on such setters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery


[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359163#comment-14359163
 ] 

Junping Du commented on YARN-3039:
--

Thanks [~sjlee0] for review and many good comments! I addressed most of them in 
v5 patch and will post it soon.
bq. I see some methods are marked as Stable (e.g. 
AggregatorNodemanagerProtocol), but I feel that’s bit premature. Can we mark 
these still unstable or evolving? Note that at least at this point even the 
class names can change.
Nice catch! Let's mark it with evolving instead.

bq. While we’re on the API annotations, I notice that the annotations are on 
methods rather than classes themselves. I usually set them on the classes with 
the understanding that the entire class is unstable, for example. Which is a 
more common practice?
We could have an annotation in class level which is default publicity and 
stability for each method. However, each method could have its own annotation 
to override the class one. In most cases, the class level annotation is more 
public and stable than individual methods which is first-class contract with 
end users or other components (or they will have concerns to use it). Take an 
example, if we need to add a new API which is not stable yet to a protocol 
class marked with stable, we shouldn't regression the whole class from stable 
to evolving or unstable, but we can mark the new method as unstable or 
evolving. Make sense?

bq. I feel that the default for NM_AGGREGATOR_SERVICE_THREAD_COUNT doesn't have 
to be as high as 20 as the traffic will be pretty low; 5?
5 sounds good. Will update.

bq. Since createTimelineClient(ApplicationId) was introduced only on this 
branch, we should be able to just replace it instead of adding a new deprecated 
method, no?
I didn't check the history when we are adding 
createTimelineClient(ApplicationId). If so, we can remove it. For other 
createTimelineClient(), we can mark them as deprecated if it is showed up in 
previous releases because newInstance() is a more standard way of doing factory 
work.

bq. putObjects() Not sure if I understand the comment “timelineServiceAddress 
couldn’t have been initialized”; can’t putObjects() be called in a steady 
state? If so, shouldn’t we check if timelineServiceAddress is not null before 
proceeding to loop and wait for the value to come in? Otherwise, we would 
introduce a 1 second latency in every put call even in a steady state?
In steady state, there is no initialized delay becuase timelineServiceAddress 
is already there (in timelineClient). The cost here only happens for the first 
event when timelineClient start to post or after timelineServiceAddress get 
updated (for failure or other reasons). We design this to make sure 
TimelineClient can handle service discovery itself rather than letting caller 
to figure it out.


bq. maxRetries - retries might be a better variable name?
Updated.

bq. It might be good to create a small helper method for polling for the 
timelineServiceAddress value
You are right that we can always abstract some helper method to make logic more 
consisely. Update it with pollTimelineServiceAddress() method.

bq. Not sure if we need a while loop for needRetry; it either succeeds (at 
which point needRetry becomes false and you exit normally) or it doesn’t (in 
which case we go into the exception handling and we try only once to get the 
value). Basically I’m not sure whether this retry code is what you meant to do?
That's actually a bug here that try-catch should be inside while loop, so we 
can tolerant post failure for some retries times (the address could be stale 
and being updated by RM) within the while loop. Thanks for identifying this.

bq. I think it may be enough to make timelineServiceAddress volatile instead of 
making getter/setter synchronized.
Agree. Given only one thread could update today, so volatile should be safe 
enough.

bq. doPostingObject() has duplicate code with putObjects(); can we consider 
ways to eliminate code duplication? I know it calls different methods deep 
inside the implementation, but there should be a way to reduce code duplication.
Agree. In new patch (v5), I will abstract all common logic in some helper 
methods. 

bq. typo: ILLEAGAL_NUMBER_MESSAGE - ILLEGAL_NUMBER_MESSAGE, Context.java, We 
might want to update the comment (or method names) a little bit. 
NodeManager.java, removeRegisteredAggregators() - removeRegisteredAggregator() 
(should be singular)
Updated for these comments.

bq. We need removeKnownAggregator() as well; otherwise we’d blow the NM memory. 
Perhaps what we need is more like setKnownAggregators() instead of add? 
Essentially NM would need to replace its knowledge of the known aggregators 
every time RM updates it via heartbeat, right?
I was thinking some shortcut for registeredAggegators goes to knownAggregators 
directly so some local aggregators don't

[jira] [Updated] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery


 [ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3039:
-
Attachment: YARN-3039-v5.patch

Update v5 patch with incorporating [~sjlee0]'s comments.

 [Aggregator wireup] Implement ATS app-appgregator service discovery
 ---

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, 
 YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, 
 YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer


[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359656#comment-14359656
 ] 

zhihai xu commented on YARN-3336:
-

Hi [~cnauroth], I implemented a unit test in the new patch YARN-3336.002.patch.
Without the fix, you can see the test failure with the following message:
{code}
java.lang.AssertionError: expected:1 but was:4
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer.testFSLeakInObtainSystemTokensForUser(TestDelegationTokenRenewer.java:1041)
{code}
I called obtainSystemTokensForUser three times, without the fix, the FileSystem 
Cache Size will increase to 4 from 1.
The unit test proved this memory leak issue.


 FileSystem memory leak in DelegationTokenRenewer
 

 Key: YARN-3336
 URL: https://issues.apache.org/jira/browse/YARN-3336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3336.000.patch, YARN-3336.001.patch, 
 YARN-3336.002.patch


 FileSystem memory leak in DelegationTokenRenewer.
 Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
 FileSystem entry will be added to  FileSystem#CACHE which will never be 
 garbage collected.
 This is the implementation of obtainSystemTokensForUser:
 {code}
   protected Token?[] obtainSystemTokensForUser(String user,
   final Credentials credentials) throws IOException, InterruptedException 
 {
 // Get new hdfs tokens on behalf of this user
 UserGroupInformation proxyUser =
 UserGroupInformation.createProxyUser(user,
   UserGroupInformation.getLoginUser());
 Token?[] newTokens =
 proxyUser.doAs(new PrivilegedExceptionActionToken?[]() {
   @Override
   public Token?[] run() throws Exception {
 return FileSystem.get(getConfig()).addDelegationTokens(
   UserGroupInformation.getLoginUser().getUserName(), credentials);
   }
 });
 return newTokens;
   }
 {code}
 The memory leak happened when FileSystem.get(getConfig()) is called with a 
 new proxy user.
 Because createProxyUser will always create a new Subject.
 The calling sequence is 
 FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), 
 conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, 
 conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf)
 {code}
 public static UserGroupInformation createProxyUser(String user,
   UserGroupInformation realUser) {
 if (user == null || user.isEmpty()) {
   throw new IllegalArgumentException(Null user);
 }
 if (realUser == null) {
   throw new IllegalArgumentException(Null real user);
 }
 Subject subject = new Subject();
 SetPrincipal principals = subject.getPrincipals();
 principals.add(new User(user));
 principals.add(new RealUser(realUser));
 UserGroupInformation result =new UserGroupInformation(subject);
 result.setAuthenticationMethod(AuthenticationMethod.PROXY);
 return result;
   }
 {code}
 FileSystem#Cache#Key.equals will compare the ugi
 {code}
   Key(URI uri, Configuration conf, long unique) throws IOException {
 scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase();
 authority = 
 uri.getAuthority()==null?:uri.getAuthority().toLowerCase();
 this.unique = unique;
 this.ugi = UserGroupInformation.getCurrentUser();
   }
   public boolean equals(Object obj) {
 if (obj == this) {
   return true;
 }
 if (obj != null  obj instanceof Key) {
   Key that = (Key)obj;
   return isEqual(this.scheme, that.scheme)
   isEqual(this.authority, that.authority)
   isEqual(this.ugi, that.ugi)
   (this.unique == that.unique);
 }
 return false;
   }
 {code}
 UserGroupInformation.equals will compare subject by reference.
 {code}
   public boolean equals(Object o) {
 if (o == this) {
   return true;
 } else if (o == null || getClass() != o.getClass()) {
   return false;
 } else {
   return subject == ((UserGroupInformation) o).subject;
 }
   }
 {code}
 So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
 are called, a new FileSystem will be created and a new entry will be added to 
 FileSystem.CACHE.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-03-12 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359662#comment-14359662
 ] 

Jonathan Eagles commented on YARN-3267:
---

+1. this patch is a nice addition, [~lichangleo]. Will commit to trunk, 
branch-2, and branch-2.7 in a few hours.

 Timelineserver applies the ACL rules after applying the limit on the number 
 of records
 --

 Key: YARN-3267
 URL: https://issues.apache.org/jira/browse/YARN-3267
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Prakash Ramachandran
Assignee: Chang Li
 Attachments: YARN-3267.3.patch, YARN-3267.4.patch, YARN-3267.5.patch, 
 YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, 
 YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch


 While fetching the entities from timelineserver, the limit is applied on the 
 entities to be fetched from leveldb, the ACL filters are applied after this 
 (TimelineDataManager.java::getEntities). 
 this could mean that even if there are entities available which match the 
 query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3346) Deadlock in Capacity Scheduler