[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358291#comment-14358291 ] zhihai xu commented on YARN-3336: - Hi [~cnauroth], Yes, you are right, I forget the FileSystem RPC Proxy also need correct ugi information. I uploaded a new patch YARN-3336.001.patch which addressed your comment. Please review it. many thanks for the review. FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3336.000.patch, YARN-3336.001.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. The calling sequence is FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf) {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } {code} UserGroupInformation.equals will compare subject by reference. {code} public boolean equals(Object o) { if (o == this) { return true; } else if (o == null || getClass() != o.getClass()) { return false; } else { return subject == ((UserGroupInformation) o).subject; } } {code} So in this case, every time createProxyUser and FileSystem.get(getConfig()) are called, a new FileSystem will be created and a new entry will be added to FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node
Vinod Kumar Vavilapalli created YARN-3332: - Summary: [Umbrella] Unified Resource Statistics Collection per node Key: YARN-3332 URL: https://issues.apache.org/jira/browse/YARN-3332 Project: Hadoop YARN Issue Type: Improvement Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Today in YARN, NodeManager collects statistics like per container resource usage and overall physical resources available on the machine. Currently this is used internally in YARN by the NodeManager for only a limited usage: automatically determining the capacity of resources on node and enforcing memory usage to what is reserved per container. This proposal is to extend the existing architecture and collect statistics for usage beyond the existing usecases. Proposal attached in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3335) Job In Error State Will Lost Jobhistory For Second and Later Attempts
Chang Li created YARN-3335: -- Summary: Job In Error State Will Lost Jobhistory For Second and Later Attempts Key: YARN-3335 URL: https://issues.apache.org/jira/browse/YARN-3335 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error state. In that situation Job's second or some later attempt could succeed but those later attempts' history file will all be lost. Because the first attempt in error state will copy its history file to intermediate dir while mistakenly think of itself as lastattempt. Jobhistory server will later move the history file of that error attempt from intermediate dir to done dir while ignore all later that job attempt's history file in intermediate dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355354#comment-14355354 ] Jian He commented on YARN-3273: --- thanks Rohith ! overall sounds good. bq. All active users table wont be rendered I think it's also useful to display all active users. Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Attachments: 0001-YARN-3273-v1.patch, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom.PNG Job may be stuck for reasons such as: - hitting queue capacity - hitting user-limit, - hitting AM-resource-percentage The first queueCapacity is already shown on the UI. We may surface things like: - what is user's current usage and user-limit; - what is the AM resource usage and limit; - what is the application's current HeadRoom; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers
[ https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355395#comment-14355395 ] Bikas Saha commented on YARN-2140: -- This paper may have useful insights into the network sharing issues. http://research.microsoft.com/en-us/um/people/srikanth/data/nsdi11_seawall.pdf Add support for network IO isolation/scheduling for containers -- Key: YARN-2140 URL: https://issues.apache.org/jira/browse/YARN-2140 Project: Hadoop YARN Issue Type: New Feature Reporter: Wei Yan Assignee: Wei Yan Attachments: NetworkAsAResourceDesign.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3336: Attachment: YARN-3336.001.patch FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3336.000.patch, YARN-3336.001.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. The calling sequence is FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf) {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } {code} UserGroupInformation.equals will compare subject by reference. {code} public boolean equals(Object o) { if (o == this) { return true; } else if (o == null || getClass() != o.getClass()) { return false; } else { return subject == ((UserGroupInformation) o).subject; } } {code} So in this case, every time createProxyUser and FileSystem.get(getConfig()) are called, a new FileSystem will be created and a new entry will be added to FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3295) Fix documentation nits found in markdown conversion
[ https://issues.apache.org/jira/browse/YARN-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357015#comment-14357015 ] Hudson commented on YARN-3295: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #129 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/129/]) YARN-3295. Fix documentation nits found in markdown conversion. Contributed by Masatake Iwasaki. (ozawa: rev 30c428a858c179645d6dc82b7027f6b7e871b439) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRestart.md Fix documentation nits found in markdown conversion --- Key: YARN-3295 URL: https://issues.apache.org/jira/browse/YARN-3295 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Trivial Fix For: 2.7.0 Attachments: YARN-3295.001.patch * In ResourceManagerRestart page - Inside the Notes, the _e{epoch}_ , was highlighted before but not now. * yarn container command {noformat} list ApplicationId (should be Application Attempt ID ?) Lists containers for the application attempt. {noformat} * yarn application attempt command {noformat} list ApplicationId Lists applications attempts from the RM (should be Lists applications attempts for the given application) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357431#comment-14357431 ] Hadoop QA commented on YARN-3243: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703964/YARN-3243.4.patch against trunk revision c3003eb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6919//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6919//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6919//console This message is automatically generated. CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357111#comment-14357111 ] Junping Du commented on YARN-3039: -- Thanks [~sjlee0]! Providing an end-to-end flow below which could be helpful for your review: - When AM get launched, NM auxiliary service will add a new aggregator service to aggregatorCollection (per Node) for necessary binding work. aggregatorCollection also has a client for AggregatorNodeManagerProtocol to notify NM on new app aggregator registered and detailed address. - When NM get notified, it will update registeredAggregators list (for all local app aggregators), and notify RM in next heartbeat. - RM received registeredAggregators from NM, it will update its aggregators list. - Next time, when other NMs and AM heartbeat with RM, it will provide aggregatorInfo in heartbeat response (for AM, it is through AllocationResponse). - AM of DS has AMRMClientAsync which heartbeat with RM so can receive updated aggregator address periodically. With registered a callback for listening aggregator address update, it can update address of TimelineClient in a thread-safe way. - AM call timeline operations in a non-blocking way (for not hanging there as deadlock), currently is wrapping with a new thread but will be improved later (in another JIRA) for saving resource of threads. - TimelineClient (consuming v2 service) is looping in retry logic until get correct address that being set by AM. [Aggregator wireup] Implement ATS app-appgregator service discovery --- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358390#comment-14358390 ] Hadoop QA commented on YARN-3336: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704121/YARN-3336.001.patch against trunk revision ff83ae7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6935//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6935//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6935//console This message is automatically generated. FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3336.000.patch, YARN-3336.001.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. The calling sequence is FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf) {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) {
[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358427#comment-14358427 ] Naganarasimha G R commented on YARN-3034: - Seems like patch is fine, I was able to update the latest code from yarn-2928 branch, apply the patch and build successfully using {{mvn clean install -DskipTests -Dmaven.javadoc.skip=true}}. Not sure why it failed . [Aggregator wireup] Implement RM starting its ATS writer Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3339) TestDockerContainerExecutor should pull a single image and not the entire centos repository
Ravindra Naik created YARN-3339: --- Summary: TestDockerContainerExecutor should pull a single image and not the entire centos repository Key: YARN-3339 URL: https://issues.apache.org/jira/browse/YARN-3339 Project: Hadoop YARN Issue Type: Test Components: test Affects Versions: 2.6.0 Environment: Linux Reporter: Ravindra Naik Priority: Critical TestDockerContainerExecutor test pulls the entire centos repository which is time consuming. Pulling a specific image (e.g. centos7) will be sufficient to run the test successfully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3339) TestDockerContainerExecutor should pull a single image and not the entire centos repository
[ https://issues.apache.org/jira/browse/YARN-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Naik updated YARN-3339: Description: TestDockerContainerExecutor test pulls the entire centos repository which is time consuming. Pulling a specific image (e.g. centos7) will be sufficient to run the test successfully and will save time was: TestDockerContainerExecutor test pulls the entire centos repository which is time consuming. Pulling a specific image (e.g. centos7) will be sufficient to run the test successfully. TestDockerContainerExecutor should pull a single image and not the entire centos repository --- Key: YARN-3339 URL: https://issues.apache.org/jira/browse/YARN-3339 Project: Hadoop YARN Issue Type: Test Components: test Affects Versions: 2.6.0 Environment: Linux Reporter: Ravindra Naik Priority: Critical TestDockerContainerExecutor test pulls the entire centos repository which is time consuming. Pulling a specific image (e.g. centos7) will be sufficient to run the test successfully and will save time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress
[ https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358538#comment-14358538 ] Hudson commented on YARN-1884: -- FAILURE: Integrated in Hadoop-Yarn-trunk #864 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/864/]) YARN-1884. Added nodeHttpAddress into ContainerReport and fixed the link to NM web page. Contributed by Xuan Gong. (zjshen: rev 85f6d67fa78511f255fcfa810afc9a156a7b483b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ContainerMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ContainerBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerReportPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ContainerCreatedEvent.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/ContainerInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java ContainerReport should have nodeHttpAddress --- Key: YARN-1884 URL: https://issues.apache.org/jira/browse/YARN-1884 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Xuan Gong Fix For: 2.7.0 Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch, YARN-1884.4.patch In web UI, we're going to show the node, which used to be to link to the NM web page. However, on AHS web UI, and RM web UI after YARN-1809, the node field has to be set to nodeID where the container is allocated. We need to add nodeHttpAddress to the containerReport to link users to NM web page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place
[ https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358550#comment-14358550 ] Hadoop QA commented on YARN-3302: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704134/YARN-3302-trunk.001.patch against trunk revision ff83ae7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6936//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6936//console This message is automatically generated. TestDockerContainerExecutor should run automatically if it can detect docker in the usual place --- Key: YARN-3302 URL: https://issues.apache.org/jira/browse/YARN-3302 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Ravi Prakash Attachments: YARN-3302-trunk.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3339) TestDockerContainerExecutor should pull a single image and not the entire centos repository
[ https://issues.apache.org/jira/browse/YARN-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358588#comment-14358588 ] Hadoop QA commented on YARN-3339: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704141/YARN-3339-branch-2.6.0.001.patch against trunk revision ff83ae7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6937//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6937//console This message is automatically generated. TestDockerContainerExecutor should pull a single image and not the entire centos repository --- Key: YARN-3339 URL: https://issues.apache.org/jira/browse/YARN-3339 Project: Hadoop YARN Issue Type: Test Components: test Affects Versions: 2.6.0 Environment: Linux Reporter: Ravindra Kumar Naik Priority: Minor Fix For: 2.6.0 Attachments: YARN-3339-branch-2.6.0.001.patch, YARN-3339-trunk.001.patch TestDockerContainerExecutor test pulls the entire centos repository which is time consuming. Pulling a specific image (e.g. centos7) will be sufficient to run the test successfully and will save time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress
[ https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358755#comment-14358755 ] Hudson commented on YARN-1884: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #121 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/121/]) YARN-1884. Added nodeHttpAddress into ContainerReport and fixed the link to NM web page. Contributed by Xuan Gong. (zjshen: rev 85f6d67fa78511f255fcfa810afc9a156a7b483b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerReportPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ContainerCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ContainerBlock.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ContainerMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/ContainerInfo.java ContainerReport should have nodeHttpAddress --- Key: YARN-1884 URL: https://issues.apache.org/jira/browse/YARN-1884 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Xuan Gong Fix For: 2.7.0 Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch, YARN-1884.4.patch In web UI, we're going to show the node, which used to be to link to the NM web page. However, on AHS web UI, and RM web UI after YARN-1809, the node field has to be set to nodeID where the container is allocated. We need to add nodeHttpAddress to the containerReport to link users to NM web page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358710#comment-14358710 ] Hadoop QA commented on YARN-41: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704146/YARN-41-4.patch against trunk revision ff83ae7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6938//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6938//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6938//console This message is automatically generated. The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41-4.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3335) Job In Error State Will Lost Jobhistory Of Second and Later Attempts
[ https://issues.apache.org/jira/browse/YARN-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-3335: --- Description: Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error state. In that situation Job's second or some later attempt could succeed but those later attempts' history file will all be lost. Because the first attempt in error state will copy its history file to intermediate dir while mistakenly think of itself as lastattempt. Jobhistory server will later move the history file of that error attempt from intermediate dir to done dir while ignore the rest of that job attempt's later history files in intermediate dir.(was: Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error state. In that situation Job's second or some later attempt could succeed but those later attempts' history file will all be lost. Because the first attempt in error state will copy its history file to intermediate dir while mistakenly think of itself as lastattempt. Jobhistory server will later move the history file of that error attempt from intermediate dir to done dir while ignore all later that job attempt's history file in intermediate dir. ) Job In Error State Will Lost Jobhistory Of Second and Later Attempts Key: YARN-3335 URL: https://issues.apache.org/jira/browse/YARN-3335 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: YARN-3335.1.patch Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error state. In that situation Job's second or some later attempt could succeed but those later attempts' history file will all be lost. Because the first attempt in error state will copy its history file to intermediate dir while mistakenly think of itself as lastattempt. Jobhistory server will later move the history file of that error attempt from intermediate dir to done dir while ignore the rest of that job attempt's later history files in intermediate dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress
[ https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358737#comment-14358737 ] Hudson commented on YARN-1884: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2062 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2062/]) YARN-1884. Added nodeHttpAddress into ContainerReport and fixed the link to NM web page. Contributed by Xuan Gong. (zjshen: rev 85f6d67fa78511f255fcfa810afc9a156a7b483b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ContainerCreatedEvent.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ContainerMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/ContainerInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerReportPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ContainerBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java ContainerReport should have nodeHttpAddress --- Key: YARN-1884 URL: https://issues.apache.org/jira/browse/YARN-1884 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Xuan Gong Fix For: 2.7.0 Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch, YARN-1884.4.patch In web UI, we're going to show the node, which used to be to link to the NM web page. However, on AHS web UI, and RM web UI after YARN-1809, the node field has to be set to nodeID where the container is allocated. We need to add nodeHttpAddress to the containerReport to link users to NM web page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3331) NodeManager should use directory other than tmp for extracting and loading leveldbjni
[ https://issues.apache.org/jira/browse/YARN-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358768#comment-14358768 ] Allen Wittenauer commented on YARN-3331: We really can't read this in via the core-site.xml file? I could have sworn we populated the system properties with the content of the xml files on startup. It's really less than ideal to set this up on the command line as it is already quite long and there is a risk that we'll overflow the buffer, preventing things from running. That issue aside: a) We need to avoid using daemon-specific and/or project-specific environment variables. This is the #1 reason why the branch-2 shell code is just utter chaos. This should be HADOOP_ something. b) Additionally, this should be set for everything, not just the nodemanager in order to keep this consistent if/when other stuff uses this functionality. The add_param should probably happen in finalize_hadoop_opts c) The default should be set with an actual value in hadoop_basic_init. $(pwd) is a *terrible* default value for this, because there is no guarantee what the pwd actually is. d) There needs to be a stanza added to hadoop-env.sh that actually explains what your new environment variable does, how to set it, etc. All/most of this is covered on http://wiki.apache.org/hadoop/UnixShellScriptProgrammingGuide . I'm now going to go check to make sure I didn't miss anything on that page and add it if I did. :) NodeManager should use directory other than tmp for extracting and loading leveldbjni - Key: YARN-3331 URL: https://issues.apache.org/jira/browse/YARN-3331 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3331.001.patch /tmp can be required to be noexec in many environments. This causes a problem when nodemanager tries to load the leveldbjni library which can get unpacked and executed from /tmp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358708#comment-14358708 ] Nathan Roberts commented on YARN-3298: -- I agree. Let's not change anything for the time being. If YARN-2113 requires some tweaking in this area, we can do it at that time. User-limit should be enforced in CapacityScheduler -- Key: YARN-3298 URL: https://issues.apache.org/jira/browse/YARN-3298 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, yarn Reporter: Wangda Tan Assignee: Wangda Tan User-limit is not treat as a hard-limit for now, it will not consider required-resource (resource of being-allocated resource request). And also, when user's used resource equals to user-limit, it will still continue. This will generate jitter issues when we have YARN-2069 (preemption policy kills a container under an user, and scheduler allocate a container under the same user soon after). The expected behavior should be as same as queue's capacity: Only when user.usage + required = user-limit (1), queue will continue to allocate container. (1), user-limit mentioned here is determined by following computing {code} current-capacity = queue.used + now-required (when queue.used queue.capacity) queue.capacity (when queue.used queue.capacity) user-limit = min(max(current-capacity / #active-users, current-capacity * user-limit / 100), queue-capacity * user-limit-factor) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3347) Improve YARN log command to get AMContainer logs
[ https://issues.apache.org/jira/browse/YARN-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3347: Description: Right now, we could specify applicationId, node http address and container ID to get the specify container log. Or we could only specify applicationId to get all the container logs. It is very hard for the users to get logs for AM container since the AMContainer logs have more useful information. Users need to know the AMContainer's container ID and related Node http address. We could improve the YARN Log Command to allow users to get AMContainer logs directly was: Right now, we could specify applicationId node http address and container ID to get the specify container log. Or we could only specify applicationId to get all the container logs. It is very hard for the users to get logs for AM container since the AMContainer logs have more useful information. Users need to know the AMContainer's container ID and related Node http address. We could improve the YARN Log Command to allow users to get AMContainer logs directly Improve YARN log command to get AMContainer logs Key: YARN-3347 URL: https://issues.apache.org/jira/browse/YARN-3347 Project: Hadoop YARN Issue Type: Test Reporter: Xuan Gong Assignee: Xuan Gong Right now, we could specify applicationId, node http address and container ID to get the specify container log. Or we could only specify applicationId to get all the container logs. It is very hard for the users to get logs for AM container since the AMContainer logs have more useful information. Users need to know the AMContainer's container ID and related Node http address. We could improve the YARN Log Command to allow users to get AMContainer logs directly -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3347) Improve YARN log command to get AMContainer logs
Xuan Gong created YARN-3347: --- Summary: Improve YARN log command to get AMContainer logs Key: YARN-3347 URL: https://issues.apache.org/jira/browse/YARN-3347 Project: Hadoop YARN Issue Type: Test Reporter: Xuan Gong Assignee: Xuan Gong Right now, we could specify applicationId node http address and container ID to get the specify container log. Or we could only specify applicationId to get all the container logs. It is very hard for the users to get logs for AM container since the AMContainer logs have more useful information. Users need to know the AMContainer's container ID and related Node http address. We could improve the YARN Log Command to allow users to get AMContainer logs directly -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3347) Improve YARN log command to get AMContainer logs
[ https://issues.apache.org/jira/browse/YARN-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3347: -- Issue Type: Sub-task (was: Test) Parent: YARN-431 Improve YARN log command to get AMContainer logs Key: YARN-3347 URL: https://issues.apache.org/jira/browse/YARN-3347 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Right now, we could specify applicationId, node http address and container ID to get the specify container log. Or we could only specify applicationId to get all the container logs. It is very hard for the users to get logs for AM container since the AMContainer logs have more useful information. Users need to know the AMContainer's container ID and related Node http address. We could improve the YARN Log Command to allow users to get AMContainer logs directly -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3284) Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN command
[ https://issues.apache.org/jira/browse/YARN-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359969#comment-14359969 ] Hadoop QA commented on YARN-3284: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704326/YARN-3284.1.patch against trunk revision 8212877. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 9 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6949//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6949//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6949//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6949//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6949//console This message is automatically generated. Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN command - Key: YARN-3284 URL: https://issues.apache.org/jira/browse/YARN-3284 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3284.1.patch, YARN-3284.1.patch Current, we have some extra metrics about the application and current attempt in RM Web UI. We should expose that information through YARN Command, too. 1. Preemption metrics 2. application outstanding resource requests 3. container locality info -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359893#comment-14359893 ] Rohith commented on YARN-3305: -- Updated patch as per comment.. Kindly review that latest patch AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation Key: YARN-3305 URL: https://issues.apache.org/jira/browse/YARN-3305 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch For given any ResourceRequest, {{CS#allocate}} normalizes request to minimumAllocation if requested memory is less than minimumAllocation. But AM-used resource is updated with actual ResourceRequest made by user. This results in AM container allocation more than Max ApplicationMaster Resource. This is because AM-Used is updated with actual ResourceRequest made by user while activating the applications. But during allocation of container, ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359881#comment-14359881 ] Rohith commented on YARN-3305: -- bq. call normalize the am request in RMAppManager after validateAndCreateResourceRequest to make sure the am request stored in RMAppIml is also correct. Agree.. Will change it AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation Key: YARN-3305 URL: https://issues.apache.org/jira/browse/YARN-3305 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Attachments: 0001-YARN-3305.patch For given any ResourceRequest, {{CS#allocate}} normalizes request to minimumAllocation if requested memory is less than minimumAllocation. But AM-used resource is updated with actual ResourceRequest made by user. This results in AM container allocation more than Max ApplicationMaster Resource. This is because AM-Used is updated with actual ResourceRequest made by user while activating the applications. But during allocation of container, ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359957#comment-14359957 ] Junping Du commented on YARN-3039: -- Also, thanks [~zjshen] for the comments! bq. Sorry for raising the question in late, but I'd like to think it out loudly about the first step. No worry. Good idea never comes late. bq. Nowadays, app-level aggregator is started by the callback handler listening to the container start even of NM. Given we are going to support stand-alone and container mode, this approach may not work. As we're going to have IPC channel between aggregator and NM, should we use an IPC call to invoke adding one app-level aggregator. That will make NM as client and aggregatorCollection as server. I think in our design for multiple aggregator models (show in YARN-3033), the relationship between NM and aggregatorCollection in future could be 1 to n mapping. In this case, making NM as server could be better, or NM have to maintain different clients to talk to different aggregatorCollections. Also, it is more align with design of YARN-3332 which sounds like NM will have more information to share out as a server. [~vinodkv], please correct me if I miss something here. Making NM as server can still work for stand-alone and container mode, several ways I can think of now: 1. add a heartbeat between NM and aggregatorCollection: given it is IPC and number of aggregatorCollection is 1 or a small number, the heartbeat interval can be much less than 1 second. Also, richer info (like healthy status, etc.) could be taken through the request and response rather than just applicationID and service address. 2. if we don't want any delay, we can make NM holding on RPC response a while and return if a new AM launch request or reach to an interval, aggregator will send RPC request again immediately even no new aggregator get bind. Thoughts? [Aggregator wireup] Implement ATS app-appgregator service discovery --- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359959#comment-14359959 ] Naganarasimha G R commented on YARN-3034: - Thanks for reviewing the patch [~gtCarrera9] I have not yet added the implementation part, where in i am planning to expose public methods similar to SystemMetricPublisher (appCreated, appFinished, appACLsUpdated,appAttemptRegistered,appAttemptFinished) and more based on the need. All these methods require ResourceManager project classes like RMAppAttempt, RMApp, RMAppAttemptState etc.. needs to be passed . Hence have kept this package in Resourcemanager project itself. And also planning to move SystemMetricsEvent and its subclasses (related to App and Appattempt) to this package. Please provide your opinion on the same [Aggregator wireup] Implement RM starting its ATS writer Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3305: - Attachment: 0001-YARN-3305.patch AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation Key: YARN-3305 URL: https://issues.apache.org/jira/browse/YARN-3305 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch For given any ResourceRequest, {{CS#allocate}} normalizes request to minimumAllocation if requested memory is less than minimumAllocation. But AM-used resource is updated with actual ResourceRequest made by user. This results in AM container allocation more than Max ApplicationMaster Resource. This is because AM-Used is updated with actual ResourceRequest made by user while activating the applications. But during allocation of container, ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359936#comment-14359936 ] Junping Du commented on YARN-3039: -- [~sjlee0], Thanks for comments here! In my understanding, if timelineServiceAddress is not null, the pollTimelineServiceAddress() can be fast return as below. Isn't it? Do you think we still need a null check here? {code} private int pollTimelineServiceAddress(int retries) { while (timelineServiceAddress == null retries 0) { try { Thread.sleep(this.serviceRetryInterval); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } timelineServiceAddress = getTimelineServiceAddress(); retries--; } return retries; } {code} [Aggregator wireup] Implement ATS app-appgregator service discovery --- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359953#comment-14359953 ] Hadoop QA commented on YARN-3305: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704352/0001-YARN-3305.patch against trunk revision a852910. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate org.apache.hadoop.yarn.server.resourcemanager.TestAppManager org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6950//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6950//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6950//console This message is automatically generated. AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation Key: YARN-3305 URL: https://issues.apache.org/jira/browse/YARN-3305 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch For given any ResourceRequest, {{CS#allocate}} normalizes request to minimumAllocation if requested memory is less than minimumAllocation. But AM-used resource is updated with actual ResourceRequest made by user. This results in AM container allocation more than Max ApplicationMaster Resource. This is because AM-Used is updated with actual ResourceRequest made by user while activating the applications. But during allocation of container, ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period
[ https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3294: Attachment: (was: apache-yarn-3294.0.patch) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period - Key: YARN-3294 URL: https://issues.apache.org/jira/browse/YARN-3294 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, apache-yarn-3294.0.patch It would be nice to have a button on the web UI that would allow dumping of debug logs for just the capacity scheduler for a fixed period of time(1 min, 5 min or so) in a separate log file. It would be useful when debugging scheduler behavior without affecting the rest of the resourcemanager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period
[ https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3294: Attachment: apache-yarn-3294.0.patch Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period - Key: YARN-3294 URL: https://issues.apache.org/jira/browse/YARN-3294 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, apache-yarn-3294.0.patch It would be nice to have a button on the web UI that would allow dumping of debug logs for just the capacity scheduler for a fixed period of time(1 min, 5 min or so) in a separate log file. It would be useful when debugging scheduler behavior without affecting the rest of the resourcemanager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-3267: --- Attachment: YARN-3267.4.patch Timelineserver applies the ACL rules after applying the limit on the number of records -- Key: YARN-3267 URL: https://issues.apache.org/jira/browse/YARN-3267 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Prakash Ramachandran Assignee: Chang Li Attachments: YARN-3267.3.patch, YARN-3267.4.patch, YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch While fetching the entities from timelineserver, the limit is applied on the entities to be fetched from leveldb, the ACL filters are applied after this (TimelineDataManager.java::getEntities). this could mean that even if there are entities available which match the query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-3267: --- Attachment: (was: YARN-3267.4.patch) Timelineserver applies the ACL rules after applying the limit on the number of records -- Key: YARN-3267 URL: https://issues.apache.org/jira/browse/YARN-3267 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Prakash Ramachandran Assignee: Chang Li Attachments: YARN-3267.3.patch, YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch While fetching the entities from timelineserver, the limit is applied on the entities to be fetched from leveldb, the ACL filters are applied after this (TimelineDataManager.java::getEntities). this could mean that even if there are entities available which match the query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress
[ https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358886#comment-14358886 ] Hudson commented on YARN-1884: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2080 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2080/]) YARN-1884. Added nodeHttpAddress into ContainerReport and fixed the link to NM web page. Contributed by Xuan Gong. (zjshen: rev 85f6d67fa78511f255fcfa810afc9a156a7b483b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ContainerMetricsConstants.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ContainerBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/ContainerInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerReportPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ContainerCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto ContainerReport should have nodeHttpAddress --- Key: YARN-1884 URL: https://issues.apache.org/jira/browse/YARN-1884 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Xuan Gong Fix For: 2.7.0 Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch, YARN-1884.4.patch In web UI, we're going to show the node, which used to be to link to the NM web page. However, on AHS web UI, and RM web UI after YARN-1809, the node field has to be set to nodeID where the container is allocated. We need to add nodeHttpAddress to the containerReport to link users to NM web page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3330) Implement a protobuf compatibility checker to check if a patch breaks the compatibility with existing client and internal protocols
[ https://issues.apache.org/jira/browse/YARN-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358792#comment-14358792 ] Allen Wittenauer commented on YARN-3330: This shouldn't be yarn specific. I'd HIGHLY recommend bumping this up to a full issue into the HADOOP project so that: a) goes into common b) has higher visibility Implement a protobuf compatibility checker to check if a patch breaks the compatibility with existing client and internal protocols --- Key: YARN-3330 URL: https://issues.apache.org/jira/browse/YARN-3330 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Attachments: pdiff_patch.py Per YARN-3292, we may want to start YARN rolling upgrade test compatibility verification tool by a simple script to check protobuf compatibility. The script may work on incoming patch files, check if there are any changes to protobuf files, and report any potentially incompatible changes (line removals, etc,.). We may want the tool to be conservative: it may report false positives, but we should minimize its chance to have false negatives. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358798#comment-14358798 ] Mit Desai commented on YARN-2890: - Verified the test failures are not due to my patch. Some of them passes with my patch on my local machine. And some always fail for me. MiniMRYarnCluster should turn on timeline service if configured to do so Key: YARN-2890 URL: https://issues.apache.org/jira/browse/YARN-2890 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch Currently the MiniMRYarnCluster does not consider the configuration value for enabling timeline service before starting. The MiniYarnCluster should only start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-3267: --- Attachment: YARN-3267.4.patch Timelineserver applies the ACL rules after applying the limit on the number of records -- Key: YARN-3267 URL: https://issues.apache.org/jira/browse/YARN-3267 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Prakash Ramachandran Assignee: Chang Li Attachments: YARN-3267.3.patch, YARN-3267.4.patch, YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch While fetching the entities from timelineserver, the limit is applied on the entities to be fetched from leveldb, the ACL filters are applied after this (TimelineDataManager.java::getEntities). this could mean that even if there are entities available which match the query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period
[ https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3294: Attachment: Screen Shot 2015-03-12 at 8.51.25 PM.png Uploaded screenshot with the changes. Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period - Key: YARN-3294 URL: https://issues.apache.org/jira/browse/YARN-3294 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png It would be nice to have a button on the web UI that would allow dumping of debug logs for just the capacity scheduler for a fixed period of time(1 min, 5 min or so) in a separate log file. It would be useful when debugging scheduler behavior without affecting the rest of the resourcemanager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period
[ https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3294: Attachment: apache-yarn-3294.0.patch Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period - Key: YARN-3294 URL: https://issues.apache.org/jira/browse/YARN-3294 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, apache-yarn-3294.0.patch It would be nice to have a button on the web UI that would allow dumping of debug logs for just the capacity scheduler for a fixed period of time(1 min, 5 min or so) in a separate log file. It would be useful when debugging scheduler behavior without affecting the rest of the resourcemanager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358823#comment-14358823 ] Chang Li commented on YARN-3267: Thanks [~jeagles] for further review, updated patch according to those suggestions. Timelineserver applies the ACL rules after applying the limit on the number of records -- Key: YARN-3267 URL: https://issues.apache.org/jira/browse/YARN-3267 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Prakash Ramachandran Assignee: Chang Li Attachments: YARN-3267.3.patch, YARN-3267.4.patch, YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch While fetching the entities from timelineserver, the limit is applied on the entities to be fetched from leveldb, the ACL filters are applied after this (TimelineDataManager.java::getEntities). this could mean that even if there are entities available which match the query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period
[ https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358821#comment-14358821 ] Varun Vasudev commented on YARN-3294: - The uploaded patch will dump debug logs for the capacity scheduler to yarn-capacity-scheduler-debug.log in the logs directory. Older logs will be overwritten and not rotated. Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period - Key: YARN-3294 URL: https://issues.apache.org/jira/browse/YARN-3294 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, apache-yarn-3294.0.patch It would be nice to have a button on the web UI that would allow dumping of debug logs for just the capacity scheduler for a fixed period of time(1 min, 5 min or so) in a separate log file. It would be useful when debugging scheduler behavior without affecting the rest of the resourcemanager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress
[ https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358833#comment-14358833 ] Hudson commented on YARN-1884: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #130 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/130/]) YARN-1884. Added nodeHttpAddress into ContainerReport and fixed the link to NM web page. Contributed by Xuan Gong. (zjshen: rev 85f6d67fa78511f255fcfa810afc9a156a7b483b) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/ContainerInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ContainerCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ContainerBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ContainerMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerReportPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java ContainerReport should have nodeHttpAddress --- Key: YARN-1884 URL: https://issues.apache.org/jira/browse/YARN-1884 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Xuan Gong Fix For: 2.7.0 Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch, YARN-1884.4.patch In web UI, we're going to show the node, which used to be to link to the NM web page. However, on AHS web UI, and RM web UI after YARN-1809, the node field has to be set to nodeID where the container is allocated. We need to add nodeHttpAddress to the containerReport to link users to NM web page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3291) DockerContainerExecutor should run as a non-root user inside the container
[ https://issues.apache.org/jira/browse/YARN-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358856#comment-14358856 ] Abin Shahab commented on YARN-3291: --- [~sidharta-s] [~raviprak] [~aw] [~vinodkv] [~vvasudev] Please review. DockerContainerExecutor should run as a non-root user inside the container -- Key: YARN-3291 URL: https://issues.apache.org/jira/browse/YARN-3291 Project: Hadoop YARN Issue Type: Improvement Reporter: Abin Shahab Assignee: Abin Shahab Attachments: YARN-3291.patch Currently DockerContainerExecutor runs container as root(inside the container). Outside the container it runs as yarn. Inside the this can be run as the user which is not root. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3291) DockerContainerExecutor should run as a non-root user inside the container
[ https://issues.apache.org/jira/browse/YARN-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358858#comment-14358858 ] Abin Shahab commented on YARN-3291: --- I'll fix the findbug error. DockerContainerExecutor should run as a non-root user inside the container -- Key: YARN-3291 URL: https://issues.apache.org/jira/browse/YARN-3291 Project: Hadoop YARN Issue Type: Improvement Reporter: Abin Shahab Assignee: Abin Shahab Attachments: YARN-3291.patch Currently DockerContainerExecutor runs container as root(inside the container). Outside the container it runs as yarn. Inside the this can be run as the user which is not root. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN
[ https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358977#comment-14358977 ] Xuan Gong commented on YARN-3338: - +1. lgtm. will commit Exclude jline dependency from YARN -- Key: YARN-3338 URL: https://issues.apache.org/jira/browse/YARN-3338 Project: Hadoop YARN Issue Type: Bug Components: build Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-3338.1.patch It was fixed in YARN-2815, but is broken again by YARN-1514. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359012#comment-14359012 ] Sangjin Lee commented on YARN-3034: --- The hadoop jenkins would try to apply it against the trunk, so it wouldn't pass anyway. I took a look at the patch, and it looks good for the most part. One minor comment: - resourcemanager/pom.xml -- you should be able to remove the version as it is specified in the parent pom -- nit: reduce spaces to 2 Thanks! [Aggregator wireup] Implement RM starting its ATS writer Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN
[ https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359020#comment-14359020 ] Xuan Gong commented on YARN-3338: - Committed to trunk/branch-2/branch-2.7. Thanks, zhijie Exclude jline dependency from YARN -- Key: YARN-3338 URL: https://issues.apache.org/jira/browse/YARN-3338 Project: Hadoop YARN Issue Type: Bug Components: build Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3338.1.patch It was fixed in YARN-2815, but is broken again by YARN-1514. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN
[ https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359032#comment-14359032 ] Hudson commented on YARN-3338: -- FAILURE: Integrated in Hadoop-trunk-Commit #7309 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7309/]) YARN-3338. Exclude jline dependency from YARN. Contributed by Zhijie (xgong: rev 06ce1d9a6cd9bec25e2f478b98264caf96a3ea44) * hadoop-project/pom.xml * hadoop-yarn-project/CHANGES.txt Exclude jline dependency from YARN -- Key: YARN-3338 URL: https://issues.apache.org/jira/browse/YARN-3338 Project: Hadoop YARN Issue Type: Bug Components: build Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3338.1.patch It was fixed in YARN-2815, but is broken again by YARN-1514. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN
[ https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358963#comment-14358963 ] Chris Douglas commented on YARN-3338: - +1 lgtm Exclude jline dependency from YARN -- Key: YARN-3338 URL: https://issues.apache.org/jira/browse/YARN-3338 Project: Hadoop YARN Issue Type: Bug Components: build Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-3338.1.patch It was fixed in YARN-2815, but is broken again by YARN-1514. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes
[ https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358986#comment-14358986 ] Varun Vasudev commented on YARN-3326: - [~Naganarasimha] - can we change the endpoint from /get-labels-to-Nodes to a more RESTful name. Something like /labels/nodes?labels=nodes? ReST support for getLabelsToNodes -- Key: YARN-3326 URL: https://issues.apache.org/jira/browse/YARN-3326 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Minor Attachments: YARN-3326.20150310-1.patch REST to support to retrieve LabelsToNodes Mapping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3243: - Attachment: YARN-3243.5.patch CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2792) Have a public Test-only API for creating important records that ecosystem projects can depend on
[ https://issues.apache.org/jira/browse/YARN-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2792: - Priority: Major (was: Blocker) Have a public Test-only API for creating important records that ecosystem projects can depend on Key: YARN-2792 URL: https://issues.apache.org/jira/browse/YARN-2792 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli From YARN-2789, {quote} Sigh. Even though this is a private API, it will be used by downstream projects for testing. It'll be useful for this to be re-instated, maybe with a deprecated annotation, so that older versions of downstream projects can build against Hadoop 2.6. I am inclined to have a separate test-only public util API that keeps compatibility for tests. Rather than opening unwanted APIs up. I'll file a separate ticket for this, we need all YARN apps/frameworks to move to that API instead of these private unstable APIs. For now, I am okay keeping a private compat for the APIs changed in YARN-2698. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3341) Fix findbugs warning:BC_UNCONFIRMED_CAST at FSSchedulerNode.reserveResource
zhihai xu created YARN-3341: --- Summary: Fix findbugs warning:BC_UNCONFIRMED_CAST at FSSchedulerNode.reserveResource Key: YARN-3341 URL: https://issues.apache.org/jira/browse/YARN-3341 Project: Hadoop YARN Issue Type: Improvement Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Fix findbugs warning:BC_UNCONFIRMED_CAST at FSSchedulerNode.reserveResource The warning message is {code} Unchecked/unconfirmed cast from org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode.reserveResource(SchedulerApplicationAttempt, Priority, RMContainer) {code} The code which cause the warning is {code} this.reservedAppSchedulable = (FSAppAttempt) application; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3343) TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate sometime fails in trunk
Xuan Gong created YARN-3343: --- Summary: TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate sometime fails in trunk Key: YARN-3343 URL: https://issues.apache.org/jira/browse/YARN-3343 Project: Hadoop YARN Issue Type: Test Reporter: Xuan Gong Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359460#comment-14359460 ] Hadoop QA commented on YARN-3243: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704207/YARN-3243.5.patch against trunk revision b49c3a1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6945//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6945//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6945//console This message is automatically generated. CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3343) TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate sometime fails in trunk
[ https://issues.apache.org/jira/browse/YARN-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3343: Description: Error Message test timed out after 3 milliseconds Stacktrace java.lang.Exception: test timed out after 3 milliseconds at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293) at java.net.InetAddress.getAllByName0(InetAddress.java:1246) at java.net.InetAddress.getAllByName(InetAddress.java:1162) at java.net.InetAddress.getAllByName(InetAddress.java:1098) at java.net.InetAddress.getByName(InetAddress.java:1048) at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:563) at org.apache.hadoop.yarn.server.resourcemanager.NodesListManager.isValidNode(NodesListManager.java:147) at org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(ResourceTrackerService.java:367) at org.apache.hadoop.yarn.server.resourcemanager.MockNM.nodeHeartbeat(MockNM.java:178) at org.apache.hadoop.yarn.server.resourcemanager.MockNM.nodeHeartbeat(MockNM.java:136) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:206) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate(TestCapacitySchedulerNodeLabelUpdate.java:157) TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate sometime fails in trunk --- Key: YARN-3343 URL: https://issues.apache.org/jira/browse/YARN-3343 Project: Hadoop YARN Issue Type: Test Reporter: Xuan Gong Priority: Minor Error Message test timed out after 3 milliseconds Stacktrace java.lang.Exception: test timed out after 3 milliseconds at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293) at java.net.InetAddress.getAllByName0(InetAddress.java:1246) at java.net.InetAddress.getAllByName(InetAddress.java:1162) at java.net.InetAddress.getAllByName(InetAddress.java:1098) at java.net.InetAddress.getByName(InetAddress.java:1048) at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:563) at org.apache.hadoop.yarn.server.resourcemanager.NodesListManager.isValidNode(NodesListManager.java:147) at org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(ResourceTrackerService.java:367) at org.apache.hadoop.yarn.server.resourcemanager.MockNM.nodeHeartbeat(MockNM.java:178) at org.apache.hadoop.yarn.server.resourcemanager.MockNM.nodeHeartbeat(MockNM.java:136) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:206) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate(TestCapacitySchedulerNodeLabelUpdate.java:157) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359367#comment-14359367 ] Hudson commented on YARN-3154: -- FAILURE: Integrated in Hadoop-trunk-Commit #7312 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7312/]) YARN-3154. Added additional APIs in LogAggregationContext to avoid aggregating running logs of application when rolling is enabled. Contributed by Xuan Gong. (vinodkv: rev 863079bb874ba77918ca1c0741eae10e245995c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LogAggregationContextPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationContext.java Should not upload partial logs for MR jobs or other short-running' applications - Key: YARN-3154 URL: https://issues.apache.org/jira/browse/YARN-3154 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, YARN-3154.4.patch Currently, if we are running a MR job, and we do not set the log interval properly, we will have their partial logs uploaded and then removed from the local filesystem which is not right. We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359364#comment-14359364 ] Li Lu commented on YARN-3034: - Hi [~Naganarasimha], thanks for updating the patch! Just a quick question: are there any special rationales behind putting RMTimelineCollector into a separate place in RM, instead of putting it into the aggregator (or, collector) directory in our own timeline server module? (Especially when we're introducing dependency to the RM anyways... ) Thanks! [Aggregator wireup] Implement RM starting its ATS writer Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-3267: --- Attachment: YARN-3267.5.patch Timelineserver applies the ACL rules after applying the limit on the number of records -- Key: YARN-3267 URL: https://issues.apache.org/jira/browse/YARN-3267 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Prakash Ramachandran Assignee: Chang Li Attachments: YARN-3267.3.patch, YARN-3267.4.patch, YARN-3267.5.patch, YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch While fetching the entities from timelineserver, the limit is applied on the entities to be fetched from leveldb, the ACL filters are applied after this (TimelineDataManager.java::getEntities). this could mean that even if there are entities available which match the query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1942) Many of ConverterUtils methods need to have public interfaces
[ https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359438#comment-14359438 ] Hadoop QA commented on YARN-1942: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704246/YARN-1942.2.patch against trunk revision 863079b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color}. The applied patch generated 1372 javac compiler warnings (more than the trunk's current 1152 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: org.apache.hadoop.yarn.logaggregation.TestAggregatedLogDeletionService Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6946//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6946//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6946//console This message is automatically generated. Many of ConverterUtils methods need to have public interfaces - Key: YARN-1942 URL: https://issues.apache.org/jira/browse/YARN-1942 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.4.0 Reporter: Thomas Graves Assignee: Wangda Tan Priority: Critical Attachments: YARN-1942.1.patch, YARN-1942.2.patch ConverterUtils has a bunch of functions that are useful to application masters. It should either be made public or we make some of the utilities in it public or we provide other external apis for application masters to use. Note that distributedshell and MR are both using these interfaces. For instance the main use case I see right now is for getting the application attempt id within the appmaster: String containerIdStr = System.getenv(Environment.CONTAINER_ID.name()); ConverterUtils.toContainerId ContainerId containerId = ConverterUtils.toContainerId(containerIdStr); ApplicationAttemptId applicationAttemptId = containerId.getApplicationAttemptId(); I don't see any other way for the application master to get this information. If there is please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359441#comment-14359441 ] Zhijie Shen commented on YARN-3039: --- bq. When AM get launched, NM auxiliary service will add a new aggregator service to aggregatorCollection (per Node) for necessary binding work. aggregatorCollection also has a client for AggregatorNodeManagerProtocol to notify NM on new app aggregator registered and detailed address. Hi Junping, thanks for creating the new patch. Sorry for raising the question in late, but I'd like to think it out loudly about the first step. Nowadays, app-level aggregator is started by the callback handler listening to the container start even of NM. Given we are going to support stand-alone and container mode, this approach may not work. As we're going to have IPC channel between aggregator and NM, should we use an IPC call to invoke adding one app-level aggregator. So the protocol is that NM sends a request to the aggregator collection to start a app-level aggregator, and collection responds with the aggregator address. However, in this case, it may not be AggregatorNodemanagerProtocol, but NodemanagerAggregatorProtocol instead. The benefit is to uniform the way of starting app-level aggregator inside node-level aggregator (at lease it seem that we need to something similar in YARN-3033), and further reducing the dependency/assumption on aux service. [~vinodkv] and [~sjlee0] how do you think about it? [Aggregator wireup] Implement ATS app-appgregator service discovery --- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-2871) TestRMRestart#testRMRestartGetApplicationList sometime fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reopened YARN-2871: - I saw the testcase failure with the same issue. rMAppManager.logApplicationSummary( isA(org.apache.hadoop.yarn.api.records.ApplicationId) ); Wanted 3 times: - at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:969) But was 2 times: - at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:66) Stacktrace org.mockito.exceptions.verification.TooLittleActualInvocations: rMAppManager.logApplicationSummary( isA(org.apache.hadoop.yarn.api.records.ApplicationId) ); Wanted 3 times: - at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:969) But was 2 times: - at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:66) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:969) TestRMRestart#testRMRestartGetApplicationList sometime fails in trunk - Key: YARN-2871 URL: https://issues.apache.org/jira/browse/YARN-2871 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Priority: Minor From trunk build #746 (https://builds.apache.org/job/Hadoop-Yarn-trunk/746): {code} Failed tests: TestRMRestart.testRMRestartGetApplicationList:957 rMAppManager.logApplicationSummary( isA(org.apache.hadoop.yarn.api.records.ApplicationId) ); Wanted 3 times: - at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:957) But was 2 times: - at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:66) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3342) TestAllocationFileLoaderService.testGetAllocationFileFromClasspath sometime fails in trunk
Xuan Gong created YARN-3342: --- Summary: TestAllocationFileLoaderService.testGetAllocationFileFromClasspath sometime fails in trunk Key: YARN-3342 URL: https://issues.apache.org/jira/browse/YARN-3342 Project: Hadoop YARN Issue Type: Test Reporter: Xuan Gong Priority: Minor org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService.testGetAllocationFileFromClasspath Failing for the past 1 build (Since Failed#6924 ) Took 25 ms. Stacktrace java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService.testGetAllocationFileFromClasspath(TestAllocationFileLoaderService.java:66) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359356#comment-14359356 ] Jian He commented on YARN-3136: --- looks good to me. test failures not related? getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, 0006-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1942) Many of ConverterUtils methods need to have public interfaces
[ https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1942: - Attachment: YARN-1942.2.patch Attached ver.2. Offline discussed with Vinod, moved most methods of ConverterUtils to *Id, and marked the previous method to deprecated. This also merged changes in YARN-3340. Many of ConverterUtils methods need to have public interfaces - Key: YARN-1942 URL: https://issues.apache.org/jira/browse/YARN-1942 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.4.0 Reporter: Thomas Graves Assignee: Wangda Tan Priority: Critical Attachments: YARN-1942.1.patch, YARN-1942.2.patch ConverterUtils has a bunch of functions that are useful to application masters. It should either be made public or we make some of the utilities in it public or we provide other external apis for application masters to use. Note that distributedshell and MR are both using these interfaces. For instance the main use case I see right now is for getting the application attempt id within the appmaster: String containerIdStr = System.getenv(Environment.CONTAINER_ID.name()); ConverterUtils.toContainerId ContainerId containerId = ConverterUtils.toContainerId(containerIdStr); ApplicationAttemptId applicationAttemptId = containerId.getApplicationAttemptId(); I don't see any other way for the application master to get this information. If there is please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359224#comment-14359224 ] Hadoop QA commented on YARN-3336: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704208/YARN-3336.001.patch against trunk revision 06ce1d9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestTestTests org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorizatTestTests org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheTests org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMReTests org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCRespoTests org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourcTests org.apache.hadoop.yarn.server.resourcemanager.TestResourceManTests org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6941//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6941//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6941//console This message is automatically generated. FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3336.000.patch, YARN-3336.001.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens =
[jira] [Commented] (YARN-3330) Implement a protobuf compatibility checker to check if a patch breaks the compatibility with existing client and internal protocols
[ https://issues.apache.org/jira/browse/YARN-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359254#comment-14359254 ] Li Lu commented on YARN-3330: - Hi [~aw], thanks for the suggestion! I will definitely bump this up after I finished the two issues raised above. Implement a protobuf compatibility checker to check if a patch breaks the compatibility with existing client and internal protocols --- Key: YARN-3330 URL: https://issues.apache.org/jira/browse/YARN-3330 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Attachments: pdiff_patch.py Per YARN-3292, we may want to start YARN rolling upgrade test compatibility verification tool by a simple script to check protobuf compatibility. The script may work on incoming patch files, check if there are any changes to protobuf files, and report any potentially incompatible changes (line removals, etc,.). We may want the tool to be conservative: it may report false positives, but we should minimize its chance to have false negatives. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359230#comment-14359230 ] Hadoop QA commented on YARN-3243: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704207/YARN-3243.5.patch against trunk revision 06ce1d9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore org.apache.hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStorePerf org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6942//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6942//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6942//console This message is automatically generated. CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA):
[jira] [Commented] (YARN-3337) Provide YARN chaos monkey
[ https://issues.apache.org/jira/browse/YARN-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359290#comment-14359290 ] Markus Weimer commented on YARN-3337: - Excellent idea! We do this over in REEF within the AM as well as the containers (via reef-poison), but that doesn't allow us to create failures at a low enough level. Having the RM kill our containers would be awesome for tests. Provide YARN chaos monkey - Key: YARN-3337 URL: https://issues.apache.org/jira/browse/YARN-3337 Project: Hadoop YARN Issue Type: New Feature Components: test Affects Versions: 2.7.0 Reporter: Steve Loughran To test failure resilience today you either need custom scripts or implement Chaos Monkey-like logic in your application (SLIDER-202). Killing AMs and containers on a schedule probability is the core activity here, one that could be handled by a CLI App/client lib that does this. # entry point to have a startup delay before acting # frequency of chaos wakeup/polling # probability to AM failure generation (0-100) # probability of non-AM container kill # future: other operations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359239#comment-14359239 ] Junping Du commented on YARN-3225: -- Nice discussion, [~devaraj.k]! bq. If there are some long running containers in the NM and RMAdmin CLI gets terminated before issuing forceful decommission then the NM could in the “DECOMMISSIONING” state irrespective of timeout. AM I missing anything? If users terminate the blocking/pending CLI, then it only means they want to track timeout themselves or they want to adjust timeout value ahead or delay. In this case, the decommissioning nodes either get decommissioned when app finished (a clean quit), or wait user to decommission again later. We can add some alert messages later if some nodes are in decommissioning stage for really long time. The basic idea is we agree to not tracking timeout in RM side for each individual nodes. bq. If we don't pass timeout to RM then how are we going to achieve this? You mean this will be handled later, once the basic things are done. You are right that timeout value could be useful to pass down to AM for preemption containers (however, no any effect on terminating nodes). Let's keep it here and we can leverage it later when we are notifying AM. bq. For making timeout longer, if we use new CLI then there is a chance of forceful decommission happening with the old CLI timeout. Is there any constraint like this needs to be done with the same CLI? Not quite understanding the case described here. Users should terminate the current CLI and launch a new CLI for adjusted timeout values if they want to wait shorter or longer. If it already passed previous timeout values, current CLI should quit already with all nodes decommissioned. Am I missing something here? New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Devaraj K Attachments: YARN-3225.patch, YARN-914.patch New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359275#comment-14359275 ] Wangda Tan commented on YARN-3243: -- Seems like Jenkins issue, rekicked. CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359332#comment-14359332 ] Vinod Kumar Vavilapalli commented on YARN-3154: --- bq. The testcase failures are not related. And They can pass locally. Can you point any JIRAs tracking them. The patch looks good, checking it in. Should not upload partial logs for MR jobs or other short-running' applications - Key: YARN-3154 URL: https://issues.apache.org/jira/browse/YARN-3154 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, YARN-3154.4.patch Currently, if we are running a MR job, and we do not set the log interval properly, we will have their partial logs uploaded and then removed from the local filesystem which is not right. We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-1942) Many of ConverterUtils methods need to have public interfaces
[ https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-1942: Assignee: Wangda Tan Many of ConverterUtils methods need to have public interfaces - Key: YARN-1942 URL: https://issues.apache.org/jira/browse/YARN-1942 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.4.0 Reporter: Thomas Graves Assignee: Wangda Tan Priority: Critical ConverterUtils has a bunch of functions that are useful to application masters. It should either be made public or we make some of the utilities in it public or we provide other external apis for application masters to use. Note that distributedshell and MR are both using these interfaces. For instance the main use case I see right now is for getting the application attempt id within the appmaster: String containerIdStr = System.getenv(Environment.CONTAINER_ID.name()); ConverterUtils.toContainerId ContainerId containerId = ConverterUtils.toContainerId(containerIdStr); ApplicationAttemptId applicationAttemptId = containerId.getApplicationAttemptId(); I don't see any other way for the application master to get this information. If there is please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1942) Many of ConverterUtils methods need to have public interfaces
[ https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1942: - Target Version/s: 2.7.0 (was: 3.0.0, 2.4.1) Many of ConverterUtils methods need to have public interfaces - Key: YARN-1942 URL: https://issues.apache.org/jira/browse/YARN-1942 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.4.0 Reporter: Thomas Graves Assignee: Wangda Tan Priority: Critical ConverterUtils has a bunch of functions that are useful to application masters. It should either be made public or we make some of the utilities in it public or we provide other external apis for application masters to use. Note that distributedshell and MR are both using these interfaces. For instance the main use case I see right now is for getting the application attempt id within the appmaster: String containerIdStr = System.getenv(Environment.CONTAINER_ID.name()); ConverterUtils.toContainerId ContainerId containerId = ConverterUtils.toContainerId(containerIdStr); ApplicationAttemptId applicationAttemptId = containerId.getApplicationAttemptId(); I don't see any other way for the application master to get this information. If there is please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1942) Many of ConverterUtils methods need to have public interfaces
[ https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359091#comment-14359091 ] Wangda Tan commented on YARN-1942: -- Working on this, will post a patch soon. Many of ConverterUtils methods need to have public interfaces - Key: YARN-1942 URL: https://issues.apache.org/jira/browse/YARN-1942 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.4.0 Reporter: Thomas Graves Assignee: Wangda Tan Priority: Critical ConverterUtils has a bunch of functions that are useful to application masters. It should either be made public or we make some of the utilities in it public or we provide other external apis for application masters to use. Note that distributedshell and MR are both using these interfaces. For instance the main use case I see right now is for getting the application attempt id within the appmaster: String containerIdStr = System.getenv(Environment.CONTAINER_ID.name()); ConverterUtils.toContainerId ContainerId containerId = ConverterUtils.toContainerId(containerIdStr); ApplicationAttemptId applicationAttemptId = containerId.getApplicationAttemptId(); I don't see any other way for the application master to get this information. If there is please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3340) Mark setters to be @Public for ApplicationId/ApplicationAttemptId/ContainerId.
Wangda Tan created YARN-3340: Summary: Mark setters to be @Public for ApplicationId/ApplicationAttemptId/ContainerId. Key: YARN-3340 URL: https://issues.apache.org/jira/browse/YARN-3340 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Currently, setters of ApplicaitonId/ApplicationAttemptId/ContainerId are all private, that's not correct -- user's applications need to set such ids to do query status / submit application, etc. We need mark such setters to be public avoiding downstream applications encounters compilation error when changes made on such setters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3340) Mark setters to be @Public for ApplicationId/ApplicationAttemptId/ContainerId.
[ https://issues.apache.org/jira/browse/YARN-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359188#comment-14359188 ] Hadoop QA commented on YARN-3340: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704212/YARN-3340.1.patch against trunk revision 6dae6d1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6944//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6944//console This message is automatically generated. Mark setters to be @Public for ApplicationId/ApplicationAttemptId/ContainerId. -- Key: YARN-3340 URL: https://issues.apache.org/jira/browse/YARN-3340 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3340.1.patch Currently, setters of ApplicaitonId/ApplicationAttemptId/ContainerId are all private, that's not correct -- user's applications need to set such ids to do query status / submit application, etc. We need mark such setters to be public avoiding downstream applications encounters compilation error when changes made on such setters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359199#comment-14359199 ] Xuan Gong commented on YARN-3154: - The testcase failures are not related. And They can pass locally. Should not upload partial logs for MR jobs or other short-running' applications - Key: YARN-3154 URL: https://issues.apache.org/jira/browse/YARN-3154 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, YARN-3154.4.patch Currently, if we are running a MR job, and we do not set the log interval properly, we will have their partial logs uploaded and then removed from the local filesystem which is not right. We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359089#comment-14359089 ] Wangda Tan commented on YARN-3243: -- Hi [~jianhe], I've done all updates, thanks for your comments! bq. CapacityScheduler, why following code is moved ? This is because CS will call updateClusterResource after the a new node is added, which will set resourceLimits of queues. To set queue's resourceLimits, it needs to how much resource in each partition, so it need to call labelManager.activeNode before setting queue's resourceLimits. Attaching new patch (ver.5) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1942) Many of ConverterUtils methods need to have public interfaces
[ https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359169#comment-14359169 ] Hadoop QA commented on YARN-1942: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704209/YARN-1942.1.patch against trunk revision 06ce1d9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6943//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6943//console This message is automatically generated. Many of ConverterUtils methods need to have public interfaces - Key: YARN-1942 URL: https://issues.apache.org/jira/browse/YARN-1942 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.4.0 Reporter: Thomas Graves Assignee: Wangda Tan Priority: Critical Attachments: YARN-1942.1.patch ConverterUtils has a bunch of functions that are useful to application masters. It should either be made public or we make some of the utilities in it public or we provide other external apis for application masters to use. Note that distributedshell and MR are both using these interfaces. For instance the main use case I see right now is for getting the application attempt id within the appmaster: String containerIdStr = System.getenv(Environment.CONTAINER_ID.name()); ConverterUtils.toContainerId ContainerId containerId = ConverterUtils.toContainerId(containerIdStr); ApplicationAttemptId applicationAttemptId = containerId.getApplicationAttemptId(); I don't see any other way for the application master to get this information. If there is please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359186#comment-14359186 ] Jian He commented on YARN-3305: --- [~rohithsharma], thanks for working on this ! I think we can call normalize the am request in RMAppManager after validateAndCreateResourceRequest to make sure the am request stored in RMAppIml is also correct. AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation Key: YARN-3305 URL: https://issues.apache.org/jira/browse/YARN-3305 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Attachments: 0001-YARN-3305.patch For given any ResourceRequest, {{CS#allocate}} normalizes request to minimumAllocation if requested memory is less than minimumAllocation. But AM-used resource is updated with actual ResourceRequest made by user. This results in AM container allocation more than Max ApplicationMaster Resource. This is because AM-Used is updated with actual ResourceRequest made by user while activating the applications. But during allocation of container, ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359181#comment-14359181 ] zhihai xu commented on YARN-3336: - The findbugs warning is not related to my change. I just created YARN-3341 to fix this findbugs warning. The TestRM failure is also not related to my change, which is a timeout failure. It passed at my local latest build with the following message. {code} --- T E S T S --- Running org.apache.hadoop.yarn.server.resourcemanager.TestRM Tests run: 22, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 166.002 sec - in org.apache.hadoop.yarn.server.resourcemanager.TestRM Results : Tests run: 22, Failures: 0, Errors: 0, Skipped: 0 {code} FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3336.000.patch, YARN-3336.001.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. The calling sequence is FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf) {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } {code} UserGroupInformation.equals will compare subject by reference. {code} public boolean equals(Object o) { if (o == this) { return true; } else if (o == null || getClass() != o.getClass()) { return false; } else { return subject == ((UserGroupInformation) o).subject; } } {code} So in this case, every time createProxyUser and FileSystem.get(getConfig()) are called, a new FileSystem will be created and a new entry will be added to FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3336: Attachment: (was: YARN-3336.001.patch) FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3336.000.patch, YARN-3336.001.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. The calling sequence is FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf) {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } {code} UserGroupInformation.equals will compare subject by reference. {code} public boolean equals(Object o) { if (o == this) { return true; } else if (o == null || getClass() != o.getClass()) { return false; } else { return subject == ((UserGroupInformation) o).subject; } } {code} So in this case, every time createProxyUser and FileSystem.get(getConfig()) are called, a new FileSystem will be created and a new entry will be added to FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1942) Many of ConverterUtils methods need to have public interfaces
[ https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1942: - Attachment: YARN-1942.1.patch Attached ver.1 for review, also removed {code} - public static MapString, String convertToString( - MapCharSequence, CharSequence env) { {code} It seems so YARN-related, and nobody is using it in Hadoop. Many of ConverterUtils methods need to have public interfaces - Key: YARN-1942 URL: https://issues.apache.org/jira/browse/YARN-1942 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.4.0 Reporter: Thomas Graves Assignee: Wangda Tan Priority: Critical Attachments: YARN-1942.1.patch ConverterUtils has a bunch of functions that are useful to application masters. It should either be made public or we make some of the utilities in it public or we provide other external apis for application masters to use. Note that distributedshell and MR are both using these interfaces. For instance the main use case I see right now is for getting the application attempt id within the appmaster: String containerIdStr = System.getenv(Environment.CONTAINER_ID.name()); ConverterUtils.toContainerId ContainerId containerId = ConverterUtils.toContainerId(containerIdStr); ApplicationAttemptId applicationAttemptId = containerId.getApplicationAttemptId(); I don't see any other way for the application master to get this information. If there is please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2792) Have a public Test-only API for creating important records that ecosystem projects can depend on
[ https://issues.apache.org/jira/browse/YARN-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359119#comment-14359119 ] Wangda Tan commented on YARN-2792: -- Updated priority from blocker to major, this is no longer a blocker after YARN-1942 and YARN-3340. Have a public Test-only API for creating important records that ecosystem projects can depend on Key: YARN-2792 URL: https://issues.apache.org/jira/browse/YARN-2792 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli From YARN-2789, {quote} Sigh. Even though this is a private API, it will be used by downstream projects for testing. It'll be useful for this to be re-instated, maybe with a deprecated annotation, so that older versions of downstream projects can build against Hadoop 2.6. I am inclined to have a separate test-only public util API that keeps compatibility for tests. Rather than opening unwanted APIs up. I'll file a separate ticket for this, we need all YARN apps/frameworks to move to that API instead of these private unstable APIs. For now, I am okay keeping a private compat for the APIs changed in YARN-2698. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359158#comment-14359158 ] Hadoop QA commented on YARN-3267: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704176/YARN-3267.4.patch against trunk revision ff83ae7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice: org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator org.apache.hadoop.mapred.TestClusterMRNotification The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice: org.apache.hadoop.mapred.TestClusterMapReduceTestCase Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6940//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6940//console This message is automatically generated. Timelineserver applies the ACL rules after applying the limit on the number of records -- Key: YARN-3267 URL: https://issues.apache.org/jira/browse/YARN-3267 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Prakash Ramachandran Assignee: Chang Li Attachments: YARN-3267.3.patch, YARN-3267.4.patch, YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch While fetching the entities from timelineserver, the limit is applied on the entities to be fetched from leveldb, the ACL filters are applied after this (TimelineDataManager.java::getEntities). this could mean that even if there are entities available which match the query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359174#comment-14359174 ] Vinod Kumar Vavilapalli commented on YARN-3034: --- How about we keep the same config {{system-metrics-publisher.enabled}} for enabling this functionality but a new config which lets us chose the version of YTS? [Aggregator wireup] Implement RM starting its ATS writer Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3341) Fix findbugs warning:BC_UNCONFIRMED_CAST at FSSchedulerNode.reserveResource
[ https://issues.apache.org/jira/browse/YARN-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3341: Labels: findbugs (was: ) Fix findbugs warning:BC_UNCONFIRMED_CAST at FSSchedulerNode.reserveResource --- Key: YARN-3341 URL: https://issues.apache.org/jira/browse/YARN-3341 Project: Hadoop YARN Issue Type: Improvement Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Labels: findbugs Fix findbugs warning:BC_UNCONFIRMED_CAST at FSSchedulerNode.reserveResource The warning message is {code} Unchecked/unconfirmed cast from org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode.reserveResource(SchedulerApplicationAttempt, Priority, RMContainer) {code} The code which cause the warning is {code} this.reservedAppSchedulable = (FSAppAttempt) application; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3336: Attachment: YARN-3336.001.patch FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3336.000.patch, YARN-3336.001.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. The calling sequence is FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf) {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } {code} UserGroupInformation.equals will compare subject by reference. {code} public boolean equals(Object o) { if (o == this) { return true; } else if (o == null || getClass() != o.getClass()) { return false; } else { return subject == ((UserGroupInformation) o).subject; } } {code} So in this case, every time createProxyUser and FileSystem.get(getConfig()) are called, a new FileSystem will be created and a new entry will be added to FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3340) Mark setters to be @Public for ApplicationId/ApplicationAttemptId/ContainerId.
[ https://issues.apache.org/jira/browse/YARN-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3340: - Attachment: YARN-3340.1.patch Attached ver.1 patch for review. Mark setters to be @Public for ApplicationId/ApplicationAttemptId/ContainerId. -- Key: YARN-3340 URL: https://issues.apache.org/jira/browse/YARN-3340 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3340.1.patch Currently, setters of ApplicaitonId/ApplicationAttemptId/ContainerId are all private, that's not correct -- user's applications need to set such ids to do query status / submit application, etc. We need mark such setters to be public avoiding downstream applications encounters compilation error when changes made on such setters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359163#comment-14359163 ] Junping Du commented on YARN-3039: -- Thanks [~sjlee0] for review and many good comments! I addressed most of them in v5 patch and will post it soon. bq. I see some methods are marked as Stable (e.g. AggregatorNodemanagerProtocol), but I feel that’s bit premature. Can we mark these still unstable or evolving? Note that at least at this point even the class names can change. Nice catch! Let's mark it with evolving instead. bq. While we’re on the API annotations, I notice that the annotations are on methods rather than classes themselves. I usually set them on the classes with the understanding that the entire class is unstable, for example. Which is a more common practice? We could have an annotation in class level which is default publicity and stability for each method. However, each method could have its own annotation to override the class one. In most cases, the class level annotation is more public and stable than individual methods which is first-class contract with end users or other components (or they will have concerns to use it). Take an example, if we need to add a new API which is not stable yet to a protocol class marked with stable, we shouldn't regression the whole class from stable to evolving or unstable, but we can mark the new method as unstable or evolving. Make sense? bq. I feel that the default for NM_AGGREGATOR_SERVICE_THREAD_COUNT doesn't have to be as high as 20 as the traffic will be pretty low; 5? 5 sounds good. Will update. bq. Since createTimelineClient(ApplicationId) was introduced only on this branch, we should be able to just replace it instead of adding a new deprecated method, no? I didn't check the history when we are adding createTimelineClient(ApplicationId). If so, we can remove it. For other createTimelineClient(), we can mark them as deprecated if it is showed up in previous releases because newInstance() is a more standard way of doing factory work. bq. putObjects() Not sure if I understand the comment “timelineServiceAddress couldn’t have been initialized”; can’t putObjects() be called in a steady state? If so, shouldn’t we check if timelineServiceAddress is not null before proceeding to loop and wait for the value to come in? Otherwise, we would introduce a 1 second latency in every put call even in a steady state? In steady state, there is no initialized delay becuase timelineServiceAddress is already there (in timelineClient). The cost here only happens for the first event when timelineClient start to post or after timelineServiceAddress get updated (for failure or other reasons). We design this to make sure TimelineClient can handle service discovery itself rather than letting caller to figure it out. bq. maxRetries - retries might be a better variable name? Updated. bq. It might be good to create a small helper method for polling for the timelineServiceAddress value You are right that we can always abstract some helper method to make logic more consisely. Update it with pollTimelineServiceAddress() method. bq. Not sure if we need a while loop for needRetry; it either succeeds (at which point needRetry becomes false and you exit normally) or it doesn’t (in which case we go into the exception handling and we try only once to get the value). Basically I’m not sure whether this retry code is what you meant to do? That's actually a bug here that try-catch should be inside while loop, so we can tolerant post failure for some retries times (the address could be stale and being updated by RM) within the while loop. Thanks for identifying this. bq. I think it may be enough to make timelineServiceAddress volatile instead of making getter/setter synchronized. Agree. Given only one thread could update today, so volatile should be safe enough. bq. doPostingObject() has duplicate code with putObjects(); can we consider ways to eliminate code duplication? I know it calls different methods deep inside the implementation, but there should be a way to reduce code duplication. Agree. In new patch (v5), I will abstract all common logic in some helper methods. bq. typo: ILLEAGAL_NUMBER_MESSAGE - ILLEGAL_NUMBER_MESSAGE, Context.java, We might want to update the comment (or method names) a little bit. NodeManager.java, removeRegisteredAggregators() - removeRegisteredAggregator() (should be singular) Updated for these comments. bq. We need removeKnownAggregator() as well; otherwise we’d blow the NM memory. Perhaps what we need is more like setKnownAggregators() instead of add? Essentially NM would need to replace its knowledge of the known aggregators every time RM updates it via heartbeat, right? I was thinking some shortcut for registeredAggegators goes to knownAggregators directly so some local aggregators don't
[jira] [Updated] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3039: - Attachment: YARN-3039-v5.patch Update v5 patch with incorporating [~sjlee0]'s comments. [Aggregator wireup] Implement ATS app-appgregator service discovery --- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359656#comment-14359656 ] zhihai xu commented on YARN-3336: - Hi [~cnauroth], I implemented a unit test in the new patch YARN-3336.002.patch. Without the fix, you can see the test failure with the following message: {code} java.lang.AssertionError: expected:1 but was:4 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer.testFSLeakInObtainSystemTokensForUser(TestDelegationTokenRenewer.java:1041) {code} I called obtainSystemTokensForUser three times, without the fix, the FileSystem Cache Size will increase to 4 from 1. The unit test proved this memory leak issue. FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3336.000.patch, YARN-3336.001.patch, YARN-3336.002.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. The calling sequence is FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf) {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } {code} UserGroupInformation.equals will compare subject by reference. {code} public boolean equals(Object o) { if (o == this) { return true; } else if (o == null || getClass() != o.getClass()) { return false; } else { return subject == ((UserGroupInformation) o).subject; } } {code} So in this case, every time createProxyUser and FileSystem.get(getConfig()) are called, a new FileSystem will be created and a new entry will be added to FileSystem.CACHE. -- This message was sent by Atlassian JIRA
[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359662#comment-14359662 ] Jonathan Eagles commented on YARN-3267: --- +1. this patch is a nice addition, [~lichangleo]. Will commit to trunk, branch-2, and branch-2.7 in a few hours. Timelineserver applies the ACL rules after applying the limit on the number of records -- Key: YARN-3267 URL: https://issues.apache.org/jira/browse/YARN-3267 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Prakash Ramachandran Assignee: Chang Li Attachments: YARN-3267.3.patch, YARN-3267.4.patch, YARN-3267.5.patch, YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch While fetching the entities from timelineserver, the limit is applied on the entities to be fetched from leveldb, the ACL filters are applied after this (TimelineDataManager.java::getEntities). this could mean that even if there are entities available which match the query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3346) Deadlock in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan resolved YARN-3346. -- Resolution: Implemented Fix Version/s: 2.6.1 2.7.0 This issue is already resolved: YARN-3251 for 2.6.1 fix, and YARN-3265 for 2.7.0 fix. Deadlock in Capacity Scheduler -- Key: YARN-3346 URL: https://issues.apache.org/jira/browse/YARN-3346 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Suma Shivaprasad Fix For: 2.7.0, 2.6.1 Attachments: rm.deadlock_jstack {noformat} Found one Java-level deadlock: = 2144051991@qtp-383501499-6: waiting to lock monitor 0x7fa700eec8e8 (object 0x0004589fec18, a org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp), which is held by ResourceManager Event Processor ResourceManager Event Processor: waiting to lock monitor 0x7fa700aadf88 (object 0x000441c05ec8, a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue), which is held by IPC Server handler 0 on 54311 IPC Server handler 0 on 54311: waiting to lock monitor 0x7fa700e20798 (object 0x000441d867f8, a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue), which is held by ResourceManager Event Processor {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1453) [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments
[ https://issues.apache.org/jira/browse/YARN-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359702#comment-14359702 ] Kai Zheng commented on YARN-1453: - Hello [~ajisakaa], We need to look at the JDK8 support. I pinged [~apurtell] offline, and was told I can work on it. I'm wondering if you have already started to work on this, if not I'd like to help. Thanks ! [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments - Key: YARN-1453 URL: https://issues.apache.org/jira/browse/YARN-1453 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Attachments: 1453-branch-2.patch, 1453-branch-2.patch, 1453-trunk.patch, 1453-trunk.patch, YARN-1453-02.patch Javadoc is more strict by default in JDK8 and will error out on malformed or illegal tags found in doc comments. Although tagged as JDK8 all of the required changes are generic Javadoc cleanups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)