[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA
[ https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377403#comment-14377403 ] Tsuyoshi Ozawa commented on YARN-1880: -- [~qwertymaniac] [~ajisakaa] thank you for the review! > Cleanup TestApplicationClientProtocolOnHA > - > > Key: YARN-1880 > URL: https://issues.apache.org/jira/browse/YARN-1880 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 2.6.0 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa >Priority: Trivial > Fix For: 2.8.0 > > Attachments: YARN-1880.1.patch > > > The tests introduced on YARN-1521 includes multiple assertion with &&. We > should separate them because it's difficult to identify which condition is > illegal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA
[ https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377382#comment-14377382 ] Hudson commented on YARN-1880: -- FAILURE: Integrated in Hadoop-trunk-Commit #7413 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7413/]) YARN-1880. Cleanup TestApplicationClientProtocolOnHA. Contributed by ozawa. (harsh: rev fbceb3b41834d6899c4353fb24f12ba3ecf67faf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationClientProtocolOnHA.java * hadoop-yarn-project/CHANGES.txt > Cleanup TestApplicationClientProtocolOnHA > - > > Key: YARN-1880 > URL: https://issues.apache.org/jira/browse/YARN-1880 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 2.6.0 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa >Priority: Trivial > Fix For: 2.8.0 > > Attachments: YARN-1880.1.patch > > > The tests introduced on YARN-1521 includes multiple assertion with &&. We > should separate them because it's difficult to identify which condition is > illegal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA
[ https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-1880: -- Component/s: test > Cleanup TestApplicationClientProtocolOnHA > - > > Key: YARN-1880 > URL: https://issues.apache.org/jira/browse/YARN-1880 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 2.6.0 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa >Priority: Trivial > Fix For: 2.8.0 > > Attachments: YARN-1880.1.patch > > > The tests introduced on YARN-1521 includes multiple assertion with &&. We > should separate them because it's difficult to identify which condition is > illegal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA
[ https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-1880: -- Affects Version/s: 2.6.0 > Cleanup TestApplicationClientProtocolOnHA > - > > Key: YARN-1880 > URL: https://issues.apache.org/jira/browse/YARN-1880 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 2.6.0 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa >Priority: Trivial > Fix For: 2.8.0 > > Attachments: YARN-1880.1.patch > > > The tests introduced on YARN-1521 includes multiple assertion with &&. We > should separate them because it's difficult to identify which condition is > illegal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA
[ https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377378#comment-14377378 ] Harsh J commented on YARN-1880: --- +1, this still applies. Committing shortly, thanks [~ozawa] (and [~ajisakaa] for the earlier review)! > Cleanup TestApplicationClientProtocolOnHA > - > > Key: YARN-1880 > URL: https://issues.apache.org/jira/browse/YARN-1880 > Project: Hadoop YARN > Issue Type: Test >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa >Priority: Trivial > Attachments: YARN-1880.1.patch > > > The tests introduced on YARN-1521 includes multiple assertion with &&. We > should separate them because it's difficult to identify which condition is > illegal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377372#comment-14377372 ] Devaraj K commented on YARN-3225: - {code:xml} org.apache.hadoop.yarn.server.resourcemanager.TestRM {code} This test failure is not related to the patch. > New parameter or CLI for decommissioning node gracefully in RMAdmin CLI > --- > > Key: YARN-3225 > URL: https://issues.apache.org/jira/browse/YARN-3225 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Devaraj K > Attachments: YARN-3225-1.patch, YARN-3225.patch, YARN-914.patch > > > New CLI (or existing CLI with parameters) should put each node on > decommission list to decommissioning status and track timeout to terminate > the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377356#comment-14377356 ] Hadoop QA commented on YARN-2495: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706826/YARN-2495.20150324-1.patch against trunk revision 9fae455. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7088//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7088//console This message is automatically generated. > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, > YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, > YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, > YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, > YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, > YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, > YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or > using script suggested by [~aw] (YARN-2729) ) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3394) WebApplication proxy documentation is incomplete
[ https://issues.apache.org/jira/browse/YARN-3394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377320#comment-14377320 ] Tsuyoshi Ozawa commented on YARN-3394: -- +1 for having the document. > WebApplication proxy documentation is incomplete > - > > Key: YARN-3394 > URL: https://issues.apache.org/jira/browse/YARN-3394 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Bibin A Chundatt >Assignee: Naganarasimha G R >Priority: Minor > > Webproxy documentation is incomplete > hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html > 1.Configuration of service start/stop as separate server > 2.Steps to start as daemon service > 3.Secure mode for Web proxy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3394) WebApplication proxy documentation is incomplete
Bibin A Chundatt created YARN-3394: -- Summary: WebApplication proxy documentation is incomplete Key: YARN-3394 URL: https://issues.apache.org/jira/browse/YARN-3394 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Naganarasimha G R Priority: Minor Webproxy documentation is incomplete hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html 1.Configuration of service start/stop as separate server 2.Steps to start as daemon service 3.Secure mode for Web proxy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3347) Improve YARN log command to get AMContainer logs
[ https://issues.apache.org/jira/browse/YARN-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377265#comment-14377265 ] Hadoop QA commented on YARN-3347: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706818/YARN-3347.2.rebase.patch against trunk revision 9fae455. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7087//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7087//console This message is automatically generated. > Improve YARN log command to get AMContainer logs > > > Key: YARN-3347 > URL: https://issues.apache.org/jira/browse/YARN-3347 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3347.1.patch, YARN-3347.1.rebase.patch, > YARN-3347.2.patch, YARN-3347.2.rebase.patch > > > Right now, we could specify applicationId, node http address and container ID > to get the specify container log. Or we could only specify applicationId to > get all the container logs. It is very hard for the users to get logs for AM > container since the AMContainer logs have more useful information. Users need > to know the AMContainer's container ID and related Node http address. > We could improve the YARN Log Command to allow users to get AMContainer logs > directly -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377258#comment-14377258 ] zhihai xu commented on YARN-3336: - [~cnauroth], Not a problem, thanks for the notification. > FileSystem memory leak in DelegationTokenRenewer > > > Key: YARN-3336 > URL: https://issues.apache.org/jira/browse/YARN-3336 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.7.0 > > Attachments: YARN-3336.000.patch, YARN-3336.001.patch, > YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch > > > FileSystem memory leak in DelegationTokenRenewer. > Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new > FileSystem entry will be added to FileSystem#CACHE which will never be > garbage collected. > This is the implementation of obtainSystemTokensForUser: > {code} > protected Token[] obtainSystemTokensForUser(String user, > final Credentials credentials) throws IOException, InterruptedException > { > // Get new hdfs tokens on behalf of this user > UserGroupInformation proxyUser = > UserGroupInformation.createProxyUser(user, > UserGroupInformation.getLoginUser()); > Token[] newTokens = > proxyUser.doAs(new PrivilegedExceptionAction[]>() { > @Override > public Token[] run() throws Exception { > return FileSystem.get(getConfig()).addDelegationTokens( > UserGroupInformation.getLoginUser().getUserName(), credentials); > } > }); > return newTokens; > } > {code} > The memory leak happened when FileSystem.get(getConfig()) is called with a > new proxy user. > Because createProxyUser will always create a new Subject. > The calling sequence is > FileSystem.get(getConfig())=>FileSystem.get(getDefaultUri(conf), > conf)=>FileSystem.CACHE.get(uri, conf)=>FileSystem.CACHE.getInternal(uri, > conf, key)=>FileSystem.CACHE.map.get(key)=>createFileSystem(uri, conf) > {code} > public static UserGroupInformation createProxyUser(String user, > UserGroupInformation realUser) { > if (user == null || user.isEmpty()) { > throw new IllegalArgumentException("Null user"); > } > if (realUser == null) { > throw new IllegalArgumentException("Null real user"); > } > Subject subject = new Subject(); > Set principals = subject.getPrincipals(); > principals.add(new User(user)); > principals.add(new RealUser(realUser)); > UserGroupInformation result =new UserGroupInformation(subject); > result.setAuthenticationMethod(AuthenticationMethod.PROXY); > return result; > } > {code} > FileSystem#Cache#Key.equals will compare the ugi > {code} > Key(URI uri, Configuration conf, long unique) throws IOException { > scheme = uri.getScheme()==null?"":uri.getScheme().toLowerCase(); > authority = > uri.getAuthority()==null?"":uri.getAuthority().toLowerCase(); > this.unique = unique; > this.ugi = UserGroupInformation.getCurrentUser(); > } > public boolean equals(Object obj) { > if (obj == this) { > return true; > } > if (obj != null && obj instanceof Key) { > Key that = (Key)obj; > return isEqual(this.scheme, that.scheme) > && isEqual(this.authority, that.authority) > && isEqual(this.ugi, that.ugi) > && (this.unique == that.unique); > } > return false; > } > {code} > UserGroupInformation.equals will compare subject by reference. > {code} > public boolean equals(Object o) { > if (o == this) { > return true; > } else if (o == null || getClass() != o.getClass()) { > return false; > } else { > return subject == ((UserGroupInformation) o).subject; > } > } > {code} > So in this case, every time createProxyUser and FileSystem.get(getConfig()) > are called, a new FileSystem will be created and a new entry will be added to > FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377252#comment-14377252 ] Naganarasimha G R commented on YARN-2495: - Hi [~leftnoteasy], Oops mistake from my side, Have uploaded the patch with correction, Please check. > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, > YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, > YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, > YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, > YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, > YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, > YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or > using script suggested by [~aw] (YARN-2729) ) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2495: Attachment: YARN-2495.20150324-1.patch > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, > YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, > YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, > YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, > YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, > YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, > YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or > using script suggested by [~aw] (YARN-2729) ) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt
[ https://issues.apache.org/jira/browse/YARN-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377216#comment-14377216 ] Hudson commented on YARN-3393: -- FAILURE: Integrated in Hadoop-trunk-Commit #7409 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7409/]) YARN-3393. Getting application(s) goes wrong when app finishes before (xgong: rev 9fae455e26e0230107e1c6db58a49a5b6b296cf4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java > Getting application(s) goes wrong when app finishes before starting the > attempt > --- > > Key: YARN-3393 > URL: https://issues.apache.org/jira/browse/YARN-3393 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Critical > Fix For: 2.7.0 > > Attachments: YARN-3393.1.patch > > > When generating app report in ApplicationHistoryManagerOnTimelineStore, it > checks if appAttempt == null. > {code} > ApplicationAttemptReport appAttempt = > getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId()); > if (appAttempt != null) { > app.appReport.setHost(appAttempt.getHost()); > app.appReport.setRpcPort(appAttempt.getRpcPort()); > app.appReport.setTrackingUrl(appAttempt.getTrackingUrl()); > > app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl()); > } > {code} > However, {{getApplicationAttempt}} doesn't return null but throws > ApplicationAttemptNotFoundException: > {code} > if (entity == null) { > throw new ApplicationAttemptNotFoundException( > "The entity for application attempt " + appAttemptId + > " doesn't exist in the timeline store"); > } else { > return convertToApplicationAttemptReport(entity); > } > {code} > They code isn't coupled well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3347) Improve YARN log command to get AMContainer logs
[ https://issues.apache.org/jira/browse/YARN-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3347: Attachment: YARN-3347.2.rebase.patch > Improve YARN log command to get AMContainer logs > > > Key: YARN-3347 > URL: https://issues.apache.org/jira/browse/YARN-3347 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3347.1.patch, YARN-3347.1.rebase.patch, > YARN-3347.2.patch, YARN-3347.2.rebase.patch > > > Right now, we could specify applicationId, node http address and container ID > to get the specify container log. Or we could only specify applicationId to > get all the container logs. It is very hard for the users to get logs for AM > container since the AMContainer logs have more useful information. Users need > to know the AMContainer's container ID and related Node http address. > We could improve the YARN Log Command to allow users to get AMContainer logs > directly -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt
[ https://issues.apache.org/jira/browse/YARN-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377211#comment-14377211 ] Xuan Gong commented on YARN-3393: - Committed into trunk/branch-2/branch-2.7. Thanks, zhijie. > Getting application(s) goes wrong when app finishes before starting the > attempt > --- > > Key: YARN-3393 > URL: https://issues.apache.org/jira/browse/YARN-3393 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Critical > Fix For: 2.7.0 > > Attachments: YARN-3393.1.patch > > > When generating app report in ApplicationHistoryManagerOnTimelineStore, it > checks if appAttempt == null. > {code} > ApplicationAttemptReport appAttempt = > getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId()); > if (appAttempt != null) { > app.appReport.setHost(appAttempt.getHost()); > app.appReport.setRpcPort(appAttempt.getRpcPort()); > app.appReport.setTrackingUrl(appAttempt.getTrackingUrl()); > > app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl()); > } > {code} > However, {{getApplicationAttempt}} doesn't return null but throws > ApplicationAttemptNotFoundException: > {code} > if (entity == null) { > throw new ApplicationAttemptNotFoundException( > "The entity for application attempt " + appAttemptId + > " doesn't exist in the timeline store"); > } else { > return convertToApplicationAttemptReport(entity); > } > {code} > They code isn't coupled well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt
[ https://issues.apache.org/jira/browse/YARN-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377204#comment-14377204 ] Xuan Gong commented on YARN-3393: - +1 LGTM. Will commit > Getting application(s) goes wrong when app finishes before starting the > attempt > --- > > Key: YARN-3393 > URL: https://issues.apache.org/jira/browse/YARN-3393 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Critical > Attachments: YARN-3393.1.patch > > > When generating app report in ApplicationHistoryManagerOnTimelineStore, it > checks if appAttempt == null. > {code} > ApplicationAttemptReport appAttempt = > getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId()); > if (appAttempt != null) { > app.appReport.setHost(appAttempt.getHost()); > app.appReport.setRpcPort(appAttempt.getRpcPort()); > app.appReport.setTrackingUrl(appAttempt.getTrackingUrl()); > > app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl()); > } > {code} > However, {{getApplicationAttempt}} doesn't return null but throws > ApplicationAttemptNotFoundException: > {code} > if (entity == null) { > throw new ApplicationAttemptNotFoundException( > "The entity for application attempt " + appAttemptId + > " doesn't exist in the timeline store"); > } else { > return convertToApplicationAttemptReport(entity); > } > {code} > They code isn't coupled well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377201#comment-14377201 ] Hadoop QA commented on YARN-3021: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706795/YARN-3021.006.patch against trunk revision 2c238ae. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7085//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7085//console This message is automatically generated. > YARN's delegation-token handling disallows certain trust setups to operate > properly over DistCp > --- > > Key: YARN-3021 > URL: https://issues.apache.org/jira/browse/YARN-3021 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.3.0 >Reporter: Harsh J >Assignee: Yongjun Zhang > Attachments: YARN-3021.001.patch, YARN-3021.002.patch, > YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, > YARN-3021.006.patch, YARN-3021.patch > > > Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, > and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN > clusters. > Now if one logs in with a COMMON credential, and runs a job on A's YARN that > needs to access B's HDFS (such as a DistCp), the operation fails in the RM, > as it attempts a renewDelegationToken(…) synchronously during application > submission (to validate the managed token before it adds it to a scheduler > for automatic renewal). The call obviously fails cause B realm will not trust > A's credentials (here, the RM's principal is the renewer). > In the 1.x JobTracker the same call is present, but it is done asynchronously > and once the renewal attempt failed we simply ceased to schedule any further > attempts of renewals, rather than fail the job immediately. > We should change the logic such that we attempt the renewal but go easy on > the failure and skip the scheduling alone, rather than bubble back an error > to the client, failing the app submission. This way the old behaviour is > retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377196#comment-14377196 ] Naganarasimha G R commented on YARN-3034: - Hi [~zjshen], bq. According to this comments, it seems that you want to create a separate stack to put entities into RMTimelineCollector, right? If so, the current design makes sense. yes I wanted to create a separate stack similar to SystemMetricsPublisher, so that ATS V1 and V2 are less coupled and removal of SMP once completely deprecated is smoother bq. yarn.resourcemanager.system-metrics-publisher.enabled for v1 SystemMetricsPublisher. For v2, both RM and NM reads yarn.system-metrics-publisher.enabled? No need to have v1/v2 flag? On after thoughts i feel this approach is better as once we deprecate SMP then there will be unnecessary additional configuration of which version type, which would not be required. If all are fine then will move back to the approach as mentioned by Zhijie. > [Collector wireup] Implement RM starting its timeline collector > --- > > Key: YARN-3034 > URL: https://issues.apache.org/jira/browse/YARN-3034 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, > YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, > YARN-3034.20150320-1.patch > > > Per design in YARN-2928, implement resource managers starting their own ATS > writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3244) Add user specified information for clean-up container in ApplicationSubmissionContext
[ https://issues.apache.org/jira/browse/YARN-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377175#comment-14377175 ] Hadoop QA commented on YARN-3244: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706802/YARN-3244.2.patch against trunk revision 2c238ae. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7086//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7086//console This message is automatically generated. > Add user specified information for clean-up container in > ApplicationSubmissionContext > - > > Key: YARN-3244 > URL: https://issues.apache.org/jira/browse/YARN-3244 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3244.1.patch, YARN-3244.2.patch > > > To launch user-specified clean up container, users need to provide proper > informations to YARN. > It should at least have following properties: > * A flag to indicate whether needs to launch the clean-up container > * A time-out period to indicate how long the clean-up container can run > * maxRetry times > * containerLaunchContext for clean-up container -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt
[ https://issues.apache.org/jira/browse/YARN-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377133#comment-14377133 ] Hadoop QA commented on YARN-3393: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706792/YARN-3393.1.patch against trunk revision 2c238ae. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7084//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7084//console This message is automatically generated. > Getting application(s) goes wrong when app finishes before starting the > attempt > --- > > Key: YARN-3393 > URL: https://issues.apache.org/jira/browse/YARN-3393 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Critical > Attachments: YARN-3393.1.patch > > > When generating app report in ApplicationHistoryManagerOnTimelineStore, it > checks if appAttempt == null. > {code} > ApplicationAttemptReport appAttempt = > getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId()); > if (appAttempt != null) { > app.appReport.setHost(appAttempt.getHost()); > app.appReport.setRpcPort(appAttempt.getRpcPort()); > app.appReport.setTrackingUrl(appAttempt.getTrackingUrl()); > > app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl()); > } > {code} > However, {{getApplicationAttempt}} doesn't return null but throws > ApplicationAttemptNotFoundException: > {code} > if (entity == null) { > throw new ApplicationAttemptNotFoundException( > "The entity for application attempt " + appAttemptId + > " doesn't exist in the timeline store"); > } else { > return convertToApplicationAttemptReport(entity); > } > {code} > They code isn't coupled well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3244) Add user specified information for clean-up container in ApplicationSubmissionContext
[ https://issues.apache.org/jira/browse/YARN-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377098#comment-14377098 ] Xuan Gong commented on YARN-3244: - Created a new object named CleanupContainer which includes launch-context for cleanup container and maxCleanupContainerAttempts. Also, add two global yarn configurations: RM_CLEAN_UP_CONTAINER_TIMEOUT_MS and RM_CLEAN_UP_CONTAINER_MAX_ATTEMPTS > Add user specified information for clean-up container in > ApplicationSubmissionContext > - > > Key: YARN-3244 > URL: https://issues.apache.org/jira/browse/YARN-3244 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3244.1.patch, YARN-3244.2.patch > > > To launch user-specified clean up container, users need to provide proper > informations to YARN. > It should at least have following properties: > * A flag to indicate whether needs to launch the clean-up container > * A time-out period to indicate how long the clean-up container can run > * maxRetry times > * containerLaunchContext for clean-up container -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3244) Add user specified information for clean-up container in ApplicationSubmissionContext
[ https://issues.apache.org/jira/browse/YARN-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3244: Attachment: YARN-3244.2.patch Address all the latest comments > Add user specified information for clean-up container in > ApplicationSubmissionContext > - > > Key: YARN-3244 > URL: https://issues.apache.org/jira/browse/YARN-3244 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3244.1.patch, YARN-3244.2.patch > > > To launch user-specified clean up container, users need to provide proper > informations to YARN. > It should at least have following properties: > * A flag to indicate whether needs to launch the clean-up container > * A time-out period to indicate how long the clean-up container can run > * maxRetry times > * containerLaunchContext for clean-up container -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated YARN-3021: Attachment: YARN-3021.006.patch The test failure seems to be unrelated, upload same patch 06 to trigger another jenkins run. > YARN's delegation-token handling disallows certain trust setups to operate > properly over DistCp > --- > > Key: YARN-3021 > URL: https://issues.apache.org/jira/browse/YARN-3021 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.3.0 >Reporter: Harsh J >Assignee: Yongjun Zhang > Attachments: YARN-3021.001.patch, YARN-3021.002.patch, > YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, > YARN-3021.006.patch, YARN-3021.patch > > > Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, > and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN > clusters. > Now if one logs in with a COMMON credential, and runs a job on A's YARN that > needs to access B's HDFS (such as a DistCp), the operation fails in the RM, > as it attempts a renewDelegationToken(…) synchronously during application > submission (to validate the managed token before it adds it to a scheduler > for automatic renewal). The call obviously fails cause B realm will not trust > A's credentials (here, the RM's principal is the renewer). > In the 1.x JobTracker the same call is present, but it is done asynchronously > and once the renewal attempt failed we simply ceased to schedule any further > attempts of renewals, rather than fail the job immediately. > We should change the logic such that we attempt the renewal but go easy on > the failure and skip the scheduling alone, rather than bubble back an error > to the client, failing the app submission. This way the old behaviour is > retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated YARN-3021: Attachment: (was: YARN-3021.006.patch) > YARN's delegation-token handling disallows certain trust setups to operate > properly over DistCp > --- > > Key: YARN-3021 > URL: https://issues.apache.org/jira/browse/YARN-3021 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.3.0 >Reporter: Harsh J >Assignee: Yongjun Zhang > Attachments: YARN-3021.001.patch, YARN-3021.002.patch, > YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, > YARN-3021.006.patch, YARN-3021.patch > > > Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, > and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN > clusters. > Now if one logs in with a COMMON credential, and runs a job on A's YARN that > needs to access B's HDFS (such as a DistCp), the operation fails in the RM, > as it attempts a renewDelegationToken(…) synchronously during application > submission (to validate the managed token before it adds it to a scheduler > for automatic renewal). The call obviously fails cause B realm will not trust > A's credentials (here, the RM's principal is the renewer). > In the 1.x JobTracker the same call is present, but it is done asynchronously > and once the renewal attempt failed we simply ceased to schedule any further > attempts of renewals, rather than fail the job immediately. > We should change the logic such that we attempt the renewal but go easy on > the failure and skip the scheduling alone, rather than bubble back an error > to the client, failing the app submission. This way the old behaviour is > retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt
[ https://issues.apache.org/jira/browse/YARN-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3393: -- Attachment: YARN-3393.1.patch Create the patch to fix the problem > Getting application(s) goes wrong when app finishes before starting the > attempt > --- > > Key: YARN-3393 > URL: https://issues.apache.org/jira/browse/YARN-3393 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Critical > Attachments: YARN-3393.1.patch > > > When generating app report in ApplicationHistoryManagerOnTimelineStore, it > checks if appAttempt == null. > {code} > ApplicationAttemptReport appAttempt = > getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId()); > if (appAttempt != null) { > app.appReport.setHost(appAttempt.getHost()); > app.appReport.setRpcPort(appAttempt.getRpcPort()); > app.appReport.setTrackingUrl(appAttempt.getTrackingUrl()); > > app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl()); > } > {code} > However, {{getApplicationAttempt}} doesn't return null but throws > ApplicationAttemptNotFoundException: > {code} > if (entity == null) { > throw new ApplicationAttemptNotFoundException( > "The entity for application attempt " + appAttemptId + > " doesn't exist in the timeline store"); > } else { > return convertToApplicationAttemptReport(entity); > } > {code} > They code isn't coupled well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2901) Add errors and warning stats to RM, NM web UI
[ https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377040#comment-14377040 ] Wangda Tan commented on YARN-2901: -- Hi [~vvasudev], I spent some time take a look at Log4JMetricsAppeneder implementation (will include other modified component in next round). 1) Log4jMetricsAppender, 1.1 Better to place in yarn-server-common? 1.2 If you agree above, how about put into package o.a.h.y.server.metrics (or utils)? 1.3 Rename it to Log4jWarnErrorMetricsAppender? 1.4 Comments about implementation: I think currently, implementation of cleanup can be improved, now cutoff process of message/count is basically loop all items stored, which could be inefficient (imaging if number of stored message > threshold), existing logics in the patch would lead to lots of potential stored message (tons of messages could be genereated in 5 min, which is purge message task run interval). If you can make the data structure to be: SortedMap> errors (and warnings), the outside map is sorted by value (SortedMap with smallest timestamp goes first), and inside map is sorted by key (smallest timestamp goes first), purge can happen when we add any event, it will just take at most log(N=500) time to do the purge, and no extra timer task needed. To make SortedMap can sort by value, one way to do that can refer to http://stackoverflow.com/questions/109383/how-to-sort-a-mapkey-value-on-the-values-in-java (first answer). Here, value = SortedMap>, we can sort the SortedMaps according to smallest key in each SortedMap. And one corner case may need to consider is, it is possible a same message can have lots of different timestamps, so we need purge the inner SortedMap too. To make better code readability, you can wrap the SortedMap to a inner class like MessageInfo. > Add errors and warning stats to RM, NM web UI > - > > Key: YARN-2901 > URL: https://issues.apache.org/jira/browse/YARN-2901 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Exception collapsed.png, Exception expanded.jpg, Screen > Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, > apache-yarn-2901.1.patch > > > It would be really useful to have statistics on the number of errors and > warnings in the RM and NM web UI. I'm thinking about - > 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day > 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 > hours/day > By errors and warnings I'm referring to the log level. > I suspect we can probably achieve this by writing a custom appender?(I'm open > to suggestions on alternate mechanisms for implementing this). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3347) Improve YARN log command to get AMContainer logs
[ https://issues.apache.org/jira/browse/YARN-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377031#comment-14377031 ] Hadoop QA commented on YARN-3347: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706770/YARN-3347.2.patch against trunk revision 2c238ae. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7083//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7083//console This message is automatically generated. > Improve YARN log command to get AMContainer logs > > > Key: YARN-3347 > URL: https://issues.apache.org/jira/browse/YARN-3347 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3347.1.patch, YARN-3347.1.rebase.patch, > YARN-3347.2.patch > > > Right now, we could specify applicationId, node http address and container ID > to get the specify container log. Or we could only specify applicationId to > get all the container logs. It is very hard for the users to get logs for AM > container since the AMContainer logs have more useful information. Users need > to know the AMContainer's container ID and related Node http address. > We could improve the YARN Log Command to allow users to get AMContainer logs > directly -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377019#comment-14377019 ] sandflee commented on YARN-3387: yes > container complete message couldn't pass to am if am restarted and rm changed > - > > Key: YARN-3387 > URL: https://issues.apache.org/jira/browse/YARN-3387 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: sandflee >Priority: Critical > > suppose am work preserving and rm ha is enabled. > container complete message is passed to appattemt.justFinishedContainers in > rm。in normal situation,all attempt in one app shares the same > justFinishedContainers, but when rm changed, every attempt has it's own > justFinishedContainers, so in situations below, container complete message > couldn't passed to am: > 1, am restart > 2, rm changes > 3, container launched by first am completes > container complete message will be passed to appAttempt1 not appAttempt2, but > am pull finished containers from appAttempt2 (currentAppAttempt) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377008#comment-14377008 ] Hadoop QA commented on YARN-3304: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706767/YARN-3304-v2.patch against trunk revision 2c238ae. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7082//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7082//console This message is automatically generated. > ResourceCalculatorProcessTree#getCpuUsagePercent default return value is > inconsistent with other getters > > > Key: YARN-3304 > URL: https://issues.apache.org/jira/browse/YARN-3304 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Junping Du >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: YARN-3304-v2.patch, YARN-3304.patch > > > Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for > unavailable case while other resource metrics are return 0 in the same case > which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3347) Improve YARN log command to get AMContainer logs
[ https://issues.apache.org/jira/browse/YARN-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3347: Attachment: YARN-3347.2.patch fix -1 on findBug > Improve YARN log command to get AMContainer logs > > > Key: YARN-3347 > URL: https://issues.apache.org/jira/browse/YARN-3347 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3347.1.patch, YARN-3347.1.rebase.patch, > YARN-3347.2.patch > > > Right now, we could specify applicationId, node http address and container ID > to get the specify container log. Or we could only specify applicationId to > get all the container logs. It is very hard for the users to get logs for AM > container since the AMContainer logs have more useful information. Users need > to know the AMContainer's container ID and related Node http address. > We could improve the YARN Log Command to allow users to get AMContainer logs > directly -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376973#comment-14376973 ] Karthik Kambatla commented on YARN-3304: bq. That's an incompatible change which sounds not necessary for now. In previous releases, we have never called these APIs Public even if they were intended to be sub-classed. In my mind, this is the last opportunity to decide on what the API should do? I think consistent and reasonable return values should be given a higher priority over compatibility. bq. May be we don't have to leverage "-1" in resource usage to distinguish unavailable case? e.g. we can have some boolean value to identify the resource is available or not which sounds more correct than using odd value like Karthik Kambatla mentioned before. I am okay with adding boolean methods to capture unavailability, but that seems a little overboard. Using -1 in the ResourceCalculatorProcessTree is okay by me. My concern was with logging this -1 value in the metrics. In either case, I would like for the container usage metrics to see if the usage is available before logging the same. bq. So I propose to go patch here (after fixing a minor test failure) in 2.7 given this is a blocker and we can fix YARN-3392 later in 2.8. Thoughts? Since it is not too much work or risk, I would prefer we fix both in 2.7. This is solely wearing my Apache hat on. My Cloudera hat doesn't really mind it being in 2.8 vs 2.7. > ResourceCalculatorProcessTree#getCpuUsagePercent default return value is > inconsistent with other getters > > > Key: YARN-3304 > URL: https://issues.apache.org/jira/browse/YARN-3304 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Junping Du >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: YARN-3304-v2.patch, YARN-3304.patch > > > Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for > unavailable case while other resource metrics are return 0 in the same case > which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3304: - Attachment: YARN-3304-v2.patch Update patch to v2 to fix the test failure for 1st patch. > ResourceCalculatorProcessTree#getCpuUsagePercent default return value is > inconsistent with other getters > > > Key: YARN-3304 > URL: https://issues.apache.org/jira/browse/YARN-3304 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Junping Du >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: YARN-3304-v2.patch, YARN-3304.patch > > > Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for > unavailable case while other resource metrics are return 0 in the same case > which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376944#comment-14376944 ] Hadoop QA commented on YARN-3021: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706735/YARN-3021.006.patch against trunk revision 972f1f1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7081//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7081//console This message is automatically generated. > YARN's delegation-token handling disallows certain trust setups to operate > properly over DistCp > --- > > Key: YARN-3021 > URL: https://issues.apache.org/jira/browse/YARN-3021 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.3.0 >Reporter: Harsh J >Assignee: Yongjun Zhang > Attachments: YARN-3021.001.patch, YARN-3021.002.patch, > YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, > YARN-3021.006.patch, YARN-3021.patch > > > Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, > and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN > clusters. > Now if one logs in with a COMMON credential, and runs a job on A's YARN that > needs to access B's HDFS (such as a DistCp), the operation fails in the RM, > as it attempts a renewDelegationToken(…) synchronously during application > submission (to validate the managed token before it adds it to a scheduler > for automatic renewal). The call obviously fails cause B realm will not trust > A's credentials (here, the RM's principal is the renewer). > In the 1.x JobTracker the same call is present, but it is done asynchronously > and once the renewal attempt failed we simply ceased to schedule any further > attempts of renewals, rather than fail the job immediately. > We should change the logic such that we attempt the renewal but go easy on > the failure and skip the scheduling alone, rather than bubble back an error > to the client, failing the app submission. This way the old behaviour is > retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376920#comment-14376920 ] Junping Du commented on YARN-3304: -- Thanks [~adhoot] for comments! bq. If we use a default of zero we cannot distinguish when its unavailable versus zero usage. That will make the future "track the improvement to handle unavailable case later" near impossible to do. May be we don't have to leverage "-1" in resource usage to distinguish unavailable case? e.g. we can have some boolean value to identify the resource is available or not which sounds more correct than using odd value like [~ka...@cloudera.com] mentioned before. bq. I propose we make all the defaults consistently -1. That's an incompatible change which sounds not necessary for now. bq. I can fix the metrics as well to use this to implement tracking unavailable case. Opened YARN-3392 for that. Agree that we should have some fix on metrics side later. But even that, with changed all default values to -1, it is still a incompatible behavior with old released version. So I propose to go patch here (after fixing a minor test failure) in 2.7 given this is a blocker and we can fix YARN-3392 later in 2.8. Thoughts? > ResourceCalculatorProcessTree#getCpuUsagePercent default return value is > inconsistent with other getters > > > Key: YARN-3304 > URL: https://issues.apache.org/jira/browse/YARN-3304 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Junping Du >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: YARN-3304.patch > > > Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for > unavailable case while other resource metrics are return 0 in the same case > which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376914#comment-14376914 ] Chris Nauroth commented on YARN-3336: - [~zxu], I apologize, but I missed entering your name on the git commit message: {code} commit 6ca1f12024fd7cec7b01df0f039ca59f3f365dc1 Author: cnauroth Date: Mon Mar 23 10:45:50 2015 -0700 YARN-3336. FileSystem memory leak in DelegationTokenRenewer. {code} Unfortunately, this isn't something we can change, because it could mess up the git history. You're still there in CHANGES.txt though, so you get the proper credit for the patch: {code} YARN-3336. FileSystem memory leak in DelegationTokenRenewer. (Zhihai Xu via cnauroth) {code} > FileSystem memory leak in DelegationTokenRenewer > > > Key: YARN-3336 > URL: https://issues.apache.org/jira/browse/YARN-3336 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.7.0 > > Attachments: YARN-3336.000.patch, YARN-3336.001.patch, > YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch > > > FileSystem memory leak in DelegationTokenRenewer. > Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new > FileSystem entry will be added to FileSystem#CACHE which will never be > garbage collected. > This is the implementation of obtainSystemTokensForUser: > {code} > protected Token[] obtainSystemTokensForUser(String user, > final Credentials credentials) throws IOException, InterruptedException > { > // Get new hdfs tokens on behalf of this user > UserGroupInformation proxyUser = > UserGroupInformation.createProxyUser(user, > UserGroupInformation.getLoginUser()); > Token[] newTokens = > proxyUser.doAs(new PrivilegedExceptionAction[]>() { > @Override > public Token[] run() throws Exception { > return FileSystem.get(getConfig()).addDelegationTokens( > UserGroupInformation.getLoginUser().getUserName(), credentials); > } > }); > return newTokens; > } > {code} > The memory leak happened when FileSystem.get(getConfig()) is called with a > new proxy user. > Because createProxyUser will always create a new Subject. > The calling sequence is > FileSystem.get(getConfig())=>FileSystem.get(getDefaultUri(conf), > conf)=>FileSystem.CACHE.get(uri, conf)=>FileSystem.CACHE.getInternal(uri, > conf, key)=>FileSystem.CACHE.map.get(key)=>createFileSystem(uri, conf) > {code} > public static UserGroupInformation createProxyUser(String user, > UserGroupInformation realUser) { > if (user == null || user.isEmpty()) { > throw new IllegalArgumentException("Null user"); > } > if (realUser == null) { > throw new IllegalArgumentException("Null real user"); > } > Subject subject = new Subject(); > Set principals = subject.getPrincipals(); > principals.add(new User(user)); > principals.add(new RealUser(realUser)); > UserGroupInformation result =new UserGroupInformation(subject); > result.setAuthenticationMethod(AuthenticationMethod.PROXY); > return result; > } > {code} > FileSystem#Cache#Key.equals will compare the ugi > {code} > Key(URI uri, Configuration conf, long unique) throws IOException { > scheme = uri.getScheme()==null?"":uri.getScheme().toLowerCase(); > authority = > uri.getAuthority()==null?"":uri.getAuthority().toLowerCase(); > this.unique = unique; > this.ugi = UserGroupInformation.getCurrentUser(); > } > public boolean equals(Object obj) { > if (obj == this) { > return true; > } > if (obj != null && obj instanceof Key) { > Key that = (Key)obj; > return isEqual(this.scheme, that.scheme) > && isEqual(this.authority, that.authority) > && isEqual(this.ugi, that.ugi) > && (this.unique == that.unique); > } > return false; > } > {code} > UserGroupInformation.equals will compare subject by reference. > {code} > public boolean equals(Object o) { > if (o == this) { > return true; > } else if (o == null || getClass() != o.getClass()) { > return false; > } else { > return subject == ((UserGroupInformation) o).subject; > } > } > {code} > So in this case, every time createProxyUser and FileSystem.get(getConfig()) > are called, a new FileSystem will be created and a new entry will be added to > FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3383) AdminService should use "warn" instead of "info" to log exception when operation fails
[ https://issues.apache.org/jira/browse/YARN-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376830#comment-14376830 ] Hadoop QA commented on YARN-3383: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706717/YARN-3383-032315.patch against trunk revision 972f1f1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7080//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7080//console This message is automatically generated. > AdminService should use "warn" instead of "info" to log exception when > operation fails > -- > > Key: YARN-3383 > URL: https://issues.apache.org/jira/browse/YARN-3383 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Li Lu > Attachments: YARN-3383-032015.patch, YARN-3383-032315.patch > > > Now it uses info: > {code} > private YarnException logAndWrapException(IOException ioe, String user, > String argName, String msg) throws YarnException { > LOG.info("Exception " + msg, ioe); > {code} > But it should use warn instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376765#comment-14376765 ] Yongjun Zhang commented on YARN-3021: - Hi [~jianhe], Thanks a lot for the clarification, I did a new rev (06) to address your latest comment, and also tested it against real clusters. Would you please take a further look? Thanks. > YARN's delegation-token handling disallows certain trust setups to operate > properly over DistCp > --- > > Key: YARN-3021 > URL: https://issues.apache.org/jira/browse/YARN-3021 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.3.0 >Reporter: Harsh J >Assignee: Yongjun Zhang > Attachments: YARN-3021.001.patch, YARN-3021.002.patch, > YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, > YARN-3021.006.patch, YARN-3021.patch > > > Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, > and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN > clusters. > Now if one logs in with a COMMON credential, and runs a job on A's YARN that > needs to access B's HDFS (such as a DistCp), the operation fails in the RM, > as it attempts a renewDelegationToken(…) synchronously during application > submission (to validate the managed token before it adds it to a scheduler > for automatic renewal). The call obviously fails cause B realm will not trust > A's credentials (here, the RM's principal is the renewer). > In the 1.x JobTracker the same call is present, but it is done asynchronously > and once the renewal attempt failed we simply ceased to schedule any further > attempts of renewals, rather than fail the job immediately. > We should change the logic such that we attempt the renewal but go easy on > the failure and skip the scheduling alone, rather than bubble back an error > to the client, failing the app submission. This way the old behaviour is > retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated YARN-3021: Attachment: YARN-3021.006.patch > YARN's delegation-token handling disallows certain trust setups to operate > properly over DistCp > --- > > Key: YARN-3021 > URL: https://issues.apache.org/jira/browse/YARN-3021 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.3.0 >Reporter: Harsh J >Assignee: Yongjun Zhang > Attachments: YARN-3021.001.patch, YARN-3021.002.patch, > YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, > YARN-3021.006.patch, YARN-3021.patch > > > Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, > and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN > clusters. > Now if one logs in with a COMMON credential, and runs a job on A's YARN that > needs to access B's HDFS (such as a DistCp), the operation fails in the RM, > as it attempts a renewDelegationToken(…) synchronously during application > submission (to validate the managed token before it adds it to a scheduler > for automatic renewal). The call obviously fails cause B realm will not trust > A's credentials (here, the RM's principal is the renewer). > In the 1.x JobTracker the same call is present, but it is done asynchronously > and once the renewal attempt failed we simply ceased to schedule any further > attempts of renewals, rather than fail the job immediately. > We should change the logic such that we attempt the renewal but go easy on > the failure and skip the scheduling alone, rather than bubble back an error > to the client, failing the app submission. This way the old behaviour is > retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
[ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3391: -- Description: To continue the discussion in YARN-3040, let's figure out the best way to describe the flow. Some key issues that we need to conclude on: - How do we include the flow version in the context so that it gets passed into the collector and to the storage eventually? - Flow run id should be a number as opposed to a generic string? - Default behavior for the flow run id if it is missing (i.e. client did not set it) - How do we handle flow attributes in case of nested levels of flows? was: To continue the discussion in YARN-3040, let's figure out the best way to describe the flow. Some key issues that we need to conclude on: - How do we include the flow version in the context so that it gets passed into the collector and to the storage eventually? - Flow run id should be a number as opposed to a generic string? - Default behavior for the flow run id if it is missing (i.e. client did not set it) > Clearly define flow ID/ flow run / flow version in API and storage > -- > > Key: YARN-3391 > URL: https://issues.apache.org/jira/browse/YARN-3391 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > To continue the discussion in YARN-3040, let's figure out the best way to > describe the flow. > Some key issues that we need to conclude on: > - How do we include the flow version in the context so that it gets passed > into the collector and to the storage eventually? > - Flow run id should be a number as opposed to a generic string? > - Default behavior for the flow run id if it is missing (i.e. client did not > set it) > - How do we handle flow attributes in case of nested levels of flows? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
[ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3391: -- Description: To continue the discussion in YARN-3040, let's figure out the best way to describe the flow. Some key issues that we need to conclude on: - How do we include the flow version in the context so that it gets passed into the collector and to the storage eventually? - Flow run id should be a number as opposed to a generic string? - Default behavior for the flow run id if it is missing (i.e. client did not set it) was:To continue the discussion in YARN-3040, let's figure out the best way to describe the flow. > Clearly define flow ID/ flow run / flow version in API and storage > -- > > Key: YARN-3391 > URL: https://issues.apache.org/jira/browse/YARN-3391 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > To continue the discussion in YARN-3040, let's figure out the best way to > describe the flow. > Some key issues that we need to conclude on: > - How do we include the flow version in the context so that it gets passed > into the collector and to the storage eventually? > - Flow run id should be a number as opposed to a generic string? > - Default behavior for the flow run id if it is missing (i.e. client did not > set it) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376749#comment-14376749 ] Sangjin Lee commented on YARN-3040: --- Thanks [~zjshen] for the updated patch! I am comfortable with continuing to work on the flow-related items in the separate JIRA. I'll jot down the key points in that JIRA shortly. I went over the latest patch, and overall it looks good. I do have a few comments: (AppLevelTimelineCollector.java) {code} 50protected void serviceInit(Configuration conf) throws Exception { 51 context.setClusterId(conf.get(YarnConfiguration.RM_CLUSTER_ID, 52 YarnConfiguration.DEFAULT_RM_CLUSTER_ID)); 53 context.setUserId(UserGroupInformation.getCurrentUser().getShortUserName()); 54 context.setFlowId(TimelineUtils.generateDefaultFlowIdBasedOnAppId(appId)); 55 context.setFlowRunId("0"); 56 context.setAppId(appId.toString()); {code} I'm not sure of these set calls. Are these here just to initialize the context to default values? For example, UGI.getCurrentUser().getShortUserName() would return the user under which the daemon was started (whether it is NM or a standalone daemon) in case of a per-node daemon, which is highly likely to be incorrect. Do we need to bother setting default values if they are going to be incorrect anyway, for example, for user? At minimum, it would be helpful to have a comment here why this is being done. (AMLauncher.java) - Do we need to be case-insensitive here? I think we can be strict about the tag names? - You might want to be bit defensive about the tag not carrying any value (e.g. "TIMELINE_FLOW_ID_TAG:"). If the value is empty, tag.substring() would throw an IndexOutOfBoundsException. > [Data Model] Make putEntities operation be aware of the app's context > - > > Key: YARN-3040 > URL: https://issues.apache.org/jira/browse/YARN-3040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch > > > Per design in YARN-2928, implement client-side API for handling *flows*. > Frameworks should be able to define and pass in all attributes of flows and > flow runs to YARN, and they should be passed into ATS writers. > YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3241) FairScheduler handles "invalid" queue names inconsistently
[ https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376737#comment-14376737 ] zhihai xu commented on YARN-3241: - Thanks [~kasha] for valuable feedback and committing the patch! > FairScheduler handles "invalid" queue names inconsistently > -- > > Key: YARN-3241 > URL: https://issues.apache.org/jira/browse/YARN-3241 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: zhihai xu >Assignee: zhihai xu > Fix For: 2.8.0 > > Attachments: YARN-3241.000.patch, YARN-3241.001.patch, > YARN-3241.002.patch > > > Leading space, trailing space and empty sub queue name may cause > MetricsException(Metrics source XXX already exists! ) when add application to > FairScheduler. > The reason is because QueueMetrics parse the queue name different from the > QueueManager. > QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space > and trailing space in the sub queue name, It will also remove empty sub queue > name. > {code} > static final Splitter Q_SPLITTER = > Splitter.on('.').omitEmptyStrings().trimResults(); > {code} > But QueueManager won't remove Leading space, trailing space and empty sub > queue name. > This will cause out of sync between FSQueue and FSQueueMetrics. > QueueManager will think two queue names are different so it will try to > create a new queue. > But FSQueueMetrics will treat these two queue names as same queue which will > create "Metrics source XXX already exists!" MetricsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376719#comment-14376719 ] Li Lu commented on YARN-3047: - Hi [~varun_saxena], thanks for the new patch. Could you please elaborate more about which exact comment will be addressed in YARN-3051? Thanks! BTW, in 003 patch I can still see TimelineEvents.java. Do we still need that? > [Data Serving] Set up ATS reader with basic request serving structure and > lifecycle > --- > > Key: YARN-3047 > URL: https://issues.apache.org/jira/browse/YARN-3047 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3047.001.patch, YARN-3047.003.patch, > YARN-3047.02.patch > > > Per design in YARN-2938, set up the ATS reader as a service and implement the > basic structure as a service. It includes lifecycle management, request > serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt
Zhijie Shen created YARN-3393: - Summary: Getting application(s) goes wrong when app finishes before starting the attempt Key: YARN-3393 URL: https://issues.apache.org/jira/browse/YARN-3393 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Critical When generating app report in ApplicationHistoryManagerOnTimelineStore, it checks if appAttempt == null. {code} ApplicationAttemptReport appAttempt = getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId()); if (appAttempt != null) { app.appReport.setHost(appAttempt.getHost()); app.appReport.setRpcPort(appAttempt.getRpcPort()); app.appReport.setTrackingUrl(appAttempt.getTrackingUrl()); app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl()); } {code} However, {{getApplicationAttempt}} doesn't return null but throws ApplicationAttemptNotFoundException: {code} if (entity == null) { throw new ApplicationAttemptNotFoundException( "The entity for application attempt " + appAttemptId + " doesn't exist in the timeline store"); } else { return convertToApplicationAttemptReport(entity); } {code} They code isn't coupled well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly
[ https://issues.apache.org/jira/browse/YARN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-2605: --- Assignee: Anubhav Dhoot > [RM HA] Rest api endpoints doing redirect incorrectly > - > > Key: YARN-2605 > URL: https://issues.apache.org/jira/browse/YARN-2605 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: bc Wong >Assignee: Anubhav Dhoot > Labels: newbie > > The standby RM's webui tries to do a redirect via meta-refresh. That is fine > for pages designed to be viewed by web browsers. But the API endpoints > shouldn't do that. Most programmatic HTTP clients do not do meta-refresh. I'd > suggest HTTP 303, or return a well-defined error message (json or xml) > stating that the standby status and a link to the active RM. > The standby RM is returning this today: > {noformat} > $ curl -i http://bcsec-1.ent.cloudera.com:8088/ws/v1/cluster/metrics > HTTP/1.1 200 OK > Cache-Control: no-cache > Expires: Thu, 25 Sep 2014 18:34:53 GMT > Date: Thu, 25 Sep 2014 18:34:53 GMT > Pragma: no-cache > Expires: Thu, 25 Sep 2014 18:34:53 GMT > Date: Thu, 25 Sep 2014 18:34:53 GMT > Pragma: no-cache > Content-Type: text/plain; charset=UTF-8 > Refresh: 3; url=http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics > Content-Length: 117 > Server: Jetty(6.1.26) > This is standby RM. Redirecting to the current active RM: > http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376695#comment-14376695 ] Anubhav Dhoot commented on YARN-3304: - Hi [~djp] [~vinodkv], If we use a default of zero we cannot distinguish when its unavailable versus zero usage. That will make the future "track the improvement to handle unavailable case later" near impossible to do. I propose we make all the defaults consistently -1. I can fix the metrics as well to use this to implement tracking unavailable case. Opened YARN-3392 for that > ResourceCalculatorProcessTree#getCpuUsagePercent default return value is > inconsistent with other getters > > > Key: YARN-3304 > URL: https://issues.apache.org/jira/browse/YARN-3304 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Junping Du >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: YARN-3304.patch > > > Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for > unavailable case while other resource metrics are return 0 in the same case > which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3392) Change NodeManager metrics to not populate resource usage metrics if they are unavailable
Anubhav Dhoot created YARN-3392: --- Summary: Change NodeManager metrics to not populate resource usage metrics if they are unavailable Key: YARN-3392 URL: https://issues.apache.org/jira/browse/YARN-3392 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3390) RMTimelineCollector should have the context info of each app
[ https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376684#comment-14376684 ] Zhijie Shen commented on YARN-3390: --- It shouldn't. Storage layer implementations only depends on the writer interface, which is covered in YARN-3040. > RMTimelineCollector should have the context info of each app > > > Key: YARN-3390 > URL: https://issues.apache.org/jira/browse/YARN-3390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > RMTimelineCollector should have the context info of each app whose entity > has been put -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) FairScheduler: Metric for latency to allocate first container for an application
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376672#comment-14376672 ] Hudson commented on YARN-2868: -- FAILURE: Integrated in Hadoop-trunk-Commit #7407 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7407/]) YARN-2868. FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/CHANGES.txt > FairScheduler: Metric for latency to allocate first container for an > application > > > Key: YARN-2868 > URL: https://issues.apache.org/jira/browse/YARN-2868 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: metrics, supportability > Fix For: 2.8.0 > > Attachments: YARN-2868-01.patch, YARN-2868.002.patch, > YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, > YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, > YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch, > YARN-2868.012.patch > > > Add a metric to measure the latency between "starting container allocation" > and "first container actually allocated". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3390) RMTimelineCollector should have the context info of each app
[ https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376670#comment-14376670 ] Li Lu commented on YARN-3390: - Hi [~zjshen], could you please confirm that this JIRA will also block all storage layer implementations? Or we can proceed after YARN-3040 is in? Thanks! > RMTimelineCollector should have the context info of each app > > > Key: YARN-3390 > URL: https://issues.apache.org/jira/browse/YARN-3390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > RMTimelineCollector should have the context info of each app whose entity > has been put -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3040: -- Attachment: YARN-3040.3.patch Upload a new patch to address the comments so far. The notable change in this patch is to remove the timestamp suffix. And add the default for RM_CLUSTER_ID, such that the ID won't change across RM restarting or failover. > [Data Model] Make putEntities operation be aware of the app's context > - > > Key: YARN-3040 > URL: https://issues.apache.org/jira/browse/YARN-3040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch > > > Per design in YARN-2928, implement client-side API for handling *flows*. > Frameworks should be able to define and pass in all attributes of flows and > flow runs to YARN, and they should be passed into ATS writers. > YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3386) Cgroups feature should work with default hierarchy settings of CentOS 7
[ https://issues.apache.org/jira/browse/YARN-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376645#comment-14376645 ] Karthik Kambatla commented on YARN-3386: YARN-2194 seems to imply there are more changes required for cgroups to work with RHEL/Centos 7? Should this marked a duplicate of the other? > Cgroups feature should work with default hierarchy settings of CentOS 7 > --- > > Key: YARN-3386 > URL: https://issues.apache.org/jira/browse/YARN-3386 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki > > The path found by CgroupsLCEResourcesHandler#parseMtab contains comma and > results in failure of container-executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3383) AdminService should use "warn" instead of "info" to log exception when operation fails
[ https://issues.apache.org/jira/browse/YARN-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3383: Attachment: YARN-3383-032315.patch Rebase the patch with the latest trunk. > AdminService should use "warn" instead of "info" to log exception when > operation fails > -- > > Key: YARN-3383 > URL: https://issues.apache.org/jira/browse/YARN-3383 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Li Lu > Attachments: YARN-3383-032015.patch, YARN-3383-032315.patch > > > Now it uses info: > {code} > private YarnException logAndWrapException(IOException ioe, String user, > String argName, String msg) throws YarnException { > LOG.info("Exception " + msg, ioe); > {code} > But it should use warn instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376640#comment-14376640 ] Karthik Kambatla commented on YARN-3387: Does this imply our work-preserving AM restart is broken on a RM failover? > container complete message couldn't pass to am if am restarted and rm changed > - > > Key: YARN-3387 > URL: https://issues.apache.org/jira/browse/YARN-3387 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: sandflee >Priority: Critical > > suppose am work preserving and rm ha is enabled. > container complete message is passed to appattemt.justFinishedContainers in > rm。in normal situation,all attempt in one app shares the same > justFinishedContainers, but when rm changed, every attempt has it's own > justFinishedContainers, so in situations below, container complete message > couldn't passed to am: > 1, am restart > 2, rm changes > 3, container launched by first am completes > container complete message will be passed to appAttempt1 not appAttempt2, but > am pull finished containers from appAttempt2 (currentAppAttempt) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3387: --- Priority: Critical (was: Major) Target Version/s: 2.7.0 > container complete message couldn't pass to am if am restarted and rm changed > - > > Key: YARN-3387 > URL: https://issues.apache.org/jira/browse/YARN-3387 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: sandflee >Priority: Critical > > suppose am work preserving and rm ha is enabled. > container complete message is passed to appattemt.justFinishedContainers in > rm。in normal situation,all attempt in one app shares the same > justFinishedContainers, but when rm changed, every attempt has it's own > justFinishedContainers, so in situations below, container complete message > couldn't passed to am: > 1, am restart > 2, rm changes > 3, container launched by first am completes > container complete message will be passed to appAttempt1 not appAttempt2, but > am pull finished containers from appAttempt2 (currentAppAttempt) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2868) FairScheduler: Metric for latency to allocate first container for an application
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2868: --- Summary: FairScheduler: Metric for latency to allocate first container for an application (was: Add metric for initial container launch time to FairScheduler) > FairScheduler: Metric for latency to allocate first container for an > application > > > Key: YARN-2868 > URL: https://issues.apache.org/jira/browse/YARN-2868 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: metrics, supportability > Attachments: YARN-2868-01.patch, YARN-2868.002.patch, > YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, > YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, > YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch, > YARN-2868.012.patch > > > Add a metric to measure the latency between "starting container allocation" > and "first container actually allocated". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376633#comment-14376633 ] Karthik Kambatla commented on YARN-2868: +1, checking this in. > Add metric for initial container launch time to FairScheduler > - > > Key: YARN-2868 > URL: https://issues.apache.org/jira/browse/YARN-2868 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: metrics, supportability > Attachments: YARN-2868-01.patch, YARN-2868.002.patch, > YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, > YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, > YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch, > YARN-2868.012.patch > > > Add a metric to measure the latency between "starting container allocation" > and "first container actually allocated". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3383) AdminService should use "warn" instead of "info" to log exception when operation fails
[ https://issues.apache.org/jira/browse/YARN-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376629#comment-14376629 ] Hadoop QA commented on YARN-3383: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706096/YARN-3383-032015.patch against trunk revision 2bc097c. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7079//console This message is automatically generated. > AdminService should use "warn" instead of "info" to log exception when > operation fails > -- > > Key: YARN-3383 > URL: https://issues.apache.org/jira/browse/YARN-3383 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Li Lu > Attachments: YARN-3383-032015.patch > > > Now it uses info: > {code} > private YarnException logAndWrapException(IOException ioe, String user, > String argName, String msg) throws YarnException { > LOG.info("Exception " + msg, ioe); > {code} > But it should use warn instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376605#comment-14376605 ] Wangda Tan commented on YARN-2495: -- Hmm.. {{StringArrayProto.stringElement -> elements}} is still not changed in latest patch, could you take a look again? I meant to remove the "string" prefix, since the StringArrayProto already indicates that. Beyond that, patch LGTM. > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, > YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, > YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, > YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, > YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, > YARN-2495.20150321-1.patch, YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or > using script suggested by [~aw] (YARN-2729) ) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376602#comment-14376602 ] Junping Du commented on YARN-3040: -- Hi [~zjshen], thanks for the patch! I am still reviewing the patch but have some quick comments so far: {code} + public static String generateDefaultClusterIdBasedOnAppId( + ApplicationId appId) { +return "cluster_" + appId.getClusterTimestamp(); + } {code} It seems appId's ClusterTimestamp comes from RM and get changed everytime RM get restart. I think here we need a ClusterID that can keep consistent across from RM restarts. Isn't it? Or applications get submitted to the same cluster could get different ClusterID just because RM failed over which shouldn't be users' expectation. Suggest to add a configuration for user to input a specified ClusterID or it generate default (and variable) value for test purpose. {code} + rpc getTimelienCollectorContext (GetTimelineCollectorContextRequestProto) returns (GetTimelineCollectorContextResponseProto); {code} One typos here and other places, "Timelien" should be "Timeline". {code} -import java.util.ArrayList; -import java.util.HashMap; -import java.util.List; -import java.util.Map; -import java.util.Vector; +import java.util.*; {code} We shouldn't do this which could load unnecessary classes. {code} + * The aggregator needs to get the context information including user, flow {code} aggregator => collector > [Data Model] Make putEntities operation be aware of the app's context > - > > Key: YARN-3040 > URL: https://issues.apache.org/jira/browse/YARN-3040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3040.1.patch, YARN-3040.2.patch > > > Per design in YARN-2928, implement client-side API for handling *flows*. > Frameworks should be able to define and pass in all attributes of flows and > flow runs to YARN, and they should be passed into ATS writers. > YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3241) FairScheduler handles "invalid" queue names inconsistently
[ https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376577#comment-14376577 ] Hudson commented on YARN-3241: -- FAILURE: Integrated in Hadoop-trunk-Commit #7406 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7406/]) YARN-3241. FairScheduler handles invalid queue names inconsistently. (Zhihai Xu via kasha) (kasha: rev 2bc097cd14692e6ceb06bff959f28531534eb307) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/InvalidQueueNameException.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueueManager.java > FairScheduler handles "invalid" queue names inconsistently > -- > > Key: YARN-3241 > URL: https://issues.apache.org/jira/browse/YARN-3241 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3241.000.patch, YARN-3241.001.patch, > YARN-3241.002.patch > > > Leading space, trailing space and empty sub queue name may cause > MetricsException(Metrics source XXX already exists! ) when add application to > FairScheduler. > The reason is because QueueMetrics parse the queue name different from the > QueueManager. > QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space > and trailing space in the sub queue name, It will also remove empty sub queue > name. > {code} > static final Splitter Q_SPLITTER = > Splitter.on('.').omitEmptyStrings().trimResults(); > {code} > But QueueManager won't remove Leading space, trailing space and empty sub > queue name. > This will cause out of sync between FSQueue and FSQueueMetrics. > QueueManager will think two queue names are different so it will try to > create a new queue. > But FSQueueMetrics will treat these two queue names as same queue which will > create "Metrics source XXX already exists!" MetricsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376575#comment-14376575 ] Zhijie Shen commented on YARN-3034: --- bq. then only way RMTimelineCollector can be invoked is through SystemMetricsPublisher's (SMP) public methods Oh, probably I misunderstood your intention. I used to think this is way that you want to do put the data into RMTimelineCollector. So in this case, we could put RMTimelineCollector inside SystemMetricsPublisher, and whereas we invoke timeline client, we call RMTimelineCollector for v2. According to this comments, it seems that you want to create a separate stack to put entities into RMTimelineCollector, right? If so, the current design makes sense. bq. So in NM side too we req a configuration but we cannot use the existing one I meant we keep {{yarn.resourcemanager.system-metrics-publisher.enabled}} for v1 SystemMetricsPublisher. For v2, both RM and NM reads {{yarn.system-metrics-publisher.enabled}}? No need to have v1/v2 flag? > [Collector wireup] Implement RM starting its timeline collector > --- > > Key: YARN-3034 > URL: https://issues.apache.org/jira/browse/YARN-3034 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, > YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, > YARN-3034.20150320-1.patch > > > Per design in YARN-2928, implement resource managers starting their own ATS > writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3241) FairScheduler handles "invalid" queue names inconsistently
[ https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3241: --- Summary: FairScheduler handles "invalid" queue names inconsistently (was: Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler) > FairScheduler handles "invalid" queue names inconsistently > -- > > Key: YARN-3241 > URL: https://issues.apache.org/jira/browse/YARN-3241 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3241.000.patch, YARN-3241.001.patch, > YARN-3241.002.patch > > > Leading space, trailing space and empty sub queue name may cause > MetricsException(Metrics source XXX already exists! ) when add application to > FairScheduler. > The reason is because QueueMetrics parse the queue name different from the > QueueManager. > QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space > and trailing space in the sub queue name, It will also remove empty sub queue > name. > {code} > static final Splitter Q_SPLITTER = > Splitter.on('.').omitEmptyStrings().trimResults(); > {code} > But QueueManager won't remove Leading space, trailing space and empty sub > queue name. > This will cause out of sync between FSQueue and FSQueueMetrics. > QueueManager will think two queue names are different so it will try to > create a new queue. > But FSQueueMetrics will treat these two queue names as same queue which will > create "Metrics source XXX already exists!" MetricsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI
[ https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376563#comment-14376563 ] Naganarasimha G R commented on YARN-3362: - Thanks for the feedback [~leftnoteasy], bq. different labels under same queue can have different user-limit/capacity/maximum-capacity/max-am-resource, etc. If this is the case then the approach which you specified makes sense but by "can" you mean currently its not there and in future it can come in ? More than repeated info, other drawback i can see is suppose for particular label userlimit is not reached but as overall at queue level if the user has reached his limit it will be difficult for user to go through all labels and find out whether user has reached queue limit . Correct me if my understanding on this is wrong . > Add node label usage in RM CapacityScheduler web UI > --- > > Key: YARN-3362 > URL: https://issues.apache.org/jira/browse/YARN-3362 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager, webapp >Reporter: Wangda Tan >Assignee: Naganarasimha G R > > We don't have node label usage in RM CapacityScheduler web UI now, without > this, user will be hard to understand what happened to nodes have labels > assign to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler
[ https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376559#comment-14376559 ] Karthik Kambatla commented on YARN-3241: +1. Checking this in. > Leading space, trailing space and empty sub queue name may cause > MetricsException for fair scheduler > > > Key: YARN-3241 > URL: https://issues.apache.org/jira/browse/YARN-3241 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3241.000.patch, YARN-3241.001.patch, > YARN-3241.002.patch > > > Leading space, trailing space and empty sub queue name may cause > MetricsException(Metrics source XXX already exists! ) when add application to > FairScheduler. > The reason is because QueueMetrics parse the queue name different from the > QueueManager. > QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space > and trailing space in the sub queue name, It will also remove empty sub queue > name. > {code} > static final Splitter Q_SPLITTER = > Splitter.on('.').omitEmptyStrings().trimResults(); > {code} > But QueueManager won't remove Leading space, trailing space and empty sub > queue name. > This will cause out of sync between FSQueue and FSQueueMetrics. > QueueManager will think two queue names are different so it will try to > create a new queue. > But FSQueueMetrics will treat these two queue names as same queue which will > create "Metrics source XXX already exists!" MetricsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized
[ https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376556#comment-14376556 ] Karthik Kambatla commented on YARN-3024: [~chengbing.liu] - thanks for the clarifications. Makes sense. For the TODOs, it would be nice to have follow-up JIRAs. If it is not too much trouble, can you create them so interested contributors could follow up? > LocalizerRunner should give DIE action when all resources are localized > --- > > Key: YARN-3024 > URL: https://issues.apache.org/jira/browse/YARN-3024 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Fix For: 2.7.0 > > Attachments: YARN-3024.01.patch, YARN-3024.02.patch, > YARN-3024.03.patch, YARN-3024.04.patch > > > We have observed that {{LocalizerRunner}} always gives a LIVE action at the > end of localization process. > The problem is {{findNextResource()}} can return null even when {{pending}} > was not empty prior to the call. This method removes localized resources from > {{pending}}, therefore we should check the return value, and gives DIE action > when it returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376498#comment-14376498 ] Naganarasimha G R commented on YARN-3034: - Thanks for the comments [~zjshen], bq. and in this approach, I don't think we should couple RMTimelineCollector and SystemMetricsPublisher. Keeping SystemMetricsPublisher separate, we can easily deprecate and even remove it from the code base later. May be i am missing something here , If RM or RM context is not aware then only way RMTimelineCollector can be invoked is through SystemMetricsPublisher's (SMP) public methods like appCreated, appFinished,appAttemptRegistered or RMTimelineCollector can have its own event handler and during initialization SMP can select the event handler present in its class or of RMTimelineCollector. But still there will be dependency of event source calling public methods of SMP . So i feel it will not be smoother to deprecate and remove SystemMetricsPublisher's as it will have code for creation of RMTimelineCollector, sending events to RMTimelineCollector to publish to ATS V2. bq. Moreover, we can keep the existing config as what it is now, and create a new config to control starting v2 RM writing data stack. IMHO i feel the current config is better because in ATS V2 container events are planned to be moved to NM side (YARN-3045), So in NM side too we req a configuration but we cannot use the existing one {{"yarn.resourcemanager.system-metrics-publisher.enabled"}} as it indicates more like RM side configuration only Approach specified in the patch uses a single config for both NM and RM > [Collector wireup] Implement RM starting its timeline collector > --- > > Key: YARN-3034 > URL: https://issues.apache.org/jira/browse/YARN-3034 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, > YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, > YARN-3034.20150320-1.patch > > > Per design in YARN-2928, implement resource managers starting their own ATS > writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376466#comment-14376466 ] Hadoop QA commented on YARN-3304: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706680/YARN-3304.patch against trunk revision 6ca1f12. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: org.apache.hadoop.yarn.util.TestProcfsBasedProcessTree Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7078//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7078//console This message is automatically generated. > ResourceCalculatorProcessTree#getCpuUsagePercent default return value is > inconsistent with other getters > > > Key: YARN-3304 > URL: https://issues.apache.org/jira/browse/YARN-3304 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Junping Du >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: YARN-3304.patch > > > Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for > unavailable case while other resource metrics are return 0 in the same case > which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376447#comment-14376447 ] zhihai xu commented on YARN-3336: - Thanks [~cnauroth] for valuable feedback and committing the patch! Greatly appreciated. > FileSystem memory leak in DelegationTokenRenewer > > > Key: YARN-3336 > URL: https://issues.apache.org/jira/browse/YARN-3336 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.7.0 > > Attachments: YARN-3336.000.patch, YARN-3336.001.patch, > YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch > > > FileSystem memory leak in DelegationTokenRenewer. > Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new > FileSystem entry will be added to FileSystem#CACHE which will never be > garbage collected. > This is the implementation of obtainSystemTokensForUser: > {code} > protected Token[] obtainSystemTokensForUser(String user, > final Credentials credentials) throws IOException, InterruptedException > { > // Get new hdfs tokens on behalf of this user > UserGroupInformation proxyUser = > UserGroupInformation.createProxyUser(user, > UserGroupInformation.getLoginUser()); > Token[] newTokens = > proxyUser.doAs(new PrivilegedExceptionAction[]>() { > @Override > public Token[] run() throws Exception { > return FileSystem.get(getConfig()).addDelegationTokens( > UserGroupInformation.getLoginUser().getUserName(), credentials); > } > }); > return newTokens; > } > {code} > The memory leak happened when FileSystem.get(getConfig()) is called with a > new proxy user. > Because createProxyUser will always create a new Subject. > The calling sequence is > FileSystem.get(getConfig())=>FileSystem.get(getDefaultUri(conf), > conf)=>FileSystem.CACHE.get(uri, conf)=>FileSystem.CACHE.getInternal(uri, > conf, key)=>FileSystem.CACHE.map.get(key)=>createFileSystem(uri, conf) > {code} > public static UserGroupInformation createProxyUser(String user, > UserGroupInformation realUser) { > if (user == null || user.isEmpty()) { > throw new IllegalArgumentException("Null user"); > } > if (realUser == null) { > throw new IllegalArgumentException("Null real user"); > } > Subject subject = new Subject(); > Set principals = subject.getPrincipals(); > principals.add(new User(user)); > principals.add(new RealUser(realUser)); > UserGroupInformation result =new UserGroupInformation(subject); > result.setAuthenticationMethod(AuthenticationMethod.PROXY); > return result; > } > {code} > FileSystem#Cache#Key.equals will compare the ugi > {code} > Key(URI uri, Configuration conf, long unique) throws IOException { > scheme = uri.getScheme()==null?"":uri.getScheme().toLowerCase(); > authority = > uri.getAuthority()==null?"":uri.getAuthority().toLowerCase(); > this.unique = unique; > this.ugi = UserGroupInformation.getCurrentUser(); > } > public boolean equals(Object obj) { > if (obj == this) { > return true; > } > if (obj != null && obj instanceof Key) { > Key that = (Key)obj; > return isEqual(this.scheme, that.scheme) > && isEqual(this.authority, that.authority) > && isEqual(this.ugi, that.ugi) > && (this.unique == that.unique); > } > return false; > } > {code} > UserGroupInformation.equals will compare subject by reference. > {code} > public boolean equals(Object o) { > if (o == this) { > return true; > } else if (o == null || getClass() != o.getClass()) { > return false; > } else { > return subject == ((UserGroupInformation) o).subject; > } > } > {code} > So in this case, every time createProxyUser and FileSystem.get(getConfig()) > are called, a new FileSystem will be created and a new entry will be added to > FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376446#comment-14376446 ] Zhijie Shen commented on YARN-3034: --- bq. so i think its not an incompatible change. Please provide your opinion on the same. Sorry, I missed that piece. bq. IIUC SystemMetricsPublisher.publish*Event methods can determine which version of ATS to publish and can post it accordingly ? I meant in the current approach SystemMetricsPublisher can be self contained. RMTimelineCollector can be a private stuff in SystemMetricsPublisher, constructed and started there. It's not necessary to be visible in RM and its context objects. bq. we might not require much of the functionality of SystemMetricsPublisher and it will be just delegating the calls to RMTimelineCollector. I'm not sure about if there's previous discussion about the way for RM to put entities, but this approach sound cleaner, and in this approach, I don't think we should couple RMTimelineCollector and SystemMetricsPublisher. Keeping SystemMetricsPublisher separate, we can easily deprecate and even remove it from the code base later. Moreover, we can keep the existing config as what it is now, and create a new config to control starting v2 RM writing data stack. > [Collector wireup] Implement RM starting its timeline collector > --- > > Key: YARN-3034 > URL: https://issues.apache.org/jira/browse/YARN-3034 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, > YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, > YARN-3034.20150320-1.patch > > > Per design in YARN-2928, implement resource managers starting their own ATS > writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376444#comment-14376444 ] Hadoop QA commented on YARN-3225: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706513/YARN-3225-1.patch against trunk revision 7e6f384. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7077//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7077//console This message is automatically generated. > New parameter or CLI for decommissioning node gracefully in RMAdmin CLI > --- > > Key: YARN-3225 > URL: https://issues.apache.org/jira/browse/YARN-3225 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Devaraj K > Attachments: YARN-3225-1.patch, YARN-3225.patch, YARN-914.patch > > > New CLI (or existing CLI with parameters) should put each node on > decommission list to decommissioning status and track timeout to terminate > the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376415#comment-14376415 ] Naganarasimha G R commented on YARN-3034: - Also [~zjshen], earlier thought process to expose RMTimelineCollector to RM and it's context, was to gradually replace SystemMetricsPublisher with RMTimelineCollector, as i felt once we deprecate & completely remove ATSV1, we might not require much of the functionality of SystemMetricsPublisher and it will be just delegating the calls to RMTimelineCollector. your thoughts ? > [Collector wireup] Implement RM starting its timeline collector > --- > > Key: YARN-3034 > URL: https://issues.apache.org/jira/browse/YARN-3034 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, > YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, > YARN-3034.20150320-1.patch > > > Per design in YARN-2928, implement resource managers starting their own ATS > writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376402#comment-14376402 ] Naganarasimha G R commented on YARN-3044: - Hi [~zjshen] & [~sjlee0], As part of this jira following basic App and AppAttempt life cycle events, i am planning to capture in {{RMTimelineCollector}} : * ApplicationCreated * ApplicationFinished * ApplicationACLsUpdated * AppAttemptRegistered * AppAttemptFinished Apart from these any other events you guys have thought about to be captured (as i remember, some where Sangjin had mentioned to capture all the life cycle events/states) ? > [Event producers] Implement RM writing app lifecycle events to ATS > -- > > Key: YARN-3044 > URL: https://issues.apache.org/jira/browse/YARN-3044 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > > Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376400#comment-14376400 ] Zhijie Shen commented on YARN-3040: --- [~sjlee0], thanks for more comments, but would you mind continuing the flow attributes discussion in YARN-3391 to unblock this jira? In this jira, how about focusing on the data flow to passing this context info to the collector? For flow info, no matter what it should be specifically, this patch works out the path to collect it from user via application submission context and pass it to RM, NM and finally to the collector. If we're okay with is approach. It is easy for us to add new flow info or correct existing flow info later on. I filed YARN-3391 to fork the flow related discussion. > [Data Model] Make putEntities operation be aware of the app's context > - > > Key: YARN-3040 > URL: https://issues.apache.org/jira/browse/YARN-3040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3040.1.patch, YARN-3040.2.patch > > > Per design in YARN-2928, implement client-side API for handling *flows*. > Frameworks should be able to define and pass in all attributes of flows and > flow runs to YARN, and they should be passed into ATS writers. > YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376380#comment-14376380 ] Hadoop QA commented on YARN-3136: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706581/0008-YARN-3136.patch against trunk revision 36af4a9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7076//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7076//artifact/patchprocess/newPatchFindbugsWarningshadoop-sls.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7076//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7076//console This message is automatically generated. > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, > 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, > 0006-YARN-3136.patch, 0007-YARN-3136.patch, 0008-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
Zhijie Shen created YARN-3391: - Summary: Clearly define flow ID/ flow run / flow version in API and storage Key: YARN-3391 URL: https://issues.apache.org/jira/browse/YARN-3391 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen To continue the discussion in YARN-3040, let's figure out the best way to describe the flow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3304: - Attachment: YARN-3304.patch Deliver a quick patch to fix it, given this is a blocker for release. > ResourceCalculatorProcessTree#getCpuUsagePercent default return value is > inconsistent with other getters > > > Key: YARN-3304 > URL: https://issues.apache.org/jira/browse/YARN-3304 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Junping Du >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: YARN-3304.patch > > > Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for > unavailable case while other resource metrics are return 0 in the same case > which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376349#comment-14376349 ] Naganarasimha G R commented on YARN-3034: - Thanks for your comments [~zjshen] bq. RM_SYSTEM_METRICS_PUBLISHER_ENABLED -> SYSTEM_METRICS_PUBLISHER_ENABLED is an incompatible change. : This i incorporated based on [~vinodkv]'s [comment|https://issues.apache.org/jira/browse/YARN-3034?focusedCommentId=14360797&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14360797], And also i have added old keys as part of {{addDeprecatedKeys}}, so i think its not an incompatible change. Please provide your opinion on the same. bq. RMTimelineCollector doesn't need to be exposed to RM and it's context. It seems to be enough to construct it inside SystemMetricsPublisher only. IIUC SystemMetricsPublisher.publish*Event methods can determine which version of ATS to publish and can post it accordingly ? > [Collector wireup] Implement RM starting its timeline collector > --- > > Key: YARN-3034 > URL: https://issues.apache.org/jira/browse/YARN-3034 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, > YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, > YARN-3034.20150320-1.patch > > > Per design in YARN-2928, implement resource managers starting their own ATS > writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3390) RMTimelineCollector should have the context info of each app
[ https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376323#comment-14376323 ] Naganarasimha G R commented on YARN-3390: - Hi [~zjshen], Shall i work on this jira ? as i can utilize the same in YARN-3044 ? > RMTimelineCollector should have the context info of each app > > > Key: YARN-3390 > URL: https://issues.apache.org/jira/browse/YARN-3390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > RMTimelineCollector should have the context info of each app whose entity > has been put -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376300#comment-14376300 ] Hudson commented on YARN-3336: -- FAILURE: Integrated in Hadoop-trunk-Commit #7405 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7405/]) YARN-3336. FileSystem memory leak in DelegationTokenRenewer. (cnauroth: rev 6ca1f12024fd7cec7b01df0f039ca59f3f365dc1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java > FileSystem memory leak in DelegationTokenRenewer > > > Key: YARN-3336 > URL: https://issues.apache.org/jira/browse/YARN-3336 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.7.0 > > Attachments: YARN-3336.000.patch, YARN-3336.001.patch, > YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch > > > FileSystem memory leak in DelegationTokenRenewer. > Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new > FileSystem entry will be added to FileSystem#CACHE which will never be > garbage collected. > This is the implementation of obtainSystemTokensForUser: > {code} > protected Token[] obtainSystemTokensForUser(String user, > final Credentials credentials) throws IOException, InterruptedException > { > // Get new hdfs tokens on behalf of this user > UserGroupInformation proxyUser = > UserGroupInformation.createProxyUser(user, > UserGroupInformation.getLoginUser()); > Token[] newTokens = > proxyUser.doAs(new PrivilegedExceptionAction[]>() { > @Override > public Token[] run() throws Exception { > return FileSystem.get(getConfig()).addDelegationTokens( > UserGroupInformation.getLoginUser().getUserName(), credentials); > } > }); > return newTokens; > } > {code} > The memory leak happened when FileSystem.get(getConfig()) is called with a > new proxy user. > Because createProxyUser will always create a new Subject. > The calling sequence is > FileSystem.get(getConfig())=>FileSystem.get(getDefaultUri(conf), > conf)=>FileSystem.CACHE.get(uri, conf)=>FileSystem.CACHE.getInternal(uri, > conf, key)=>FileSystem.CACHE.map.get(key)=>createFileSystem(uri, conf) > {code} > public static UserGroupInformation createProxyUser(String user, > UserGroupInformation realUser) { > if (user == null || user.isEmpty()) { > throw new IllegalArgumentException("Null user"); > } > if (realUser == null) { > throw new IllegalArgumentException("Null real user"); > } > Subject subject = new Subject(); > Set principals = subject.getPrincipals(); > principals.add(new User(user)); > principals.add(new RealUser(realUser)); > UserGroupInformation result =new UserGroupInformation(subject); > result.setAuthenticationMethod(AuthenticationMethod.PROXY); > return result; > } > {code} > FileSystem#Cache#Key.equals will compare the ugi > {code} > Key(URI uri, Configuration conf, long unique) throws IOException { > scheme = uri.getScheme()==null?"":uri.getScheme().toLowerCase(); > authority = > uri.getAuthority()==null?"":uri.getAuthority().toLowerCase(); > this.unique = unique; > this.ugi = UserGroupInformation.getCurrentUser(); > } > public boolean equals(Object obj) { > if (obj == this) { > return true; > } > if (obj != null && obj instanceof Key) { > Key that = (Key)obj; > return isEqual(this.scheme, that.scheme) > && isEqual(this.authority, that.authority) > && isEqual(this.ugi, that.ugi) > && (this.unique == that.unique); > } > return false; > } > {code} > UserGroupInformation.equals will compare subject by reference. > {code} > public boolean equals(Object o) { > if (o == this) { > return true; > } else if (o == null || getClass() != o.getClass()) { > return false; > } else { > return subject == ((UserGroupInformation) o).subject; > } > } > {code} > So in this case, every time createProxyUser and FileSystem.get(getConfig()) > are called, a new FileSystem will be created and a new entry will be added to > FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376285#comment-14376285 ] Sangjin Lee commented on YARN-3040: --- {quote} I can understand this particular case described above. Like my prior comment about flow run ID, my concern is whether flow/version/run's explicit hierarchy is so general to capture most use cases. IMHO, by nature, the hierarchy is the tree of flows, and a flow can be the flow of flows or the flow of apps. However, if other users just want to use one level of flow, version/run info seems to be redundant. On the other side, if use the flow recursion structure, it's elastic to have flow levels from one to many. We can treat the first level as the flow, the second as version and third and run. I don't have expertise knowledge about workflow such as Oozie, but just want to think out my concern loudly. That said, if flow/version/run is the general description of a flow, I agree we should pass in these three env vars together and separately. {quote} Agreed that we need to consider both use cases (single level and multi-level). I just want to clarify that even with one level of flows, it is possible (and in fact it is more common) that there are multiple runs for a given flow version, and multiple version for a given flow name; e.g. "foo.pig"/"v.1"/1, "foo.pig"/"v.1"/2, ..., "foo.pig"/"v.2"/10, "foo.pig"/"v.2"/11, ... Also, my mental model is that flow id/version/run-id is not a hierarchy. It's just a group of 3 attributes (although there is some implied contains relationship). Also, when we store these 3 attributes in the storage, I suspect schemas like HBase/phoenix will probably make only the flow id (name) and the flow run id as part of the primary/row key, and store the flow version in a separate table. > [Data Model] Make putEntities operation be aware of the app's context > - > > Key: YARN-3040 > URL: https://issues.apache.org/jira/browse/YARN-3040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3040.1.patch, YARN-3040.2.patch > > > Per design in YARN-2928, implement client-side API for handling *flows*. > Frameworks should be able to define and pass in all attributes of flows and > flow runs to YARN, and they should be passed into ATS writers. > YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376280#comment-14376280 ] Sangjin Lee commented on YARN-3040: --- {quote} I can understand this particular case described above. Like my prior comment about flow run ID, my concern is whether flow/version/run's explicit hierarchy is so general to capture most use cases. IMHO, by nature, the hierarchy is the tree of flows, and a flow can be the flow of flows or the flow of apps. However, if other users just want to use one level of flow, version/run info seems to be redundant. On the other side, if use the flow recursion structure, it's elastic to have flow levels from one to many. We can treat the first level as the flow, the second as version and third and run. I don't have expertise knowledge about workflow such as Oozie, but just want to think out my concern loudly. That said, if flow/version/run is the general description of a flow, I agree we should pass in these three env vars together and separately. {quote} Agreed that we need to consider both use cases (single level and multi-level). I just want to clarify that even with one level of flows, it is possible (and in fact it is more common) that there are multiple runs for a given flow version, and multiple version for a given flow name; e.g. "foo.pig"/"v.1"/1, "foo.pig"/"v.1"/2, ..., "foo.pig"/"v.2"/10, "foo.pig"/"v.2"/11, ... Also, my mental model is that flow id/version/run-id is not a hierarchy. It's just a group of 3 attributes (although there is some implied contains relationship). Also, when we store these 3 attributes in the storage, I suspect schemas like HBase/phoenix will probably make only the flow id (name) and the flow run id as part of the primary/row key, and store the flow version in a separate table. > [Data Model] Make putEntities operation be aware of the app's context > - > > Key: YARN-3040 > URL: https://issues.apache.org/jira/browse/YARN-3040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3040.1.patch, YARN-3040.2.patch > > > Per design in YARN-2928, implement client-side API for handling *flows*. > Frameworks should be able to define and pass in all attributes of flows and > flow runs to YARN, and they should be passed into ATS writers. > YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376255#comment-14376255 ] Sangjin Lee commented on YARN-3040: --- bq. I can see the benefit. For example, if it represents the timestamp, we can filter the flow runs and say give me the runs in the last 5 mins. But my concern is whether it's the general way to let user to describe a run. The design doc says the flow runs for a given flow must have "unique and totally ordered run identifiers". We obviously had numbers in mind when we had that (mostly coming from the ease of sorting and ordering in the storage). And that's the convention we will push frameworks to use. I think it is important that we make it a number (long). However, there is a difference between having numbers as run id's and having timestamps as run id's. I don't think we need to go so far as requiring timestamps as run id's. As long as they are numbers, I think it would be fine. I can imagine some flows using run id's like "1", "2", ... We could allow any arbitrary scheme to generate the run id's, but the challenge is it might seriously hamper the ability to store and sort them efficiently. And, in most cases, the timestamp of the flow start is a quite natural scheme, and I would think most frameworks will just adopt that scheme. What do you think? On a related note, we should also generate the default run id if it is missing. I realize this could be bit tricky. If the flow id is also missing, then we're treating this single YARN app as a flow in and of itself. Then we can do flow/version/run id = (yarn app name)/("1")/(app submission timestamp). This is also mentioned in the design doc. However, if the flow id is provided but not the flow run id, it can be tricky as there can be multiple YARN apps for the given flow run. One obvious solution might be to reject app submission if the flow client (not the timeline client) sets the flow id but not the flow run id. For that we'd need some kind of a common layer for checks. Thoughts? > [Data Model] Make putEntities operation be aware of the app's context > - > > Key: YARN-3040 > URL: https://issues.apache.org/jira/browse/YARN-3040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3040.1.patch, YARN-3040.2.patch > > > Per design in YARN-2928, implement client-side API for handling *flows*. > Frameworks should be able to define and pass in all attributes of flows and > flow runs to YARN, and they should be passed into ATS writers. > YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376256#comment-14376256 ] Zhijie Shen commented on YARN-3034: --- Some comments about the patch: 1. RM_SYSTEM_METRICS_PUBLISHER_ENABLED -> SYSTEM_METRICS_PUBLISHER_ENABLED is an incompatible change. 2. RMTimelineCollector doesn't need to be exposed to RM and it's context. It seems to be enough to construct it inside SystemMetricsPublisher only. bq. I would prefer for the former one as it would be simpler to review. Please provide your opinion I filed a separate Jira: YARN-3390 > [Collector wireup] Implement RM starting its timeline collector > --- > > Key: YARN-3034 > URL: https://issues.apache.org/jira/browse/YARN-3034 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, > YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, > YARN-3034.20150320-1.patch > > > Per design in YARN-2928, implement resource managers starting their own ATS > writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3390) RMTimelineCollector should have the context info of each app
Zhijie Shen created YARN-3390: - Summary: RMTimelineCollector should have the context info of each app Key: YARN-3390 URL: https://issues.apache.org/jira/browse/YARN-3390 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen RMTimelineCollector should have the context info of each app whose entity has been put -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376229#comment-14376229 ] Naganarasimha G R commented on YARN-3034: - Thanks [~sjlee0] & [~djp] for the reviews, {{" so I still suggest to add some check and warning here."}} : well currently i have logged a warning message as {{"RMTimelineCollector has not been configured to publish System Metrics in ATS V2"}} if it is not configured to publish system metrics for ATS v2. will that suffice ? bq. Zhijie Shen, can we put that work on your patch in YARN-3040? Or you suggest something else? We can do it in 2 ways , * as Zhijie suggested [earlier|https://issues.apache.org/jira/browse/YARN-3034?focusedCommentId=14372342&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14372342], we can handle it in a separate jira * can handle as part of YARN-3044 (which i am working on ) I would prefer for the former one as it would be simpler to review. Please provide your opinion > [Collector wireup] Implement RM starting its timeline collector > --- > > Key: YARN-3034 > URL: https://issues.apache.org/jira/browse/YARN-3034 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, > YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, > YARN-3034.20150320-1.patch > > > Per design in YARN-2928, implement resource managers starting their own ATS > writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376228#comment-14376228 ] Zhijie Shen commented on YARN-3034: --- Let me elaborate my previous comments. In YARN-3040, I'm working on the issue to make the context info available in app-level collector, such that when we use timeline client to put entity inside AM and NM, the entity will be automatically associated to this context. This jiar is to create RM collector. To achieve the similar thing, RM collector should have the context info available too. RM has all this information available (should be inside RMApp), such that RM collector needs to make sure this information is available in some when when putting an entity. I'm okay if you want to exclude this work here, and I'll file a separate jira for it. However, I want to exclude it from YARN-3040 to prevent the patch there growing even bigger. That one is required to unblock the framework to write their specific data, and I wish it could get in asap. > [Collector wireup] Implement RM starting its timeline collector > --- > > Key: YARN-3034 > URL: https://issues.apache.org/jira/browse/YARN-3034 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, > YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, > YARN-3034.20150320-1.patch > > > Per design in YARN-2928, implement resource managers starting their own ATS > writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0008-YARN-3136.patch > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, > 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, > 0006-YARN-3136.patch, 0007-YARN-3136.patch, 0008-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376200#comment-14376200 ] Junping Du commented on YARN-3034: -- Thanks [~Naganarasimha] for updating the patch! bq. Also, we should add a warning message log if user put something illegal here or it just get silent without any warn. This i feel is not required as we don't do this for any other configuration and also we have clearly captured the possible values in the yarn-default.xml. Most configurations get loaded as boolean value or int number. Some String configuration is for loading class, so ClassNotFound will get throw immediately if name is wrong. Here it belongs a different case, so I still suggest to add some check and warning here. For context info, [~zjshen], can we put that work on your patch in YARN-3040? Or you suggest something else? > [Collector wireup] Implement RM starting its timeline collector > --- > > Key: YARN-3034 > URL: https://issues.apache.org/jira/browse/YARN-3034 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, > YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, > YARN-3034.20150320-1.patch > > > Per design in YARN-2928, implement resource managers starting their own ATS > writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3389) Two attempts might operate on same data structures concurrently
[ https://issues.apache.org/jira/browse/YARN-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376180#comment-14376180 ] Hadoop QA commented on YARN-3389: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706549/YARN-3389.01.patch against trunk revision 0b9f12c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7075//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7075//console This message is automatically generated. > Two attempts might operate on same data structures concurrently > --- > > Key: YARN-3389 > URL: https://issues.apache.org/jira/browse/YARN-3389 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3389.01.patch > > > In AttemptFailedTransition, the new attempt will get > state('justFinishedContainers' and 'finishedContainersSentToAM') reference > from the failed attempt. Then the two attempts might operate on these two > variables concurrently, e.g. they might update 'justFinishedContainers' > concurrently when they are both handling CONTAINER_FINISHED event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376169#comment-14376169 ] Zhijie Shen commented on YARN-3040: --- bq. It sounds not quite scalable if we have one client for each app in the RM... In RM/NM, I think we can and we should implement a wrapper layer, which may contain multiple applications, to have delegator to write the data for multiple applications. bq. One most significant advantage to have run ids as integers is we can easily sort all existing runs for one flow in ascending or descending order. This might be a solid use case in general? I can see the benefit. For example, if it represents the timestamp, we can filter the flow runs and say give me the runs in the last 5 mins. But my concern is whether it's the general way to let user to describe a run. bq. Hmm, I didn't think the version as part of the flow id. I can understand this particular case described above. Like my prior comment about flow run ID, my concern is whether flow/version/run's explicit hierarchy is so general to capture most use cases. IMHO, by nature, the hierarchy is the tree of flows, and a flow can be the flow of flows or the flow of apps. However, if other users just want to use one level of flow, version/run info seems to be redundant. On the other side, if use the flow recursion structure, it's elastic to have flow levels from one to many. We can treat the first level as the flow, the second as version and third and run. I don't have expertise knowledge about workflow such as Oozie, but just want to think out my concern loudly. That said, if flow/version/run is the general description of a flow, I agree we should pass in these three env vars together and separately. bq. Mostly fine, but I have some concerns about rolling upgrades. bq. I'm still not sure why it would make sense to have different logical cluster id's every time the RM/cluster restarts. I meant the admin can configure a cluster ID explicitly, which won't be appended with the timestamp. I added it for the default value to distinguish the clusters that are started by you and me, but I think about it again, and it seems that RM restarting problem makes sense. I'll change the default not to append timestamp. > [Data Model] Make putEntities operation be aware of the app's context > - > > Key: YARN-3040 > URL: https://issues.apache.org/jira/browse/YARN-3040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3040.1.patch, YARN-3040.2.patch > > > Per design in YARN-2928, implement client-side API for handling *flows*. > Frameworks should be able to define and pass in all attributes of flows and > flow runs to YARN, and they should be passed into ATS writers. > YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376150#comment-14376150 ] Sangjin Lee commented on YARN-3034: --- LGTM. Let's wait to hear from Zhijie. > [Collector wireup] Implement RM starting its timeline collector > --- > > Key: YARN-3034 > URL: https://issues.apache.org/jira/browse/YARN-3034 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, > YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, > YARN-3034.20150320-1.patch > > > Per design in YARN-2928, implement resource managers starting their own ATS > writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376109#comment-14376109 ] Zhijie Shen commented on YARN-3047: --- bq. We can probably move it to yarn-api. I prefer to keeping it in the server module, unless it's supposed to be public to users. bq. This has to be discussed though as Zhijie Shen thinks we can use the same v1 config. My opinion is that collector should bind on a random port, which will be reported to timeline client. Reader on single daemon should start on a configured port, and users know it form the config. bq. TimelineReaderWebServer If you'd like to keep "reader, I'm fine with it, but let's still say TimelineReaderServer. Meanwhile, TimelineReaderWebService -> TimelineReaderWebService*s*. > [Data Serving] Set up ATS reader with basic request serving structure and > lifecycle > --- > > Key: YARN-3047 > URL: https://issues.apache.org/jira/browse/YARN-3047 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3047.001.patch, YARN-3047.003.patch, > YARN-3047.02.patch > > > Per design in YARN-2938, set up the ATS reader as a service and implement the > basic structure as a service. It includes lifecycle management, request > serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3111) Fix ratio problem on FairScheduler page
[ https://issues.apache.org/jira/browse/YARN-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376086#comment-14376086 ] Hadoop QA commented on YARN-3111: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706530/YARN-3111.v2.patch against trunk revision 0b9f12c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7074//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7074//console This message is automatically generated. > Fix ratio problem on FairScheduler page > --- > > Key: YARN-3111 > URL: https://issues.apache.org/jira/browse/YARN-3111 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Minor > Attachments: YARN-3111.1.patch, YARN-3111.png, YARN-3111.v2.patch, > parenttooltip.png > > > Found 3 problems on FairScheduler page: > 1. Only compute memory for ratio even when queue schedulingPolicy is DRF. > 2. When min resources is configured larger than real resources, the steady > fair share ratio is so long that it is out the page. > 3. When cluster resources is 0(no nodemanager start), ratio is displayed as > "NaN% used" > Attached image shows the snapshot of above problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3384) TestLogAggregationService.verifyContainerLogs fails after YARN-2777
[ https://issues.apache.org/jira/browse/YARN-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376067#comment-14376067 ] Hudson commented on YARN-3384: -- SUCCESS: Integrated in Hadoop-trunk-Commit #7402 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7402/]) YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. Contributed by Naganarasimha G R. (ozawa: rev 82eda771e05cf2b31788ee1582551e65f1c0f9aa) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java > TestLogAggregationService.verifyContainerLogs fails after YARN-2777 > --- > > Key: YARN-3384 > URL: https://issues.apache.org/jira/browse/YARN-3384 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Labels: test-fail > Fix For: 2.7.0 > > Attachments: YARN-3384.20150321-1.patch > > > Following test cases of TestLogAggregationService is failing : > testMultipleAppsLogAggregation > testLogAggregationServiceWithRetention > testLogAggregationServiceWithInterval > testLogAggregationServiceWithPatterns -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log
[ https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376066#comment-14376066 ] Hudson commented on YARN-2777: -- SUCCESS: Integrated in Hadoop-trunk-Commit #7402 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7402/]) YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. Contributed by Naganarasimha G R. (ozawa: rev 82eda771e05cf2b31788ee1582551e65f1c0f9aa) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java > Mark the end of individual log in aggregated log > > > Key: YARN-2777 > URL: https://issues.apache.org/jira/browse/YARN-2777 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Varun Saxena > Labels: log-aggregation > Fix For: 2.7.0 > > Attachments: YARN-2777.001.patch, YARN-2777.02.patch > > > Below is snippet of aggregated log showing hbase master log: > {code} > LogType: hbase-hbase-master-ip-172-31-34-167.log > LogUploadTime: 29-Oct-2014 22:31:55 > LogLength: 24103045 > Log Contents: > Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167 > ... > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124) > at org.apache.hadoop.hbase.Chore.run(Chore.java:80) > at java.lang.Thread.run(Thread.java:745) > LogType: hbase-hbase-master-ip-172-31-34-167.out > {code} > Since logs from various daemons are aggregated in one log file, it would be > desirable to mark the end of one log before starting with the next. > e.g. with such a line: > {code} > End of LogType: hbase-hbase-master-ip-172-31-34-167.log > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3388) userlimit isn't playing well with DRF calculator
[ https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376060#comment-14376060 ] Nathan Roberts commented on YARN-3388: -- Example (lots of things going on in this algorithm. I simplified to just the key pieces for clarity.) tuples are resources [memory] or [memory,cpu] just memory: - Queue Capacity is [100] 2 active users, both request [10] at a time User1 is at [45] User2 is at [40] Limit is calculated to be 100/2=50, both users can allocate User2 goes to [50] - now used Capacity is 45+50=95 Limit is still 50 User1 goes to [55] - used Capacity now 50+55=105 Limit is now 105/2 User2 goes to [60] - used Capacity is now 60+55=115 Limit is now 115/2 So on and so forth until maxCapacity is hit. Notice how the users essentially leap frog one another, allowing the Limit to continually move higher. memory and cpu Queue Capacity is [100,100] 2 active users, User1 asks for [10,20], User2 asks for [20,10] User1 is at [35,45] User2 is at [45,35] Limit is calculated to be [100/2=50,100/2=50], both users can allocate User2 goes to [65,45] - used Capacity is now [65+35=100,45+45=90] Limit is still [50,50] User1 goes to [45,65] - used Capacity is now [65+45=110,45+65=110] Limit is now [110/2=55, 110/2=55] User1 and User2 are now both considered over limit and neither can allocate. User1 is over on cpu, User2 is over on memory. Open to suggestions on simple ways to fix this. I'm currently thinking a reasonable (simple, effective, computationally cheap, mostly fair) approach might be to give some small percentage of additional leeway for userLimit. > userlimit isn't playing well with DRF calculator > > > Key: YARN-3388 > URL: https://issues.apache.org/jira/browse/YARN-3388 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > > When there are multiple active users in a queue, it should be possible for > those users to make use of capacity up-to max_capacity (or close). The > resources should be fairly distributed among the active users in the queue. > This works pretty well when there is a single resource being scheduled. > However, when there are multiple resources the situation gets more complex > and the current algorithm tends to get stuck at Capacity. > Example illustrated in subsequent comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3384) TestLogAggregationService.verifyContainerLogs fails after YARN-2777
[ https://issues.apache.org/jira/browse/YARN-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376059#comment-14376059 ] Naganarasimha G R commented on YARN-3384: - Thanks [~ozawa], for reviewing and committing the patch :) > TestLogAggregationService.verifyContainerLogs fails after YARN-2777 > --- > > Key: YARN-3384 > URL: https://issues.apache.org/jira/browse/YARN-3384 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Labels: test-fail > Fix For: 2.7.0 > > Attachments: YARN-3384.20150321-1.patch > > > Following test cases of TestLogAggregationService is failing : > testMultipleAppsLogAggregation > testLogAggregationServiceWithRetention > testLogAggregationServiceWithInterval > testLogAggregationServiceWithPatterns -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3384) TestLogAggregationService.verifyContainerLogs fails after YARN-2777
[ https://issues.apache.org/jira/browse/YARN-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3384: - Summary: TestLogAggregationService.verifyContainerLogs fails after YARN-2777 (was: Test failures since TestLogAggregationService.verifyContainerLogs fails after YARN-2777) > TestLogAggregationService.verifyContainerLogs fails after YARN-2777 > --- > > Key: YARN-3384 > URL: https://issues.apache.org/jira/browse/YARN-3384 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Labels: test-fail > Attachments: YARN-3384.20150321-1.patch > > > Following test cases of TestLogAggregationService is failing : > testMultipleAppsLogAggregation > testLogAggregationServiceWithRetention > testLogAggregationServiceWithInterval > testLogAggregationServiceWithPatterns -- This message was sent by Atlassian JIRA (v6.3.4#6332)