[jira] [Commented] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt
[ https://issues.apache.org/jira/browse/YARN-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377655#comment-14377655 ] Hudson commented on YARN-3393: -- FAILURE: Integrated in Hadoop-Yarn-trunk #876 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/876/]) YARN-3393. Getting application(s) goes wrong when app finishes before (xgong: rev 9fae455e26e0230107e1c6db58a49a5b6b296cf4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java Getting application(s) goes wrong when app finishes before starting the attempt --- Key: YARN-3393 URL: https://issues.apache.org/jira/browse/YARN-3393 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Critical Fix For: 2.7.0 Attachments: YARN-3393.1.patch When generating app report in ApplicationHistoryManagerOnTimelineStore, it checks if appAttempt == null. {code} ApplicationAttemptReport appAttempt = getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId()); if (appAttempt != null) { app.appReport.setHost(appAttempt.getHost()); app.appReport.setRpcPort(appAttempt.getRpcPort()); app.appReport.setTrackingUrl(appAttempt.getTrackingUrl()); app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl()); } {code} However, {{getApplicationAttempt}} doesn't return null but throws ApplicationAttemptNotFoundException: {code} if (entity == null) { throw new ApplicationAttemptNotFoundException( The entity for application attempt + appAttemptId + doesn't exist in the timeline store); } else { return convertToApplicationAttemptReport(entity); } {code} They code isn't coupled well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log
[ https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377656#comment-14377656 ] Hudson commented on YARN-2777: -- FAILURE: Integrated in Hadoop-Yarn-trunk #876 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/876/]) YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. Contributed by Naganarasimha G R. (ozawa: rev 82eda771e05cf2b31788ee1582551e65f1c0f9aa) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/CHANGES.txt Mark the end of individual log in aggregated log Key: YARN-2777 URL: https://issues.apache.org/jira/browse/YARN-2777 Project: Hadoop YARN Issue Type: Improvement Reporter: Ted Yu Assignee: Varun Saxena Labels: log-aggregation Fix For: 2.7.0 Attachments: YARN-2777.001.patch, YARN-2777.02.patch Below is snippet of aggregated log showing hbase master log: {code} LogType: hbase-hbase-master-ip-172-31-34-167.log LogUploadTime: 29-Oct-2014 22:31:55 LogLength: 24103045 Log Contents: Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167 ... at org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124) at org.apache.hadoop.hbase.Chore.run(Chore.java:80) at java.lang.Thread.run(Thread.java:745) LogType: hbase-hbase-master-ip-172-31-34-167.out {code} Since logs from various daemons are aggregated in one log file, it would be desirable to mark the end of one log before starting with the next. e.g. with such a line: {code} End of LogType: hbase-hbase-master-ip-172-31-34-167.log {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3384) TestLogAggregationService.verifyContainerLogs fails after YARN-2777
[ https://issues.apache.org/jira/browse/YARN-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377659#comment-14377659 ] Hudson commented on YARN-3384: -- FAILURE: Integrated in Hadoop-Yarn-trunk #876 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/876/]) YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. Contributed by Naganarasimha G R. (ozawa: rev 82eda771e05cf2b31788ee1582551e65f1c0f9aa) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/CHANGES.txt TestLogAggregationService.verifyContainerLogs fails after YARN-2777 --- Key: YARN-3384 URL: https://issues.apache.org/jira/browse/YARN-3384 Project: Hadoop YARN Issue Type: Bug Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Minor Labels: test-fail Fix For: 2.7.0 Attachments: YARN-3384.20150321-1.patch Following test cases of TestLogAggregationService is failing : testMultipleAppsLogAggregation testLogAggregationServiceWithRetention testLogAggregationServiceWithInterval testLogAggregationServiceWithPatterns -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3241) FairScheduler handles invalid queue names inconsistently
[ https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377649#comment-14377649 ] Hudson commented on YARN-3241: -- FAILURE: Integrated in Hadoop-Yarn-trunk #876 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/876/]) YARN-3241. FairScheduler handles invalid queue names inconsistently. (Zhihai Xu via kasha) (kasha: rev 2bc097cd14692e6ceb06bff959f28531534eb307) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/InvalidQueueNameException.java FairScheduler handles invalid queue names inconsistently -- Key: YARN-3241 URL: https://issues.apache.org/jira/browse/YARN-3241 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-3241.000.patch, YARN-3241.001.patch, YARN-3241.002.patch Leading space, trailing space and empty sub queue name may cause MetricsException(Metrics source XXX already exists! ) when add application to FairScheduler. The reason is because QueueMetrics parse the queue name different from the QueueManager. QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space and trailing space in the sub queue name, It will also remove empty sub queue name. {code} static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); {code} But QueueManager won't remove Leading space, trailing space and empty sub queue name. This will cause out of sync between FSQueue and FSQueueMetrics. QueueManager will think two queue names are different so it will try to create a new queue. But FSQueueMetrics will treat these two queue names as same queue which will create Metrics source XXX already exists! MetricsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377657#comment-14377657 ] Hudson commented on YARN-3336: -- FAILURE: Integrated in Hadoop-Yarn-trunk #876 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/876/]) YARN-3336. FileSystem memory leak in DelegationTokenRenewer. (cnauroth: rev 6ca1f12024fd7cec7b01df0f039ca59f3f365dc1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Fix For: 2.7.0 Attachments: YARN-3336.000.patch, YARN-3336.001.patch, YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. The calling sequence is FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf) {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } {code} UserGroupInformation.equals will compare subject by reference. {code} public boolean equals(Object o) { if (o == this) { return true; } else if (o == null || getClass() != o.getClass()) { return false; } else { return subject == ((UserGroupInformation) o).subject; } } {code} So in this case, every time createProxyUser and FileSystem.get(getConfig()) are called, a new FileSystem will be created and a new entry will be added to FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) FairScheduler: Metric for latency to allocate first container for an application
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377653#comment-14377653 ] Hudson commented on YARN-2868: -- FAILURE: Integrated in Hadoop-Yarn-trunk #876 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/876/]) YARN-2868. FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java FairScheduler: Metric for latency to allocate first container for an application Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Fix For: 2.8.0 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch, YARN-2868.012.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3241) FairScheduler handles invalid queue names inconsistently
[ https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377924#comment-14377924 ] Hudson commented on YARN-3241: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2074 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2074/]) YARN-3241. FairScheduler handles invalid queue names inconsistently. (Zhihai Xu via kasha) (kasha: rev 2bc097cd14692e6ceb06bff959f28531534eb307) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/InvalidQueueNameException.java FairScheduler handles invalid queue names inconsistently -- Key: YARN-3241 URL: https://issues.apache.org/jira/browse/YARN-3241 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-3241.000.patch, YARN-3241.001.patch, YARN-3241.002.patch Leading space, trailing space and empty sub queue name may cause MetricsException(Metrics source XXX already exists! ) when add application to FairScheduler. The reason is because QueueMetrics parse the queue name different from the QueueManager. QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space and trailing space in the sub queue name, It will also remove empty sub queue name. {code} static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); {code} But QueueManager won't remove Leading space, trailing space and empty sub queue name. This will cause out of sync between FSQueue and FSQueueMetrics. QueueManager will think two queue names are different so it will try to create a new queue. But FSQueueMetrics will treat these two queue names as same queue which will create Metrics source XXX already exists! MetricsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log
[ https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377931#comment-14377931 ] Hudson commented on YARN-2777: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2074 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2074/]) YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. Contributed by Naganarasimha G R. (ozawa: rev 82eda771e05cf2b31788ee1582551e65f1c0f9aa) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java Mark the end of individual log in aggregated log Key: YARN-2777 URL: https://issues.apache.org/jira/browse/YARN-2777 Project: Hadoop YARN Issue Type: Improvement Reporter: Ted Yu Assignee: Varun Saxena Labels: log-aggregation Fix For: 2.7.0 Attachments: YARN-2777.001.patch, YARN-2777.02.patch Below is snippet of aggregated log showing hbase master log: {code} LogType: hbase-hbase-master-ip-172-31-34-167.log LogUploadTime: 29-Oct-2014 22:31:55 LogLength: 24103045 Log Contents: Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167 ... at org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124) at org.apache.hadoop.hbase.Chore.run(Chore.java:80) at java.lang.Thread.run(Thread.java:745) LogType: hbase-hbase-master-ip-172-31-34-167.out {code} Since logs from various daemons are aggregated in one log file, it would be desirable to mark the end of one log before starting with the next. e.g. with such a line: {code} End of LogType: hbase-hbase-master-ip-172-31-34-167.log {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377932#comment-14377932 ] Hudson commented on YARN-3336: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2074 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2074/]) YARN-3336. FileSystem memory leak in DelegationTokenRenewer. (cnauroth: rev 6ca1f12024fd7cec7b01df0f039ca59f3f365dc1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Fix For: 2.7.0 Attachments: YARN-3336.000.patch, YARN-3336.001.patch, YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. The calling sequence is FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf) {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } {code} UserGroupInformation.equals will compare subject by reference. {code} public boolean equals(Object o) { if (o == this) { return true; } else if (o == null || getClass() != o.getClass()) { return false; } else { return subject == ((UserGroupInformation) o).subject; } } {code} So in this case, every time createProxyUser and FileSystem.get(getConfig()) are called, a new FileSystem will be created and a new entry will be added to FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3384) TestLogAggregationService.verifyContainerLogs fails after YARN-2777
[ https://issues.apache.org/jira/browse/YARN-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377934#comment-14377934 ] Hudson commented on YARN-3384: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2074 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2074/]) YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. Contributed by Naganarasimha G R. (ozawa: rev 82eda771e05cf2b31788ee1582551e65f1c0f9aa) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java TestLogAggregationService.verifyContainerLogs fails after YARN-2777 --- Key: YARN-3384 URL: https://issues.apache.org/jira/browse/YARN-3384 Project: Hadoop YARN Issue Type: Bug Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Minor Labels: test-fail Fix For: 2.7.0 Attachments: YARN-3384.20150321-1.patch Following test cases of TestLogAggregationService is failing : testMultipleAppsLogAggregation testLogAggregationServiceWithRetention testLogAggregationServiceWithInterval testLogAggregationServiceWithPatterns -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt
[ https://issues.apache.org/jira/browse/YARN-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377930#comment-14377930 ] Hudson commented on YARN-3393: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2074 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2074/]) YARN-3393. Getting application(s) goes wrong when app finishes before (xgong: rev 9fae455e26e0230107e1c6db58a49a5b6b296cf4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/CHANGES.txt Getting application(s) goes wrong when app finishes before starting the attempt --- Key: YARN-3393 URL: https://issues.apache.org/jira/browse/YARN-3393 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Critical Fix For: 2.7.0 Attachments: YARN-3393.1.patch When generating app report in ApplicationHistoryManagerOnTimelineStore, it checks if appAttempt == null. {code} ApplicationAttemptReport appAttempt = getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId()); if (appAttempt != null) { app.appReport.setHost(appAttempt.getHost()); app.appReport.setRpcPort(appAttempt.getRpcPort()); app.appReport.setTrackingUrl(appAttempt.getTrackingUrl()); app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl()); } {code} However, {{getApplicationAttempt}} doesn't return null but throws ApplicationAttemptNotFoundException: {code} if (entity == null) { throw new ApplicationAttemptNotFoundException( The entity for application attempt + appAttemptId + doesn't exist in the timeline store); } else { return convertToApplicationAttemptReport(entity); } {code} They code isn't coupled well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) FairScheduler: Metric for latency to allocate first container for an application
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377928#comment-14377928 ] Hudson commented on YARN-2868: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2074 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2074/]) YARN-2868. FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/CHANGES.txt FairScheduler: Metric for latency to allocate first container for an application Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Fix For: 2.8.0 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch, YARN-2868.012.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377920#comment-14377920 ] Junping Du commented on YARN-3034: -- Thanks [~Naganarasimha] for updating the patch! Latest patch LGTM. [~zjshen] and [~sjlee0], do you have further comments? If not, I will go ahead and commit it today. [Collector wireup] Implement RM starting its timeline collector --- Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3024.20150324-1.patch, YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, YARN-3034.20150320-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log
[ https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377874#comment-14377874 ] Hudson commented on YARN-2777: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2092 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2092/]) YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. Contributed by Naganarasimha G R. (ozawa: rev 82eda771e05cf2b31788ee1582551e65f1c0f9aa) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java Mark the end of individual log in aggregated log Key: YARN-2777 URL: https://issues.apache.org/jira/browse/YARN-2777 Project: Hadoop YARN Issue Type: Improvement Reporter: Ted Yu Assignee: Varun Saxena Labels: log-aggregation Fix For: 2.7.0 Attachments: YARN-2777.001.patch, YARN-2777.02.patch Below is snippet of aggregated log showing hbase master log: {code} LogType: hbase-hbase-master-ip-172-31-34-167.log LogUploadTime: 29-Oct-2014 22:31:55 LogLength: 24103045 Log Contents: Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167 ... at org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124) at org.apache.hadoop.hbase.Chore.run(Chore.java:80) at java.lang.Thread.run(Thread.java:745) LogType: hbase-hbase-master-ip-172-31-34-167.out {code} Since logs from various daemons are aggregated in one log file, it would be desirable to mark the end of one log before starting with the next. e.g. with such a line: {code} End of LogType: hbase-hbase-master-ip-172-31-34-167.log {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377875#comment-14377875 ] Hudson commented on YARN-3336: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2092 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2092/]) YARN-3336. FileSystem memory leak in DelegationTokenRenewer. (cnauroth: rev 6ca1f12024fd7cec7b01df0f039ca59f3f365dc1) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Fix For: 2.7.0 Attachments: YARN-3336.000.patch, YARN-3336.001.patch, YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. The calling sequence is FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf) {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } {code} UserGroupInformation.equals will compare subject by reference. {code} public boolean equals(Object o) { if (o == this) { return true; } else if (o == null || getClass() != o.getClass()) { return false; } else { return subject == ((UserGroupInformation) o).subject; } } {code} So in this case, every time createProxyUser and FileSystem.get(getConfig()) are called, a new FileSystem will be created and a new entry will be added to FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3384) TestLogAggregationService.verifyContainerLogs fails after YARN-2777
[ https://issues.apache.org/jira/browse/YARN-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377869#comment-14377869 ] Hudson commented on YARN-3384: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2092 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2092/]) YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. Contributed by Naganarasimha G R. (ozawa: rev 82eda771e05cf2b31788ee1582551e65f1c0f9aa) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java TestLogAggregationService.verifyContainerLogs fails after YARN-2777 --- Key: YARN-3384 URL: https://issues.apache.org/jira/browse/YARN-3384 Project: Hadoop YARN Issue Type: Bug Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Minor Labels: test-fail Fix For: 2.7.0 Attachments: YARN-3384.20150321-1.patch Following test cases of TestLogAggregationService is failing : testMultipleAppsLogAggregation testLogAggregationServiceWithRetention testLogAggregationServiceWithInterval testLogAggregationServiceWithPatterns -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3241) FairScheduler handles invalid queue names inconsistently
[ https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377862#comment-14377862 ] Hudson commented on YARN-3241: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2092 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2092/]) YARN-3241. FairScheduler handles invalid queue names inconsistently. (Zhihai Xu via kasha) (kasha: rev 2bc097cd14692e6ceb06bff959f28531534eb307) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/InvalidQueueNameException.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java FairScheduler handles invalid queue names inconsistently -- Key: YARN-3241 URL: https://issues.apache.org/jira/browse/YARN-3241 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-3241.000.patch, YARN-3241.001.patch, YARN-3241.002.patch Leading space, trailing space and empty sub queue name may cause MetricsException(Metrics source XXX already exists! ) when add application to FairScheduler. The reason is because QueueMetrics parse the queue name different from the QueueManager. QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space and trailing space in the sub queue name, It will also remove empty sub queue name. {code} static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); {code} But QueueManager won't remove Leading space, trailing space and empty sub queue name. This will cause out of sync between FSQueue and FSQueueMetrics. QueueManager will think two queue names are different so it will try to create a new queue. But FSQueueMetrics will treat these two queue names as same queue which will create Metrics source XXX already exists! MetricsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt
[ https://issues.apache.org/jira/browse/YARN-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377868#comment-14377868 ] Hudson commented on YARN-3393: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2092 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2092/]) YARN-3393. Getting application(s) goes wrong when app finishes before (xgong: rev 9fae455e26e0230107e1c6db58a49a5b6b296cf4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/CHANGES.txt Getting application(s) goes wrong when app finishes before starting the attempt --- Key: YARN-3393 URL: https://issues.apache.org/jira/browse/YARN-3393 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Critical Fix For: 2.7.0 Attachments: YARN-3393.1.patch When generating app report in ApplicationHistoryManagerOnTimelineStore, it checks if appAttempt == null. {code} ApplicationAttemptReport appAttempt = getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId()); if (appAttempt != null) { app.appReport.setHost(appAttempt.getHost()); app.appReport.setRpcPort(appAttempt.getRpcPort()); app.appReport.setTrackingUrl(appAttempt.getTrackingUrl()); app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl()); } {code} However, {{getApplicationAttempt}} doesn't return null but throws ApplicationAttemptNotFoundException: {code} if (entity == null) { throw new ApplicationAttemptNotFoundException( The entity for application attempt + appAttemptId + doesn't exist in the timeline store); } else { return convertToApplicationAttemptReport(entity); } {code} They code isn't coupled well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) FairScheduler: Metric for latency to allocate first container for an application
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377867#comment-14377867 ] Hudson commented on YARN-2868: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2092 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2092/]) YARN-2868. FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/CHANGES.txt FairScheduler: Metric for latency to allocate first container for an application Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Fix For: 2.8.0 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch, YARN-2868.012.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA
[ https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377876#comment-14377876 ] Hudson commented on YARN-1880: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2092 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2092/]) YARN-1880. Cleanup TestApplicationClientProtocolOnHA. Contributed by ozawa. (harsh: rev fbceb3b41834d6899c4353fb24f12ba3ecf67faf) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationClientProtocolOnHA.java Cleanup TestApplicationClientProtocolOnHA - Key: YARN-1880 URL: https://issues.apache.org/jira/browse/YARN-1880 Project: Hadoop YARN Issue Type: Test Components: test Affects Versions: 2.6.0 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Priority: Trivial Fix For: 2.8.0 Attachments: YARN-1880.1.patch The tests introduced on YARN-1521 includes multiple assertion with . We should separate them because it's difficult to identify which condition is illegal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability
[ https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377886#comment-14377886 ] Hadoop QA commented on YARN-1902: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12638931/YARN-1902.v3.patch against trunk revision 3ca5bd1. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7090//console This message is automatically generated. Allocation of too many containers when a second request is done with the same resource capability - Key: YARN-1902 URL: https://issues.apache.org/jira/browse/YARN-1902 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.2.0, 2.3.0, 2.4.0 Reporter: Sietse T. Au Labels: client Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch Regarding AMRMClientImpl Scenario 1: Given a ContainerRequest x with Resource y, when addContainerRequest is called z times with x, allocate is called and at least one of the z allocated containers is started, then if another addContainerRequest call is done and subsequently an allocate call to the RM, (z+1) containers will be allocated, where 1 container is expected. Scenario 2: No containers are started between the allocate calls. Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) are requested in both scenarios, but that only in the second scenario, the correct behavior is observed. Looking at the implementation I have found that this (z+1) request is caused by the structure of the remoteRequestsTable. The consequence of MapResource, ResourceRequestInfo is that ResourceRequestInfo does not hold any information about whether a request has been sent to the RM yet or not. There are workarounds for this, such as releasing the excess containers received. The solution implemented is to initialize a new ResourceRequest in ResourceRequestInfo when a request has been successfully sent to the RM. The patch includes a test in which scenario one is tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA
[ https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377935#comment-14377935 ] Hudson commented on YARN-1880: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2074 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2074/]) YARN-1880. Cleanup TestApplicationClientProtocolOnHA. Contributed by ozawa. (harsh: rev fbceb3b41834d6899c4353fb24f12ba3ecf67faf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationClientProtocolOnHA.java * hadoop-yarn-project/CHANGES.txt Cleanup TestApplicationClientProtocolOnHA - Key: YARN-1880 URL: https://issues.apache.org/jira/browse/YARN-1880 Project: Hadoop YARN Issue Type: Test Components: test Affects Versions: 2.6.0 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Priority: Trivial Fix For: 2.8.0 Attachments: YARN-1880.1.patch The tests introduced on YARN-1521 includes multiple assertion with . We should separate them because it's difficult to identify which condition is illegal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377589#comment-14377589 ] Hadoop QA commented on YARN-3034: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706868/YARN-3024.20150324-1.patch against trunk revision c6c396f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7089//console This message is automatically generated. [Collector wireup] Implement RM starting its timeline collector --- Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3024.20150324-1.patch, YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, YARN-3034.20150320-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377604#comment-14377604 ] Naganarasimha G R commented on YARN-3034: - [~zjshen], I have uploaded the patch with the changes which you mentioned for the configuration. Please review. [Collector wireup] Implement RM starting its timeline collector --- Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3024.20150324-1.patch, YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, YARN-3034.20150320-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) FairScheduler: Metric for latency to allocate first container for an application
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377637#comment-14377637 ] Hudson commented on YARN-2868: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #142 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/142/]) YARN-2868. FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java FairScheduler: Metric for latency to allocate first container for an application Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Fix For: 2.8.0 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch, YARN-2868.012.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3241) FairScheduler handles invalid queue names inconsistently
[ https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377633#comment-14377633 ] Hudson commented on YARN-3241: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #142 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/142/]) YARN-3241. FairScheduler handles invalid queue names inconsistently. (Zhihai Xu via kasha) (kasha: rev 2bc097cd14692e6ceb06bff959f28531534eb307) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/InvalidQueueNameException.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java FairScheduler handles invalid queue names inconsistently -- Key: YARN-3241 URL: https://issues.apache.org/jira/browse/YARN-3241 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-3241.000.patch, YARN-3241.001.patch, YARN-3241.002.patch Leading space, trailing space and empty sub queue name may cause MetricsException(Metrics source XXX already exists! ) when add application to FairScheduler. The reason is because QueueMetrics parse the queue name different from the QueueManager. QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space and trailing space in the sub queue name, It will also remove empty sub queue name. {code} static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); {code} But QueueManager won't remove Leading space, trailing space and empty sub queue name. This will cause out of sync between FSQueue and FSQueueMetrics. QueueManager will think two queue names are different so it will try to create a new queue. But FSQueueMetrics will treat these two queue names as same queue which will create Metrics source XXX already exists! MetricsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3384) TestLogAggregationService.verifyContainerLogs fails after YARN-2777
[ https://issues.apache.org/jira/browse/YARN-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377643#comment-14377643 ] Hudson commented on YARN-3384: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #142 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/142/]) YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. Contributed by Naganarasimha G R. (ozawa: rev 82eda771e05cf2b31788ee1582551e65f1c0f9aa) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java TestLogAggregationService.verifyContainerLogs fails after YARN-2777 --- Key: YARN-3384 URL: https://issues.apache.org/jira/browse/YARN-3384 Project: Hadoop YARN Issue Type: Bug Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Minor Labels: test-fail Fix For: 2.7.0 Attachments: YARN-3384.20150321-1.patch Following test cases of TestLogAggregationService is failing : testMultipleAppsLogAggregation testLogAggregationServiceWithRetention testLogAggregationServiceWithInterval testLogAggregationServiceWithPatterns -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt
[ https://issues.apache.org/jira/browse/YARN-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377639#comment-14377639 ] Hudson commented on YARN-3393: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #142 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/142/]) YARN-3393. Getting application(s) goes wrong when app finishes before (xgong: rev 9fae455e26e0230107e1c6db58a49a5b6b296cf4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java Getting application(s) goes wrong when app finishes before starting the attempt --- Key: YARN-3393 URL: https://issues.apache.org/jira/browse/YARN-3393 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Critical Fix For: 2.7.0 Attachments: YARN-3393.1.patch When generating app report in ApplicationHistoryManagerOnTimelineStore, it checks if appAttempt == null. {code} ApplicationAttemptReport appAttempt = getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId()); if (appAttempt != null) { app.appReport.setHost(appAttempt.getHost()); app.appReport.setRpcPort(appAttempt.getRpcPort()); app.appReport.setTrackingUrl(appAttempt.getTrackingUrl()); app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl()); } {code} However, {{getApplicationAttempt}} doesn't return null but throws ApplicationAttemptNotFoundException: {code} if (entity == null) { throw new ApplicationAttemptNotFoundException( The entity for application attempt + appAttemptId + doesn't exist in the timeline store); } else { return convertToApplicationAttemptReport(entity); } {code} They code isn't coupled well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377641#comment-14377641 ] Hudson commented on YARN-3336: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #142 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/142/]) YARN-3336. FileSystem memory leak in DelegationTokenRenewer. (cnauroth: rev 6ca1f12024fd7cec7b01df0f039ca59f3f365dc1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Fix For: 2.7.0 Attachments: YARN-3336.000.patch, YARN-3336.001.patch, YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. The calling sequence is FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf) {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } {code} UserGroupInformation.equals will compare subject by reference. {code} public boolean equals(Object o) { if (o == this) { return true; } else if (o == null || getClass() != o.getClass()) { return false; } else { return subject == ((UserGroupInformation) o).subject; } } {code} So in this case, every time createProxyUser and FileSystem.get(getConfig()) are called, a new FileSystem will be created and a new entry will be added to FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA
[ https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377644#comment-14377644 ] Hudson commented on YARN-1880: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #142 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/142/]) YARN-1880. Cleanup TestApplicationClientProtocolOnHA. Contributed by ozawa. (harsh: rev fbceb3b41834d6899c4353fb24f12ba3ecf67faf) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationClientProtocolOnHA.java Cleanup TestApplicationClientProtocolOnHA - Key: YARN-1880 URL: https://issues.apache.org/jira/browse/YARN-1880 Project: Hadoop YARN Issue Type: Test Components: test Affects Versions: 2.6.0 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Priority: Trivial Fix For: 2.8.0 Attachments: YARN-1880.1.patch The tests introduced on YARN-1521 includes multiple assertion with . We should separate them because it's difficult to identify which condition is illegal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log
[ https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377640#comment-14377640 ] Hudson commented on YARN-2777: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #142 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/142/]) YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. Contributed by Naganarasimha G R. (ozawa: rev 82eda771e05cf2b31788ee1582551e65f1c0f9aa) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java Mark the end of individual log in aggregated log Key: YARN-2777 URL: https://issues.apache.org/jira/browse/YARN-2777 Project: Hadoop YARN Issue Type: Improvement Reporter: Ted Yu Assignee: Varun Saxena Labels: log-aggregation Fix For: 2.7.0 Attachments: YARN-2777.001.patch, YARN-2777.02.patch Below is snippet of aggregated log showing hbase master log: {code} LogType: hbase-hbase-master-ip-172-31-34-167.log LogUploadTime: 29-Oct-2014 22:31:55 LogLength: 24103045 Log Contents: Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167 ... at org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124) at org.apache.hadoop.hbase.Chore.run(Chore.java:80) at java.lang.Thread.run(Thread.java:745) LogType: hbase-hbase-master-ip-172-31-34-167.out {code} Since logs from various daemons are aggregated in one log file, it would be desirable to mark the end of one log before starting with the next. e.g. with such a line: {code} End of LogType: hbase-hbase-master-ip-172-31-34-167.log {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA
[ https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377660#comment-14377660 ] Hudson commented on YARN-1880: -- FAILURE: Integrated in Hadoop-Yarn-trunk #876 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/876/]) YARN-1880. Cleanup TestApplicationClientProtocolOnHA. Contributed by ozawa. (harsh: rev fbceb3b41834d6899c4353fb24f12ba3ecf67faf) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationClientProtocolOnHA.java Cleanup TestApplicationClientProtocolOnHA - Key: YARN-1880 URL: https://issues.apache.org/jira/browse/YARN-1880 Project: Hadoop YARN Issue Type: Test Components: test Affects Versions: 2.6.0 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Priority: Trivial Fix For: 2.8.0 Attachments: YARN-1880.1.patch The tests introduced on YARN-1521 includes multiple assertion with . We should separate them because it's difficult to identify which condition is illegal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3034: Attachment: YARN-3024.20150324-1.patch [Collector wireup] Implement RM starting its timeline collector --- Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3024.20150324-1.patch, YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, YARN-3034.20150320-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377753#comment-14377753 ] Junping Du commented on YARN-3304: -- Thanks for comments, [~kasha]! bq. In previous releases, we have never called these APIs Public even if they were intended to be sub-classed. In my mind, this is the last opportunity to decide on what the API should do? I think consistent and reasonable return values should be given a higher priority over compatibility. Agree on the priority here. However, having consistent and reasonable return values doesn't have to break compatibility (or consistent behaviors) - just like the way I proposed above, we can consistently return resource value to 0 if they are unavailable and we have an additional flag to mark if resource is available or not. bq. I am okay with adding boolean methods to capture unavailability, but that seems a little overboard. Using -1 in the ResourceCalculatorProcessTree is okay by me. My concern was with logging this -1 value in the metrics. In either case, I would like for the container usage metrics to see if the usage is available before logging the same. I agree both ways can work. However, I think adding a boolean method sounds better, at least former. More important, it doesn't break any consistent behavior of previous releases. We don't need to break it if we don't have to. Isn't it? bq. Since it is not too much work or risk, I would prefer we fix both in 2.7. This is solely wearing my Apache hat on. My Cloudera hat doesn't really mind it being in 2.8 vs 2.7. My idea is simple here: a fast-moving, regular and predictable release train could benefit our community and ecosystem in many aspects. I also have other wish list that cannot catch up 2.7. When this patch get in, I am not sure if YARN-3392 is still a blocker for 2.7 and I would also prefer a fix rather than a pending JIRA there delay the release unnecessarily. [~vinodkv], [~kasha] and [~adhoot], what do you think? ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters Key: YARN-3304 URL: https://issues.apache.org/jira/browse/YARN-3304 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Karthik Kambatla Priority: Blocker Attachments: YARN-3304-v2.patch, YARN-3304.patch Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for unavailable case while other resource metrics are return 0 in the same case which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3040: -- Attachment: YARN-3040.4.patch [Data Model] Make putEntities operation be aware of the app's context - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, YARN-3040.4.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3395) [Fair Scheduler] Handle the user name correctly when submit application and use user name as default queue name.
[ https://issues.apache.org/jira/browse/YARN-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3395: -- Summary: [Fair Scheduler] Handle the user name correctly when submit application and use user name as default queue name. (was: Handle the user name correctly when submit application and use user name as default queue name.) [Fair Scheduler] Handle the user name correctly when submit application and use user name as default queue name. Key: YARN-3395 URL: https://issues.apache.org/jira/browse/YARN-3395 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3395.000.patch Handle the user name correctly when submit application and use user name as default queue name. We should reject the application with an empty or whitespace only user name. because it doesn't make sense to have an empty or whitespace only user name. We should remove the trailing and leading whitespace of the user name when we use user name as default queue name, otherwise it will be rejected by InvalidQueueNameException from QueueManager. I think this change make sense, because it will be compatible with queue name convention and also we already did similar thing for '.' in user name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3395) Handle the user name correctly when submit application and use user name as default queue name.
[ https://issues.apache.org/jira/browse/YARN-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3395: Attachment: YARN-3395.000.patch Handle the user name correctly when submit application and use user name as default queue name. --- Key: YARN-3395 URL: https://issues.apache.org/jira/browse/YARN-3395 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3395.000.patch Handle the user name correctly when submit application and use user name as default queue name. We should reject the application with an empty or whitespace only user name. because it doesn't make sense to have an empty or whitespace only user name. We should remove the trailing and leading whitespace of the user name when we use user name as default queue name, otherwise it will be rejected by InvalidQueueNameException from QueueManager. I think this change make sense, because it will be compatible with queue name convention and also we already did similar thing for '.' in user name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3395) Handle the user name correctly when submit application and use user name as default queue name.
[ https://issues.apache.org/jira/browse/YARN-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378879#comment-14378879 ] zhihai xu commented on YARN-3395: - I uploaded a patch YARN-3395.000.patch for review. I added two test case in TestFairScheduler Without the change, both tests will fail. Handle the user name correctly when submit application and use user name as default queue name. --- Key: YARN-3395 URL: https://issues.apache.org/jira/browse/YARN-3395 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3395.000.patch Handle the user name correctly when submit application and use user name as default queue name. We should reject the application with an empty or whitespace only user name. because it doesn't make sense to have an empty or whitespace only user name. We should remove the trailing and leading whitespace of the user name when we use user name as default queue name, otherwise it will be rejected by InvalidQueueNameException from QueueManager. I think this change make sense, because it will be compatible with queue name convention and also we already did similar thing for '.' in user name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378946#comment-14378946 ] Sangjin Lee commented on YARN-3040: --- bq. It will check if the tag starts with TIMELINE_FLOW_ID_TAG:, and then if the value is empty, TIMELINE_FLOW_ID_TAG:.substring(TIMELINE_FLOW_ID_TAG.length() + 1) will return an empty value. It shouldn't throw IndexOutOfBoundsException. But it seems there's no need to add an empty env, I'll change the code accordingly. Ack. I was thrown off because the code was like {code} if (tag.startsWith(TAG + :)) { String value = tag.substring(TAG.length() + 1); } {code} It works because the +1 is really for the semi-colon. LGTM overall. [Data Model] Make putEntities operation be aware of the app's context - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, YARN-3040.4.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378666#comment-14378666 ] Zhijie Shen commented on YARN-3040: --- Thanks for review, Sangjin and Junping! I've updated the patch accordingly. bq. I am comfortable with continuing to work on the flow-related items in the separate JIRA. Thanks. This sounds good. bq. I'm not sure of these set calls. Are these here just to initialize the context to default values? Yes, these are the defaults. In fact, user will sure to be update by the rpc call to get the context info (unless there's bug in the RPC). The current user for initialization is usually not correct, but I kept to have a value to ensure we have something to pass to the storage to prevent possible NPE that will crush the process. Instead, we can easily debug/inspect the storage to verify the user if bug occurs. I add some code comments for the initialization. bq. I would prefer something like yarn.cluster.id because this id is for identifying YARN cluster rather than ResourceManager. I also agree yarn.cluster.id sounds better, but yarn.resourcemanager.cluster-id is the legacy name, which is used by RM HA for a while. As it's not sound so bad, how about keeping it, such that we don't need to deprecate config or break compatibility. bq. Can we add a test case that without specifying flow_id and flow_run_id and v2 timeline service still can work? Added the test case in the new patch bq. Do we need to be case-insensitive here? I think we can be strict about the tag names? This is because the tag text has case sensitive and insensitive mode. When insensitive, even if user inputs the upper case strings, it will be normalized to lower case strings. So we need to take care this case. bq. You might want to be bit defensive about the tag not carrying any value (e.g. TIMELINE_FLOW_ID_TAG:). It will check if the tag starts with TIMELINE_FLOW_ID_TAG:, and then if the value is empty, {{TIMELINE_FLOW_ID_TAG:.substring(TIMELINE_FLOW_ID_TAG.length() + 1)}} will return an empty value. It shouldn't throw IndexOutOfBoundsException. But it seems there's no need to add an empty env, I'll change the code accordingly. In addition, I fixed a couple test failure in the new patch. [Data Model] Make putEntities operation be aware of the app's context - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor
[ https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378780#comment-14378780 ] Hadoop QA commented on YARN-3365: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707006/YARN-3365.001.patch against trunk revision a16bfff. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7095//console This message is automatically generated. Add support for using the 'tc' tool via container-executor -- Key: YARN-3365 URL: https://issues.apache.org/jira/browse/YARN-3365 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3365.001.patch We need the following functionality : 1) modify network interface traffic shaping rules - to be able to attach a qdisc, create child classes etc 2) read existing rules in place 3) read stats for the various classes Using tc requires elevated privileges - hence this functionality is to be made available via container-executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container
[ https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378805#comment-14378805 ] Hadoop QA commented on YARN-1621: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706974/YARN-1621.6.patch against trunk revision a16bfff. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapreduce.security.TestJHSSecurity Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7093//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7093//console This message is automatically generated. Add CLI to list rows of task attempt ID, container ID, host of container, state of container -- Key: YARN-1621 URL: https://issues.apache.org/jira/browse/YARN-1621 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Tassapol Athiapinya Assignee: Bartosz Ługowski Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch, YARN-1621.4.patch, YARN-1621.5.patch, YARN-1621.6.patch As more applications are moved to YARN, we need generic CLI to list rows of task attempt ID, container ID, host of container, state of container. Today if YARN application running in a container does hang, there is no way to find out more info because a user does not know where each attempt is running in. For each running application, it is useful to differentiate between running/succeeded/failed/killed containers. {code:title=proposed yarn cli} $ yarn application -list-containers -applicationId appId [-containerState state of container] where containerState is optional filter to list container in given state only. container state can be running/succeeded/killed/failed/all. A user can specify more than one container state at once e.g. KILLED,FAILED. task attempt ID container ID host of container state of container {code} CLI should work with running application/completed application. If a container runs many task attempts, all attempts should be shown. That will likely be the case of Tez container-reuse application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3395) Handle the user name correctly when submit application and use user name as default queue name.
zhihai xu created YARN-3395: --- Summary: Handle the user name correctly when submit application and use user name as default queue name. Key: YARN-3395 URL: https://issues.apache.org/jira/browse/YARN-3395 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: zhihai xu Assignee: zhihai xu Handle the user name correctly when submit application and use user name as default queue name. We should reject the application with an empty or whitespace only user name. because it doesn't make sense to have an empty or whitespace only user name. We should remove the trailing and leading whitespace of the user name when we use user name as default queue name, otherwise it will be rejected by InvalidQueueNameException from QueueManager. I think this change make sense, because it will be compatible with queue name convention and also we already did similar thing for '.' in user name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2654) Revisit all shared cache config parameters to ensure quality names
[ https://issues.apache.org/jira/browse/YARN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378942#comment-14378942 ] Vinod Kumar Vavilapalli commented on YARN-2654: --- WON'T FIX is perhaps the right resolution.. Revisit all shared cache config parameters to ensure quality names -- Key: YARN-2654 URL: https://issues.apache.org/jira/browse/YARN-2654 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Priority: Blocker Attachments: shared_cache_config_parameters.txt Revisit all the shared cache config parameters in YarnConfiguration and yarn-default.xml to ensure quality names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3394) WebApplication proxy documentation is incomplete
[ https://issues.apache.org/jira/browse/YARN-3394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3394: Attachment: YARN-3394.20150324-1.patch WebApplicationProxy.html Attaching the sample html and the patch to fix the issue WebApplication proxy documentation is incomplete - Key: YARN-3394 URL: https://issues.apache.org/jira/browse/YARN-3394 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Naganarasimha G R Priority: Minor Attachments: WebApplicationProxy.html, YARN-3394.20150324-1.patch Webproxy documentation is incomplete hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html 1.Configuration of service start/stop as separate server 2.Steps to start as daemon service 3.Secure mode for Web proxy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377984#comment-14377984 ] Junping Du commented on YARN-3040: -- Some additional comments: {code} property +descriptionThe YARN cluster ID./description +nameyarn.resourcemanager.cluster-id/name +valueyarn_cluster/value + /property {code} I would prefer something like yarn.cluster.id because this id is for identifying YARN cluster rather than ResourceManager. It should keep consistent across RMs (active and standby) get switch over. Also other names like: RM_CLUSTER_ID, DEFAULT_RM_CLUSTER_ID, we should use YARN_CLUSTER_ID instead. {code} @@ -208,7 +211,11 @@ public void testDSShell(boolean haveDomain, String timelineVersion) if (timelineVersion.equalsIgnoreCase(v2)) { String[] timelineArgs = { --timeline_service_version, - v2 + v2, + --flow, + test_flow_id, + --flow_run, + 12345678 }; {code} Can we add a test case that without specifying flow_id and flow_run_id and v2 timeline service still can work? In my understanding, these info will still be optional for applications. So we should make sure these info is nullable in launching applications and other following flows. [Data Model] Make putEntities operation be aware of the app's context - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt
[ https://issues.apache.org/jira/browse/YARN-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377999#comment-14377999 ] Hudson commented on YARN-3393: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #142 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/142/]) YARN-3393. Getting application(s) goes wrong when app finishes before (xgong: rev 9fae455e26e0230107e1c6db58a49a5b6b296cf4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/CHANGES.txt Getting application(s) goes wrong when app finishes before starting the attempt --- Key: YARN-3393 URL: https://issues.apache.org/jira/browse/YARN-3393 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Critical Fix For: 2.7.0 Attachments: YARN-3393.1.patch When generating app report in ApplicationHistoryManagerOnTimelineStore, it checks if appAttempt == null. {code} ApplicationAttemptReport appAttempt = getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId()); if (appAttempt != null) { app.appReport.setHost(appAttempt.getHost()); app.appReport.setRpcPort(appAttempt.getRpcPort()); app.appReport.setTrackingUrl(appAttempt.getTrackingUrl()); app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl()); } {code} However, {{getApplicationAttempt}} doesn't return null but throws ApplicationAttemptNotFoundException: {code} if (entity == null) { throw new ApplicationAttemptNotFoundException( The entity for application attempt + appAttemptId + doesn't exist in the timeline store); } else { return convertToApplicationAttemptReport(entity); } {code} They code isn't coupled well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3384) TestLogAggregationService.verifyContainerLogs fails after YARN-2777
[ https://issues.apache.org/jira/browse/YARN-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378000#comment-14378000 ] Hudson commented on YARN-3384: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #142 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/142/]) YARN-3384. TestLogAggregationService.verifyContainerLogs fails after YARN-2777. Contributed by Naganarasimha G R. (ozawa: rev 82eda771e05cf2b31788ee1582551e65f1c0f9aa) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/CHANGES.txt TestLogAggregationService.verifyContainerLogs fails after YARN-2777 --- Key: YARN-3384 URL: https://issues.apache.org/jira/browse/YARN-3384 Project: Hadoop YARN Issue Type: Bug Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Minor Labels: test-fail Fix For: 2.7.0 Attachments: YARN-3384.20150321-1.patch Following test cases of TestLogAggregationService is failing : testMultipleAppsLogAggregation testLogAggregationServiceWithRetention testLogAggregationServiceWithInterval testLogAggregationServiceWithPatterns -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3365) Add support for using the 'tc' tool via container-executor
[ https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana updated YARN-3365: Attachment: YARN-3365.002.patch Re-created the patch against trunk - ensuring a change that is only in trunk isn't undone. Add support for using the 'tc' tool via container-executor -- Key: YARN-3365 URL: https://issues.apache.org/jira/browse/YARN-3365 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3365.001.patch, YARN-3365.002.patch We need the following functionality : 1) modify network interface traffic shaping rules - to be able to attach a qdisc, create child classes etc 2) read existing rules in place 3) read stats for the various classes Using tc requires elevated privileges - hence this functionality is to be made available via container-executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3395) [Fair Scheduler] Handle the user name correctly when submit application and use user name as default queue name.
[ https://issues.apache.org/jira/browse/YARN-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379034#comment-14379034 ] Hadoop QA commented on YARN-3395: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707050/YARN-3395.000.patch against trunk revision 53a28af. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7096//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7096//console This message is automatically generated. [Fair Scheduler] Handle the user name correctly when submit application and use user name as default queue name. Key: YARN-3395 URL: https://issues.apache.org/jira/browse/YARN-3395 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3395.000.patch Handle the user name correctly when submit application and use user name as default queue name. We should reject the application with an empty or whitespace only user name. because it doesn't make sense to have an empty or whitespace only user name. We should remove the trailing and leading whitespace of the user name when we use user name as default queue name, otherwise it will be rejected by InvalidQueueNameException from QueueManager. I think this change make sense, because it will be compatible with queue name convention and also we already did similar thing for '.' in user name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3214) Add non-exclusive node labels
[ https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379092#comment-14379092 ] Wangda Tan commented on YARN-3214: -- Hi [~lohit], Thanks for review the doc, for your comments: bq. If so, it would become too restrictive. Labels on nodes can be seen in multiple dimension (from app's resource, machine resource and also usecase resouce, eg backfill jobs are placed on specific set of nodes). In those cases we should have ability to have multiple labels on node Yes, now we only support one label for each node (partition). We temporarily support only one for each node is, if we have multiple labels on each node, it will hard to do resource planning (like what we did, we can say queue-A can use 40% of label-X and queue-B can use 60% of label-X). Assume a node with label-X and label-Y, and its resource is 10G, it will be hard to say the node has 10G resource of (X+Y) OR 10G resource of X and Y. This also makes preemption hard to do. A tradeoff is, if we don't plan resource share (or capacity) on node-labels, some resource could be wasted and queues can be starved when they still under their configured capacity. Multiple labels on node (we call this constraint) is in design stage, we've some thoughts about it, and will push it to community once it get a better share -- should not take too long. bq. Also, in the documents there is mention of scheduling apps without any labels being scheduled on labeled nodes if resources are idle. Does that also cover apps which could have different label other than A/B, but still have a label be placed on these nodes when there is free resources available? No, it will only try to allocate non-labeled requests to labeled nodes, if a resource request explicitly asks node label, we will only allocate corresponding labeled resource for it. Add non-exclusive node labels -- Key: YARN-3214 URL: https://issues.apache.org/jira/browse/YARN-3214 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: Non-exclusive-Node-Partition-Design.pdf Currently node labels partition the cluster to some sub-clusters so resources cannot be shared between partitioned cluster. With the current implementation of node labels we cannot use the cluster optimally and the throughput of the cluster will suffer. We are proposing adding non-exclusive node labels: 1. Labeled apps get the preference on Labeled nodes 2. If there is no ask for labeled resources we can assign those nodes to non labeled apps 3. If there is any future ask for those resources , we will preempt the non labeled apps and give them back to labeled apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized
[ https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379169#comment-14379169 ] Chengbing Liu commented on YARN-3024: - [~kasha], I created YARN-3396 to track the URISyntaxException issue. For multiple downloads per ContainerLocalizer, I found YARN-665 already created. As for the other TODO, i.e. synchronization, I don't see any need for this. I think we can safely remove this one. LocalizerRunner should give DIE action when all resources are localized --- Key: YARN-3024 URL: https://issues.apache.org/jira/browse/YARN-3024 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Fix For: 2.7.0 Attachments: YARN-3024.01.patch, YARN-3024.02.patch, YARN-3024.03.patch, YARN-3024.04.patch We have observed that {{LocalizerRunner}} always gives a LIVE action at the end of localization process. The problem is {{findNextResource()}} can return null even when {{pending}} was not empty prior to the call. This method removes localized resources from {{pending}}, therefore we should check the return value, and gives DIE action when it returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor
[ https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379195#comment-14379195 ] Hadoop QA commented on YARN-3365: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707106/YARN-3365.002.patch against trunk revision 53a28af. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7097//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7097//console This message is automatically generated. Add support for using the 'tc' tool via container-executor -- Key: YARN-3365 URL: https://issues.apache.org/jira/browse/YARN-3365 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3365.001.patch, YARN-3365.002.patch We need the following functionality : 1) modify network interface traffic shaping rules - to be able to attach a qdisc, create child classes etc 2) read existing rules in place 3) read stats for the various classes Using tc requires elevated privileges - hence this functionality is to be made available via container-executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3396) Handle URISyntaxException in ResourceLocalizationService
[ https://issues.apache.org/jira/browse/YARN-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reassigned YARN-3396: -- Assignee: Brahma Reddy Battula Handle URISyntaxException in ResourceLocalizationService Key: YARN-3396 URL: https://issues.apache.org/jira/browse/YARN-3396 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Chengbing Liu Assignee: Brahma Reddy Battula There are two occurrences of the following code snippet: {code} //TODO fail? Already translated several times... {code} It should be handled correctly in case that the resource URI is incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3334) [Event Producers] NM start to posting some app related metrics in early POC stage of phase 2.
[ https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379240#comment-14379240 ] Zhijie Shen commented on YARN-3334: --- Junping, thanks for the patch. Here's my comments: 1. Do you want change initialized to started {code} 512 // only put initialized client {code} 2. The following method seems unnecessary, because there's {{getTimelineClient(ApplicationId id)}}. {code} 499 500 public MapApplicationId, TimelineClient getTimelineClients() { 501 return this.timelineClients; 502 } 503 {code} 3. It seems there's no need to maintain rmKnownCollectors. We can blindly put the service addr into timeline client. It won't affect anything if the address is not changed? Or we can do a simple check {{client.getAddr != newServiceAddr}} to avoid trivial set. 4. IMHO, the better description is to use ContainerEntity whose ID is this container ID. {code} 441 TimelineEntity entity = new TimelineEntity(); 442 entity.setType(NMEntity.NM_CONTAINER_METRICS.toString()); 443 entity.setId(containerId.toString()); {code} 5. We need flag to control NM emitting the timeline data or not. 6. Unnecessary empty string. {code} 503 + cpuUsageTotalCoresPercentage); {code} 7. You probably want to use addTimeSeriesData to add single key/value pair. {code} 526 memoryMetric.setTimeSeries(timeSeries); {code} 8. NM needs to remove the timelineClient of a finished app. Otherwise, timelineClients will eat increasingly more resource as NM keeps running, but actually don't use it. The difficulty is how to know if an application is already finished. We need to think about it. {code} 368 private ConcurrentHashMapApplicationId, TimelineClient timelineClients = 369 new ConcurrentHashMapApplicationId, TimelineClient(); {code} [Event Producers] NM start to posting some app related metrics in early POC stage of phase 2. - Key: YARN-3334 URL: https://issues.apache.org/jira/browse/YARN-3334 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: YARN-2928 Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor
[ https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379247#comment-14379247 ] Sidharta Seethana commented on YARN-3365: - Summary of changes included in patch : Additional tests, fixes, cleanup of TestLinuxContainerExecutor ( by [~vvasudev] ) container-executor - changes to support superuser execution of ‘tc’ in batch mode ( by [~sidharta-s] ) container-executor - refactored main.c to make it easier to read/maintain ( by [~sidharta-s] ) Add support for using the 'tc' tool via container-executor -- Key: YARN-3365 URL: https://issues.apache.org/jira/browse/YARN-3365 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3365.001.patch, YARN-3365.002.patch We need the following functionality : 1) modify network interface traffic shaping rules - to be able to attach a qdisc, create child classes etc 2) read existing rules in place 3) read stats for the various classes Using tc requires elevated privileges - hence this functionality is to be made available via container-executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379031#comment-14379031 ] Zhijie Shen commented on YARN-3047: --- [~varun_saxena], any luck to take a look at the latest comments? Thanks! - Zhijie [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.02.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled
[ https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2740: - Summary: ResourceManager side should properly handle node label modifications when distributed node label configuration enabled (was: RM AdminService should prevent admin change labels on nodes when distributed node label configuration enabled) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled -- Key: YARN-2740 URL: https://issues.apache.org/jira/browse/YARN-2740 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch According to YARN-2495, labels of nodes will be specified when NM do heartbeat. We shouldn't allow admin modify labels on nodes when distributed node label configuration enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled
[ https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2740: - Description: According to YARN-2495, when distributed node label configuration is enabled: - RMAdmin / REST API should reject change labels on node operations. - RMNodeLabelsManager shouldn't persistent labels on nodes when NM do heartbeat. was:According to YARN-2495, labels of nodes will be specified when NM do heartbeat. We shouldn't allow admin modify labels on nodes when distributed node label configuration enabled. ResourceManager side should properly handle node label modifications when distributed node label configuration enabled -- Key: YARN-2740 URL: https://issues.apache.org/jira/browse/YARN-2740 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch According to YARN-2495, when distributed node label configuration is enabled: - RMAdmin / REST API should reject change labels on node operations. - RMNodeLabelsManager shouldn't persistent labels on nodes when NM do heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3214) Add non-exclusive node labels
[ https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379141#comment-14379141 ] Vinod Kumar Vavilapalli commented on YARN-3214: --- May be we should start calling out partitions and attributes/constraints (when we have a JIRA) everywhere for clarity. Add non-exclusive node labels -- Key: YARN-3214 URL: https://issues.apache.org/jira/browse/YARN-3214 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: Non-exclusive-Node-Partition-Design.pdf Currently node labels partition the cluster to some sub-clusters so resources cannot be shared between partitioned cluster. With the current implementation of node labels we cannot use the cluster optimally and the throughput of the cluster will suffer. We are proposing adding non-exclusive node labels: 1. Labeled apps get the preference on Labeled nodes 2. If there is no ask for labeled resources we can assign those nodes to non labeled apps 3. If there is any future ask for those resources , we will preempt the non labeled apps and give them back to labeled apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3396) Handle URISyntaxException in ResourceLocalizationService
[ https://issues.apache.org/jira/browse/YARN-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379186#comment-14379186 ] Brahma Reddy Battula commented on YARN-3396: {quote}There are two occurrences of the following code snippet:{quote} Actually there are three occurences ,line num : 951,974, 1014 {code} } catch (URISyntaxException e) { // TODO fail? Already translated several times... } {code} Handle URISyntaxException in ResourceLocalizationService Key: YARN-3396 URL: https://issues.apache.org/jira/browse/YARN-3396 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Chengbing Liu Assignee: Brahma Reddy Battula There are two occurrences of the following code snippet: {code} //TODO fail? Already translated several times... {code} It should be handled correctly in case that the resource URI is incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3396) Handle URISyntaxException in ResourceLocalizationService
[ https://issues.apache.org/jira/browse/YARN-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379191#comment-14379191 ] Chengbing Liu commented on YARN-3396: - Can you check if you are using the latest code? Handle URISyntaxException in ResourceLocalizationService Key: YARN-3396 URL: https://issues.apache.org/jira/browse/YARN-3396 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Chengbing Liu Assignee: Brahma Reddy Battula There are two occurrences of the following code snippet: {code} //TODO fail? Already translated several times... {code} It should be handled correctly in case that the resource URI is incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3396) Handle URISyntaxException in ResourceLocalizationService
Chengbing Liu created YARN-3396: --- Summary: Handle URISyntaxException in ResourceLocalizationService Key: YARN-3396 URL: https://issues.apache.org/jira/browse/YARN-3396 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Chengbing Liu There are two occurrences of the following code snippet: {code} //TODO fail? Already translated several times... {code} It should be handled correctly in case that the resource URI is incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379218#comment-14379218 ] Varun Saxena commented on YARN-3047: None actually. Was mistaken. TimelineEvents is required because we will continue with three of the v1 APIs', one of which requires TimelineEvents. [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.02.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379249#comment-14379249 ] Yongjun Zhang commented on YARN-3021: - Restarted my VM (the same one on which I reported the trace stack in my last update), and rerun the failed test TestCapacitySchedulerNodeLabelUpdate, and it is successful. There is some flakiness with this test but not related to this jira. YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Assignee: Yongjun Zhang Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, YARN-3021.006.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3330) Implement a protobuf compatibility checker to check if a patch breaks the compatibility with existing client and internal protocols
[ https://issues.apache.org/jira/browse/YARN-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3330: Attachment: pdiff_patch.py Update the script to handle the case on file creation and removals. Reorganize the code a little bit to dispatch state transition functions dynamically. Implement a protobuf compatibility checker to check if a patch breaks the compatibility with existing client and internal protocols --- Key: YARN-3330 URL: https://issues.apache.org/jira/browse/YARN-3330 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Attachments: pdiff_patch.py, pdiff_patch.py Per YARN-3292, we may want to start YARN rolling upgrade test compatibility verification tool by a simple script to check protobuf compatibility. The script may work on incoming patch files, check if there are any changes to protobuf files, and report any potentially incompatible changes (line removals, etc,.). We may want the tool to be conservative: it may report false positives, but we should minimize its chance to have false negatives. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379225#comment-14379225 ] Varun Saxena commented on YARN-3047: Ok.. Will upload a patch. I meant for reader, you think we can use the same config as v1. Anyways I am continuing with separate config for reader as of now. Let me know if you have a difference in opinion owing to ease in migration. [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.02.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3047: --- Attachment: YARN-3047.04.patch [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.02.patch, YARN-3047.04.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3214) Add non-exclusive node labels
[ https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379309#comment-14379309 ] Wangda Tan commented on YARN-3214: -- Hi [~lohit], Problems of multiple labels on a same node and at most one label on each node are quite different: At most one label on each node makes a cluster becomes several disjoint sub-clusters, all scheduling algorithms (no matter if using capacity/fair/fifo) can just simply run on the sub-cluster. If you want to divide resource for queues on labels (as example above, queue-A can use 40% of label-X and queue-B can use 60% of label-X) when we support multiple labels (say X and Y) on a same node (say node1), sub-clusters will become overlapping, that makes scheduling very hard: When qA can access X and qB can access Y, how much resource of node1 you plan to allocate to qA/qB? A more complex example is, node1 has X,Y; node2 has X only, node3 has X,Z. This is a very tough problem and as far as I know (please let me know if I missed anything), there's no platform perfectly solved this. So this is why separating partition vs. attribute/constraints becomes important. Partition is a way to divide cluster, each sub-cluster has similar properties (like how to share to queues) to a general cluster resource setting, that will be useful when a set of nodes contributed and shared to only a subset of queues of the entire cluster. Attribute is a just way to allocate container, a simple way to improve attribute/constraint is FCFS (first come first serve), no quota will be assigned to each attribute. Mesos is different here, it doesn't do anything for node attributes in scheduling side, all node attributes will be directly passed to framework side directly, and framework will decide if accept or reject offer according to its node attributes, it will not take care of how to balance framework shares on each attributes. Add non-exclusive node labels -- Key: YARN-3214 URL: https://issues.apache.org/jira/browse/YARN-3214 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: Non-exclusive-Node-Partition-Design.pdf Currently node labels partition the cluster to some sub-clusters so resources cannot be shared between partitioned cluster. With the current implementation of node labels we cannot use the cluster optimally and the throughput of the cluster will suffer. We are proposing adding non-exclusive node labels: 1. Labeled apps get the preference on Labeled nodes 2. If there is no ask for labeled resources we can assign those nodes to non labeled apps 3. If there is any future ask for those resources , we will preempt the non labeled apps and give them back to labeled apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3395) [Fair Scheduler] Handle the user name correctly when submit application and use user name as default queue name.
[ https://issues.apache.org/jira/browse/YARN-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3395: --- Component/s: (was: scheduler) fairscheduler [Fair Scheduler] Handle the user name correctly when submit application and use user name as default queue name. Key: YARN-3395 URL: https://issues.apache.org/jira/browse/YARN-3395 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3395.000.patch Handle the user name correctly when submit application and use user name as default queue name. We should reject the application with an empty or whitespace only user name. because it doesn't make sense to have an empty or whitespace only user name. We should remove the trailing and leading whitespace of the user name when we use user name as default queue name, otherwise it will be rejected by InvalidQueueNameException from QueueManager. I think this change make sense, because it will be compatible with queue name convention and also we already did similar thing for '.' in user name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2213) Change proxy-user cookie log in AmIpFilter to DEBUG
[ https://issues.apache.org/jira/browse/YARN-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2213: --- Attachment: YARN-2213.02.patch Change proxy-user cookie log in AmIpFilter to DEBUG --- Key: YARN-2213 URL: https://issues.apache.org/jira/browse/YARN-2213 Project: Hadoop YARN Issue Type: Task Reporter: Ted Yu Assignee: Varun Saxena Priority: Minor Attachments: YARN-2213.001.patch, YARN-2213.02.patch I saw a lot of the following lines in AppMaster log: {code} 14/06/24 17:12:36 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set {code} For long running app, this would consume considerable log space. Log level should be changed to DEBUG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379264#comment-14379264 ] Hadoop QA commented on YARN-3047: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707130/YARN-3047.04.patch against trunk revision 53a28af. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7098//console This message is automatically generated. [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.02.patch, YARN-3047.04.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3214) Add non-exclusive node labels
[ https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379263#comment-14379263 ] Lohit Vijayarenu commented on YARN-3214: Thanks [~wangda] for reply. I feel partitions and constraints as two separate entities will cause more confusion. If allocation is challenge (as you described in example for multiple labels), then it is something which should be solved in scheduler, no? This is same problem one would have even without labels. For a given node which advertises 10G of memory, and app/queue with X and Y, how would you divide resource among X and Y? PS: Mesos Scheduler for example uses term called constraints which is similar to labels. In that sense I agree with [~vinodkv] that we should probably call this feature as partition or something related? Add non-exclusive node labels -- Key: YARN-3214 URL: https://issues.apache.org/jira/browse/YARN-3214 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: Non-exclusive-Node-Partition-Design.pdf Currently node labels partition the cluster to some sub-clusters so resources cannot be shared between partitioned cluster. With the current implementation of node labels we cannot use the cluster optimally and the throughput of the cluster will suffer. We are proposing adding non-exclusive node labels: 1. Labeled apps get the preference on Labeled nodes 2. If there is no ask for labeled resources we can assign those nodes to non labeled apps 3. If there is any future ask for those resources , we will preempt the non labeled apps and give them back to labeled apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2213) Change proxy-user cookie log in AmIpFilter to DEBUG
[ https://issues.apache.org/jira/browse/YARN-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379292#comment-14379292 ] Hadoop QA commented on YARN-2213: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707135/YARN-2213.02.patch against trunk revision 53a28af. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7099//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7099//console This message is automatically generated. Change proxy-user cookie log in AmIpFilter to DEBUG --- Key: YARN-2213 URL: https://issues.apache.org/jira/browse/YARN-2213 Project: Hadoop YARN Issue Type: Task Reporter: Ted Yu Assignee: Varun Saxena Priority: Minor Attachments: YARN-2213.001.patch, YARN-2213.02.patch I saw a lot of the following lines in AppMaster log: {code} 14/06/24 17:12:36 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set {code} For long running app, this would consume considerable log space. Log level should be changed to DEBUG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3396) Handle URISyntaxException in ResourceLocalizationService
[ https://issues.apache.org/jira/browse/YARN-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379345#comment-14379345 ] Brahma Reddy Battula commented on YARN-3396: I referred 2.6 code,, Yes, it's there Only two places..will upload patch soon.. Handle URISyntaxException in ResourceLocalizationService Key: YARN-3396 URL: https://issues.apache.org/jira/browse/YARN-3396 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Chengbing Liu Assignee: Brahma Reddy Battula There are two occurrences of the following code snippet: {code} //TODO fail? Already translated several times... {code} It should be handled correctly in case that the resource URI is incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI
[ https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378200#comment-14378200 ] Wangda Tan commented on YARN-3362: -- bq. If this is the case then the approach which you specified makes sense but by can you mean currently its not there and in future it can come in ? Some of them are already existed, like user-limit, and some of them are coming, like am-resource-percent. Sorry I may not understand what's your question, user-limit and queue-limit are just two different limits regardless of node labels, sometimes user-limit higher and sometimes queue-limit higher. Could you explain what's your question (maybe by example)? Thanks, Add node label usage in RM CapacityScheduler web UI --- Key: YARN-3362 URL: https://issues.apache.org/jira/browse/YARN-3362 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager, webapp Reporter: Wangda Tan Assignee: Naganarasimha G R We don't have node label usage in RM CapacityScheduler web UI now, without this, user will be hard to understand what happened to nodes have labels assign to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3392) Change NodeManager metrics to not populate resource usage metrics if they are unavailable
[ https://issues.apache.org/jira/browse/YARN-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3392: Attachment: YARN-3392.prelim.patch Demonstrates how returning negative value to show unavailable usage helps in tracking usage metrics correctly Change NodeManager metrics to not populate resource usage metrics if they are unavailable -- Key: YARN-3392 URL: https://issues.apache.org/jira/browse/YARN-3392 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3392.prelim.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3383) AdminService should use warn instead of info to log exception when operation fails
[ https://issues.apache.org/jira/browse/YARN-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378242#comment-14378242 ] Hudson commented on YARN-3383: -- FAILURE: Integrated in Hadoop-trunk-Commit #7420 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7420/]) YARN-3383. AdminService should use warn instead of info to log exception when operation fails. (Li Lu via wangda) (wangda: rev 97a7277a2d696474b5c8e2d814c8291d4bde246e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * hadoop-yarn-project/CHANGES.txt AdminService should use warn instead of info to log exception when operation fails -- Key: YARN-3383 URL: https://issues.apache.org/jira/browse/YARN-3383 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Wangda Tan Assignee: Li Lu Fix For: 2.8.0 Attachments: YARN-3383-032015.patch, YARN-3383-032315.patch Now it uses info: {code} private YarnException logAndWrapException(IOException ioe, String user, String argName, String msg) throws YarnException { LOG.info(Exception + msg, ioe); {code} But it should use warn instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378303#comment-14378303 ] Anubhav Dhoot commented on YARN-3304: - I have a patch available now for YARN-3392 that shows how returning -1 would help implement it. If we take the other approach it would be good to validate that its still possible by ensuring those changes are done in this jira. Specifically in this example, we should add the boolean options here to ensure we can still do YARN-3392. We can then compare the two approaches to see which one is better. ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters Key: YARN-3304 URL: https://issues.apache.org/jira/browse/YARN-3304 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Karthik Kambatla Priority: Blocker Attachments: YARN-3304-v2.patch, YARN-3304.patch Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for unavailable case while other resource metrics are return 0 in the same case which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378310#comment-14378310 ] Hadoop QA commented on YARN-3136: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706936/0009-YARN-3136.patch against trunk revision 6413d34. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7091//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7091//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7091//console This message is automatically generated. getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, 0006-YARN-3136.patch, 0007-YARN-3136.patch, 0008-YARN-3136.patch, 0009-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0009-YARN-3136.patch Hi [~jlowe] and [~jianhe] I used ConcurrentMap for 'applications'. But findbugs warnings are coming for non-synchronized access on this map. Hope that is acceptable, pls share your opinion. getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, 0006-YARN-3136.patch, 0007-YARN-3136.patch, 0008-YARN-3136.patch, 0009-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3127) Apphistory url crashes when RM switches with ATS enabled
[ https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378134#comment-14378134 ] Tsuyoshi Ozawa commented on YARN-3127: -- [~Naganarasimha] Thank you for taking this issue! The policy of fix looks good to me. Could you add a test case to TestRMRestart to cover the case? Also, can we preserve following test cases? {code} -verify(writer).applicationStarted(any(RMApp.class)); -verify(publisher).appCreated(any(RMApp.class), anyLong()); {code} Apphistory url crashes when RM switches with ATS enabled Key: YARN-3127 URL: https://issues.apache.org/jira/browse/YARN-3127 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.6.0 Environment: RM HA with ATS Reporter: Bibin A Chundatt Assignee: Naganarasimha G R Attachments: YARN-3127.20150213-1.patch 1.Start RM with HA and ATS configured and run some yarn applications 2.Once applications are finished sucessfully start timeline server 3.Now failover HA form active to standby 4.Access timeline server URL IP:PORT/applicationhistory Result: Application history URL fails with below info {quote} 2015-02-03 20:28:09,511 ERROR org.apache.hadoop.yarn.webapp.View: Failed to read the applications. java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643) at org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:80) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) ... Caused by: org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: The entity for application attempt appattempt_1422972608379_0001_01 doesn't exist in the timeline store at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplicationAttempt(ApplicationHistoryManagerOnTimelineStore.java:151) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.generateApplicationReport(ApplicationHistoryManagerOnTimelineStore.java:499) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAllApplications(ApplicationHistoryManagerOnTimelineStore.java:108) at org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:84) at org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:81) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) ... 51 more 2015-02-03 20:28:09,512 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /applicationhistory org.apache.hadoop.yarn.webapp.WebAppException: Error rendering block: nestLevel=6 expected 5 at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77) {quote} Behaviour with AHS with file based history store -Apphistory url is working -No attempt entries are shown for each application. Based on inital analysis when RM switches ,application attempts from state store are not replayed but only applications are. So when /applicaitonhistory url is accessed it tries for all attempt id and fails -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378203#comment-14378203 ] Wangda Tan commented on YARN-2495: -- Thanks for update. Patch LGTM, +1. will wait and commit in a few days if there's no opposite opinions. Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state
[ https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3212: - Attachment: YARN-3212-v3.patch Update patch to address review comments above, include: - Properly handling in case of node (in decommissioning) reconnection with a different port. - Some refactor work, include: merge StatusUpdateWhenHealthyTransition and StatusUpdateWhenDecommissioningTransition together. RMNode State Transition Update with DECOMMISSIONING state - Key: YARN-3212 URL: https://issues.apache.org/jira/browse/YARN-3212 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Junping Du Assignee: Junping Du Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, YARN-3212-v2.patch, YARN-3212-v3.patch As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and can transition from “running” state triggered by a new event - “decommissioning”. This new state can be transit to state of “decommissioned” when Resource_Update if no running apps on this NM or NM reconnect after restart. Or it received DECOMMISSIONED event (after timeout from CLI). In addition, it can back to “running” if user decides to cancel previous decommission by calling recommission on the same node. The reaction to other events is similar to RUNNING state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378354#comment-14378354 ] Karthik Kambatla commented on YARN-3304: Thought a little more about this. If we choose to go with returning 0 and adding boolean methods for availability, I would like to see how the corresponding user code will look like compared to returning -1. Do we expect the code to be the following? If so, how do we handle the usage being available at the time of calling isAvailable, and not being available at the time of calling getUsage? To avoid this issue, we could get the usage on the availability call and cache it, and the getUsage call would return this cached value? But, requiring the availability call now is an even more incompatible change, no? {code} ResourceTrackerProcessTee procTree = new (); if (procTree.isMemoryUsageAvailable()) { procTree.getMemoryUsage(); } {code} And, how is the above user code snippet different from the one below: {code} ResourceTrackerProcessTee procTree = new (); procTree.getMemoryUsage(); {code} What is the cost of breaking compat of this previously Private API? I have a feeling it would be worth not making the API super-complicated. I want to avoid fixing this in a hurry just to unblock the release. I am willing to prioritize this, chat offline if need be, and solve it the right way. If we think that is too slow, we could always revert YARN-3296 for 2.7. ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters Key: YARN-3304 URL: https://issues.apache.org/jira/browse/YARN-3304 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Karthik Kambatla Priority: Blocker Attachments: YARN-3304-v2.patch, YARN-3304.patch Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for unavailable case while other resource metrics are return 0 in the same case which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state
[ https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378378#comment-14378378 ] Hadoop QA commented on YARN-3212: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706943/YARN-3212-v3.patch against trunk revision 51f1f49. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7092//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7092//console This message is automatically generated. RMNode State Transition Update with DECOMMISSIONING state - Key: YARN-3212 URL: https://issues.apache.org/jira/browse/YARN-3212 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Junping Du Assignee: Junping Du Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, YARN-3212-v2.patch, YARN-3212-v3.patch As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and can transition from “running” state triggered by a new event - “decommissioning”. This new state can be transit to state of “decommissioned” when Resource_Update if no running apps on this NM or NM reconnect after restart. Or it received DECOMMISSIONED event (after timeout from CLI). In addition, it can back to “running” if user decides to cancel previous decommission by calling recommission on the same node. The reaction to other events is similar to RUNNING state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378364#comment-14378364 ] Zhijie Shen commented on YARN-3034: --- bq. so that ATS V1 and V2 are less coupled and removal of SMP once completely deprecated is smoother Exactly. The last patch looks good to me. [Collector wireup] Implement RM starting its timeline collector --- Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3024.20150324-1.patch, YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, YARN-3034.20150320-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container
[ https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378391#comment-14378391 ] Bartosz Ługowski commented on YARN-1621: Thanks [~Naganarasimha]. Done all, apart of: {quote} * May be we can leverage the benifit of passing the states to AHS too, this will reduce the transfer of data from AHS to the client. ur opinion ? * If we are incorporating the above point then i feel only only when appNotFoundInRM we need to query for all states from AHS if not querying for COMPLETE state would be sufficient. {quote} Correct me if I'm wrong, but AHS has only COMPLETE containers, so we need to query AHS only if states filter is empty(ALL) or contains COMPLETE state. {quote} * No test cases for modification of GetContainersRequestPBImpl/GetContainersRequestProto {quote} There are already tests for this in: org.apache.hadoop.yarn.api.TestPBImplRecords#testGetContainersRequestPBImpl ? {quote} * there are some test case failures and findbugs issues reported can you take a look at it {quote} Not related with this patch. Add CLI to list rows of task attempt ID, container ID, host of container, state of container -- Key: YARN-1621 URL: https://issues.apache.org/jira/browse/YARN-1621 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Tassapol Athiapinya Assignee: Bartosz Ługowski Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch, YARN-1621.4.patch, YARN-1621.5.patch, YARN-1621.6.patch As more applications are moved to YARN, we need generic CLI to list rows of task attempt ID, container ID, host of container, state of container. Today if YARN application running in a container does hang, there is no way to find out more info because a user does not know where each attempt is running in. For each running application, it is useful to differentiate between running/succeeded/failed/killed containers. {code:title=proposed yarn cli} $ yarn application -list-containers -applicationId appId [-containerState state of container] where containerState is optional filter to list container in given state only. container state can be running/succeeded/killed/failed/all. A user can specify more than one container state at once e.g. KILLED,FAILED. task attempt ID container ID host of container state of container {code} CLI should work with running application/completed application. If a container runs many task attempts, all attempts should be shown. That will likely be the case of Tez container-reuse application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container
[ https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bartosz Ługowski updated YARN-1621: --- Attachment: YARN-1621.6.patch Add CLI to list rows of task attempt ID, container ID, host of container, state of container -- Key: YARN-1621 URL: https://issues.apache.org/jira/browse/YARN-1621 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Tassapol Athiapinya Assignee: Bartosz Ługowski Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch, YARN-1621.4.patch, YARN-1621.5.patch, YARN-1621.6.patch As more applications are moved to YARN, we need generic CLI to list rows of task attempt ID, container ID, host of container, state of container. Today if YARN application running in a container does hang, there is no way to find out more info because a user does not know where each attempt is running in. For each running application, it is useful to differentiate between running/succeeded/failed/killed containers. {code:title=proposed yarn cli} $ yarn application -list-containers -applicationId appId [-containerState state of container] where containerState is optional filter to list container in given state only. container state can be running/succeeded/killed/failed/all. A user can specify more than one container state at once e.g. KILLED,FAILED. task attempt ID container ID host of container state of container {code} CLI should work with running application/completed application. If a container runs many task attempts, all attempts should be shown. That will likely be the case of Tez container-reuse application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377356#comment-14377356 ] Hadoop QA commented on YARN-2495: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706826/YARN-2495.20150324-1.patch against trunk revision 9fae455. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7088//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7088//console This message is automatically generated. Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA
[ https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377382#comment-14377382 ] Hudson commented on YARN-1880: -- FAILURE: Integrated in Hadoop-trunk-Commit #7413 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7413/]) YARN-1880. Cleanup TestApplicationClientProtocolOnHA. Contributed by ozawa. (harsh: rev fbceb3b41834d6899c4353fb24f12ba3ecf67faf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationClientProtocolOnHA.java * hadoop-yarn-project/CHANGES.txt Cleanup TestApplicationClientProtocolOnHA - Key: YARN-1880 URL: https://issues.apache.org/jira/browse/YARN-1880 Project: Hadoop YARN Issue Type: Test Components: test Affects Versions: 2.6.0 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Priority: Trivial Fix For: 2.8.0 Attachments: YARN-1880.1.patch The tests introduced on YARN-1521 includes multiple assertion with . We should separate them because it's difficult to identify which condition is illegal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377372#comment-14377372 ] Devaraj K commented on YARN-3225: - {code:xml} org.apache.hadoop.yarn.server.resourcemanager.TestRM {code} This test failure is not related to the patch. New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Devaraj K Attachments: YARN-3225-1.patch, YARN-3225.patch, YARN-914.patch New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA
[ https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377378#comment-14377378 ] Harsh J commented on YARN-1880: --- +1, this still applies. Committing shortly, thanks [~ozawa] (and [~ajisakaa] for the earlier review)! Cleanup TestApplicationClientProtocolOnHA - Key: YARN-1880 URL: https://issues.apache.org/jira/browse/YARN-1880 Project: Hadoop YARN Issue Type: Test Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Priority: Trivial Attachments: YARN-1880.1.patch The tests introduced on YARN-1521 includes multiple assertion with . We should separate them because it's difficult to identify which condition is illegal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA
[ https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-1880: -- Component/s: test Cleanup TestApplicationClientProtocolOnHA - Key: YARN-1880 URL: https://issues.apache.org/jira/browse/YARN-1880 Project: Hadoop YARN Issue Type: Test Components: test Affects Versions: 2.6.0 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Priority: Trivial Fix For: 2.8.0 Attachments: YARN-1880.1.patch The tests introduced on YARN-1521 includes multiple assertion with . We should separate them because it's difficult to identify which condition is illegal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA
[ https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-1880: -- Affects Version/s: 2.6.0 Cleanup TestApplicationClientProtocolOnHA - Key: YARN-1880 URL: https://issues.apache.org/jira/browse/YARN-1880 Project: Hadoop YARN Issue Type: Test Components: test Affects Versions: 2.6.0 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Priority: Trivial Fix For: 2.8.0 Attachments: YARN-1880.1.patch The tests introduced on YARN-1521 includes multiple assertion with . We should separate them because it's difficult to identify which condition is illegal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA
[ https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377403#comment-14377403 ] Tsuyoshi Ozawa commented on YARN-1880: -- [~qwertymaniac] [~ajisakaa] thank you for the review! Cleanup TestApplicationClientProtocolOnHA - Key: YARN-1880 URL: https://issues.apache.org/jira/browse/YARN-1880 Project: Hadoop YARN Issue Type: Test Components: test Affects Versions: 2.6.0 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Priority: Trivial Fix For: 2.8.0 Attachments: YARN-1880.1.patch The tests introduced on YARN-1521 includes multiple assertion with . We should separate them because it's difficult to identify which condition is illegal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt
[ https://issues.apache.org/jira/browse/YARN-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377950#comment-14377950 ] Hudson commented on YARN-3393: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #133 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/133/]) YARN-3393. Getting application(s) goes wrong when app finishes before (xgong: rev 9fae455e26e0230107e1c6db58a49a5b6b296cf4) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java Getting application(s) goes wrong when app finishes before starting the attempt --- Key: YARN-3393 URL: https://issues.apache.org/jira/browse/YARN-3393 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Critical Fix For: 2.7.0 Attachments: YARN-3393.1.patch When generating app report in ApplicationHistoryManagerOnTimelineStore, it checks if appAttempt == null. {code} ApplicationAttemptReport appAttempt = getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId()); if (appAttempt != null) { app.appReport.setHost(appAttempt.getHost()); app.appReport.setRpcPort(appAttempt.getRpcPort()); app.appReport.setTrackingUrl(appAttempt.getTrackingUrl()); app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl()); } {code} However, {{getApplicationAttempt}} doesn't return null but throws ApplicationAttemptNotFoundException: {code} if (entity == null) { throw new ApplicationAttemptNotFoundException( The entity for application attempt + appAttemptId + doesn't exist in the timeline store); } else { return convertToApplicationAttemptReport(entity); } {code} They code isn't coupled well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) FairScheduler: Metric for latency to allocate first container for an application
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377948#comment-14377948 ] Hudson commented on YARN-2868: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #133 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/133/]) YARN-2868. FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java FairScheduler: Metric for latency to allocate first container for an application Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Fix For: 2.8.0 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch, YARN-2868.012.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA
[ https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377955#comment-14377955 ] Hudson commented on YARN-1880: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #133 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/133/]) YARN-1880. Cleanup TestApplicationClientProtocolOnHA. Contributed by ozawa. (harsh: rev fbceb3b41834d6899c4353fb24f12ba3ecf67faf) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationClientProtocolOnHA.java Cleanup TestApplicationClientProtocolOnHA - Key: YARN-1880 URL: https://issues.apache.org/jira/browse/YARN-1880 Project: Hadoop YARN Issue Type: Test Components: test Affects Versions: 2.6.0 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Priority: Trivial Fix For: 2.8.0 Attachments: YARN-1880.1.patch The tests introduced on YARN-1521 includes multiple assertion with . We should separate them because it's difficult to identify which condition is illegal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377952#comment-14377952 ] Hudson commented on YARN-3336: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #133 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/133/]) YARN-3336. FileSystem memory leak in DelegationTokenRenewer. (cnauroth: rev 6ca1f12024fd7cec7b01df0f039ca59f3f365dc1) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Fix For: 2.7.0 Attachments: YARN-3336.000.patch, YARN-3336.001.patch, YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. The calling sequence is FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf) {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } {code} UserGroupInformation.equals will compare subject by reference. {code} public boolean equals(Object o) { if (o == this) { return true; } else if (o == null || getClass() != o.getClass()) { return false; } else { return subject == ((UserGroupInformation) o).subject; } } {code} So in this case, every time createProxyUser and FileSystem.get(getConfig()) are called, a new FileSystem will be created and a new entry will be added to FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)