[jira] [Updated] (YARN-1418) Add Tracing to YARN
[ https://issues.apache.org/jira/browse/YARN-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-1418: - Assignee: Masatake Iwasaki (was: Yi Liu) > Add Tracing to YARN > --- > > Key: YARN-1418 > URL: https://issues.apache.org/jira/browse/YARN-1418 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, nodemanager, resourcemanager >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki > > Adding tracing using HTrace in the same way as HBASE-6449 and HDFS-5274. > The most part of changes needed for basis such as RPC seems to be almost > ready in HDFS-5274. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-3055: - Assignee: Daryn Sharp (was: Yi Liu) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer -- Key: YARN-3055 URL: https://issues.apache.org/jira/browse/YARN-3055 Project: Hadoop YARN Issue Type: Bug Components: security Reporter: Yi Liu Assignee: Daryn Sharp Priority: Blocker Attachments: YARN-3055.001.patch, YARN-3055.002.patch, YARN-3055.patch After YARN-2964, there is only one timer to renew the token if it's shared by jobs. In {{removeApplicationFromRenewal}}, when going to remove a token, and the token is shared by other jobs, we will not cancel the token. Meanwhile, we should not cancel the _timerTask_, also we should not remove it from {{allTokens}}. Otherwise for the existing submitted applications which share this token will not get renew any more, and for new submitted applications which share this token, the token will be renew immediately. For example, we have 3 applications: app1, app2, app3. And they share the token1. See following scenario: *1).* app1 is submitted firstly, then app2, and then app3. In this case, there is only one token renewal timer for token1, and is scheduled when app1 is submitted *2).* app1 is finished, then the renewal timer is cancelled. token1 will not be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2467) Add SpanReceiverHost to YARN daemons
[ https://issues.apache.org/jira/browse/YARN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335779#comment-14335779 ] Yi Liu commented on YARN-2467: -- [~iwasakims], I assign the JIRA to you, and feel free to work on it. Add SpanReceiverHost to YARN daemons - Key: YARN-2467 URL: https://issues.apache.org/jira/browse/YARN-2467 Project: Hadoop YARN Issue Type: Sub-task Components: api, nodemanager, resourcemanager Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2467) Add SpanReceiverHost to YARN daemons
[ https://issues.apache.org/jira/browse/YARN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-2467: - Assignee: Masatake Iwasaki (was: Yi Liu) Add SpanReceiverHost to YARN daemons - Key: YARN-2467 URL: https://issues.apache.org/jira/browse/YARN-2467 Project: Hadoop YARN Issue Type: Sub-task Components: api, nodemanager, resourcemanager Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3070) TestRMAdminCLI#testHelp fails for transitionToActive command
[ https://issues.apache.org/jira/browse/YARN-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14282046#comment-14282046 ] Yi Liu commented on YARN-3070: -- +1, other test failures are not related. TestRMAdminCLI#testHelp fails for transitionToActive command Key: YARN-3070 URL: https://issues.apache.org/jira/browse/YARN-3070 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Junping Du Priority: Minor Attachments: YARN-3070-v2.patch, YARN-3070.patch {code} testError(new String[] { -help, -transitionToActive }, Usage: yarn rmadmin [-transitionToActive serviceId + [--forceactive]], dataErr, 0); {code} fails with: {code} java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.client.cli.TestRMAdminCLI.testError(TestRMAdminCLI.java:547) at org.apache.hadoop.yarn.client.cli.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:335) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-1418) Add Tracing to YARN
[ https://issues.apache.org/jira/browse/YARN-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu reassigned YARN-1418: Assignee: Yi Liu Add Tracing to YARN --- Key: YARN-1418 URL: https://issues.apache.org/jira/browse/YARN-1418 Project: Hadoop YARN Issue Type: Improvement Components: api, nodemanager, resourcemanager Reporter: Masatake Iwasaki Assignee: Yi Liu Adding tracing using HTrace in the same way as HBASE-6449 and HDFS-5274. The most part of changes needed for basis such as RPC seems to be almost ready in HDFS-5274. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2467) Add SpanReceiverHost to YARN daemons
[ https://issues.apache.org/jira/browse/YARN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu reassigned YARN-2467: Assignee: Yi Liu Add SpanReceiverHost to YARN daemons - Key: YARN-2467 URL: https://issues.apache.org/jira/browse/YARN-2467 Project: Hadoop YARN Issue Type: Sub-task Components: api, nodemanager, resourcemanager Reporter: Masatake Iwasaki Assignee: Yi Liu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276275#comment-14276275 ] Yi Liu commented on YARN-3055: -- Is it possible the launcher job finishes firstly, but sub-jobs are still running? If so, the issue exists. If not, then the issue is invalid. The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer -- Key: YARN-3055 URL: https://issues.apache.org/jira/browse/YARN-3055 Project: Hadoop YARN Issue Type: Bug Components: security Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-3055.001.patch, YARN-3055.002.patch After YARN-2964, there is only one timer to renew the token if it's shared by jobs. In {{removeApplicationFromRenewal}}, when going to remove a token, and the token is shared by other jobs, we will not cancel the token. Meanwhile, we should not cancel the _timerTask_, also we should not remove it from {{allTokens}}. Otherwise for the existing submitted applications which share this token will not get renew any more, and for new submitted applications which share this token, the token will be renew immediately. For example, we have 3 applications: app1, app2, app3. And they share the token1. See following scenario: *1).* app1 is submitted firstly, then app2, and then app3. In this case, there is only one token renewal timer for token1, and is scheduled when app1 is submitted *2).* app1 is finished, then the renewal timer is cancelled. token1 will not be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3058) Fix error msg of tokens activation delay configuration
Yi Liu created YARN-3058: Summary: Fix error msg of tokens activation delay configuration Key: YARN-3058 URL: https://issues.apache.org/jira/browse/YARN-3058 Project: Hadoop YARN Issue Type: Bug Reporter: Yi Liu Assignee: Yi Liu Priority: Minor {code} this.rollingInterval = conf.getLong( YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS, YarnConfiguration.DEFAULT_RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS) * 1000; ... this.activationDelay = (long) (conf.getLong(YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS, YarnConfiguration.DEFAULT_RM_NM_EXPIRY_INTERVAL_MS) * 1.5); ... if (rollingInterval = activationDelay * 2) { throw new IllegalArgumentException( YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS + should be more than 2 X + YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS); } {code} The error msg should be {code} YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS + should be more than 3 X + YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS); {code} Also It's {{3 X}} instead of {{2 X}}, since it's multiplied by *1.5*. There are few other places having same issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3058) Fix error msg of tokens activation delay configuration
[ https://issues.apache.org/jira/browse/YARN-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-3058: - Attachment: YARN-3058.001.patch Fix error msg of tokens activation delay configuration -- Key: YARN-3058 URL: https://issues.apache.org/jira/browse/YARN-3058 Project: Hadoop YARN Issue Type: Bug Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Attachments: YARN-3058.001.patch {code} this.rollingInterval = conf.getLong( YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS, YarnConfiguration.DEFAULT_RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS) * 1000; ... this.activationDelay = (long) (conf.getLong(YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS, YarnConfiguration.DEFAULT_RM_NM_EXPIRY_INTERVAL_MS) * 1.5); ... if (rollingInterval = activationDelay * 2) { throw new IllegalArgumentException( YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS + should be more than 2 X + YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS); } {code} The error msg should be {code} YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS + should be more than 3 X + YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS); {code} Also It's {{3 X}} instead of {{2 X}}, since it's multiplied by *1.5*. There are few other places having same issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3055) Fix allTokens issue in DelegationTokenRenewer
Yi Liu created YARN-3055: Summary: Fix allTokens issue in DelegationTokenRenewer Key: YARN-3055 URL: https://issues.apache.org/jira/browse/YARN-3055 Project: Hadoop YARN Issue Type: Bug Components: security Reporter: Yi Liu Assignee: Yi Liu In {{removeApplicationFromRenewal}}, when going to remove a token, and the token is shared by other jobs, we will not cancel the token. Meanwhile, we should not cancel the _timerTask_, also we should not remove it from {{allTokens}}. Otherwise for the existing submitted applications which share this token will not get renew any more, and for new submitted applications which share this token, the token will be renew immediately. For example, we have 3 applications: app1, app2, app3. And they share the token1. See following scenario: *1).* app1 is submitted firstly, then app2, and then app3. In this case, there is only one token renewal timer for token1, and is scheduled when app1 is submitted *2).* app1 is finished, then the renewal timer is cancelled. token1 will not be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3055) Fix allTokens issue in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-3055: - Attachment: YARN-3055.001.patch [~jianhe], [~kasha] and [~jlowe], can you help to take a look? Fix allTokens issue in DelegationTokenRenewer - Key: YARN-3055 URL: https://issues.apache.org/jira/browse/YARN-3055 Project: Hadoop YARN Issue Type: Bug Components: security Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-3055.001.patch In {{removeApplicationFromRenewal}}, when going to remove a token, and the token is shared by other jobs, we will not cancel the token. Meanwhile, we should not cancel the _timerTask_, also we should not remove it from {{allTokens}}. Otherwise for the existing submitted applications which share this token will not get renew any more, and for new submitted applications which share this token, the token will be renew immediately. For example, we have 3 applications: app1, app2, app3. And they share the token1. See following scenario: *1).* app1 is submitted firstly, then app2, and then app3. In this case, there is only one token renewal timer for token1, and is scheduled when app1 is submitted *2).* app1 is finished, then the renewal timer is cancelled. token1 will not be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3055) Fix allTokens issue in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-3055: - Attachment: YARN-3055.002.patch Fix allTokens issue in DelegationTokenRenewer - Key: YARN-3055 URL: https://issues.apache.org/jira/browse/YARN-3055 Project: Hadoop YARN Issue Type: Bug Components: security Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-3055.001.patch, YARN-3055.002.patch In {{removeApplicationFromRenewal}}, when going to remove a token, and the token is shared by other jobs, we will not cancel the token. Meanwhile, we should not cancel the _timerTask_, also we should not remove it from {{allTokens}}. Otherwise for the existing submitted applications which share this token will not get renew any more, and for new submitted applications which share this token, the token will be renew immediately. For example, we have 3 applications: app1, app2, app3. And they share the token1. See following scenario: *1).* app1 is submitted firstly, then app2, and then app3. In this case, there is only one token renewal timer for token1, and is scheduled when app1 is submitted *2).* app1 is finished, then the renewal timer is cancelled. token1 will not be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3055) Fix allTokens issue in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-3055: - Attachment: YARN-3055.002.patch Fix allTokens issue in DelegationTokenRenewer - Key: YARN-3055 URL: https://issues.apache.org/jira/browse/YARN-3055 Project: Hadoop YARN Issue Type: Bug Components: security Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-3055.001.patch, YARN-3055.002.patch In {{removeApplicationFromRenewal}}, when going to remove a token, and the token is shared by other jobs, we will not cancel the token. Meanwhile, we should not cancel the _timerTask_, also we should not remove it from {{allTokens}}. Otherwise for the existing submitted applications which share this token will not get renew any more, and for new submitted applications which share this token, the token will be renew immediately. For example, we have 3 applications: app1, app2, app3. And they share the token1. See following scenario: *1).* app1 is submitted firstly, then app2, and then app3. In this case, there is only one token renewal timer for token1, and is scheduled when app1 is submitted *2).* app1 is finished, then the renewal timer is cancelled. token1 will not be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3055) Fix allTokens issue in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-3055: - Attachment: (was: YARN-3055.002.patch) Fix allTokens issue in DelegationTokenRenewer - Key: YARN-3055 URL: https://issues.apache.org/jira/browse/YARN-3055 Project: Hadoop YARN Issue Type: Bug Components: security Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-3055.001.patch, YARN-3055.002.patch In {{removeApplicationFromRenewal}}, when going to remove a token, and the token is shared by other jobs, we will not cancel the token. Meanwhile, we should not cancel the _timerTask_, also we should not remove it from {{allTokens}}. Otherwise for the existing submitted applications which share this token will not get renew any more, and for new submitted applications which share this token, the token will be renew immediately. For example, we have 3 applications: app1, app2, app3. And they share the token1. See following scenario: *1).* app1 is submitted firstly, then app2, and then app3. In this case, there is only one token renewal timer for token1, and is scheduled when app1 is submitted *2).* app1 is finished, then the renewal timer is cancelled. token1 will not be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3055) Fix allTokens issue in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275172#comment-14275172 ] Yi Liu commented on YARN-3055: -- Upload a new patch. Fix allTokens issue in DelegationTokenRenewer - Key: YARN-3055 URL: https://issues.apache.org/jira/browse/YARN-3055 Project: Hadoop YARN Issue Type: Bug Components: security Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-3055.001.patch, YARN-3055.002.patch In {{removeApplicationFromRenewal}}, when going to remove a token, and the token is shared by other jobs, we will not cancel the token. Meanwhile, we should not cancel the _timerTask_, also we should not remove it from {{allTokens}}. Otherwise for the existing submitted applications which share this token will not get renew any more, and for new submitted applications which share this token, the token will be renew immediately. For example, we have 3 applications: app1, app2, app3. And they share the token1. See following scenario: *1).* app1 is submitted firstly, then app2, and then app3. In this case, there is only one token renewal timer for token1, and is scheduled when app1 is submitted *2).* app1 is finished, then the renewal timer is cancelled. token1 will not be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3055) Fix allTokens issue in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275115#comment-14275115 ] Yi Liu commented on YARN-3055: -- The token is still not be renewed, will update the patch later Fix allTokens issue in DelegationTokenRenewer - Key: YARN-3055 URL: https://issues.apache.org/jira/browse/YARN-3055 Project: Hadoop YARN Issue Type: Bug Components: security Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-3055.001.patch In {{removeApplicationFromRenewal}}, when going to remove a token, and the token is shared by other jobs, we will not cancel the token. Meanwhile, we should not cancel the _timerTask_, also we should not remove it from {{allTokens}}. Otherwise for the existing submitted applications which share this token will not get renew any more, and for new submitted applications which share this token, the token will be renew immediately. For example, we have 3 applications: app1, app2, app3. And they share the token1. See following scenario: *1).* app1 is submitted firstly, then app2, and then app3. In this case, there is only one token renewal timer for token1, and is scheduled when app1 is submitted *2).* app1 is finished, then the renewal timer is cancelled. token1 will not be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-3055: - Summary: The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer (was: Fix allTokens issue in DelegationTokenRenewer) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer -- Key: YARN-3055 URL: https://issues.apache.org/jira/browse/YARN-3055 Project: Hadoop YARN Issue Type: Bug Components: security Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-3055.001.patch, YARN-3055.002.patch In {{removeApplicationFromRenewal}}, when going to remove a token, and the token is shared by other jobs, we will not cancel the token. Meanwhile, we should not cancel the _timerTask_, also we should not remove it from {{allTokens}}. Otherwise for the existing submitted applications which share this token will not get renew any more, and for new submitted applications which share this token, the token will be renew immediately. For example, we have 3 applications: app1, app2, app3. And they share the token1. See following scenario: *1).* app1 is submitted firstly, then app2, and then app3. In this case, there is only one token renewal timer for token1, and is scheduled when app1 is submitted *2).* app1 is finished, then the renewal timer is cancelled. token1 will not be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275183#comment-14275183 ] Yi Liu commented on YARN-2964: -- It seems this JIRA will cause the token is not renewed properly if it's shared by jobs (oozie), I filed a JIRA YARN-3055, please take a look. RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Fix For: 2.7.0 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-3055: - Description: After YARN-2964, there is only one timer to renew the token if it's shared by jobs. In {{removeApplicationFromRenewal}}, when going to remove a token, and the token is shared by other jobs, we will not cancel the token. Meanwhile, we should not cancel the _timerTask_, also we should not remove it from {{allTokens}}. Otherwise for the existing submitted applications which share this token will not get renew any more, and for new submitted applications which share this token, the token will be renew immediately. For example, we have 3 applications: app1, app2, app3. And they share the token1. See following scenario: *1).* app1 is submitted firstly, then app2, and then app3. In this case, there is only one token renewal timer for token1, and is scheduled when app1 is submitted *2).* app1 is finished, then the renewal timer is cancelled. token1 will not be renewed any more, but app2 and app3 still use it, so there is problem. was: In {{removeApplicationFromRenewal}}, when going to remove a token, and the token is shared by other jobs, we will not cancel the token. Meanwhile, we should not cancel the _timerTask_, also we should not remove it from {{allTokens}}. Otherwise for the existing submitted applications which share this token will not get renew any more, and for new submitted applications which share this token, the token will be renew immediately. For example, we have 3 applications: app1, app2, app3. And they share the token1. See following scenario: *1).* app1 is submitted firstly, then app2, and then app3. In this case, there is only one token renewal timer for token1, and is scheduled when app1 is submitted *2).* app1 is finished, then the renewal timer is cancelled. token1 will not be renewed any more, but app2 and app3 still use it, so there is problem. The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer -- Key: YARN-3055 URL: https://issues.apache.org/jira/browse/YARN-3055 Project: Hadoop YARN Issue Type: Bug Components: security Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-3055.001.patch, YARN-3055.002.patch After YARN-2964, there is only one timer to renew the token if it's shared by jobs. In {{removeApplicationFromRenewal}}, when going to remove a token, and the token is shared by other jobs, we will not cancel the token. Meanwhile, we should not cancel the _timerTask_, also we should not remove it from {{allTokens}}. Otherwise for the existing submitted applications which share this token will not get renew any more, and for new submitted applications which share this token, the token will be renew immediately. For example, we have 3 applications: app1, app2, app3. And they share the token1. See following scenario: *1).* app1 is submitted firstly, then app2, and then app3. In this case, there is only one token renewal timer for token1, and is scheduled when app1 is submitted *2).* app1 is finished, then the renewal timer is cancelled. token1 will not be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2996) Refine fs operations in FileSystemRMStateStore and few fixes
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270274#comment-14270274 ] Yi Liu commented on YARN-2996: -- Thanks [~zjshen] for review and commit. Refine fs operations in FileSystemRMStateStore and few fixes Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Fix For: 2.7.0 Attachments: YARN-2996.001.patch, YARN-2996.002.patch, YARN-2996.003.patch, YARN-2996.004.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-2996: - Attachment: YARN-2996.004.patch Good idea Zhijie, update the patch. Refine some fs operations in FileSystemRMStateStore to improve performance -- Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch, YARN-2996.002.patch, YARN-2996.003.patch, YARN-2996.004.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268587#comment-14268587 ] Yi Liu commented on YARN-3010: -- Thanks [~jianhe] and [~rohithsharma] Fix recent findbug issue in AbstractYarnScheduler - Key: YARN-3010 URL: https://issues.apache.org/jira/browse/YARN-3010 Project: Hadoop YARN Issue Type: Bug Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Fix For: 2.7.0 Attachments: YARN-3010.001.patch, YARN-3010.002.patch A new findbug issues reported recently in latest trunk: {quote} ISInconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2996) Refine fs operations in FileSystemRMStateStore and few fixes
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-2996: - Summary: Refine fs operations in FileSystemRMStateStore and few fixes (was: Refine some fs operations in FileSystemRMStateStore to improve performance) Refine fs operations in FileSystemRMStateStore and few fixes Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch, YARN-2996.002.patch, YARN-2996.003.patch, YARN-2996.004.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265835#comment-14265835 ] Yi Liu commented on YARN-2996: -- Test failure and findbugs are not related. Refine some fs operations in FileSystemRMStateStore to improve performance -- Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch, YARN-2996.002.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-3010: - Description: A new findbug issues reported recently in latest trunk: {quote} IS Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html was: A new findbug issues reported recently in latest trunk: {quote} IS Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Fix recent findbug issue in AbstractYarnScheduler - Key: YARN-3010 URL: https://issues.apache.org/jira/browse/YARN-3010 Project: Hadoop YARN Issue Type: Bug Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Attachments: YARN-3010.001.patch A new findbug issues reported recently in latest trunk: {quote} ISInconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-3010: - Attachment: YARN-3010.002.patch Update patch Fix recent findbug issue in AbstractYarnScheduler - Key: YARN-3010 URL: https://issues.apache.org/jira/browse/YARN-3010 Project: Hadoop YARN Issue Type: Bug Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Attachments: YARN-3010.001.patch, YARN-3010.002.patch A new findbug issues reported recently in latest trunk: {quote} ISInconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267086#comment-14267086 ] Yi Liu commented on YARN-2637: -- {quote} Findbugs was the result of changing the ratio of sync to unsync accesses which hit the findbugs limits, but not the pattern itself, which looks fine, so added fb exclusion. {quote} Not exactly, in FairScheduler, it's a real issue, we need *synchronized* for _resolveReservationQueueName_. Already have a JIRA YARN-3010 to fix the findbugs... maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267065#comment-14267065 ] Yi Liu commented on YARN-2996: -- Yes, Zhijie {quote} Good catch! It seems that MemoryRMStateStore#storeOrUpdateAMRMTokenSecretManagerState needs to be fixed too. {quote} {{.002}} patch already includes this fix. Refine some fs operations in FileSystemRMStateStore to improve performance -- Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch, YARN-2996.002.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-2996: - Attachment: YARN-2996.003.patch OK, I see, update the patch. Thanks Zhijie. Refine some fs operations in FileSystemRMStateStore to improve performance -- Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch, YARN-2996.002.patch, YARN-2996.003.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-3010: - Description: A new findbug issues reported recently: {quote} IS Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html was:A new findbug issues reported recently: https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Fix recent findbug issue in AbstractYarnScheduler - Key: YARN-3010 URL: https://issues.apache.org/jira/browse/YARN-3010 Project: Hadoop YARN Issue Type: Bug Reporter: Yi Liu Assignee: Yi Liu Priority: Minor A new findbug issues reported recently: {quote} ISInconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-3010: - Attachment: YARN-3010.001.patch Fix recent findbug issue in AbstractYarnScheduler - Key: YARN-3010 URL: https://issues.apache.org/jira/browse/YARN-3010 Project: Hadoop YARN Issue Type: Bug Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Attachments: YARN-3010.001.patch A new findbug issues reported recently in latest trunk: {quote} ISInconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler
Yi Liu created YARN-3010: Summary: Fix recent findbug issue in AbstractYarnScheduler Key: YARN-3010 URL: https://issues.apache.org/jira/browse/YARN-3010 Project: Hadoop YARN Issue Type: Bug Reporter: Yi Liu Assignee: Yi Liu Priority: Minor A new findbug issues reported recently: https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-3010: - Description: A new findbug issues reported recently in latest trunk: {quote} IS Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html was: A new findbug issues reported recently: {quote} IS Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Fix recent findbug issue in AbstractYarnScheduler - Key: YARN-3010 URL: https://issues.apache.org/jira/browse/YARN-3010 Project: Hadoop YARN Issue Type: Bug Reporter: Yi Liu Assignee: Yi Liu Priority: Minor A new findbug issues reported recently in latest trunk: {quote} ISInconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3004) Fix missed synchronization in MemoryRMStateStore
Yi Liu created YARN-3004: Summary: Fix missed synchronization in MemoryRMStateStore Key: YARN-3004 URL: https://issues.apache.org/jira/browse/YARN-3004 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu In {{MemoryRMStateStore}}, obviously {{state}} variable should be thread-safe, so we need to add _synchronized_ for {code} storeApplicationStateInternal updateApplicationStateInternal storeOrUpdateAMRMTokenSecretManagerState {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3004) Fix missed synchronization in MemoryRMStateStore
[ https://issues.apache.org/jira/browse/YARN-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-3004: - Attachment: YARN-3004.001.patch Fix missed synchronization in MemoryRMStateStore Key: YARN-3004 URL: https://issues.apache.org/jira/browse/YARN-3004 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-3004.001.patch In {{MemoryRMStateStore}}, obviously {{state}} variable should be thread-safe, so we need to add _synchronized_ for {code} storeApplicationStateInternal updateApplicationStateInternal storeOrUpdateAMRMTokenSecretManagerState {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265707#comment-14265707 ] Yi Liu commented on YARN-2996: -- Thanks [~zjshen] for review. You are right, for *.new* and *.tmp* file, the existing code uses them for some check. But actually the incompatible issue you mentioned is really rare and it's not a big issue. {{checkAndResumeUpdateOperation}} exists because we write state to *.tmp* file, then rename it to *.new* file, and finally rename to _output\_file_. If we remove step of renaming to *.new* file, we can remove this function too. Anyway, I will revert this modification. So in the new patch, I only keep the #1 described in description. I add two new fixes in the new patch: *1.* we missed *synchronized* for {{updateRMDelegationTokenState}} *2.* Add fix of YARN-3004 to this patch, since {{MemoryRMStateStore}} is only used in test and we can fix them in this patch too. Refine some fs operations in FileSystemRMStateStore to improve performance -- Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-2996: - Attachment: YARN-2996.002.patch Refine some fs operations in FileSystemRMStateStore to improve performance -- Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch, YARN-2996.002.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263743#comment-14263743 ] Yi Liu commented on YARN-2996: -- The 3 tests failures are not related. Refine some fs operations in FileSystemRMStateStore to improve performance -- Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
Yi Liu created YARN-2996: Summary: Refine some fs operations in FileSystemRMStateStore to improve performance Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-2996: - Attachment: YARN-2996.001.patch Refine some fs operations in FileSystemRMStateStore to improve performance -- Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService
[ https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-2993: - Fix Version/s: 2.7.0 Several fixes (missing acl check, error log msg ...) and some refinement in AdminService Key: YARN-2993 URL: https://issues.apache.org/jira/browse/YARN-2993 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Fix For: 2.7.0 Attachments: YARN-2993.001.patch This JIRA is to resolve following issues in {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}: *1.* There is no ACLs check for {{refreshServiceAcls}} *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can not refresh Admin ACLs. instead of ... Can not refresh user-groups. *3.* some unnecessary header import. *4.* {code} if (!isRMActive()) { RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, ResourceManager is not active. Can not remove labels.); throwStandbyException(); } {code} is common in lots of methods, just the message is different, we should refine it into one common method. *5.* {code} LOG.info(Exception remove labels, ioe); RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, Exception remove label); throw RPCUtil.getRemoteException(ioe); {code} is common in lots of methods, just the message is different, we should refine it into one common method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService
[ https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259776#comment-14259776 ] Yi Liu commented on YARN-2993: -- Thanks [~djp] for the review and commit. Several fixes (missing acl check, error log msg ...) and some refinement in AdminService Key: YARN-2993 URL: https://issues.apache.org/jira/browse/YARN-2993 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Fix For: 2.7.0 Attachments: YARN-2993.001.patch This JIRA is to resolve following issues in {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}: *1.* There is no ACLs check for {{refreshServiceAcls}} *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can not refresh Admin ACLs. instead of ... Can not refresh user-groups. *3.* some unnecessary header import. *4.* {code} if (!isRMActive()) { RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, ResourceManager is not active. Can not remove labels.); throwStandbyException(); } {code} is common in lots of methods, just the message is different, we should refine it into one common method. *5.* {code} LOG.info(Exception remove labels, ioe); RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, Exception remove label); throw RPCUtil.getRemoteException(ioe); {code} is common in lots of methods, just the message is different, we should refine it into one common method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService
[ https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-2993: - Summary: Several fixes (missing acl check, error log msg ...) and some refinement in AdminService (was: Several fixes (missing acl check, error log ...) and some refinement in AdminService) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService Key: YARN-2993 URL: https://issues.apache.org/jira/browse/YARN-2993 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu This JIRA is to resolve following issues in {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}: *1.* There is no ACLs check for {{refreshServiceAcls}} *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can not refresh Admin ACLs. instead of ... Can not refresh user-groups. *3.* some unnecessary header import. *4.* {code} if (!isRMActive()) { RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, ResourceManager is not active. Can not remove labels.); throwStandbyException(); } {code} is common in lots of methods, just the message is different, we should refine it into one common method. *5.* {code} LOG.info(Exception remove labels, ioe); RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, Exception remove label); throw RPCUtil.getRemoteException(ioe); {code} is common in lots of methods, just the message is different, we should refine it into one common method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2993) Several fixes (missing acl check, error log ...) and some refinement in AdminService
Yi Liu created YARN-2993: Summary: Several fixes (missing acl check, error log ...) and some refinement in AdminService Key: YARN-2993 URL: https://issues.apache.org/jira/browse/YARN-2993 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu This JIRA is to resolve following issues in {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}: *1.* There is no ACLs check for {{refreshServiceAcls}} *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can not refresh Admin ACLs. instead of ... Can not refresh user-groups. *3.* some unnecessary header import. *4.* {code} if (!isRMActive()) { RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, ResourceManager is not active. Can not remove labels.); throwStandbyException(); } {code} is common in lots of methods, just the message is different, we should refine it into one common method. *5.* {code} LOG.info(Exception remove labels, ioe); RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, Exception remove label); throw RPCUtil.getRemoteException(ioe); {code} is common in lots of methods, just the message is different, we should refine it into one common method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService
[ https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-2993: - Attachment: YARN-2993.001.patch Attach the patch to resolve those issues. Several fixes (missing acl check, error log msg ...) and some refinement in AdminService Key: YARN-2993 URL: https://issues.apache.org/jira/browse/YARN-2993 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2993.001.patch This JIRA is to resolve following issues in {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}: *1.* There is no ACLs check for {{refreshServiceAcls}} *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can not refresh Admin ACLs. instead of ... Can not refresh user-groups. *3.* some unnecessary header import. *4.* {code} if (!isRMActive()) { RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, ResourceManager is not active. Can not remove labels.); throwStandbyException(); } {code} is common in lots of methods, just the message is different, we should refine it into one common method. *5.* {code} LOG.info(Exception remove labels, ioe); RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, Exception remove label); throw RPCUtil.getRemoteException(ioe); {code} is common in lots of methods, just the message is different, we should refine it into one common method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService
[ https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258929#comment-14258929 ] Yi Liu commented on YARN-2993: -- Findbugs failure is *not related*, also test failure is *not related* (https://issues.apache.org/jira/browse/YARN-2991). The patch is direct, no need test case. Several fixes (missing acl check, error log msg ...) and some refinement in AdminService Key: YARN-2993 URL: https://issues.apache.org/jira/browse/YARN-2993 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2993.001.patch This JIRA is to resolve following issues in {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}: *1.* There is no ACLs check for {{refreshServiceAcls}} *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can not refresh Admin ACLs. instead of ... Can not refresh user-groups. *3.* some unnecessary header import. *4.* {code} if (!isRMActive()) { RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, ResourceManager is not active. Can not remove labels.); throwStandbyException(); } {code} is common in lots of methods, just the message is different, we should refine it into one common method. *5.* {code} LOG.info(Exception remove labels, ioe); RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, Exception remove label); throw RPCUtil.getRemoteException(ioe); {code} is common in lots of methods, just the message is different, we should refine it into one common method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2667) Fix the release audit warning caused by hadoop-yarn-registry
[ https://issues.apache.org/jira/browse/YARN-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170291#comment-14170291 ] Yi Liu commented on YARN-2667: -- Thanks [~jlowe] for review and commit. Fix the release audit warning caused by hadoop-yarn-registry Key: YARN-2667 URL: https://issues.apache.org/jira/browse/YARN-2667 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Fix For: 2.6.0 Attachments: YARN-2667.001.patch ? /home/jenkins/jenkins-slave/workspace/PreCommit-HADOOP-Build/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/resources/.keep Lines that start with ? in the release audit report indicate files that do not have an Apache license header. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2667) Fix the release auit warning caused by hadoop-yarn-registry
Yi Liu created YARN-2667: Summary: Fix the release auit warning caused by hadoop-yarn-registry Key: YARN-2667 URL: https://issues.apache.org/jira/browse/YARN-2667 Project: Hadoop YARN Issue Type: Bug Reporter: Yi Liu Priority: Minor ? /home/jenkins/jenkins-slave/workspace/PreCommit-HADOOP-Build/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/resources/.keep Lines that start with ? in the release audit report indicate files that do not have an Apache license header. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2667) Fix the release auit warning caused by hadoop-yarn-registry
[ https://issues.apache.org/jira/browse/YARN-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-2667: - Attachment: YARN-2667.001.patch Fix the release auit warning caused by hadoop-yarn-registry --- Key: YARN-2667 URL: https://issues.apache.org/jira/browse/YARN-2667 Project: Hadoop YARN Issue Type: Bug Reporter: Yi Liu Priority: Minor Attachments: YARN-2667.001.patch ? /home/jenkins/jenkins-slave/workspace/PreCommit-HADOOP-Build/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/resources/.keep Lines that start with ? in the release audit report indicate files that do not have an Apache license header. -- This message was sent by Atlassian JIRA (v6.3.4#6332)