[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045105#comment-16045105 ] Arpit Agarwal commented on HDFS-11907: -- +1 for the v6 patch. Thanks Chen. [~andrew.wang] do you have any comments on the latest patch? > NameNodeResourceChecker should avoid calling df.getAvailable too frequently > --- > > Key: HDFS-11907 > URL: https://issues.apache.org/jira/browse/HDFS-11907 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, > HDFS-11907.003.patch, HDFS-11907.004.patch, HDFS-11907.005.patch, > HDFS-11907.006.patch > > > Currently, {{HealthMonitor#doHealthChecks}} invokes > {{NameNode#monitorHealth}} which ends up invoking > {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per > second by default. And NameNodeResourceChecker#isResourceAvailable invokes > {{df.getAvailable();}} every time it is called. > Since available space information should rarely be changing dramatically at > the pace of per second. A cached value should be sufficient. i.e. only try to > get the updated value when the cached value is too old. otherwise simply > return the cached value. This way df.getAvailable() gets invoked less. > Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043450#comment-16043450 ] Andrew Wang commented on HDFS-11907: I would appreciate some diligence investigating the root cause of the problem. I don't think we should commit a behavior change when there isn't a degree of confidence that it'll solve the stated problem. Based on the discussion, we (Kihwal, myself, Arpit) expect {{df}} to be a cheap call. Given that, there is not a degree of confidence that this change will speedup health checks. I'd like to hear at least a theory as to why {{df}} would be slow in a spurious way before we add complexity. I'm not going to burn bridges over this patch if someone *really* wants to commit it, but as I said before, I would much rather see a patch to add new metrics or instrumentation to root cause the problem. > NameNodeResourceChecker should avoid calling df.getAvailable too frequently > --- > > Key: HDFS-11907 > URL: https://issues.apache.org/jira/browse/HDFS-11907 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, > HDFS-11907.003.patch, HDFS-11907.004.patch > > > Currently, {{HealthMonitor#doHealthChecks}} invokes > {{NameNode#monitorHealth}} which ends up invoking > {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per > second by default. And NameNodeResourceChecker#isResourceAvailable invokes > {{df.getAvailable();}} every time it is called. > Since available space information should rarely be changing dramatically at > the pace of per second. A cached value should be sufficient. i.e. only try to > get the updated value when the cached value is too old. otherwise simply > return the cached value. This way df.getAvailable() gets invoked less. > Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043410#comment-16043410 ] Chen Liang commented on HDFS-11907: --- Thanks [~arpitagarwal] and [~andrew.wang] for the follow-up. Are you -1 on making the df interval configurable Andrew? > NameNodeResourceChecker should avoid calling df.getAvailable too frequently > --- > > Key: HDFS-11907 > URL: https://issues.apache.org/jira/browse/HDFS-11907 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, > HDFS-11907.003.patch, HDFS-11907.004.patch > > > Currently, {{HealthMonitor#doHealthChecks}} invokes > {{NameNode#monitorHealth}} which ends up invoking > {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per > second by default. And NameNodeResourceChecker#isResourceAvailable invokes > {{df.getAvailable();}} every time it is called. > Since available space information should rarely be changing dramatically at > the pace of per second. A cached value should be sufficient. i.e. only try to > get the updated value when the cached value is too old. otherwise simply > return the cached value. This way df.getAvailable() gets invoked less. > Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043365#comment-16043365 ] Andrew Wang commented on HDFS-11907: I'm overall ambivalent to the {{df}} frequency, but I'd rather not add complexity if it's not necessary. I'd like to confirm the problem is the frequency of the {{df}} call before we commit a behavior change. Repeating my previous comment, are there additional metrics or logs we can add to help debug this for next time? I definitely support that. > NameNodeResourceChecker should avoid calling df.getAvailable too frequently > --- > > Key: HDFS-11907 > URL: https://issues.apache.org/jira/browse/HDFS-11907 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, > HDFS-11907.003.patch, HDFS-11907.004.patch > > > Currently, {{HealthMonitor#doHealthChecks}} invokes > {{NameNode#monitorHealth}} which ends up invoking > {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per > second by default. And NameNodeResourceChecker#isResourceAvailable invokes > {{df.getAvailable();}} every time it is called. > Since available space information should rarely be changing dramatically at > the pace of per second. A cached value should be sufficient. i.e. only try to > get the updated value when the cached value is too old. otherwise simply > return the cached value. This way df.getAvailable() gets invoked less. > Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043335#comment-16043335 ] Arpit Agarwal commented on HDFS-11907: -- Hi [~andrew.wang], you are right that it's expected to be a cheap call, but calling it once per second per volume seems excessive. Do you see any benefit to querying {{df}} once per second? We can make the caching interval configurable and leave the default at 1 second if you prefer. This is not the same as changing the health check interval as Chen mentioned. Keeping the health check interval at 1 second lets us detect process failure faster and we don't want to change that. Also the v4 patch has a couple of issues I missed earlier. # availableSpace and availableSpaceTimeStamp should be members of checkedVolume. # The test case failure in TestNameNodeResourceChecker needs to be addressed. An easy fix is to check all volumes instead of trying to query a specific one. > NameNodeResourceChecker should avoid calling df.getAvailable too frequently > --- > > Key: HDFS-11907 > URL: https://issues.apache.org/jira/browse/HDFS-11907 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, > HDFS-11907.003.patch, HDFS-11907.004.patch > > > Currently, {{HealthMonitor#doHealthChecks}} invokes > {{NameNode#monitorHealth}} which ends up invoking > {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per > second by default. And NameNodeResourceChecker#isResourceAvailable invokes > {{df.getAvailable();}} every time it is called. > Since available space information should rarely be changing dramatically at > the pace of per second. A cached value should be sufficient. i.e. only try to > get the updated value when the cached value is too old. otherwise simply > return the cached value. This way df.getAvailable() gets invoked less. > Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039589#comment-16039589 ] Andrew Wang commented on HDFS-11907: Hi, sorry for the slow reply, was out on Friday and Monday, If we haven't confirmed the problem, I'd support adding additional logs or metrics for better debugging, but making behavior changes seems premature. Is there a metric for the {{df}} call that we can look at to confirm slowness? Other host-level statistics that we can check? > NameNodeResourceChecker should avoid calling df.getAvailable too frequently > --- > > Key: HDFS-11907 > URL: https://issues.apache.org/jira/browse/HDFS-11907 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, > HDFS-11907.003.patch, HDFS-11907.004.patch > > > Currently, {{HealthMonitor#doHealthChecks}} invokes > {{NameNode#monitorHealth}} which ends up invoking > {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per > second by default. And NameNodeResourceChecker#isResourceAvailable invokes > {{df.getAvailable();}} every time it is called. > Since available space information should rarely be changing dramatically at > the pace of per second. A cached value should be sufficient. i.e. only try to > get the updated value when the cached value is too old. otherwise simply > return the cached value. This way df.getAvailable() gets invoked less. > Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039567#comment-16039567 ] Arpit Agarwal commented on HDFS-11907: -- Hi [~vagarychen], I just noticed that TestNameNodeResourceChecker failed in the precommit run. Can you please take a look? > NameNodeResourceChecker should avoid calling df.getAvailable too frequently > --- > > Key: HDFS-11907 > URL: https://issues.apache.org/jira/browse/HDFS-11907 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, > HDFS-11907.003.patch, HDFS-11907.004.patch > > > Currently, {{HealthMonitor#doHealthChecks}} invokes > {{NameNode#monitorHealth}} which ends up invoking > {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per > second by default. And NameNodeResourceChecker#isResourceAvailable invokes > {{df.getAvailable();}} every time it is called. > Since available space information should rarely be changing dramatically at > the pace of per second. A cached value should be sufficient. i.e. only try to > get the updated value when the cached value is too old. otherwise simply > return the cached value. This way df.getAvailable() gets invoked less. > Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037506#comment-16037506 ] Arpit Agarwal commented on HDFS-11907: -- Hi [~andrew.wang], if you have no objections I will commit Chen's v4 patch by EOD today. > NameNodeResourceChecker should avoid calling df.getAvailable too frequently > --- > > Key: HDFS-11907 > URL: https://issues.apache.org/jira/browse/HDFS-11907 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, > HDFS-11907.003.patch, HDFS-11907.004.patch > > > Currently, {{HealthMonitor#doHealthChecks}} invokes > {{NameNode#monitorHealth}} which ends up invoking > {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per > second by default. And NameNodeResourceChecker#isResourceAvailable invokes > {{df.getAvailable();}} every time it is called. > Since available space information should rarely be changing dramatically at > the pace of per second. A cached value should be sufficient. i.e. only try to > get the updated value when the cached value is too old. otherwise simply > return the cached value. This way df.getAvailable() gets invoked less. > Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035112#comment-16035112 ] Chen Liang commented on HDFS-11907: --- Thanks [~andrew.wang] for the reply! df does seem to be a fairly cheap operation in general, but we've seen cases where we suspect it was this call being slow under certain conditions, which we are still doing analysis. About changing monitorHealth check interval, since we still want ZKFC process to try to contact NameNode frequently enough to detect failures ASAP, we probably don't want to lower the frequency from caller's side. > NameNodeResourceChecker should avoid calling df.getAvailable too frequently > --- > > Key: HDFS-11907 > URL: https://issues.apache.org/jira/browse/HDFS-11907 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, > HDFS-11907.003.patch, HDFS-11907.004.patch > > > Currently, {{HealthMonitor#doHealthChecks}} invokes > {{NameNode#monitorHealth}} which ends up invoking > {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per > second by default. And NameNodeResourceChecker#isResourceAvailable invokes > {{df.getAvailable();}} every time it is called. > Since available space information should rarely be changing dramatically at > the pace of per second. A cached value should be sufficient. i.e. only try to > get the updated value when the cached value is too old. otherwise simply > return the cached value. This way df.getAvailable() gets invoked less. > Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034021#comment-16034021 ] Andrew Wang commented on HDFS-11907: Thanks for the reply [~vagarychen]. Just thought I'd mention the other class in case there was potential for code sharing. Could you expand a little on why caching is necessary here? As Kihwal said, df is normally pretty cheap, so I'm curious why we need to do this. We could also get possibly the same outcome by increasing the monitorHealth check interval from 1s to 5s. > NameNodeResourceChecker should avoid calling df.getAvailable too frequently > --- > > Key: HDFS-11907 > URL: https://issues.apache.org/jira/browse/HDFS-11907 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, > HDFS-11907.003.patch, HDFS-11907.004.patch > > > Currently, {{HealthMonitor#doHealthChecks}} invokes > {{NameNode#monitorHealth}} which ends up invoking > {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per > second by default. And NameNodeResourceChecker#isResourceAvailable invokes > {{df.getAvailable();}} every time it is called. > Since available space information should rarely be changing dramatically at > the pace of per second. A cached value should be sufficient. i.e. only try to > get the updated value when the cached value is too old. otherwise simply > return the cached value. This way df.getAvailable() gets invoked less. > Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034006#comment-16034006 ] Hadoop QA commented on HDFS-11907: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 37s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 5 new + 111 unchanged - 0 fixed = 116 total (was 111) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 37s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}123m 29s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork | | | hadoop.hdfs.server.namenode.TestNameNodeResourceChecker | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HDFS-11907 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12870883/HDFS-11907.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 9229a80e5966 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7101477 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/19733/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/19733/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/19733/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/19733/console | | Powered by |
[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033968#comment-16033968 ] Arpit Agarwal commented on HDFS-11907: -- I am +1 on the v4 patch. I will hold off committing, pending Andrew's response. > NameNodeResourceChecker should avoid calling df.getAvailable too frequently > --- > > Key: HDFS-11907 > URL: https://issues.apache.org/jira/browse/HDFS-11907 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, > HDFS-11907.003.patch, HDFS-11907.004.patch > > > Currently, {{HealthMonitor#doHealthChecks}} invokes > {{NameNode#monitorHealth}} which ends up invoking > {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per > second by default. And NameNodeResourceChecker#isResourceAvailable invokes > {{df.getAvailable();}} every time it is called. > Since available space information should rarely be changing dramatically at > the pace of per second. A cached value should be sufficient. i.e. only try to > get the updated value when the cached value is too old. otherwise simply > return the cached value. This way df.getAvailable() gets invoked less. > Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033960#comment-16033960 ] Chen Liang commented on HDFS-11907: --- Thanks [~andrew.wang] for the comments! We prefer not to use it here though because: 1. the change of this JIRA is about maintaining *available space* value, while DFCachingGetSpaceUsed is to get *used space*. So we will have to make further modification to this class (or create new) if we want to use it. 2. seems that each instance of this class will use an extra background thread that periodically updates the value, which seems a bit overkill to me. But if you do think it is better to use DFCachingGetSpaceUsed, I will try to update with another patch. > NameNodeResourceChecker should avoid calling df.getAvailable too frequently > --- > > Key: HDFS-11907 > URL: https://issues.apache.org/jira/browse/HDFS-11907 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, > HDFS-11907.003.patch, HDFS-11907.004.patch > > > Currently, {{HealthMonitor#doHealthChecks}} invokes > {{NameNode#monitorHealth}} which ends up invoking > {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per > second by default. And NameNodeResourceChecker#isResourceAvailable invokes > {{df.getAvailable();}} every time it is called. > Since available space information should rarely be changing dramatically at > the pace of per second. A cached value should be sufficient. i.e. only try to > get the updated value when the cached value is too old. otherwise simply > return the cached value. This way df.getAvailable() gets invoked less. > Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033927#comment-16033927 ] Andrew Wang commented on HDFS-11907: Sorry for coming to this late, but is DFCachingGetSpaceUsed useful here? Seems related / similar. > NameNodeResourceChecker should avoid calling df.getAvailable too frequently > --- > > Key: HDFS-11907 > URL: https://issues.apache.org/jira/browse/HDFS-11907 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, > HDFS-11907.003.patch, HDFS-11907.004.patch > > > Currently, {{HealthMonitor#doHealthChecks}} invokes > {{NameNode#monitorHealth}} which ends up invoking > {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per > second by default. And NameNodeResourceChecker#isResourceAvailable invokes > {{df.getAvailable();}} every time it is called. > Since available space information should rarely be changing dramatically at > the pace of per second. A cached value should be sufficient. i.e. only try to > get the updated value when the cached value is too old. otherwise simply > return the cached value. This way df.getAvailable() gets invoked less. > Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033177#comment-16033177 ] Kihwal Lee commented on HDFS-11907: --- The {{statfs()}} call is not expensive, but I agree that the current calling frequency is a bit excessive. Since it is one of conditions that can trigger a NN failover, any change here will affect the failure detection latency. The increase from max 1 second latency to max 5 seems acceptable. > NameNodeResourceChecker should avoid calling df.getAvailable too frequently > --- > > Key: HDFS-11907 > URL: https://issues.apache.org/jira/browse/HDFS-11907 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, > HDFS-11907.003.patch > > > Currently, {{HealthMonitor#doHealthChecks}} invokes > {{NameNode#monitorHealth}} which ends up invoking > {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per > second by default. And NameNodeResourceChecker#isResourceAvailable invokes > {{df.getAvailable();}} every time it is called. > Since available space information should rarely be changing dramatically at > the pace of per second. A cached value should be sufficient. i.e. only try to > get the updated value when the cached value is too old. otherwise simply > return the cached value. This way df.getAvailable() gets invoked less. > Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032274#comment-16032274 ] Arpit Agarwal commented on HDFS-11907: -- Thanks for updating the patch [~vagarychen]. A few minor comments: # Typo: availabeSpaceTimeStamp --> availableSpaceTimeStamp # Typo: DF_STALE_THREASHOLD_MS --> DF_STALE_THRESHOLD_MS # Looks like the two threshold values were swapped from v2 to v3 patch. I think the v2 values were better. # getAvailabeSpaceTimeStamp can be package private. Also the new constructor. # in the test case, can you add an assert for the {{// first call guarantees an update}} e.g. {code} // first call guarantees an update long val0 = nb.getAvailabeSpaceTimeStamp(); volume.isResourceAvailable(); long val1 = nb.getAvailabeSpaceTimeStamp(); assertEquals(val0 + 5000, val1); timer.advance(2000); {code} Looks good otherwise! > NameNodeResourceChecker should avoid calling df.getAvailable too frequently > --- > > Key: HDFS-11907 > URL: https://issues.apache.org/jira/browse/HDFS-11907 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, > HDFS-11907.003.patch > > > Currently, {{HealthMonitor#doHealthChecks}} invokes > {{NameNode#monitorHealth}} which ends up invoking > {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per > second by default. And NameNodeResourceChecker#isResourceAvailable invokes > {{df.getAvailable();}} every time it is called. > Since available space information should rarely be changing dramatically at > the pace of per second. A cached value should be sufficient. i.e. only try to > get the updated value when the cached value is too old. otherwise simply > return the cached value. This way df.getAvailable() gets invoked less. > Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032154#comment-16032154 ] Arpit Agarwal commented on HDFS-11907: -- Thanks for this improvement Chen! A few comments: # We should use Time#monotonicNow instead of System#currentTimeMillis in both files. Time#monotonicNow also returns a millisecond value, but it is guaranteed to be monotonically increasing. # Instead of initializing availableSpeceTimeStamp to zero, we should initialize it to (Time#monotonicNow - 5000) since 0 can be a valid timestamp returned by nanoTime. # You can also log the IP address of the client that issued the request to aid debugging. It can be retrieved in an RPC call by calling Server.getIpAddr(). # Typo: availabeSpeceTimeStamp --> availableSpaceTimeStamp. # Let's Replace 5000 and 3000 with static final ints. # See if you can write an isolated unit test for NameNodeResourceChecker. e.g. the first call to isResourceAvailable should update availableSpaceTimeStamp, subsequent calls immediately should not. Then if you advance the timer (see FakeTimer) and call isResourceAvailable again, availableSpaceTimeStamp should be updated. > NameNodeResourceChecker should avoid calling df.getAvailable too frequently > --- > > Key: HDFS-11907 > URL: https://issues.apache.org/jira/browse/HDFS-11907 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch > > > Currently, {{HealthMonitor#doHealthChecks}} invokes > {{NameNode#monitorHealth}} which ends up invoking > {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per > second by default. And NameNodeResourceChecker#isResourceAvailable invokes > {{df.getAvailable();}} every time it is called. > Since available space information should rarely be changing dramatically at > the pace of per second. A cached value should be sufficient. i.e. only try to > get the updated value when the cached value is too old. otherwise simply > return the cached value. This way df.getAvailable() gets invoked less. > Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032145#comment-16032145 ] Hadoop QA commented on HDFS-11907: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 1s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}101m 49s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 | | | hadoop.hdfs.TestBlockStoragePolicy | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure090 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140 | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HDFS-11907 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12870625/HDFS-11907.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 61a3ffd5fde6 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4369690 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/19707/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/19707/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/19707/console | | Powered by | Apache
[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031863#comment-16031863 ] Hanisha Koneru commented on HDFS-11907: --- Thanks for the contribution [~vagarychen]. Patch v02 LGTM. > NameNodeResourceChecker should avoid calling df.getAvailable too frequently > --- > > Key: HDFS-11907 > URL: https://issues.apache.org/jira/browse/HDFS-11907 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch > > > Currently, {{HealthMonitor#doHealthChecks}} invokes > {{NameNode#monitorHealth}} which ends up invoking > {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per > second by default. And NameNodeResourceChecker#isResourceAvailable invokes > {{df.getAvailable();}} every time it is called. > Since available space information should rarely be changing dramatically at > the pace of per second. A cached value should be sufficient. i.e. only try to > get the updated value when the cached value is too old. otherwise simply > return the cached value. This way df.getAvailable() gets invoked less. > Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org