[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-06-09 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045105#comment-16045105
 ] 

Arpit Agarwal commented on HDFS-11907:
--

+1 for the v6 patch. Thanks Chen.

[~andrew.wang] do you have any comments on the latest patch?

> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---
>
> Key: HDFS-11907
> URL: https://issues.apache.org/jira/browse/HDFS-11907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, 
> HDFS-11907.003.patch, HDFS-11907.004.patch, HDFS-11907.005.patch, 
> HDFS-11907.006.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes 
> {{NameNode#monitorHealth}} which ends up invoking 
> {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per 
> second by default. And NameNodeResourceChecker#isResourceAvailable invokes 
> {{df.getAvailable();}} every time it is called.
> Since available space information should rarely be changing dramatically at 
> the pace of per second. A cached value should be sufficient. i.e. only try to 
> get the updated value when the cached value is too old. otherwise simply 
> return the cached value. This way df.getAvailable() gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-06-08 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043450#comment-16043450
 ] 

Andrew Wang commented on HDFS-11907:


I would appreciate some diligence investigating the root cause of the problem. 
I don't think we should commit a behavior change when there isn't a degree of 
confidence that it'll solve the stated problem.

Based on the discussion, we (Kihwal, myself, Arpit) expect {{df}} to be a cheap 
call. Given that, there is not a degree of confidence that this change will 
speedup health checks. I'd like to hear at least a theory as to why {{df}} 
would be slow in a spurious way before we add complexity.

I'm not going to burn bridges over this patch if someone *really* wants to 
commit it, but as I said before, I would much rather see a patch to add new 
metrics or instrumentation to root cause the problem.

> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---
>
> Key: HDFS-11907
> URL: https://issues.apache.org/jira/browse/HDFS-11907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, 
> HDFS-11907.003.patch, HDFS-11907.004.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes 
> {{NameNode#monitorHealth}} which ends up invoking 
> {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per 
> second by default. And NameNodeResourceChecker#isResourceAvailable invokes 
> {{df.getAvailable();}} every time it is called.
> Since available space information should rarely be changing dramatically at 
> the pace of per second. A cached value should be sufficient. i.e. only try to 
> get the updated value when the cached value is too old. otherwise simply 
> return the cached value. This way df.getAvailable() gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-06-08 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043410#comment-16043410
 ] 

Chen Liang commented on HDFS-11907:
---

Thanks [~arpitagarwal] and [~andrew.wang] for the follow-up. Are you -1 on 
making the df interval configurable Andrew?

> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---
>
> Key: HDFS-11907
> URL: https://issues.apache.org/jira/browse/HDFS-11907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, 
> HDFS-11907.003.patch, HDFS-11907.004.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes 
> {{NameNode#monitorHealth}} which ends up invoking 
> {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per 
> second by default. And NameNodeResourceChecker#isResourceAvailable invokes 
> {{df.getAvailable();}} every time it is called.
> Since available space information should rarely be changing dramatically at 
> the pace of per second. A cached value should be sufficient. i.e. only try to 
> get the updated value when the cached value is too old. otherwise simply 
> return the cached value. This way df.getAvailable() gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-06-08 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043365#comment-16043365
 ] 

Andrew Wang commented on HDFS-11907:


I'm overall ambivalent to the {{df}} frequency, but I'd rather not add 
complexity if it's not necessary. I'd like to confirm the problem is the 
frequency of the {{df}} call before we commit a behavior change.

Repeating my previous comment, are there additional metrics or logs we can add 
to help debug this for next time? I definitely support that.

> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---
>
> Key: HDFS-11907
> URL: https://issues.apache.org/jira/browse/HDFS-11907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, 
> HDFS-11907.003.patch, HDFS-11907.004.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes 
> {{NameNode#monitorHealth}} which ends up invoking 
> {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per 
> second by default. And NameNodeResourceChecker#isResourceAvailable invokes 
> {{df.getAvailable();}} every time it is called.
> Since available space information should rarely be changing dramatically at 
> the pace of per second. A cached value should be sufficient. i.e. only try to 
> get the updated value when the cached value is too old. otherwise simply 
> return the cached value. This way df.getAvailable() gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-06-08 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043335#comment-16043335
 ] 

Arpit Agarwal commented on HDFS-11907:
--

Hi [~andrew.wang], you are right that it's expected to be a cheap call, but 
calling it once per second per volume seems excessive. Do you see any benefit 
to querying {{df}} once per second? We can make the caching interval 
configurable and leave the default at 1 second if you prefer.

This is not the same as changing the health check interval as Chen mentioned. 
Keeping the health check interval at 1 second lets us detect process failure 
faster and we don't want to change that.

Also the v4 patch has a couple of issues I missed earlier.
# availableSpace and availableSpaceTimeStamp should be members of checkedVolume.
# The test case failure in TestNameNodeResourceChecker needs to be addressed. 
An easy fix is to check all volumes instead of trying to query a specific one.

> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---
>
> Key: HDFS-11907
> URL: https://issues.apache.org/jira/browse/HDFS-11907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, 
> HDFS-11907.003.patch, HDFS-11907.004.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes 
> {{NameNode#monitorHealth}} which ends up invoking 
> {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per 
> second by default. And NameNodeResourceChecker#isResourceAvailable invokes 
> {{df.getAvailable();}} every time it is called.
> Since available space information should rarely be changing dramatically at 
> the pace of per second. A cached value should be sufficient. i.e. only try to 
> get the updated value when the cached value is too old. otherwise simply 
> return the cached value. This way df.getAvailable() gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-06-06 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039589#comment-16039589
 ] 

Andrew Wang commented on HDFS-11907:


Hi, sorry for the slow reply, was out on Friday and Monday,

If we haven't confirmed the problem, I'd support adding additional logs or 
metrics for better debugging, but making behavior changes seems premature. Is 
there a metric for the {{df}} call that we can look at to confirm slowness? 
Other host-level statistics that we can check?

> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---
>
> Key: HDFS-11907
> URL: https://issues.apache.org/jira/browse/HDFS-11907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, 
> HDFS-11907.003.patch, HDFS-11907.004.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes 
> {{NameNode#monitorHealth}} which ends up invoking 
> {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per 
> second by default. And NameNodeResourceChecker#isResourceAvailable invokes 
> {{df.getAvailable();}} every time it is called.
> Since available space information should rarely be changing dramatically at 
> the pace of per second. A cached value should be sufficient. i.e. only try to 
> get the updated value when the cached value is too old. otherwise simply 
> return the cached value. This way df.getAvailable() gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-06-06 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039567#comment-16039567
 ] 

Arpit Agarwal commented on HDFS-11907:
--

Hi [~vagarychen], I just noticed that TestNameNodeResourceChecker failed in the 
precommit run.

Can you please take a look?

> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---
>
> Key: HDFS-11907
> URL: https://issues.apache.org/jira/browse/HDFS-11907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, 
> HDFS-11907.003.patch, HDFS-11907.004.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes 
> {{NameNode#monitorHealth}} which ends up invoking 
> {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per 
> second by default. And NameNodeResourceChecker#isResourceAvailable invokes 
> {{df.getAvailable();}} every time it is called.
> Since available space information should rarely be changing dramatically at 
> the pace of per second. A cached value should be sufficient. i.e. only try to 
> get the updated value when the cached value is too old. otherwise simply 
> return the cached value. This way df.getAvailable() gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-06-05 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037506#comment-16037506
 ] 

Arpit Agarwal commented on HDFS-11907:
--

Hi [~andrew.wang], if you have no objections I will commit Chen's v4 patch by 
EOD today.

> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---
>
> Key: HDFS-11907
> URL: https://issues.apache.org/jira/browse/HDFS-11907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, 
> HDFS-11907.003.patch, HDFS-11907.004.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes 
> {{NameNode#monitorHealth}} which ends up invoking 
> {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per 
> second by default. And NameNodeResourceChecker#isResourceAvailable invokes 
> {{df.getAvailable();}} every time it is called.
> Since available space information should rarely be changing dramatically at 
> the pace of per second. A cached value should be sufficient. i.e. only try to 
> get the updated value when the cached value is too old. otherwise simply 
> return the cached value. This way df.getAvailable() gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-06-02 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035112#comment-16035112
 ] 

Chen Liang commented on HDFS-11907:
---

Thanks [~andrew.wang] for the reply!

df does seem to be a fairly cheap operation in general, but we've seen cases 
where we suspect it was this call being slow under certain conditions, which we 
are still doing analysis. About changing monitorHealth check interval, since we 
still want ZKFC process to try to contact NameNode frequently enough to detect 
failures ASAP, we probably don't want to lower the frequency from caller's side.

> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---
>
> Key: HDFS-11907
> URL: https://issues.apache.org/jira/browse/HDFS-11907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, 
> HDFS-11907.003.patch, HDFS-11907.004.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes 
> {{NameNode#monitorHealth}} which ends up invoking 
> {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per 
> second by default. And NameNodeResourceChecker#isResourceAvailable invokes 
> {{df.getAvailable();}} every time it is called.
> Since available space information should rarely be changing dramatically at 
> the pace of per second. A cached value should be sufficient. i.e. only try to 
> get the updated value when the cached value is too old. otherwise simply 
> return the cached value. This way df.getAvailable() gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-06-01 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034021#comment-16034021
 ] 

Andrew Wang commented on HDFS-11907:


Thanks for the reply [~vagarychen]. Just thought I'd mention the other class in 
case there was potential for code sharing.

Could you expand a little on why caching is necessary here? As Kihwal said, df 
is normally pretty cheap, so I'm curious why we need to do this. We could also 
get possibly the same outcome by increasing the monitorHealth check interval 
from 1s to 5s.

> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---
>
> Key: HDFS-11907
> URL: https://issues.apache.org/jira/browse/HDFS-11907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, 
> HDFS-11907.003.patch, HDFS-11907.004.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes 
> {{NameNode#monitorHealth}} which ends up invoking 
> {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per 
> second by default. And NameNodeResourceChecker#isResourceAvailable invokes 
> {{df.getAvailable();}} every time it is called.
> Since available space information should rarely be changing dramatically at 
> the pace of per second. A cached value should be sufficient. i.e. only try to 
> get the updated value when the cached value is too old. otherwise simply 
> return the cached value. This way df.getAvailable() gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034006#comment-16034006
 ] 

Hadoop QA commented on HDFS-11907:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 37s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 5 new + 111 unchanged - 0 fixed = 116 total (was 111) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 37s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}123m 29s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork |
|   | hadoop.hdfs.server.namenode.TestNameNodeResourceChecker |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11907 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870883/HDFS-11907.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 9229a80e5966 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 
16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 7101477 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19733/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19733/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19733/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19733/console |
| Powered by | 

[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-06-01 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033968#comment-16033968
 ] 

Arpit Agarwal commented on HDFS-11907:
--

I am +1 on the v4 patch. I will hold off committing, pending Andrew's response.

> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---
>
> Key: HDFS-11907
> URL: https://issues.apache.org/jira/browse/HDFS-11907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, 
> HDFS-11907.003.patch, HDFS-11907.004.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes 
> {{NameNode#monitorHealth}} which ends up invoking 
> {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per 
> second by default. And NameNodeResourceChecker#isResourceAvailable invokes 
> {{df.getAvailable();}} every time it is called.
> Since available space information should rarely be changing dramatically at 
> the pace of per second. A cached value should be sufficient. i.e. only try to 
> get the updated value when the cached value is too old. otherwise simply 
> return the cached value. This way df.getAvailable() gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-06-01 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033960#comment-16033960
 ] 

Chen Liang commented on HDFS-11907:
---

Thanks [~andrew.wang] for the comments! We prefer not to use it here though 
because:
1. the change of this JIRA is about maintaining *available space* value, while 
DFCachingGetSpaceUsed is to get *used space*. So we will have to make further 
modification to this class (or create new) if we want to use it. 
2. seems that each instance of this class will use an extra background thread 
that periodically updates the value, which seems a bit overkill to me. 

But if you do think it is better to use DFCachingGetSpaceUsed, I will try to 
update with another patch.

> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---
>
> Key: HDFS-11907
> URL: https://issues.apache.org/jira/browse/HDFS-11907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, 
> HDFS-11907.003.patch, HDFS-11907.004.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes 
> {{NameNode#monitorHealth}} which ends up invoking 
> {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per 
> second by default. And NameNodeResourceChecker#isResourceAvailable invokes 
> {{df.getAvailable();}} every time it is called.
> Since available space information should rarely be changing dramatically at 
> the pace of per second. A cached value should be sufficient. i.e. only try to 
> get the updated value when the cached value is too old. otherwise simply 
> return the cached value. This way df.getAvailable() gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-06-01 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033927#comment-16033927
 ] 

Andrew Wang commented on HDFS-11907:


Sorry for coming to this late, but is DFCachingGetSpaceUsed useful here? Seems 
related / similar.

> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---
>
> Key: HDFS-11907
> URL: https://issues.apache.org/jira/browse/HDFS-11907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, 
> HDFS-11907.003.patch, HDFS-11907.004.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes 
> {{NameNode#monitorHealth}} which ends up invoking 
> {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per 
> second by default. And NameNodeResourceChecker#isResourceAvailable invokes 
> {{df.getAvailable();}} every time it is called.
> Since available space information should rarely be changing dramatically at 
> the pace of per second. A cached value should be sufficient. i.e. only try to 
> get the updated value when the cached value is too old. otherwise simply 
> return the cached value. This way df.getAvailable() gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-06-01 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033177#comment-16033177
 ] 

Kihwal Lee commented on HDFS-11907:
---

The {{statfs()}} call is not expensive, but I agree that the current calling 
frequency is a bit excessive.  Since it is one of conditions that can trigger a 
NN failover, any change here will affect the failure detection latency. The 
increase from max 1 second latency to max 5 seems acceptable.

> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---
>
> Key: HDFS-11907
> URL: https://issues.apache.org/jira/browse/HDFS-11907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, 
> HDFS-11907.003.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes 
> {{NameNode#monitorHealth}} which ends up invoking 
> {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per 
> second by default. And NameNodeResourceChecker#isResourceAvailable invokes 
> {{df.getAvailable();}} every time it is called.
> Since available space information should rarely be changing dramatically at 
> the pace of per second. A cached value should be sufficient. i.e. only try to 
> get the updated value when the cached value is too old. otherwise simply 
> return the cached value. This way df.getAvailable() gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-05-31 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032274#comment-16032274
 ] 

Arpit Agarwal commented on HDFS-11907:
--

Thanks for updating the patch [~vagarychen]. A few minor comments:
# Typo: availabeSpaceTimeStamp --> availableSpaceTimeStamp
# Typo: DF_STALE_THREASHOLD_MS --> DF_STALE_THRESHOLD_MS
# Looks like the two threshold values were swapped from v2 to v3 patch. I think 
the v2 values were better.
# getAvailabeSpaceTimeStamp can be package private. Also the new constructor.
# in the test case, can you add an assert for the {{// first call guarantees an 
update}} e.g.
{code}
// first call guarantees an update
long val0 = nb.getAvailabeSpaceTimeStamp();
volume.isResourceAvailable();
long val1 = nb.getAvailabeSpaceTimeStamp();
assertEquals(val0 + 5000, val1);
timer.advance(2000);
{code}

Looks good otherwise!

> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---
>
> Key: HDFS-11907
> URL: https://issues.apache.org/jira/browse/HDFS-11907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, 
> HDFS-11907.003.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes 
> {{NameNode#monitorHealth}} which ends up invoking 
> {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per 
> second by default. And NameNodeResourceChecker#isResourceAvailable invokes 
> {{df.getAvailable();}} every time it is called.
> Since available space information should rarely be changing dramatically at 
> the pace of per second. A cached value should be sufficient. i.e. only try to 
> get the updated value when the cached value is too old. otherwise simply 
> return the cached value. This way df.getAvailable() gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-05-31 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032154#comment-16032154
 ] 

Arpit Agarwal commented on HDFS-11907:
--

Thanks for this improvement Chen! A few comments:
# We should use Time#monotonicNow instead of System#currentTimeMillis in both 
files. Time#monotonicNow also returns a millisecond value, but it is guaranteed 
to be monotonically increasing.
# Instead of initializing availableSpeceTimeStamp to zero, we should initialize 
it to (Time#monotonicNow - 5000) since 0 can be a valid timestamp returned by 
nanoTime.
# You can also log the IP address of the client that issued the request to aid 
debugging. It can be retrieved in an RPC call by calling Server.getIpAddr().
# Typo: availabeSpeceTimeStamp --> availableSpaceTimeStamp.
# Let's Replace 5000 and 3000 with static final ints.
# See if you can write an isolated unit test for NameNodeResourceChecker. e.g. 
the first call to isResourceAvailable should update availableSpaceTimeStamp, 
subsequent calls immediately should not. Then if you advance the timer (see 
FakeTimer) and call isResourceAvailable again, availableSpaceTimeStamp should 
be updated.


> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---
>
> Key: HDFS-11907
> URL: https://issues.apache.org/jira/browse/HDFS-11907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes 
> {{NameNode#monitorHealth}} which ends up invoking 
> {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per 
> second by default. And NameNodeResourceChecker#isResourceAvailable invokes 
> {{df.getAvailable();}} every time it is called.
> Since available space information should rarely be changing dramatically at 
> the pace of per second. A cached value should be sufficient. i.e. only try to 
> get the updated value when the cached value is too old. otherwise simply 
> return the cached value. This way df.getAvailable() gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-05-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032145#comment-16032145
 ] 

Hadoop QA commented on HDFS-11907:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m  1s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}101m 49s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 |
|   | hadoop.hdfs.TestBlockStoragePolicy |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure090 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140 |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11907 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870625/HDFS-11907.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 61a3ffd5fde6 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 
09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 4369690 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19707/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19707/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19707/console |
| Powered by | Apache 

[jira] [Commented] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-05-31 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031863#comment-16031863
 ] 

Hanisha Koneru commented on HDFS-11907:
---

Thanks for the contribution [~vagarychen]. Patch v02 LGTM.

> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---
>
> Key: HDFS-11907
> URL: https://issues.apache.org/jira/browse/HDFS-11907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes 
> {{NameNode#monitorHealth}} which ends up invoking 
> {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per 
> second by default. And NameNodeResourceChecker#isResourceAvailable invokes 
> {{df.getAvailable();}} every time it is called.
> Since available space information should rarely be changing dramatically at 
> the pace of per second. A cached value should be sufficient. i.e. only try to 
> get the updated value when the cached value is too old. otherwise simply 
> return the cached value. This way df.getAvailable() gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org