[jira] [Commented] (HDFS-15410) Add separated config file fedbalance-default.xml for fedbalance tool

2020-06-19 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140960#comment-17140960
 ] 

Yiqun Lin commented on HDFS-15410:
--

Besides [~elgoiri]'s review comment, some more reivew comments from me:

Not fully understand why we need to define the class impl config to do 
reflection and get the instance. Currently there is no other implement class, 
why not just create new FedBalance/BalanceJournalInfoHDFS instance in the code? 
From my understanding, this two config settings is can be removed.
{code:java}
federation.balance.class
hadoop.hdfs.procedure.journal.class

// init journal.
Class clazz = (Class) conf
.getClass(JOURNAL_CLASS, BalanceJournalInfoHDFS.class);
journal = ReflectionUtils.newInstance(clazz, conf);

Class balanceClazz = (Class) conf
.getClass(FEDERATION_BALANCE_CLASS, FedBalance.class);
Tool balancer = ReflectionUtils.newInstance(balanceClazz, conf);
{code}

Can we rename class name from {{DistCpBalanceOptions}} to 
{{FedBalanceOptions}}? This will look more readable that these options here are 
making sense for fedbalance tool.

Can we rename config prefix from {{hadoop.hdfs.procedure.work.thread.num}} to 
{{hdfs.fedbalance.procedure.work.thread.num}}?

Following description need to be updated here since -router option doesn't 
require to inout true or false as a parameter now.
{noformat}
  final static Option ROUTER = new Option("router", false,
  "If `true` the command runs in router mode. The source path is "
  + "taken as a mount point. It will disable write by setting the mount"
  + " point readonly. Otherwise the command works in normal federation"
  + " mode. The source path is taken as the full path. It will disable"
  + " write by cancelling all permissions of the source path. The"
  + " default value is `true`.");
{noformat}

> Add separated config file fedbalance-default.xml for fedbalance tool
> 
>
> Key: HDFS-15410
> URL: https://issues.apache.org/jira/browse/HDFS-15410
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15410.001.patch, HDFS-15410.002.patch
>
>
> Add a separated config file named fedbalance-default.xml for fedbalance tool 
> configs. It's like the ditcp-default.xml for distcp tool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2020-06-19 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15423:

Component/s: webhdfs

> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Priority: Major
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2020-06-19 Thread Chao Sun (Jira)
Chao Sun created HDFS-15423:
---

 Summary: RBF: WebHDFS create shouldn't choose DN from all 
sub-clusters
 Key: HDFS-15423
 URL: https://issues.apache.org/jira/browse/HDFS-15423
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: rbf
Reporter: Chao Sun


In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} first 
gets all DNs via {{getDatanodeReport}}, and then randomly pick one from the 
list via {{getRandomDatanode}}. This logic doesn't seem correct as it should 
pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13082) cookieverf mismatch error over NFS gateway on Linux

2020-06-19 Thread Daniel Howard (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140885#comment-17140885
 ] 

Daniel Howard commented on HDFS-13082:
--

I am running into this as well, but the AIX compatibility trick did not help.

For example:

{{0-15:58 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}}
{{# No files listed}}
{{0-16:01 djh@c24-03-06 ~> *touch /hadoop/wxxxs/data/foo*}}
{{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}}
{{foo packed-hbfs/ raw/ tmp/}}
{{0-16:01 djh@c24-03-06 ~> *rm /hadoop/wxxxs/data/foo*}}
{{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}}
{{packed-hbfs/ raw/ tmp/}}

Writing to this directory forced the NFS server to return the correct directory 
contents.

I have a bunch of this in the log:

{{2020-06-19 16:01:35,281 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 
1592428367587}}
{{2020-06-19 16:01:35,287 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 
1592428367587}}
{{2020-06-19 16:01:35,454 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 
1592428367587}}

I am tempted to fiddle with _dfs.namenode.accesstime.precision_ but .. ?!

> cookieverf mismatch error over NFS gateway on Linux
> ---
>
> Key: HDFS-13082
> URL: https://issues.apache.org/jira/browse/HDFS-13082
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3
>Reporter: Dan Moraru
>Priority: Minor
>
> Running 'ls' on some directories over an HDFS-NFS gateway sometimes fails to 
> list the contents of those directories.  Running 'ls' on those same 
> directories mounted via FUSE works.  The NFS gateway logs errors like the 
> following:
> 2018-01-29 11:53:01,130 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
> cookieverf mismatch. request cookieverf: 1513390944415 dir cookieverf: 
> 1516920857335
> Reviewing 
> hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
>  suggested that  these errors can be avoided by setting 
> nfs.aix.compatibility.mode.enabled=true, and that is indeed the case.  The 
> documentation lists https://issues.apache.org/jira/browse/HDFS-6549 as a 
> known issue, but also goes on to say that "regular, non-AIX clients should 
> NOT enable AIX compatibility mode. The work-arounds implemented by AIX 
> compatibility mode effectively disable safeguards to ensure that listing of 
> directory contents via NFS returns consistent results, and that all data sent 
> to the NFS server can be assured to have been committed."   Server and client 
> is this case are one and the same, running Scientific Linux 7.4.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.

2020-06-19 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15422:
--
Description: 
When queueing an IBR (incremental block report) on a standby namenode, some of 
the reported information is being replaced with the existing stored 
information.  This can lead to false block corruption.

We had a namenode, after transitioning to active, started reporting missing 
blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were 
appended and the sizes were actually correct on the datanodes. Upon further 
investigation, it was determined that the namenode was queueing IBRs with 
altered information.

Although it sounds bad, I am not making it blocker 

  was:
When queueing an IBR (incremental block report) on a standby namenode, some of 
the reported information is being replaced with the existing stored 
information.  This can lead to false block corruption.

We had a namenode, after transitioning to active, started reporting missing 
blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were 
appended and the sizes were actually correct on the datanodes. Upon further 
investigation, it was determined that the namenode was queueing IBRs with 
altered information.


> Reported IBR is partially replaced with stored info when queuing.
> -
>
> Key: HDFS-15422
> URL: https://issues.apache.org/jira/browse/HDFS-15422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Priority: Critical
>
> When queueing an IBR (incremental block report) on a standby namenode, some 
> of the reported information is being replaced with the existing stored 
> information.  This can lead to false block corruption.
> We had a namenode, after transitioning to active, started reporting missing 
> blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were 
> appended and the sizes were actually correct on the datanodes. Upon further 
> investigation, it was determined that the namenode was queueing IBRs with 
> altered information.
> Although it sounds bad, I am not making it blocker 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15415) Reduce locking in Datanode DirectoryScanner

2020-06-19 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140812#comment-17140812
 ] 

Hadoop QA commented on HDFS-15415:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m  
5s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
3s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 47s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 224 unchanged - 0 fixed = 225 total (was 224) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  8s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}125m 37s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 7s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}200m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy 
|
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.TestGetFileChecksum |
|   | hadoop.hdfs.server.namenode.ha.TestHAAppend |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-HDFS-Build/29441/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-15415 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13006051/HDFS-15415.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 7a051fef76dc 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 
10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HDFS-15415) Reduce locking in Datanode DirectoryScanner

2020-06-19 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140779#comment-17140779
 ] 

Stephen O'Donnell commented on HDFS-15415:
--

If a block is RBW or RUR before the snapshot of memory is taken, then it will 
never be part of the in memory blocks for that pass of the scanner. RBW should 
also be skipped by the disk scan too. If a block goes FINALIZED to RBW (due to 
append), they we may records a difference or we may not depending on the 
sequence of events.

There are always going to be some "false positives" in the comparison, as the 
disk picture will always be changing before we take the lock, even with the 
code as it is now. That is why I believe we can do the processing against the 
memory snapshot without the lock. The price we pay, is possibly some more 
differences which have to be reconciled later.

The faster we can make the scan step, the less false positives there will be 
for reconcile later.

As with many of these locking problems, it is hard to be 100% sure this will 
not cause some other problems, but from what I looked at today, I think it 
should be good.

> Reduce locking in Datanode DirectoryScanner
> ---
>
> Key: HDFS-15415
> URL: https://issues.apache.org/jira/browse/HDFS-15415
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15415.001.patch
>
>
> In HDFS-15406, we have a small change to greatly reduce the runtime and 
> locking time of the datanode DirectoryScanner. They may be room for further 
> improvement here:
> 1. These lines of code in DirectoryScanner#scan(), obtain a snapshot of the 
> finalized blocks from memory, and then sort them, under the DN lock. However 
> the blocks are stored in a sorted structure (FoldedTreeSet) and hence the 
> sort should be unnecessary.
> {code}
>   final List bl = dataset.getFinalizedBlocks(bpid);
>   Collections.sort(bl); // Sort based on blockId
> {code}
> 2.  From the scan step, we have captured a snapshot of what is on disk. After 
> calling `dataset.getFinalizedBlocks(bpid);` as above we have taken a snapshot 
> of in memory. The two snapshots are never 100% in sync as things are always 
> changing as the disk is scanned.
> We are only comparing finalized blocks, so they should not really change:
> * If a block is deleted after our snapshot, our snapshot will not see it and 
> that is OK.
> * A finalized block could be appended. If that happens both the genstamp and 
> length will change, but that should be handled by reconcile when it calls 
> `FSDatasetImpl.checkAndUpdate()`, and there is nothing stopping blocks being 
> appended after they have been scanned from disk, but before they have been 
> compared with memory.
> My suspicion is that we can do all the comparison work outside of the lock 
> and checkAndUpdate() re-checks any differences later under the lock on a 
> block by block basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15417) RBF: Get the datanode report from cache for federation WebHDFS operations

2020-06-19 Thread Ye Ni (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ye Ni updated HDFS-15417:
-
Description: 
*Why*
 For WebHDFS CREATE, OPEN, APPEND and GETFILECHECKSUM operations, router or 
namenode needs to get the datanodes where the block is located, then redirect 
the request to one of the datanodes.

However, this chooseDatanode action in router is much slower than namenode, 
which directly affects the WebHDFS operations above.

For namenode WebHDFS, it normally takes tens of milliseconds, while router 
always takes more than 2 seconds.

*How*
Cache the datanode report in router RPC server. Actively refresh with a 
configured interval. Only get the datanode report when necessary in router.

It is a very expense operation where all the time is spent on.

This is only needed when we want to exclude some datanodes or find a random 
datanode for CREATE.

  was:
*Why*
 For WebHDFS CREATE, OPEN, APPEND and GETFILECHECKSUM operations, router or 
namenode needs to get the datanodes where the block is located, then redirect 
the request to one of the datanodes.

However, this chooseDatanode action in router is much slower than namenode, 
which directly affects the WebHDFS operations above.

For namenode WebHDFS, it normally takes tens of milliseconds, while router 
always takes more than 2 seconds.

*How*
 Only get the datanode report when necessary in router. It is a very expense 
operation where all the time is spent on.

This is only needed when we want to exclude some datanodes or find a random 
datanode for CREATE.


> RBF: Get the datanode report from cache for federation WebHDFS operations
> -
>
> Key: HDFS-15417
> URL: https://issues.apache.org/jira/browse/HDFS-15417
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: federation, rbf, webhdfs
>Reporter: Ye Ni
>Assignee: Ye Ni
>Priority: Major
>
> *Why*
>  For WebHDFS CREATE, OPEN, APPEND and GETFILECHECKSUM operations, router or 
> namenode needs to get the datanodes where the block is located, then redirect 
> the request to one of the datanodes.
> However, this chooseDatanode action in router is much slower than namenode, 
> which directly affects the WebHDFS operations above.
> For namenode WebHDFS, it normally takes tens of milliseconds, while router 
> always takes more than 2 seconds.
> *How*
> Cache the datanode report in router RPC server. Actively refresh with a 
> configured interval. Only get the datanode report when necessary in router.
> It is a very expense operation where all the time is spent on.
> This is only needed when we want to exclude some datanodes or find a random 
> datanode for CREATE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15417) RBF: Get the datanode report from cache for federation WebHDFS operations

2020-06-19 Thread Ye Ni (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ye Ni updated HDFS-15417:
-
Summary: RBF: Get the datanode report from cache for federation WebHDFS 
operations  (was: RBF: Lazy get the datanode report for federation WebHDFS 
operations)

> RBF: Get the datanode report from cache for federation WebHDFS operations
> -
>
> Key: HDFS-15417
> URL: https://issues.apache.org/jira/browse/HDFS-15417
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: federation, rbf, webhdfs
>Reporter: Ye Ni
>Assignee: Ye Ni
>Priority: Major
>
> *Why*
>  For WebHDFS CREATE, OPEN, APPEND and GETFILECHECKSUM operations, router or 
> namenode needs to get the datanodes where the block is located, then redirect 
> the request to one of the datanodes.
> However, this chooseDatanode action in router is much slower than namenode, 
> which directly affects the WebHDFS operations above.
> For namenode WebHDFS, it normally takes tens of milliseconds, while router 
> always takes more than 2 seconds.
> *How*
>  Only get the datanode report when necessary in router. It is a very expense 
> operation where all the time is spent on.
> This is only needed when we want to exclude some datanodes or find a random 
> datanode for CREATE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15417) RBF: Lazy get the datanode report for federation WebHDFS operations

2020-06-19 Thread Ye Ni (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ye Ni updated HDFS-15417:
-
Priority: Major  (was: Minor)

> RBF: Lazy get the datanode report for federation WebHDFS operations
> ---
>
> Key: HDFS-15417
> URL: https://issues.apache.org/jira/browse/HDFS-15417
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: federation, rbf, webhdfs
>Reporter: Ye Ni
>Assignee: Ye Ni
>Priority: Major
>
> *Why*
>  For WebHDFS CREATE, OPEN, APPEND and GETFILECHECKSUM operations, router or 
> namenode needs to get the datanodes where the block is located, then redirect 
> the request to one of the datanodes.
> However, this chooseDatanode action in router is much slower than namenode, 
> which directly affects the WebHDFS operations above.
> For namenode WebHDFS, it normally takes tens of milliseconds, while router 
> always takes more than 2 seconds.
> *How*
>  Only get the datanode report when necessary in router. It is a very expense 
> operation where all the time is spent on.
> This is only needed when we want to exclude some datanodes or find a random 
> datanode for CREATE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15416) DataStorage#addStorageLocations() should add more reasonable information verification.

2020-06-19 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140720#comment-17140720
 ] 

Íñigo Goiri commented on HDFS-15416:


Let's go with the patch here instead of the PR.
Can we add a test?

> DataStorage#addStorageLocations() should add more reasonable information 
> verification.
> --
>
> Key: HDFS-15416
> URL: https://issues.apache.org/jira/browse/HDFS-15416
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.0, 3.1.1
>Reporter: jianghua zhu
>Assignee: jianghua zhu
>Priority: Major
> Attachments: HDFS-15416.patch
>
>
> SuccessLocations content is an array, when the number is 0, do not need to be 
> executed again loadBlockPoolSliceStorage ().
> code : 
> try
> {    
> final List successLocations = loadDataStorage(   datanode, 
> nsInfo,    dataDirs, startOpt, executor);  
> return loadBlockPoolSliceStorage(   datanode, nsInfo,   successLocations, 
> startOpt, executor); }
> finally
> {     executor.shutdown(); }
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15410) Add separated config file fedbalance-default.xml for fedbalance tool

2020-06-19 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140718#comment-17140718
 ] 

Íñigo Goiri commented on HDFS-15410:


We can fix the checkstyle.
What about adding hdfs as a prefix for the new config?
It is true that we haven't added the prefix in none of the classes but HDFS is 
definetely needed here.
BTW, are the current tests covering this change indirectly?

> Add separated config file fedbalance-default.xml for fedbalance tool
> 
>
> Key: HDFS-15410
> URL: https://issues.apache.org/jira/browse/HDFS-15410
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15410.001.patch, HDFS-15410.002.patch
>
>
> Add a separated config file named fedbalance-default.xml for fedbalance tool 
> configs. It's like the ditcp-default.xml for distcp tool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15415) Reduce locking in Datanode DirectoryScanner

2020-06-19 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140699#comment-17140699
 ] 

hemanthboyina commented on HDFS-15415:
--

thanks [~sodonnell] for your analysis 

after taking the snapshot , if we did not acquire the lock ,  have you 
considered the scenario  if the blocks being converted from RBW to FINALIZED ? 
{quote}A finalized block could be appended. If that happens both the genstamp 
and length will change
{quote}
agree with you , though the replica will be changed from FINALIZED to RBW  , so 
anyways we are getting only the finalized blocks it shouldnt be problem 

 

> Reduce locking in Datanode DirectoryScanner
> ---
>
> Key: HDFS-15415
> URL: https://issues.apache.org/jira/browse/HDFS-15415
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15415.001.patch
>
>
> In HDFS-15406, we have a small change to greatly reduce the runtime and 
> locking time of the datanode DirectoryScanner. They may be room for further 
> improvement here:
> 1. These lines of code in DirectoryScanner#scan(), obtain a snapshot of the 
> finalized blocks from memory, and then sort them, under the DN lock. However 
> the blocks are stored in a sorted structure (FoldedTreeSet) and hence the 
> sort should be unnecessary.
> {code}
>   final List bl = dataset.getFinalizedBlocks(bpid);
>   Collections.sort(bl); // Sort based on blockId
> {code}
> 2.  From the scan step, we have captured a snapshot of what is on disk. After 
> calling `dataset.getFinalizedBlocks(bpid);` as above we have taken a snapshot 
> of in memory. The two snapshots are never 100% in sync as things are always 
> changing as the disk is scanned.
> We are only comparing finalized blocks, so they should not really change:
> * If a block is deleted after our snapshot, our snapshot will not see it and 
> that is OK.
> * A finalized block could be appended. If that happens both the genstamp and 
> length will change, but that should be handled by reconcile when it calls 
> `FSDatasetImpl.checkAndUpdate()`, and there is nothing stopping blocks being 
> appended after they have been scanned from disk, but before they have been 
> compared with memory.
> My suspicion is that we can do all the comparison work outside of the lock 
> and checkAndUpdate() re-checks any differences later under the lock on a 
> block by block basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15415) Reduce locking in Datanode DirectoryScanner

2020-06-19 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-15415:
-
Status: Patch Available  (was: Open)

> Reduce locking in Datanode DirectoryScanner
> ---
>
> Key: HDFS-15415
> URL: https://issues.apache.org/jira/browse/HDFS-15415
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15415.001.patch
>
>
> In HDFS-15406, we have a small change to greatly reduce the runtime and 
> locking time of the datanode DirectoryScanner. They may be room for further 
> improvement here:
> 1. These lines of code in DirectoryScanner#scan(), obtain a snapshot of the 
> finalized blocks from memory, and then sort them, under the DN lock. However 
> the blocks are stored in a sorted structure (FoldedTreeSet) and hence the 
> sort should be unnecessary.
> {code}
>   final List bl = dataset.getFinalizedBlocks(bpid);
>   Collections.sort(bl); // Sort based on blockId
> {code}
> 2.  From the scan step, we have captured a snapshot of what is on disk. After 
> calling `dataset.getFinalizedBlocks(bpid);` as above we have taken a snapshot 
> of in memory. The two snapshots are never 100% in sync as things are always 
> changing as the disk is scanned.
> We are only comparing finalized blocks, so they should not really change:
> * If a block is deleted after our snapshot, our snapshot will not see it and 
> that is OK.
> * A finalized block could be appended. If that happens both the genstamp and 
> length will change, but that should be handled by reconcile when it calls 
> `FSDatasetImpl.checkAndUpdate()`, and there is nothing stopping blocks being 
> appended after they have been scanned from disk, but before they have been 
> compared with memory.
> My suspicion is that we can do all the comparison work outside of the lock 
> and checkAndUpdate() re-checks any differences later under the lock on a 
> block by block basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.

2020-06-19 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140618#comment-17140618
 ] 

Kihwal Lee commented on HDFS-15422:
---

The fix is simple. 
{code}
@@ -2578,10 +2578,7 @@ private BlockInfo processReportedBlock(
 // If the block is an out-of-date generation stamp or state,
 // but we're the standby, we shouldn't treat it as corrupt,
 // but instead just queue it for later processing.
-// TODO: Pretty confident this should be s/storedBlock/block below,
-// since we should be postponing the info of the reported block, not
-// the stored block. See HDFS-6289 for more context.
-queueReportedBlock(storageInfo, storedBlock, reportedState,
+queueReportedBlock(storageInfo, block, reportedState,
 QUEUE_REASON_CORRUPT_STATE);
   } else {
 toCorrupt.add(c);
{code}

If  the old information in memory({{storedBlock}}) is used in queueing a 
report, the size may be old.  Unlike GENSTAMP_MISMATCH, this kind of corruption 
can be undone when the NN sees a correct report again. I.e. forcing a block 
report won't fix this condition. 

> Reported IBR is partially replaced with stored info when queuing.
> -
>
> Key: HDFS-15422
> URL: https://issues.apache.org/jira/browse/HDFS-15422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Priority: Critical
>
> When queueing an IBR (incremental block report) on a standby namenode, some 
> of the reported information is being replaced with the existing stored 
> information.  This can lead to false block corruption.
> We had a namenode, after transitioning to active, started reporting missing 
> blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were 
> appended and the sizes were actually correct on the datanodes. Upon further 
> investigation, it was determined that the namenode was queueing IBRs with 
> altered information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.

2020-06-19 Thread Kihwal Lee (Jira)
Kihwal Lee created HDFS-15422:
-

 Summary: Reported IBR is partially replaced with stored info when 
queuing.
 Key: HDFS-15422
 URL: https://issues.apache.org/jira/browse/HDFS-15422
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Kihwal Lee


When queueing an IBR (incremental block report) on a standby namenode, some of 
the reported information is being replaced with the existing stored 
information.  This can lead to false block corruption.

We had a namenode, after transitioning to active, started reporting missing 
blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were 
appended and the sizes were actually correct on the datanodes. Upon further 
investigation, it was determined that the namenode was queueing IBRs with 
altered information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-19 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140607#comment-17140607
 ] 

Kihwal Lee edited comment on HDFS-15421 at 6/19/20, 2:56 PM:
-

Example of a leak itself: (single replica shown for simplicity)

1) IBRs queued. The file was created, data written to it and closed.  Then it 
was opened for append, additional data written and closed.
{noformat}
2020-06-19 02:38:27,423 [Block report processor] INFO 
blockmanagement.BlockManager: Queueing reported block 
blk_1521788462_1099975774416 in state FINALIZED from datanode 1.2.3.4:1004 for 
later processing because generation stamp is in the future.
2020-06-19 02:38:28,190 [Block report processor] INFO 
blockmanagement.BlockManager: Queueing reported block 
blk_1521788462_1099975774420 in state FINALIZED from datanode 1.2.3.4:1004 for 
later processing because generation stamp is in the future.
{noformat}

2) Processing of queued IBRs as edits replayed.  The IBR with the first gen 
stamp for the initial file is processed. The one from append is not, as the gen 
stamp is still in the future. It is re-queued.
{noformat}
2020-06-19 02:42:22,774 [Edit log tailer] INFO blockmanagement.BlockManager: 
Processing previouly queued message ReportedBlockInfo 
[block=blk_1521788462_1099975774416, dn=1.2.3.4:1004, reportedState=FINALIZED]
2020-06-19 02:42:22,774 [Edit log tailer] INFO blockmanagement.BlockManager: 
Processing previouly queued message ReportedBlockInfo 
[block=blk_1521788462_1099975774420, dn=1.2.3.4:1004, reportedState=FINALIZED]
2020-06-19 02:42:22,774 [Edit log tailer] INFO blockmanagement.BlockManager: 
Queueing reported block blk_1521788462_1099975774420 in state FINALIZED from 
datanode 1.2.3.4:1004 for later processing because generation stamp is in the 
future.
{noformat}

3) When the edits for append is replayed.  The IBR is still identified as from 
future and re-queued.  Since there is no more edits regarding this file, the 
IBR is leaked.
{noformat}
2020-06-19 02:42:22,776 [Edit log tailer] INFO blockmanagement.BlockManager: 
Processing previouly queued message ReportedBlockInfo 
[block=blk_1521788462_1099975774420, dn=1.2.3.4:1004, reportedState=FINALIZED]
2020-06-19 02:42:22,776 [Edit log tailer] INFO blockmanagement.BlockManager: 
Queueing reported block blk_1521788462_1099975774420 in state FINALIZED from 
datanode 1.2.3.4:1004 for later processing because generation stamp is in the 
future.
{noformat}

With HDFS-14941 reverted, the last IBR is processed as expected and the leak 
does not happen anymore.

Note: The original logging level of above lines are DEBUG, but was changed to 
INFO temporarily.


was (Author: kihwal):
Example of a leak itself: (single replica shown for simplicity)

1) IBRs queued. The file was created, data written to it and closed.  Then it 
was opened for append, additional data written and closed.
{noformat}
2020-06-19 02:38:27,423 [Block report processor] INFO 
blockmanagement.BlockManager: Queueing reported block 
blk_1521788462_1099975774416 in state FINALIZED from datanode 1.2.3.4:1004 for 
later processing because generation stamp is in the future.
2020-06-19 02:38:28,190 [Block report processor] INFO 
blockmanagement.BlockManager: Queueing reported block 
blk_1521788462_1099975774420 in state FINALIZED from datanode 1.2.3.4:1004 for 
later processing because generation stamp is in the future.
{noformat}

2) Processing of queued IBRs as edits replayed.  The IBR with the first gen 
stamp for the initial file is processed. The one from append is not, as the gen 
stamp is still in the future. It is re-queued.
{noformat}
2020-06-19 02:42:22,774 [Edit log tailer] INFO blockmanagement.BlockManager: 
Processing previouly queued message ReportedBlockInfo 
[block=blk_1521788462_1099975774416, dn=1.2.3.4:1004, reportedState=FINALIZED]
2020-06-19 02:42:22,774 [Edit log tailer] INFO blockmanagement.BlockManager: 
Processing previouly queued message ReportedBlockInfo 
[block=blk_1521788462_1099975774420, dn=1.2.3.4:1004, reportedState=FINALIZED]
2020-06-19 02:42:22,774 [Edit log tailer] INFO blockmanagement.BlockManager: 
Queueing reported block blk_1521788462_1099975774420 in state FINALIZED from 
datanode 1.2.3.4:1004 for later processing because generation stamp is in the 
future.
{noformat}

3) When the edits for append is replayed.  The IBR is still identified as from 
future and re-queued.  Since there is no more edits regarding this file, the 
IBR is leaked.
{noformat}
2020-06-19 02:42:22,776 [Edit log tailer] INFO blockmanagement.BlockManager: 
Processing previouly queued message ReportedBlockInfo 
[block=blk_1521788462_1099975774420, dn=1.2.3.4:1004, reportedState=FINALIZED]
2020-06-19 02:42:22,776 [Edit log tailer] INFO blockmanagement.BlockManager: 
Queueing reported block blk_1521788462_1099975774420 in state FINALIZED from 
datanode 1.2.3.4:1004 for later processing 

[jira] [Updated] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-19 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15421:
--
Priority: Blocker  (was: Critical)

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Priority: Blocker
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file

2020-06-19 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140608#comment-17140608
 ] 

Kihwal Lee commented on HDFS-14941:
---

Filed HDFS-1542 with more details.

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: ha
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, 
> HDFS-14941.003.patch, HDFS-14941.004.patch, HDFS-14941.005.patch, 
> HDFS-14941.006.patch
>
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-19 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140607#comment-17140607
 ] 

Kihwal Lee commented on HDFS-15421:
---

Example of a leak itself: (single replica shown for simplicity)

1) IBRs queued. The file was created, data written to it and closed.  Then it 
was opened for append, additional data written and closed.
{noformat}
2020-06-19 02:38:27,423 [Block report processor] INFO 
blockmanagement.BlockManager: Queueing reported block 
blk_1521788462_1099975774416 in state FINALIZED from datanode 1.2.3.4:1004 for 
later processing because generation stamp is in the future.
2020-06-19 02:38:28,190 [Block report processor] INFO 
blockmanagement.BlockManager: Queueing reported block 
blk_1521788462_1099975774420 in state FINALIZED from datanode 1.2.3.4:1004 for 
later processing because generation stamp is in the future.
{noformat}

2) Processing of queued IBRs as edits replayed.  The IBR with the first gen 
stamp for the initial file is processed. The one from append is not, as the gen 
stamp is still in the future. It is re-queued.
{noformat}
2020-06-19 02:42:22,774 [Edit log tailer] INFO blockmanagement.BlockManager: 
Processing previouly queued message ReportedBlockInfo 
[block=blk_1521788462_1099975774416, dn=1.2.3.4:1004, reportedState=FINALIZED]
2020-06-19 02:42:22,774 [Edit log tailer] INFO blockmanagement.BlockManager: 
Processing previouly queued message ReportedBlockInfo 
[block=blk_1521788462_1099975774420, dn=1.2.3.4:1004, reportedState=FINALIZED]
2020-06-19 02:42:22,774 [Edit log tailer] INFO blockmanagement.BlockManager: 
Queueing reported block blk_1521788462_1099975774420 in state FINALIZED from 
datanode 1.2.3.4:1004 for later processing because generation stamp is in the 
future.
{noformat}

3) When the edits for append is replayed.  The IBR is still identified as from 
future and re-queued.  Since there is no more edits regarding this file, the 
IBR is leaked.
{noformat}
2020-06-19 02:42:22,776 [Edit log tailer] INFO blockmanagement.BlockManager: 
Processing previouly queued message ReportedBlockInfo 
[block=blk_1521788462_1099975774420, dn=1.2.3.4:1004, reportedState=FINALIZED]
2020-06-19 02:42:22,776 [Edit log tailer] INFO blockmanagement.BlockManager: 
Queueing reported block blk_1521788462_1099975774420 in state FINALIZED from 
datanode 1.2.3.4:1004 for later processing because generation stamp is in the 
future.
{noformat}

With HDFS-14941 reverted, the last IBR is processed as expected and the leak 
does not happen anymore.


> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Priority: Critical
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-19 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140597#comment-17140597
 ] 

Kihwal Lee commented on HDFS-15421:
---

This is an example of "stuck safe mode" from one of our small test clusters:
{noformat}
The reported blocks 3045352 needs additional 14058 blocks to reach the 
threshold 1. of total blocks 3059410.
The minimum number of live datanodes is not required. Safe mode will be turned 
off automatically once the thresholds
 have been reached.
2020-06-11 18:35:19,863 [Block report processor] INFO hdfs.StateChange: STATE* 
Safe mode extension entered.
The reported blocks 3059410 has reached the threshold 1. of total blocks 
3059410. The minimum number
 of live datanodes is not required. In safe mode extension. Safe mode will be 
turned off automatically in 30 seconds.
2020-06-11 18:35:25,036 [Edit log tailer] INFO namenode.FSImage:
 Reading 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@259766e0
 expecting start txid #3427451497
2020-06-11 18:35:25,036 [Edit log tailer] INFO namenode.FSImage: Start loading 
edits file xxx
2020-06-11 18:35:25,036 [Edit log tailer] INFO 
namenode.RedundantEditLogInputStream: Fast-forwarding stream 'xxx'
 to transaction ID 3427451497
2020-06-11 18:35:25,060 [Edit log tailer] INFO namenode.FSImage: Loaded 1 edits 
file(s) (the last named
 xxx of total size 19024.0, total edits 124.0, total load time 25.0 ms
2020-06-11 18:35:39,868 
[org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerSafeMode$SafeModeMonitor@6d4a65c6]
 INFO hdfs.StateChange: STATE* Safe mode ON, in safe mode extension.
The reported blocks 3059416 needs additional 1 blocks to reach the threshold 
1. of total blocks 3059417.
The minimum number of live datanodes is not required. In safe mode extension.
 Safe mode will be turned off automatically in 9 seconds.
2020-06-11 18:35:59,873 
[org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerSafeMode$SafeModeMonitor@6d4a65c6]
 INFO hdfs.StateChange: STATE* Safe mode ON, thresholds not met.
The reported blocks 3059416 needs additional 1 blocks to reach the threshold 
1. of total blocks 3059417.
The minimum number of live datanodes is not required. In safe mode extension.
 Safe mode will be turned off automatically in -10 seconds.
2020-06-11 18:36:19,880 
[org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerSafeMode$SafeModeMonitor@6d4a65c6]
 INFO hdfs.StateChange: STATE* Safe mode ON, thresholds not met.
The reported blocks 3059416 needs additional 1 blocks to reach the threshold 
1. of total blocks 3059417.
The minimum number of live datanodes is not required. In safe mode extension.
 Safe mode will be turned off automatically in -30 seconds.
2020-06-11 18:36:39,888 
[org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerSafeMode$SafeModeMonitor@6d4a65c6]
 INFO hdfs.StateChange: STATE* Safe mode ON, thresholds not met.
The reported blocks 3059416 needs additional 1 blocks to reach the threshold 
1. of total blocks 3059417.
{noformat}

The time in extension indefinitely grows negatively and the additionally 
required blocks increase as more IBRs leak.  You can force it out of safe mode, 
but the leak continues until a HA transition.

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Priority: Critical
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-19 Thread Kihwal Lee (Jira)
Kihwal Lee created HDFS-15421:
-

 Summary: IBR leak causes standby NN to be stuck in safe mode
 Key: HDFS-15421
 URL: https://issues.apache.org/jira/browse/HDFS-15421
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Kihwal Lee


After HDFS-14941, update of the global gen stamp is delayed in certain 
situations.  This makes the last set of incremental block reports from append 
"from future", which causes it to be simply re-queued to the pending DN message 
queue, rather than processed to complete the block.  The last set of IBRs will 
leak and never cleaned until it transitions to active.  The size of 
{{pendingDNMessages}} constantly grows until then.

If a leak happens while in a startup safe mode, the namenode will never be able 
to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15415) Reduce locking in Datanode DirectoryScanner

2020-06-19 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140477#comment-17140477
 ] 

Stephen O'Donnell commented on HDFS-15415:
--

I have annotated the main loop the DirectoryScanner runs under the datanode 
lock below, with commends starting "SOD:". Its important to keep in mind that 
this compare phase creates a list of differences. These differences are then 
checked again block by block under the datanode lock in the reconcile step. 
Some "incorrect" differences are likely to be recorded even under the lock, as 
scanning the disks will take time. This scan is performed outside of the lock, 
so the DN could be appending, deleting and adding blocks during this time. 
Therefore if some more changes happen when comparing the disk results to the 
in-memory blocks, it is not a big deal. They will get re-check and resolved 
during the reconcile step.

I have a small concern that if the disk balancer is running and moving blocks 
around it could cause more differences. However I don't see any protection 
against that when scanning the volumes either, so a block could potentially be 
counted on vol1, moved to vol2 and then counted again.

Overall, I feel it is safe to limit the lock to to be around the call to 
`dataset.getSortedFinalizedBlocks(bpid)` only.

Please let me know if anyone thinks that is wrong, or I am missing something 
obvious.


{code}

// Hold FSDataset lock to prevent further changes to the block map
try (AutoCloseableLock lock = dataset.acquireDatasetLock()) {
  for (final String bpid : blockPoolReport.getBlockPoolIds()) {
List blockpoolReport = blockPoolReport.getScanInfo(bpid);

Stats statsRecord = new Stats(bpid);
stats.put(bpid, statsRecord);
Collection diffRecord = new ArrayList<>();

statsRecord.totalBlocks = blockpoolReport.size();
// Need to hold a lock here to prevent the replica map changing
final List bl = dataset.getSortedFinalizedBlocks(bpid);

// SOD: After here, were have a "snapshot" of the replicas that were in 
the
// replica map. It doesn't really matter if those replicas change or
// not as we go through the checks, as we are working off the snapshot.
// The in-memory version will have diverged from the on-disk details as
// the disk is scanned anyway.

int d = 0; // index for blockpoolReport
int m = 0; // index for memReprot
while (m < bl.size() && d < blockpoolReport.size()) {
  ReplicaInfo memBlock = bl.get(m);
  ScanInfo info = blockpoolReport.get(d);
  // SOD: This block is safe to run outside of the lock
  if (info.getBlockId() < memBlock.getBlockId()) {
// SOD: isDeletingBlock() is a synchronized method, so we don't 
need a
// lock to check it.
if (!dataset.isDeletingBlock(bpid, info.getBlockId())) {
  // Block is missing in memory
  statsRecord.missingMemoryBlocks++;
  addDifference(diffRecord, statsRecord, info);
}
d++;
continue;
  }
  // SOD: This is safe outside the lock
  if (info.getBlockId() > memBlock.getBlockId()) {
// Block is missing on the disk
addDifference(diffRecord, statsRecord, memBlock.getBlockId(),
info.getVolume());
m++;
continue;
  }
  // Block file and/or metadata file exists on the disk
  // Block exists in memory
  // SOD: This branch looks safe
  if (info.getVolume().getStorageType() != StorageType.PROVIDED
  && info.getBlockFile() == null) {
// Block metadata file exits and block file is missing
addDifference(diffRecord, statsRecord, info);
  // SOD: This if we don't have a lock, an append or truncate could 
alter the
  // block length or gen stamp. However, these could already have 
changed
  // as the disk was scanned. Therefore I believe it is safe to do this
  // outside the lock. Worst case we gather some extra differences, but
  // they get handled in the reconcile step.
  } else if (info.getGenStamp() != memBlock.getGenerationStamp()
  || info.getBlockLength() != memBlock.getNumBytes()) {
// Block metadata file is missing or has wrong generation stamp,
// or block file length is different than expected
statsRecord.mismatchBlocks++;
addDifference(diffRecord, statsRecord, info);
  // SOD: The compareWith method checks the expected locations of the  
  // block (ie vol/subdir/subdir/blk_ with what was found on the 
disk  
  // scan. This section is a concern, as the disk balancer could move
  // a block and then this change would log a difference. Again 

[jira] [Commented] (HDFS-15415) Reduce locking in Datanode DirectoryScanner

2020-06-19 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140397#comment-17140397
 ] 

Stephen O'Donnell commented on HDFS-15415:
--

Uploaded initial patch to remove the unnecessary sort. In doing this I renamed 
FsDatasetSpi#getFinalizedBlocks to getSortedFinalizedBlocks and added a unit 
test to ensure it returns a sorted list. This is to ensure that any future 
change to the datanode internal block map does not break the sorting somehow.

I still need to look at the logic performed under the lock to see if we can 
reduce the scope of the lock safely.

> Reduce locking in Datanode DirectoryScanner
> ---
>
> Key: HDFS-15415
> URL: https://issues.apache.org/jira/browse/HDFS-15415
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15415.001.patch
>
>
> In HDFS-15406, we have a small change to greatly reduce the runtime and 
> locking time of the datanode DirectoryScanner. They may be room for further 
> improvement here:
> 1. These lines of code in DirectoryScanner#scan(), obtain a snapshot of the 
> finalized blocks from memory, and then sort them, under the DN lock. However 
> the blocks are stored in a sorted structure (FoldedTreeSet) and hence the 
> sort should be unnecessary.
> {code}
>   final List bl = dataset.getFinalizedBlocks(bpid);
>   Collections.sort(bl); // Sort based on blockId
> {code}
> 2.  From the scan step, we have captured a snapshot of what is on disk. After 
> calling `dataset.getFinalizedBlocks(bpid);` as above we have taken a snapshot 
> of in memory. The two snapshots are never 100% in sync as things are always 
> changing as the disk is scanned.
> We are only comparing finalized blocks, so they should not really change:
> * If a block is deleted after our snapshot, our snapshot will not see it and 
> that is OK.
> * A finalized block could be appended. If that happens both the genstamp and 
> length will change, but that should be handled by reconcile when it calls 
> `FSDatasetImpl.checkAndUpdate()`, and there is nothing stopping blocks being 
> appended after they have been scanned from disk, but before they have been 
> compared with memory.
> My suspicion is that we can do all the comparison work outside of the lock 
> and checkAndUpdate() re-checks any differences later under the lock on a 
> block by block basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15415) Reduce locking in Datanode DirectoryScanner

2020-06-19 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-15415:
-
Attachment: HDFS-15415.001.patch

> Reduce locking in Datanode DirectoryScanner
> ---
>
> Key: HDFS-15415
> URL: https://issues.apache.org/jira/browse/HDFS-15415
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15415.001.patch
>
>
> In HDFS-15406, we have a small change to greatly reduce the runtime and 
> locking time of the datanode DirectoryScanner. They may be room for further 
> improvement here:
> 1. These lines of code in DirectoryScanner#scan(), obtain a snapshot of the 
> finalized blocks from memory, and then sort them, under the DN lock. However 
> the blocks are stored in a sorted structure (FoldedTreeSet) and hence the 
> sort should be unnecessary.
> {code}
>   final List bl = dataset.getFinalizedBlocks(bpid);
>   Collections.sort(bl); // Sort based on blockId
> {code}
> 2.  From the scan step, we have captured a snapshot of what is on disk. After 
> calling `dataset.getFinalizedBlocks(bpid);` as above we have taken a snapshot 
> of in memory. The two snapshots are never 100% in sync as things are always 
> changing as the disk is scanned.
> We are only comparing finalized blocks, so they should not really change:
> * If a block is deleted after our snapshot, our snapshot will not see it and 
> that is OK.
> * A finalized block could be appended. If that happens both the genstamp and 
> length will change, but that should be handled by reconcile when it calls 
> `FSDatasetImpl.checkAndUpdate()`, and there is nothing stopping blocks being 
> appended after they have been scanned from disk, but before they have been 
> compared with memory.
> My suspicion is that we can do all the comparison work outside of the lock 
> and checkAndUpdate() re-checks any differences later under the lock on a 
> block by block basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15419) RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval

2020-06-19 Thread bhji123 (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140300#comment-17140300
 ] 

bhji123 edited comment on HDFS-15419 at 6/19/20, 7:33 AM:
--

Yes, router is just a proxy, but it's also a server.

Clients can decide whether wait/retry or not. But not all clients are so 
clever, especially when there is a variety of different clients.

For those not that smart clients, this pr is very useful. For those very smart 
clients who don't want router to retry, it's ok too because now router retry is 
configurable.


was (Author: bhji123):
Yes, router is just a proxy, and it's also a server.

Clients can decide whether wait/retry or not. But not all clients are so 
clever, especially when there is a variety of different clients.

For those not that smart clients, this pr is very useful. For those very smart 
clients who don't want router to retry, it's ok too because now router retry is 
configurable.

> RBF: Router should retry communicate with NN when cluster is unavailable 
> using configurable time interval
> -
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration, hdfs-client, rbf
>Reporter: bhji123
>Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry 
> **communicate with NN using configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15419) RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval

2020-06-19 Thread bhji123 (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140300#comment-17140300
 ] 

bhji123 commented on HDFS-15419:


Yes, router is just a proxy, and it's also a server.

Clients can decide whether wait/retry or not. But not all clients are so 
clever, especially when there is a variety of different clients.

For those not that smart clients, this pr is very useful. For those very smart 
clients who don't want router to retry, it's ok too because now router retry is 
configurable.

> RBF: Router should retry communicate with NN when cluster is unavailable 
> using configurable time interval
> -
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration, hdfs-client, rbf
>Reporter: bhji123
>Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry 
> **communicate with NN using configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15410) Add separated config file fedbalance-default.xml for fedbalance tool

2020-06-19 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140299#comment-17140299
 ] 

Hadoop QA commented on HDFS-15410:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
35s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 11s{color} | {color:orange} hadoop-tools/hadoop-federation-balance: The 
patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
17s{color} | {color:green} hadoop-federation-balance in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 69m 49s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-HDFS-Build/29440/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-15410 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13006034/HDFS-15410.002.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle xml |
| uname | Linux 8231b6a1b035 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 
10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git 

[jira] [Issue Comment Deleted] (HDFS-15419) RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval

2020-06-19 Thread bhji123 (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bhji123 updated HDFS-15419:
---
Comment: was deleted

(was: Yes, but clients may not configured appropriately. But if router can 
retry too, it will be more reliable.)

> RBF: Router should retry communicate with NN when cluster is unavailable 
> using configurable time interval
> -
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration, hdfs-client, rbf
>Reporter: bhji123
>Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry 
> **communicate with NN using configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15419) RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval

2020-06-19 Thread bhji123 (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140283#comment-17140283
 ] 

bhji123 commented on HDFS-15419:


Yes, but clients may not configured appropriately. But if router can retry too, 
it will be more reliable.

> RBF: Router should retry communicate with NN when cluster is unavailable 
> using configurable time interval
> -
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration, hdfs-client, rbf
>Reporter: bhji123
>Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry 
> **communicate with NN using configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15419) RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval

2020-06-19 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140277#comment-17140277
 ] 

Yuxuan Wang commented on HDFS-15419:


[~ayushtkn] Thanks for your reply.
IIRC, now router will retry not only when catch StandbyException, but also some 
other exception like ConnectionTimeoutExcetpion
IMO, we can improve the retry policy in router at least.
And I think add more retry is not a good work in this jira.

> RBF: Router should retry communicate with NN when cluster is unavailable 
> using configurable time interval
> -
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration, hdfs-client, rbf
>Reporter: bhji123
>Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry 
> **communicate with NN using configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15419) RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval

2020-06-19 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140258#comment-17140258
 ] 

Ayush Saxena edited comment on HDFS-15419 at 6/19/20, 6:48 AM:
---

The present code is to have failover is because the router maintains the 
active/standby state of the namenode, in case if there is a change in roles of 
namenode which is different to that stored in Router, the router will failover 
and update the state. That way present code seems OK, Removal of that isn't 
required, If we remove that, in case a failover happens the router will keep on 
rejecting calls based on the old states in cache until the heartbeat updates. 
The present retry logic is to just ensure if there is an active namenode then 
it gets the call. If the router couldn't find it, It doesn't hold it. Then the 
client can decide whether to retry or not. 

I am not sure but if as proposed here, the router does a full retry like normal 
client, in worse situations the actual client may timeout. For the actual call 
it sent just one call and it is stuck at server, it won't be aware that the 
router is retrying to different namenodes and stuff

 

Well IIRC we even had a logic added in router for the purpose of retry 
recently, that amongst all the exceptions received from the several Namespaces 
if one exception is retriable that only would get propagated so as client can 
retry.


was (Author: ayushtkn):
The present code is to have failover is because the router maintains the 
active/standby state of the namenode, in case if there is a change in roles of 
namenode which is different to that stored in Router, the router will failover 
and update the state. That way present code seems OK, Removal of that isn't 
required, If we remove that, in case a failover happens the router will keep on 
rejecting calls based on the old states in cache until the heartbeat updates. 
The present retry logic is to just ensure if there is an active namenode then 
it gets the call. If the router couldn't find it, It doesn't hold it. Then the 
client can decide whether to retry or not. 

I am not sure but if as proposed here, the router does a full retry like normal 
client, in worse situations the actual client may timeout. For the actual call 
it sent just one call and it is stuck at server, it won't be aware that the 
router is retrying to different namenodes and stuff

> RBF: Router should retry communicate with NN when cluster is unavailable 
> using configurable time interval
> -
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration, hdfs-client, rbf
>Reporter: bhji123
>Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry 
> **communicate with NN using configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15419) RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval

2020-06-19 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140258#comment-17140258
 ] 

Ayush Saxena commented on HDFS-15419:
-

The present code is to have failover is because the router maintains the 
active/standby state of the namenode, in case if there is a change in roles of 
namenode which is different to that stored in Router, the router will failover 
and update the state. That way present code seems OK, Removal of that isn't 
required, If we remove that, in case a failover happens the router will keep on 
rejecting calls based on the old states in cache until the heartbeat updates. 
The present retry logic is to just ensure if there is an active namenode then 
it gets the call. If the router couldn't find it, It doesn't hold it. Then the 
client can decide whether to retry or not. 

I am not sure but if as proposed here, the router does a full retry like normal 
client, in worse situations the actual client may timeout. For the actual call 
it sent just one call and it is stuck at server, it won't be aware that the 
router is retrying to different namenodes and stuff

> RBF: Router should retry communicate with NN when cluster is unavailable 
> using configurable time interval
> -
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration, hdfs-client, rbf
>Reporter: bhji123
>Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry 
> **communicate with NN using configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15419) RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval

2020-06-19 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140249#comment-17140249
 ] 

Yuxuan Wang commented on HDFS-15419:


[~bhji123]
Well, I more agree with [~ayushtkn]. And I think we should remove the retry 
code currently in router ranther than add more retry to it.
I see [~elgoiri] review the PR. How do you think Saxena's comment?

> RBF: Router should retry communicate with NN when cluster is unavailable 
> using configurable time interval
> -
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration, hdfs-client, rbf
>Reporter: bhji123
>Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry 
> **communicate with NN using configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15419) RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval

2020-06-19 Thread bhji123 (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140242#comment-17140242
 ] 

bhji123 commented on HDFS-15419:


hi, Yuxuan.

In this case, if clients timeout and nn is still unavailable, then clients will 
retry. The difference is router will be more reliable, especially when clients 
not configured appropriately.

> RBF: Router should retry communicate with NN when cluster is unavailable 
> using configurable time interval
> -
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration, hdfs-client, rbf
>Reporter: bhji123
>Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry 
> **communicate with NN using configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15410) Add separated config file fedbalance-default.xml for fedbalance tool

2020-06-19 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140240#comment-17140240
 ] 

Jinglun commented on HDFS-15410:


Hi [~elgoiri], thanks your nice comments ! Refer the fedbalance-site.xml at the 
doc(HDFS-15374 PR).

 

Upload v02 fixing the typo: `fedbalance-site.xml` is mistyped as 
`distcp-site.xml`.

> Add separated config file fedbalance-default.xml for fedbalance tool
> 
>
> Key: HDFS-15410
> URL: https://issues.apache.org/jira/browse/HDFS-15410
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15410.001.patch, HDFS-15410.002.patch
>
>
> Add a separated config file named fedbalance-default.xml for fedbalance tool 
> configs. It's like the ditcp-default.xml for distcp tool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15410) Add separated config file fedbalance-default.xml for fedbalance tool

2020-06-19 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-15410:
---
Attachment: HDFS-15410.002.patch

> Add separated config file fedbalance-default.xml for fedbalance tool
> 
>
> Key: HDFS-15410
> URL: https://issues.apache.org/jira/browse/HDFS-15410
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15410.001.patch, HDFS-15410.002.patch
>
>
> Add a separated config file named fedbalance-default.xml for fedbalance tool 
> configs. It's like the ditcp-default.xml for distcp tool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org