[jira] [Commented] (HDFS-15196) RBF: RouterRpcServer getListing cannot list large dirs correctly

2020-03-02 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049928#comment-17049928
 ] 

Fengnan Li commented on HDFS-15196:
---

[~elgoiri] Without the fix the test will fail. Actually, 
https://issues.apache.org/jira/browse/HDFS-14739 introduced a bug that makes ls 
goes into an possible infinite loop since Mount table point was added as a 
qualified path with parent, making the startAfter always smallest string from 
across all children listings.

For example, in my test, with the ls limit as 5 from namenode. it will first 
return file-0, file-1, file-2, file-3 and file-4. Without the fix, 
/parent/file-7 would be added to the listing, making the next batch listing 
with startAfter as `/parent/file-7` which is even smaller than file-0, thus the 
query sent to downstream namenode will return file-[0-4] again. With this fix 
there won't be such an issue.

I guess there is a reason that the mount point is appended as a dir, but I 
haven't dug too much. After this one I will go there.

 

[~ayushtkn] The result was put into a TreeMap before returning so the order is 
preserved. 

> RBF: RouterRpcServer getListing cannot list large dirs correctly
> 
>
> Key: HDFS-15196
> URL: https://issues.apache.org/jira/browse/HDFS-15196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Critical
> Attachments: HDFS-15196.001.patch, HDFS-15196.002.patch, 
> HDFS-15196.003.patch, HDFS-15196.003.patch, HDFS-15196.004.patch, 
> HDFS-15196.005.patch
>
>
> In RouterRpcServer, getListing function is handled as two parts:
>  # Union all partial listings from destination ns + paths
>  # Append mount points for the dir to be listed
> In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT 
> (with default value 1k), the batch listing will be used and the startAfter 
> will be used to define the boundary of each batch listing. However, step 2 
> here will add existing mount points, which will mess up with the boundary of 
> the batch, thus making the next batch startAfter wrong.
> The fix is just to append the mount points when there is no more batch query 
> necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15200) Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage

2020-03-02 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049899#comment-17049899
 ] 

Akira Ajisaka commented on HDFS-15200:
--

Thanks [~ayushtkn] for the report and thanks [~surendrasingh] for pinging me.

I think there is a rare situation that some admin would like to get the data of 
the corrupt replica by accessing the local disk of the DataNode. That way the 
admin can understand what the corrupt replica is and how to fix the corrupt 
data.

Therefore I would like to make the behavior an option and delete the corrupt 
replicas immediately by default.

> Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage 
> -
>
> Key: HDFS-15200
> URL: https://issues.apache.org/jira/browse/HDFS-15200
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>
> Presently {{invalidateBlock(..)}} before adding a replica into invalidates, 
> checks whether any  block replica is on stale storage, if any replica is on 
> stale storage, it postpones deletion of the replica.
> Here :
> {code:java}
>// Check how many copies we have of the block
> if (nr.replicasOnStaleNodes() > 0) {
>   blockLog.debug("BLOCK* invalidateBlocks: postponing " +
>   "invalidation of {} on {} because {} replica(s) are located on " +
>   "nodes with potentially out-of-date block reports", b, dn,
>   nr.replicasOnStaleNodes());
>   postponeBlock(b.getCorrupted());
>   return false;
> {code}
>  
> In case of corrupt replica, we can skip this logic and delete the corrupt 
> replica immediately, as a corrupt replica can't get corrected.
> One outcome of this behavior presently is namenodes showing different block 
> states post failover, as:
> If a replica is marked corrupt, the Active NN, will mark it as corrupt, and 
> mark it for deletion and remove it from corruptReplica's and  
> excessRedundancyMap.
> If before the deletion of replica, Failover happens.
> The standby Namenode will mark all the storages as stale.
> Then will start processing IBR's, Now since the replica's would be on stale 
> storage, it will skip deletion, and removal from corruptReplica's
> Hence both the namenode will show different numbers and different corrupt 
> replicas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15196) RBF: RouterRpcServer getListing cannot list large dirs correctly

2020-03-02 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049898#comment-17049898
 ] 

Ayush Saxena commented on HDFS-15196:
-

On a thought, the namenode returns entries in sorted order, if we put mount 
entries at end. the end result with mount entries won't be sorted.
Is that acceptable?

> RBF: RouterRpcServer getListing cannot list large dirs correctly
> 
>
> Key: HDFS-15196
> URL: https://issues.apache.org/jira/browse/HDFS-15196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Critical
> Attachments: HDFS-15196.001.patch, HDFS-15196.002.patch, 
> HDFS-15196.003.patch, HDFS-15196.003.patch, HDFS-15196.004.patch, 
> HDFS-15196.005.patch
>
>
> In RouterRpcServer, getListing function is handled as two parts:
>  # Union all partial listings from destination ns + paths
>  # Append mount points for the dir to be listed
> In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT 
> (with default value 1k), the batch listing will be used and the startAfter 
> will be used to define the boundary of each batch listing. However, step 2 
> here will add existing mount points, which will mess up with the boundary of 
> the batch, thus making the next batch startAfter wrong.
> The fix is just to append the mount points when there is no more batch query 
> necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15196) RBF: RouterRpcServer getListing cannot list large dirs correctly

2020-03-02 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049795#comment-17049795
 ] 

Hadoop QA commented on HDFS-15196:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
45s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  1s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m 
33s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 64m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.6 Server=19.03.6 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15196 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12995376/HDFS-15196.005.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f0e6576df8c1 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / edc2e9d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_242 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28885/testReport/ |
| Max. process+thread count | 3309 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28885/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> RBF: RouterRpcServer getListing cannot list large dirs correctly
> 
>
> Key: HDFS-15196
>  

[jira] [Commented] (HDFS-15196) RBF: RouterRpcServer getListing cannot list large dirs correctly

2020-03-02 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049766#comment-17049766
 ] 

Íñigo Goiri commented on HDFS-15196:


Thanks [~fengnanli] for the update, that looks better.
Can you verify that the new test fails without the fix in RouterClientProtocol?

> RBF: RouterRpcServer getListing cannot list large dirs correctly
> 
>
> Key: HDFS-15196
> URL: https://issues.apache.org/jira/browse/HDFS-15196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Critical
> Attachments: HDFS-15196.001.patch, HDFS-15196.002.patch, 
> HDFS-15196.003.patch, HDFS-15196.003.patch, HDFS-15196.004.patch, 
> HDFS-15196.005.patch
>
>
> In RouterRpcServer, getListing function is handled as two parts:
>  # Union all partial listings from destination ns + paths
>  # Append mount points for the dir to be listed
> In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT 
> (with default value 1k), the batch listing will be used and the startAfter 
> will be used to define the boundary of each batch listing. However, step 2 
> here will add existing mount points, which will mess up with the boundary of 
> the batch, thus making the next batch startAfter wrong.
> The fix is just to append the mount points when there is no more batch query 
> necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15111) stopStandbyServices() should log which service state it is transitioning from.

2020-03-02 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049760#comment-17049760
 ] 

Wei-Chiu Chuang commented on HDFS-15111:


I'm sorry i pushed it into branch-2.10 and broke the build. This is now 
reverted.

> stopStandbyServices() should log which service state it is transitioning from.
> --
>
> Key: HDFS-15111
> URL: https://issues.apache.org/jira/browse/HDFS-15111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, logging
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Xieming Li
>Priority: Major
>  Labels: newbie++
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-15111.001.patch, HDFS-15111.002.patch, 
> HDFS-15111.003.patch
>
>
> Trying to transition Observer to Standby state. {{stopStandbyServices()}} 
> logs that it is "Stopping services started for standby state". It should be 
> "Stopping services started for observer state"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15196) RBF: RouterRpcServer getListing cannot list large dirs correctly

2020-03-02 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049746#comment-17049746
 ] 

Fengnan Li commented on HDFS-15196:
---

Thanks for the answer [~elgoiri] Uploaded [^HDFS-15196.005.patch] to resolve 
those comments.

> RBF: RouterRpcServer getListing cannot list large dirs correctly
> 
>
> Key: HDFS-15196
> URL: https://issues.apache.org/jira/browse/HDFS-15196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Critical
> Attachments: HDFS-15196.001.patch, HDFS-15196.002.patch, 
> HDFS-15196.003.patch, HDFS-15196.003.patch, HDFS-15196.004.patch, 
> HDFS-15196.005.patch
>
>
> In RouterRpcServer, getListing function is handled as two parts:
>  # Union all partial listings from destination ns + paths
>  # Append mount points for the dir to be listed
> In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT 
> (with default value 1k), the batch listing will be used and the startAfter 
> will be used to define the boundary of each batch listing. However, step 2 
> here will add existing mount points, which will mess up with the boundary of 
> the batch, thus making the next batch startAfter wrong.
> The fix is just to append the mount points when there is no more batch query 
> necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15196) RBF: RouterRpcServer getListing cannot list large dirs correctly

2020-03-02 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-15196:
--
Attachment: HDFS-15196.005.patch

> RBF: RouterRpcServer getListing cannot list large dirs correctly
> 
>
> Key: HDFS-15196
> URL: https://issues.apache.org/jira/browse/HDFS-15196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Critical
> Attachments: HDFS-15196.001.patch, HDFS-15196.002.patch, 
> HDFS-15196.003.patch, HDFS-15196.003.patch, HDFS-15196.004.patch, 
> HDFS-15196.005.patch
>
>
> In RouterRpcServer, getListing function is handled as two parts:
>  # Union all partial listings from destination ns + paths
>  # Append mount points for the dir to be listed
> In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT 
> (with default value 1k), the batch listing will be used and the startAfter 
> will be used to define the boundary of each batch listing. However, step 2 
> here will add existing mount points, which will mess up with the boundary of 
> the batch, thus making the next batch startAfter wrong.
> The fix is just to append the mount points when there is no more batch query 
> necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15196) RBF: RouterRpcServer getListing cannot list large dirs correctly

2020-03-02 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-15196:
--
Attachment: (was: HDFSHDFS-15196.005.patch)

> RBF: RouterRpcServer getListing cannot list large dirs correctly
> 
>
> Key: HDFS-15196
> URL: https://issues.apache.org/jira/browse/HDFS-15196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Critical
> Attachments: HDFS-15196.001.patch, HDFS-15196.002.patch, 
> HDFS-15196.003.patch, HDFS-15196.003.patch, HDFS-15196.004.patch, 
> HDFS-15196.005.patch
>
>
> In RouterRpcServer, getListing function is handled as two parts:
>  # Union all partial listings from destination ns + paths
>  # Append mount points for the dir to be listed
> In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT 
> (with default value 1k), the batch listing will be used and the startAfter 
> will be used to define the boundary of each batch listing. However, step 2 
> here will add existing mount points, which will mess up with the boundary of 
> the batch, thus making the next batch startAfter wrong.
> The fix is just to append the mount points when there is no more batch query 
> necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15196) RBF: RouterRpcServer getListing cannot list large dirs correctly

2020-03-02 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-15196:
--
Attachment: HDFSHDFS-15196.005.patch

> RBF: RouterRpcServer getListing cannot list large dirs correctly
> 
>
> Key: HDFS-15196
> URL: https://issues.apache.org/jira/browse/HDFS-15196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Critical
> Attachments: HDFS-15196.001.patch, HDFS-15196.002.patch, 
> HDFS-15196.003.patch, HDFS-15196.003.patch, HDFS-15196.004.patch, 
> HDFSHDFS-15196.005.patch
>
>
> In RouterRpcServer, getListing function is handled as two parts:
>  # Union all partial listings from destination ns + paths
>  # Append mount points for the dir to be listed
> In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT 
> (with default value 1k), the batch listing will be used and the startAfter 
> will be used to define the boundary of each batch listing. However, step 2 
> here will add existing mount points, which will mess up with the boundary of 
> the batch, thus making the next batch startAfter wrong.
> The fix is just to append the mount points when there is no more batch query 
> necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14977) Quota Usage and Content summary are not same in Truncate with Snapshot

2020-03-02 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049707#comment-17049707
 ] 

Íñigo Goiri commented on HDFS-14977:


{quote}
Can we remove csSpaceConsumed, qoSpaceConsumed variables and add function call 
in assert like below ?
{quote}
That's fine with me too.

> Quota Usage and Content summary are not same in Truncate with Snapshot 
> ---
>
> Key: HDFS-14977
> URL: https://issues.apache.org/jira/browse/HDFS-14977
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14977.001.patch, HDFS-14977.002.patch
>
>
> steps : hdfs dfs -mkdir /dir
>            hdfs dfs -put file /dir          (file size = 10bytes)
>            hdfs dfsadmin -allowSnapshot /dir
>            hdfs dfs -createSnapshot /dir s1 
> space consumed with Quotausage and Content Summary is 30bytes
>            hdfs dfs -truncate -w 5 /dir/file
> space consumed with Quotausage , Content Summary is 45 bytes
>            hdfs dfs -deleteSnapshot /dir s1
> space consumed with Quotausage is 45bytes and Content Summary is 15bytes 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15196) RBF: RouterRpcServer getListing cannot list large dirs correctly

2020-03-02 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049590#comment-17049590
 ] 

Íñigo Goiri commented on HDFS-15196:


Regarding the Namenode overrides, we have addNamenodeOverrides in 
MiniRouterDFSCluster, I think we can leverage that.

Regarding the test, we are currently just adding the info about the mount point 
for the last page on the listing.
However, we are not checking for that in the test.

> RBF: RouterRpcServer getListing cannot list large dirs correctly
> 
>
> Key: HDFS-15196
> URL: https://issues.apache.org/jira/browse/HDFS-15196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Critical
> Attachments: HDFS-15196.001.patch, HDFS-15196.002.patch, 
> HDFS-15196.003.patch, HDFS-15196.003.patch, HDFS-15196.004.patch
>
>
> In RouterRpcServer, getListing function is handled as two parts:
>  # Union all partial listings from destination ns + paths
>  # Append mount points for the dir to be listed
> In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT 
> (with default value 1k), the batch listing will be used and the startAfter 
> will be used to define the boundary of each batch listing. However, step 2 
> here will add existing mount points, which will mess up with the boundary of 
> the batch, thus making the next batch startAfter wrong.
> The fix is just to append the mount points when there is no more batch query 
> necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15201) SnapshotCounter hits MaxSnapshotID limit

2020-03-02 Thread Karthik Palanisamy (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049588#comment-17049588
 ] 

Karthik Palanisamy commented on HDFS-15201:
---

Cc: [~arp] [~szetszwo]

> SnapshotCounter hits MaxSnapshotID limit
> 
>
> Key: HDFS-15201
> URL: https://issues.apache.org/jira/browse/HDFS-15201
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Major
>
> Users reported that they are unable to take HDFS snapshots and their 
> snapshotCounter hits MaxSnapshotID limit. MaxSnapshotID limit is 16777215.
> {code:java}
> SnapshotManager.java
> private static final int SNAPSHOT_ID_BIT_WIDTH = 24;
> /**
>  * Returns the maximum allowable snapshot ID based on the bit width of the
>  * snapshot ID.
>  *
>  * @return maximum allowable snapshot ID.
>  */
>  public int getMaxSnapshotID() {
>  return ((1 << SNAPSHOT_ID_BIT_WIDTH) - 1);
> }
> {code}
>  
> I think, SNAPSHOT_ID_BIT_WIDTH is too low. May be good idea to increase 
> SNAPSHOT_ID_BIT_WIDTH to 31? to aline with our CURRENT_STATE_ID limit 
> (Integer.MAX_VALUE - 1).
>  
> {code:java}
> /**
>  * This id is used to indicate the current state (vs. snapshots)
>  */
> public static final int CURRENT_STATE_ID = Integer.MAX_VALUE - 1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15198) RBF: In Secure Mode, Router can't refresh other router's mountTableEntries

2020-03-02 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049586#comment-17049586
 ] 

Íñigo Goiri commented on HDFS-15198:


We should do some refactoring instead of just having the same code in multiple 
places.

> RBF: In Secure Mode, Router can't refresh other router's mountTableEntries
> --
>
> Key: HDFS-15198
> URL: https://issues.apache.org/jira/browse/HDFS-15198
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Attachments: HDFS-15198.001.patch, HDFS-15198.002.patch, 
> HDFS-15198.003.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> In issue HDFS-13443, update mount table cache imediately. The specified 
> router update their own mount table cache imediately, then update other's by 
> rpc protocol refreshMountTableEntries. But in secure mode, can't refresh 
> other's router's. In specified router's log, error like this
> {code}
> 2020-02-27 22:59:07,212 WARN org.apache.hadoop.ipc.Client: Exception 
> encountered while connecting to the server : 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> 2020-02-27 22:59:07,213 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.MountTableRefresherThread: 
> Failed to refresh mount table entries cache at router $host:8111
> java.io.IOException: DestHost:destPort host:8111 , LocalHost:localPort 
> $host/$ip:0. Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.refreshMountTableEntries(RouterAdminProtocolTranslatorPB.java:288)
> at 
> org.apache.hadoop.hdfs.server.federation.router.MountTableRefresherThread.run(MountTableRefresherThread.java:65)
> 2020-02-27 22:59:07,214 INFO 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver: Added 
> new mount point /test_11 to resolver
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15200) Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage

2020-03-02 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049552#comment-17049552
 ] 

Ayush Saxena commented on HDFS-15200:
-

Thanx [~weichiu] 
During transition, it is by default the namenode while transitioning from 
standby to active, marks all datanodes as stale, to prevent any block from 
deletion, before it receives BR from the datanodes.
This is to prevent deletion of replica's since replica's on stale storages 
aren't deleted. It is there in {{startActiveServices()}}
{{blockManager.getDatanodeManager().markAllDatanodesStale();}}


> Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage 
> -
>
> Key: HDFS-15200
> URL: https://issues.apache.org/jira/browse/HDFS-15200
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>
> Presently {{invalidateBlock(..)}} before adding a replica into invalidates, 
> checks whether any  block replica is on stale storage, if any replica is on 
> stale storage, it postpones deletion of the replica.
> Here :
> {code:java}
>// Check how many copies we have of the block
> if (nr.replicasOnStaleNodes() > 0) {
>   blockLog.debug("BLOCK* invalidateBlocks: postponing " +
>   "invalidation of {} on {} because {} replica(s) are located on " +
>   "nodes with potentially out-of-date block reports", b, dn,
>   nr.replicasOnStaleNodes());
>   postponeBlock(b.getCorrupted());
>   return false;
> {code}
>  
> In case of corrupt replica, we can skip this logic and delete the corrupt 
> replica immediately, as a corrupt replica can't get corrected.
> One outcome of this behavior presently is namenodes showing different block 
> states post failover, as:
> If a replica is marked corrupt, the Active NN, will mark it as corrupt, and 
> mark it for deletion and remove it from corruptReplica's and  
> excessRedundancyMap.
> If before the deletion of replica, Failover happens.
> The standby Namenode will mark all the storages as stale.
> Then will start processing IBR's, Now since the replica's would be on stale 
> storage, it will skip deletion, and removal from corruptReplica's
> Hence both the namenode will show different numbers and different corrupt 
> replicas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15200) Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage

2020-03-02 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049542#comment-17049542
 ] 

Wei-Chiu Chuang commented on HDFS-15200:


Interesting case ... thanks for digging into this.

bq. The standby Namenode will mark all the storages as stale.
can you explain why sbnn marks all storages stale? is it because sbnn took too 
much time to transition to active and therefore all block reports were lost?


> Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage 
> -
>
> Key: HDFS-15200
> URL: https://issues.apache.org/jira/browse/HDFS-15200
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>
> Presently {{invalidateBlock(..)}} before adding a replica into invalidates, 
> checks whether any  block replica is on stale storage, if any replica is on 
> stale storage, it postpones deletion of the replica.
> Here :
> {code:java}
>// Check how many copies we have of the block
> if (nr.replicasOnStaleNodes() > 0) {
>   blockLog.debug("BLOCK* invalidateBlocks: postponing " +
>   "invalidation of {} on {} because {} replica(s) are located on " +
>   "nodes with potentially out-of-date block reports", b, dn,
>   nr.replicasOnStaleNodes());
>   postponeBlock(b.getCorrupted());
>   return false;
> {code}
>  
> In case of corrupt replica, we can skip this logic and delete the corrupt 
> replica immediately, as a corrupt replica can't get corrected.
> One outcome of this behavior presently is namenodes showing different block 
> states post failover, as:
> If a replica is marked corrupt, the Active NN, will mark it as corrupt, and 
> mark it for deletion and remove it from corruptReplica's and  
> excessRedundancyMap.
> If before the deletion of replica, Failover happens.
> The standby Namenode will mark all the storages as stale.
> Then will start processing IBR's, Now since the replica's would be on stale 
> storage, it will skip deletion, and removal from corruptReplica's
> Hence both the namenode will show different numbers and different corrupt 
> replicas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15201) SnapshotCounter hits MaxSnapshotID limit

2020-03-02 Thread Karthik Palanisamy (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Palanisamy updated HDFS-15201:
--
Description: 
Users reported that they are unable to take HDFS snapshots and their 
snapshotCounter hits MaxSnapshotID limit. MaxSnapshotID limit is 16777215.
{code:java}
SnapshotManager.java

private static final int SNAPSHOT_ID_BIT_WIDTH = 24;

/**
 * Returns the maximum allowable snapshot ID based on the bit width of the
 * snapshot ID.
 *
 * @return maximum allowable snapshot ID.
 */
 public int getMaxSnapshotID() {
 return ((1 << SNAPSHOT_ID_BIT_WIDTH) - 1);
}

{code}
 

I think, SNAPSHOT_ID_BIT_WIDTH is too low. May be good idea to increase 
SNAPSHOT_ID_BIT_WIDTH to 31? to aline with our CURRENT_STATE_ID limit 
(Integer.MAX_VALUE - 1).

 
{code:java}
/**
 * This id is used to indicate the current state (vs. snapshots)
 */
public static final int CURRENT_STATE_ID = Integer.MAX_VALUE - 1;

{code}

  was:
Users reported that they are unable to take take HDFS snapshots and their 
snapshotCounter hits MaxSnapshotID limit. MaxSnapshotID limit is 16777215.

{code}

SnapshotManager.java

private static final int SNAPSHOT_ID_BIT_WIDTH = 24;

/**
 * Returns the maximum allowable snapshot ID based on the bit width of the
 * snapshot ID.
 *
 * @return maximum allowable snapshot ID.
 */
 public int getMaxSnapshotID() {
 return ((1 << SNAPSHOT_ID_BIT_WIDTH) - 1);
}

{code}

 

I think, SNAPSHOT_ID_BIT_WIDTH is too low. May be good idea to increase 
SNAPSHOT_ID_BIT_WIDTH to 31? to aline with our CURRENT_STATE_ID limit 
(Integer.MAX_VALUE - 1).

 

{code}

/**
 * This id is used to indicate the current state (vs. snapshots)
 */
public static final int CURRENT_STATE_ID = Integer.MAX_VALUE - 1;

{code}


> SnapshotCounter hits MaxSnapshotID limit
> 
>
> Key: HDFS-15201
> URL: https://issues.apache.org/jira/browse/HDFS-15201
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Major
>
> Users reported that they are unable to take HDFS snapshots and their 
> snapshotCounter hits MaxSnapshotID limit. MaxSnapshotID limit is 16777215.
> {code:java}
> SnapshotManager.java
> private static final int SNAPSHOT_ID_BIT_WIDTH = 24;
> /**
>  * Returns the maximum allowable snapshot ID based on the bit width of the
>  * snapshot ID.
>  *
>  * @return maximum allowable snapshot ID.
>  */
>  public int getMaxSnapshotID() {
>  return ((1 << SNAPSHOT_ID_BIT_WIDTH) - 1);
> }
> {code}
>  
> I think, SNAPSHOT_ID_BIT_WIDTH is too low. May be good idea to increase 
> SNAPSHOT_ID_BIT_WIDTH to 31? to aline with our CURRENT_STATE_ID limit 
> (Integer.MAX_VALUE - 1).
>  
> {code:java}
> /**
>  * This id is used to indicate the current state (vs. snapshots)
>  */
> public static final int CURRENT_STATE_ID = Integer.MAX_VALUE - 1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15201) SnapshotCounter hits MaxSnapshotID limit

2020-03-02 Thread Karthik Palanisamy (Jira)
Karthik Palanisamy created HDFS-15201:
-

 Summary: SnapshotCounter hits MaxSnapshotID limit
 Key: HDFS-15201
 URL: https://issues.apache.org/jira/browse/HDFS-15201
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Karthik Palanisamy
Assignee: Karthik Palanisamy


Users reported that they are unable to take take HDFS snapshots and their 
snapshotCounter hits MaxSnapshotID limit. MaxSnapshotID limit is 16777215.

{code}

SnapshotManager.java

private static final int SNAPSHOT_ID_BIT_WIDTH = 24;

/**
 * Returns the maximum allowable snapshot ID based on the bit width of the
 * snapshot ID.
 *
 * @return maximum allowable snapshot ID.
 */
 public int getMaxSnapshotID() {
 return ((1 << SNAPSHOT_ID_BIT_WIDTH) - 1);
}

{code}

 

I think, SNAPSHOT_ID_BIT_WIDTH is too low. May be good idea to increase 
SNAPSHOT_ID_BIT_WIDTH to 31? to aline with our CURRENT_STATE_ID limit 
(Integer.MAX_VALUE - 1).

 

{code}

/**
 * This id is used to indicate the current state (vs. snapshots)
 */
public static final int CURRENT_STATE_ID = Integer.MAX_VALUE - 1;

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15200) Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage

2020-03-02 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049202#comment-17049202
 ] 

Surendra Singh Lilhore commented on HDFS-15200:
---

I feel we can delete corrupt replica because no chance of getting corrected it. 
As stale storage replica will be reported live in next BR, hopefully :).

[~arp], [~aajisaka], [~weichiu]  any thought on this ?

> Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage 
> -
>
> Key: HDFS-15200
> URL: https://issues.apache.org/jira/browse/HDFS-15200
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>
> Presently {{invalidateBlock(..)}} before adding a replica into invalidates, 
> checks whether any  block replica is on stale storage, if any replica is on 
> stale storage, it postpones deletion of the replica.
> Here :
> {code:java}
>// Check how many copies we have of the block
> if (nr.replicasOnStaleNodes() > 0) {
>   blockLog.debug("BLOCK* invalidateBlocks: postponing " +
>   "invalidation of {} on {} because {} replica(s) are located on " +
>   "nodes with potentially out-of-date block reports", b, dn,
>   nr.replicasOnStaleNodes());
>   postponeBlock(b.getCorrupted());
>   return false;
> {code}
>  
> In case of corrupt replica, we can skip this logic and delete the corrupt 
> replica immediately, as a corrupt replica can't get corrected.
> One outcome of this behavior presently is namenodes showing different block 
> states post failover, as:
> If a replica is marked corrupt, the Active NN, will mark it as corrupt, and 
> mark it for deletion and remove it from corruptReplica's and  
> excessRedundancyMap.
> If before the deletion of replica, Failover happens.
> The standby Namenode will mark all the storages as stale.
> Then will start processing IBR's, Now since the replica's would be on stale 
> storage, it will skip deletion, and removal from corruptReplica's
> Hence both the namenode will show different numbers and different corrupt 
> replicas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15200) Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage

2020-03-02 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049161#comment-17049161
 ] 

Ayush Saxena commented on HDFS-15200:
-

[~surendrasingh] [~elgoiri] Thoughts on this?

> Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage 
> -
>
> Key: HDFS-15200
> URL: https://issues.apache.org/jira/browse/HDFS-15200
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>
> Presently {{invalidateBlock(..)}} before adding a replica into invalidates, 
> checks whether any  block replica is on stale storage, if any replica is on 
> stale storage, it postpones deletion of the replica.
> Here :
> {code:java}
>// Check how many copies we have of the block
> if (nr.replicasOnStaleNodes() > 0) {
>   blockLog.debug("BLOCK* invalidateBlocks: postponing " +
>   "invalidation of {} on {} because {} replica(s) are located on " +
>   "nodes with potentially out-of-date block reports", b, dn,
>   nr.replicasOnStaleNodes());
>   postponeBlock(b.getCorrupted());
>   return false;
> {code}
>  
> In case of corrupt replica, we can skip this logic and delete the corrupt 
> replica immediately, as a corrupt replica can't get corrected.
> One outcome of this behavior presently is namenodes showing different block 
> states post failover, as:
> If a replica is marked corrupt, the Active NN, will mark it as corrupt, and 
> mark it for deletion and remove it from corruptReplica's and  
> excessRedundancyMap.
> If before the deletion of replica, Failover happens.
> The standby Namenode will mark all the storages as stale.
> Then will start processing IBR's, Now since the replica's would be on stale 
> storage, it will skip deletion, and removal from corruptReplica's
> Hence both the namenode will show different numbers and different corrupt 
> replicas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15200) Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage

2020-03-02 Thread Ayush Saxena (Jira)
Ayush Saxena created HDFS-15200:
---

 Summary: Delete Corrupt Replica Immediately Irrespective of 
Replicas On Stale Storage 
 Key: HDFS-15200
 URL: https://issues.apache.org/jira/browse/HDFS-15200
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ayush Saxena
Assignee: Ayush Saxena


Presently {{invalidateBlock(..)}} before adding a replica into invalidates, 
checks whether any  block replica is on stale storage, if any replica is on 
stale storage, it postpones deletion of the replica.
Here :
{code:java}
   // Check how many copies we have of the block
if (nr.replicasOnStaleNodes() > 0) {
  blockLog.debug("BLOCK* invalidateBlocks: postponing " +
  "invalidation of {} on {} because {} replica(s) are located on " +
  "nodes with potentially out-of-date block reports", b, dn,
  nr.replicasOnStaleNodes());
  postponeBlock(b.getCorrupted());
  return false;
{code}
 
In case of corrupt replica, we can skip this logic and delete the corrupt 
replica immediately, as a corrupt replica can't get corrected.

One outcome of this behavior presently is namenodes showing different block 
states post failover, as:
If a replica is marked corrupt, the Active NN, will mark it as corrupt, and 
mark it for deletion and remove it from corruptReplica's and  
excessRedundancyMap.
If before the deletion of replica, Failover happens.
The standby Namenode will mark all the storages as stale.
Then will start processing IBR's, Now since the replica's would be on stale 
storage, it will skip deletion, and removal from corruptReplica's
Hence both the namenode will show different numbers and different corrupt 
replicas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org