[jira] [Updated] (HDFS-15266) Add missing DFSOps Statistics in WebHDFS

2020-04-13 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15266:

Attachment: HDFS-15266-02.patch

> Add missing DFSOps Statistics in WebHDFS
> 
>
> Key: HDFS-15266
> URL: https://issues.apache.org/jira/browse/HDFS-15266
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15266-01.patch, HDFS-15266-02.patch
>
>
> Couple of operations doesn't increment the count of number of read/write ops 
> and DFSOpsCountStatistics
> like : getStoragePolicy



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15274) NN doesn't remove the blocks from the failed DatanodeStorageInfo

2020-04-13 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082808#comment-17082808
 ] 

Xiaoqiao He commented on HDFS-15274:


Hi [~marvelrock], Thanks for involve me here. Trigger Jenkins manually, please 
reference to https://builds.apache.org/job/PreCommit-HDFS-Build/29155/
Sorry I am not very familiar for this feature. cc 
[~cnauroth],[~weichiu],[~elgoiri] would you like to take reviews. Thanks.

> NN doesn't remove the blocks from the failed DatanodeStorageInfo
> 
>
> Key: HDFS-15274
> URL: https://issues.apache.org/jira/browse/HDFS-15274
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
> Attachments: HDFS-15274.001.patch, HDFS-15274.002.patch
>
>
> In our federation cluster, we found there were some inconsistency failure 
> volumes between two namespaces. The following logs are two NS separately.
> NS1 received the failed storage info and removed the blocks associated with 
> the failed storage.
> {code:java}
> [INFO] [IPC Server handler 76 on 8021] : Number of failed storages changes 
> from 0 to 1
> [INFO] [IPC Server handler 76 on 8021] : 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:NORMAL:X.X.X.X:50010:/data0/dfs 
> failed.
> [INFO] 
> [org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@4fb57fb3]
>  : Removed blocks associated with storage 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs 
> from DataNode X.X.X.X:50010
> [INFO] [IPC Server handler 73 on 8021] : Removed storage 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs 
> from DataNode X.X.X.X:50010{code}
> NS2 just received the failed storage.
> {code:java}
> [INFO] [IPC Server handler 87 on 8021] : Number of failed storages changes 
> from 0 to 1  {code}
>  
> After digging into the code and trying to simulate disk failed with
> {code:java}
> echo offline > /sys/block/sda/device/state
> echo 1 > /sys/block/sda/device/delete
> # re-mount the failed disk
> rescan-scsi-bus.sh -a
> systemctl daemon-reload
> mount /data0
> {code}
> I found the root reason is the inconsistency between StorageReport and 
> VolumeFailureSummary in BPServiceActor#sendHeartBeat.
> {code}
> StorageReport[] reports =
> dn.getFSDataset().getStorageReports(bpos.getBlockPoolId());
>   ..
>   // the DISK may FAILED before executing the next line
> VolumeFailureSummary volumeFailureSummary = dn.getFSDataset()
> .getVolumeFailureSummary();
> int numFailedVolumes = volumeFailureSummary != null ?
> volumeFailureSummary.getFailedStorageLocations().length : 0;
> {code} 
> I improved the tolerance in NN DatanodeDescriptor#updateStorageStats to solve 
> this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15217) Add more information to longest write/read lock held log

2020-04-13 Thread Toshihiro Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081309#comment-17081309
 ] 

Toshihiro Suzuki edited comment on HDFS-15217 at 4/14/20, 1:10 AM:
---

I created a PR for this. After applying this patch, we can see additional 
information in the lock report message as follows:
{code:java}
2020-04-11 23:04:36,020 [IPC Server handler 5 on default port 62641] INFO  
namenode.FSNamesystem (FSNamesystemLock.java:writeUnlock(321)) - Number of 
suppressed write-lock reports: 0
Longest write-lock held at 2020-04-11 23:04:36,020+0900 for 3ms by 
delete (ugi=bob (auth:SIMPLE),ip=/127.0.0.1,src=/file,dst=null,perm=null) via 
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:302)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:261)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1746)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3274)
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1130)
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:724)
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1016)
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:944)
java.security.AccessController.doPrivileged(Native Method)
javax.security.auth.Subject.doAs(Subject.java:422)
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2948)

Total suppressed write-lock held time: 0.0

{code}
This patch adds the additional information *"by delete (ugi=bob 
(auth:SIMPLE),ip=/127.0.0.1,src=/file,dst=null,perm=null)"* which is similar to 
the audit log format.


was (Author: brfrn169):
I created a PR for this. After this patch, we can see additional information in 
the lock report message as follows:
{code:java}
2020-04-11 23:04:36,020 [IPC Server handler 5 on default port 62641] INFO  
namenode.FSNamesystem (FSNamesystemLock.java:writeUnlock(321)) - Number of 
suppressed write-lock reports: 0
Longest write-lock held at 2020-04-11 23:04:36,020+0900 for 3ms by 
delete (ugi=bob (auth:SIMPLE),ip=/127.0.0.1,src=/file,dst=null,perm=null) via 
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:302)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:261)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1746)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3274)
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1130)
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:724)
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1016)
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:944)
java.security.AccessController.doPrivileged(Native Method)
javax.security.auth.Subject.doAs(Subject.java:422)
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2948)

Total suppressed write-lock held time: 0.0

{code}
This patch adds the additional information *"by delete (ugi=bob 
(auth:SIMPLE),ip=/127.0.0.1,src=/file,dst=null,perm=null)"* which is similar to 
the audit log format.

> Add more information to longest write/read lock held log
> 
>
> Key: HDFS-15217
> URL: https://issues.apache.org/jira/browse/HDFS-15217
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
>
> Currently, we can see the stack trace in the longest write/read lock held 
> log, but 

[jira] [Commented] (HDFS-15274) NN doesn't remove the blocks from the failed DatanodeStorageInfo

2020-04-13 Thread HuangTao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082766#comment-17082766
 ] 

HuangTao commented on HDFS-15274:
-

cc [~hexiaoqiao] could you help me trigger jenkins again?

> NN doesn't remove the blocks from the failed DatanodeStorageInfo
> 
>
> Key: HDFS-15274
> URL: https://issues.apache.org/jira/browse/HDFS-15274
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
> Attachments: HDFS-15274.001.patch, HDFS-15274.002.patch
>
>
> In our federation cluster, we found there were some inconsistency failure 
> volumes between two namespaces. The following logs are two NS separately.
> NS1 received the failed storage info and removed the blocks associated with 
> the failed storage.
> {code:java}
> [INFO] [IPC Server handler 76 on 8021] : Number of failed storages changes 
> from 0 to 1
> [INFO] [IPC Server handler 76 on 8021] : 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:NORMAL:X.X.X.X:50010:/data0/dfs 
> failed.
> [INFO] 
> [org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@4fb57fb3]
>  : Removed blocks associated with storage 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs 
> from DataNode X.X.X.X:50010
> [INFO] [IPC Server handler 73 on 8021] : Removed storage 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs 
> from DataNode X.X.X.X:50010{code}
> NS2 just received the failed storage.
> {code:java}
> [INFO] [IPC Server handler 87 on 8021] : Number of failed storages changes 
> from 0 to 1  {code}
>  
> After digging into the code and trying to simulate disk failed with
> {code:java}
> echo offline > /sys/block/sda/device/state
> echo 1 > /sys/block/sda/device/delete
> # re-mount the failed disk
> rescan-scsi-bus.sh -a
> systemctl daemon-reload
> mount /data0
> {code}
> I found the root reason is the inconsistency between StorageReport and 
> VolumeFailureSummary in BPServiceActor#sendHeartBeat.
> {code}
> StorageReport[] reports =
> dn.getFSDataset().getStorageReports(bpos.getBlockPoolId());
>   ..
>   // the DISK may FAILED before executing the next line
> VolumeFailureSummary volumeFailureSummary = dn.getFSDataset()
> .getVolumeFailureSummary();
> int numFailedVolumes = volumeFailureSummary != null ?
> volumeFailureSummary.getFailedStorageLocations().length : 0;
> {code} 
> I improved the tolerance in NN DatanodeDescriptor#updateStorageStats to solve 
> this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15266) Add missing DFSOps Statistics in WebHDFS

2020-04-13 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082762#comment-17082762
 ] 

Hadoop QA commented on HDFS-15266:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
0s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
14s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  3m 14s{color} 
| {color:red} hadoop-hdfs-project generated 5 new + 743 unchanged - 5 fixed = 
748 total (was 748) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 51s{color} | {color:orange} hadoop-hdfs-project: The patch generated 5 new + 
155 unchanged - 0 fixed = 160 total (was 155) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  6s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m 52s{color} 
| {color:red} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}105m 
12s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}188m 31s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:e6455cc864d |
| JIRA Issue | HDFS-15266 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12999828/HDFS-15266-01.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a2c7f27832d0 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 3edbe87 |
| maven | 

[jira] [Updated] (HDFS-15266) Add missing DFSOps Statistics in WebHDFS

2020-04-13 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15266:

Status: Patch Available  (was: Open)

> Add missing DFSOps Statistics in WebHDFS
> 
>
> Key: HDFS-15266
> URL: https://issues.apache.org/jira/browse/HDFS-15266
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15266-01.patch
>
>
> Couple of operations doesn't increment the count of number of read/write ops 
> and DFSOpsCountStatistics
> like : getStoragePolicy



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15266) Add missing DFSOps Statistics in WebHDFS

2020-04-13 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082643#comment-17082643
 ] 

Ayush Saxena commented on HDFS-15266:
-

Was trying to sync with {{DistributedFileSystem}}, Seems DFS was missing some 
Ops present in {{WebHdfsFileSystem}}, so added them there too. Hopefully 
haven't missed any...

> Add missing DFSOps Statistics in WebHDFS
> 
>
> Key: HDFS-15266
> URL: https://issues.apache.org/jira/browse/HDFS-15266
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15266-01.patch
>
>
> Couple of operations doesn't increment the count of number of read/write ops 
> and DFSOpsCountStatistics
> like : getStoragePolicy



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15266) Add missing DFSOps Statistics in WebHDFS

2020-04-13 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15266:

Attachment: HDFS-15266-01.patch

> Add missing DFSOps Statistics in WebHDFS
> 
>
> Key: HDFS-15266
> URL: https://issues.apache.org/jira/browse/HDFS-15266
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15266-01.patch
>
>
> Couple of operations doesn't increment the count of number of read/write ops 
> and DFSOpsCountStatistics
> like : getStoragePolicy



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15275) HttpFS: Response of Create was not correct with noredirect and data are true

2020-04-13 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082597#comment-17082597
 ] 

Íñigo Goiri commented on HDFS-15275:


Can you make the indentation consistent to what was there before?
Everything else makes sense.

> HttpFS: Response of Create was not correct with noredirect and data are true
> 
>
> Key: HDFS-15275
> URL: https://issues.apache.org/jira/browse/HDFS-15275
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15275.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15275) HttpFS: Response of Create was not correct with noredirect and data are true

2020-04-13 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082497#comment-17082497
 ] 

Hadoop QA commented on HDFS-15275:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
48s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 18s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-httpfs: The 
patch generated 2 new + 296 unchanged - 46 fixed = 298 total (was 342) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
28s{color} | {color:green} hadoop-hdfs-httpfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 60m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:e6455cc864d |
| JIRA Issue | HDFS-15275 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12999804/HDFS-15275.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3958ef638ea7 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8d49229 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_242 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29153/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-httpfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29153/testReport/ |
| Max. process+thread count | 632 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-httpfs U: 
hadoop-hdfs-project/hadoop-hdfs-httpfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29153/console |
| Powered by | Apache 

[jira] [Updated] (HDFS-15276) Concat on INodeRefernce fails with illegal state exception

2020-04-13 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15276:
-
Description: 
Performing concat operation on INodeRefernce throwing illegal state exception

On verifySrcFiles , the src inode was getting converted to Inodefile 
{code:java}
final INode srcINode = iip.getLastINode();
final INodeFile srcINodeFile = INodeFile.valueOf(srcINode, src);{code}
If this INode is an INodeRefernce , it fails at Preconditions.checkstate as the 
child is an refernce but we have converted that as file
{code:java}
INodeDirectory#removeChild
  final INode removed = children.remove(i);
  Preconditions.checkState(removed == child); {code}

> Concat on INodeRefernce fails with illegal state exception
> --
>
> Key: HDFS-15276
> URL: https://issues.apache.org/jira/browse/HDFS-15276
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
>
> Performing concat operation on INodeRefernce throwing illegal state exception
> On verifySrcFiles , the src inode was getting converted to Inodefile 
> {code:java}
> final INode srcINode = iip.getLastINode();
> final INodeFile srcINodeFile = INodeFile.valueOf(srcINode, src);{code}
> If this INode is an INodeRefernce , it fails at Preconditions.checkstate as 
> the child is an refernce but we have converted that as file
> {code:java}
> INodeDirectory#removeChild
>   final INode removed = children.remove(i);
>   Preconditions.checkState(removed == child); {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15276) Concat on INodeRefernce fails with illegal state exception

2020-04-13 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-15276:


 Summary: Concat on INodeRefernce fails with illegal state exception
 Key: HDFS-15276
 URL: https://issues.apache.org/jira/browse/HDFS-15276
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina
Assignee: hemanthboyina






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15275) HttpFS: Response of Create was not correct with noredirect and data are true

2020-04-13 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15275:
-
Attachment: HDFS-15275.001.patch
Status: Patch Available  (was: Open)

> HttpFS: Response of Create was not correct with noredirect and data are true
> 
>
> Key: HDFS-15275
> URL: https://issues.apache.org/jira/browse/HDFS-15275
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15275.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15275) HttpFS: Response of Create was not correct with noredirect and data are true

2020-04-13 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-15275:


 Summary: HttpFS: Response of Create was not correct with 
noredirect and data are true
 Key: HDFS-15275
 URL: https://issues.apache.org/jira/browse/HDFS-15275
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina
Assignee: hemanthboyina






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-1820) FTPFileSystem attempts to close the outputstream even when it is not initialised

2020-04-13 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082355#comment-17082355
 ] 

Hadoop QA commented on HDFS-1820:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 27s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  8s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m  
7s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
49s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}102m 40s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:e6455cc864d |
| JIRA Issue | HDFS-1820 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12999777/HDFS-1820.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  xml  findbugs  checkstyle  |
| uname | Linux 3aa101c0a22d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8d49229 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_242 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29152/testReport/ |
| Max. process+thread count | 1519 (vs. ulimit of 5500) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29152/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> FTPFileSystem attempts 

[jira] [Commented] (HDFS-15274) NN doesn't remove the blocks from the failed DatanodeStorageInfo

2020-04-13 Thread HuangTao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082348#comment-17082348
 ] 

HuangTao commented on HDFS-15274:
-

All failed test cases can pass locally

> NN doesn't remove the blocks from the failed DatanodeStorageInfo
> 
>
> Key: HDFS-15274
> URL: https://issues.apache.org/jira/browse/HDFS-15274
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
> Attachments: HDFS-15274.001.patch, HDFS-15274.002.patch
>
>
> In our federation cluster, we found there were some inconsistency failure 
> volumes between two namespaces. The following logs are two NS separately.
> NS1 received the failed storage info and removed the blocks associated with 
> the failed storage.
> {code:java}
> [INFO] [IPC Server handler 76 on 8021] : Number of failed storages changes 
> from 0 to 1
> [INFO] [IPC Server handler 76 on 8021] : 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:NORMAL:X.X.X.X:50010:/data0/dfs 
> failed.
> [INFO] 
> [org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@4fb57fb3]
>  : Removed blocks associated with storage 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs 
> from DataNode X.X.X.X:50010
> [INFO] [IPC Server handler 73 on 8021] : Removed storage 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs 
> from DataNode X.X.X.X:50010{code}
> NS2 just received the failed storage.
> {code:java}
> [INFO] [IPC Server handler 87 on 8021] : Number of failed storages changes 
> from 0 to 1  {code}
>  
> After digging into the code and trying to simulate disk failed with
> {code:java}
> echo offline > /sys/block/sda/device/state
> echo 1 > /sys/block/sda/device/delete
> # re-mount the failed disk
> rescan-scsi-bus.sh -a
> systemctl daemon-reload
> mount /data0
> {code}
> I found the root reason is the inconsistency between StorageReport and 
> VolumeFailureSummary in BPServiceActor#sendHeartBeat.
> {code}
> StorageReport[] reports =
> dn.getFSDataset().getStorageReports(bpos.getBlockPoolId());
>   ..
>   // the DISK may FAILED before executing the next line
> VolumeFailureSummary volumeFailureSummary = dn.getFSDataset()
> .getVolumeFailureSummary();
> int numFailedVolumes = volumeFailureSummary != null ?
> volumeFailureSummary.getFailedStorageLocations().length : 0;
> {code} 
> I improved the tolerance in NN DatanodeDescriptor#updateStorageStats to solve 
> this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15274) NN doesn't remove the blocks from the failed DatanodeStorageInfo

2020-04-13 Thread HuangTao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082347#comment-17082347
 ] 

HuangTao commented on HDFS-15274:
-

retrigger QA

> NN doesn't remove the blocks from the failed DatanodeStorageInfo
> 
>
> Key: HDFS-15274
> URL: https://issues.apache.org/jira/browse/HDFS-15274
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
> Attachments: HDFS-15274.001.patch, HDFS-15274.002.patch
>
>
> In our federation cluster, we found there were some inconsistency failure 
> volumes between two namespaces. The following logs are two NS separately.
> NS1 received the failed storage info and removed the blocks associated with 
> the failed storage.
> {code:java}
> [INFO] [IPC Server handler 76 on 8021] : Number of failed storages changes 
> from 0 to 1
> [INFO] [IPC Server handler 76 on 8021] : 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:NORMAL:X.X.X.X:50010:/data0/dfs 
> failed.
> [INFO] 
> [org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@4fb57fb3]
>  : Removed blocks associated with storage 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs 
> from DataNode X.X.X.X:50010
> [INFO] [IPC Server handler 73 on 8021] : Removed storage 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs 
> from DataNode X.X.X.X:50010{code}
> NS2 just received the failed storage.
> {code:java}
> [INFO] [IPC Server handler 87 on 8021] : Number of failed storages changes 
> from 0 to 1  {code}
>  
> After digging into the code and trying to simulate disk failed with
> {code:java}
> echo offline > /sys/block/sda/device/state
> echo 1 > /sys/block/sda/device/delete
> # re-mount the failed disk
> rescan-scsi-bus.sh -a
> systemctl daemon-reload
> mount /data0
> {code}
> I found the root reason is the inconsistency between StorageReport and 
> VolumeFailureSummary in BPServiceActor#sendHeartBeat.
> {code}
> StorageReport[] reports =
> dn.getFSDataset().getStorageReports(bpos.getBlockPoolId());
>   ..
>   // the DISK may FAILED before executing the next line
> VolumeFailureSummary volumeFailureSummary = dn.getFSDataset()
> .getVolumeFailureSummary();
> int numFailedVolumes = volumeFailureSummary != null ?
> volumeFailureSummary.getFailedStorageLocations().length : 0;
> {code} 
> I improved the tolerance in NN DatanodeDescriptor#updateStorageStats to solve 
> this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15274) NN doesn't remove the blocks from the failed DatanodeStorageInfo

2020-04-13 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082307#comment-17082307
 ] 

Hadoop QA commented on HDFS-15274:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 12s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 54s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}157m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.server.datanode.TestBPOfferService |
|   | hadoop.hdfs.TestFileCreation |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:e6455cc864d |
| JIRA Issue | HDFS-15274 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12999761/HDFS-15274.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1e1f35842a9d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8d49229 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_242 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29151/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29151/testReport/ |
| Max. process+thread count | 4358 (vs. ulimit of 5500) |
| modules | C: 

[jira] [Updated] (HDFS-1820) FTPFileSystem attempts to close the outputstream even when it is not initialised

2020-04-13 Thread Mikhail Pryakhin (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pryakhin updated HDFS-1820:
---
Attachment: HDFS-1820.002.patch
Status: Patch Available  (was: In Progress)

> FTPFileSystem attempts to close the outputstream even when it is not 
> initialised
> 
>
> Key: HDFS-1820
> URL: https://issues.apache.org/jira/browse/HDFS-1820
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 0.20.1
> Environment: occurs on all platforms
>Reporter: Sudharsan Sampath
>Assignee: Mikhail Pryakhin
>Priority: Major
>  Labels: hadoop
> Attachments: HDFS-1820.001.patch, HDFS-1820.002.patch
>
>
> FTPFileSystem's create method attempts to close the outputstream even when it 
> is not initialized causing a null pointer exception. In our case the apache 
> commons FTPClient was not able to create the destination file due to 
> permissions issue. The FtpClient promptly reported a 553 : Permissions issue 
> but it was overlooked in FTPFileSystem create method. 
> The following code fails
> if (!FTPReply.isPositivePreliminary(client.getReplyCode())) {
>   // The ftpClient is an inconsistent state. Must close the stream
>   // which in turn will logout and disconnect from FTP server
>   fos.close();
>   throw new IOException("Unable to create file: " + file + ", Aborting");
> }
> as 'fos' is null. As a result the proper error message "Unable to create file 
> XXX" is not reported but rather a null pointer exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15274) NN doesn't remove the blocks from the failed DatanodeStorageInfo

2020-04-13 Thread HuangTao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HuangTao updated HDFS-15274:

Attachment: HDFS-15274.002.patch

> NN doesn't remove the blocks from the failed DatanodeStorageInfo
> 
>
> Key: HDFS-15274
> URL: https://issues.apache.org/jira/browse/HDFS-15274
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
> Attachments: HDFS-15274.001.patch, HDFS-15274.002.patch
>
>
> In our federation cluster, we found there were some inconsistency failure 
> volumes between two namespaces. The following logs are two NS separately.
> NS1 received the failed storage info and removed the blocks associated with 
> the failed storage.
> {code:java}
> [INFO] [IPC Server handler 76 on 8021] : Number of failed storages changes 
> from 0 to 1
> [INFO] [IPC Server handler 76 on 8021] : 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:NORMAL:X.X.X.X:50010:/data0/dfs 
> failed.
> [INFO] 
> [org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@4fb57fb3]
>  : Removed blocks associated with storage 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs 
> from DataNode X.X.X.X:50010
> [INFO] [IPC Server handler 73 on 8021] : Removed storage 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs 
> from DataNode X.X.X.X:50010{code}
> NS2 just received the failed storage.
> {code:java}
> [INFO] [IPC Server handler 87 on 8021] : Number of failed storages changes 
> from 0 to 1  {code}
>  
> After digging into the code and trying to simulate disk failed with
> {code:java}
> echo offline > /sys/block/sda/device/state
> echo 1 > /sys/block/sda/device/delete
> # re-mount the failed disk
> rescan-scsi-bus.sh -a
> systemctl daemon-reload
> mount /data0
> {code}
> I found the root reason is the inconsistency between StorageReport and 
> VolumeFailureSummary in BPServiceActor#sendHeartBeat.
> {code}
> StorageReport[] reports =
> dn.getFSDataset().getStorageReports(bpos.getBlockPoolId());
>   ..
>   // the DISK may FAILED before executing the next line
> VolumeFailureSummary volumeFailureSummary = dn.getFSDataset()
> .getVolumeFailureSummary();
> int numFailedVolumes = volumeFailureSummary != null ?
> volumeFailureSummary.getFailedStorageLocations().length : 0;
> {code} 
> I improved the tolerance in NN DatanodeDescriptor#updateStorageStats to solve 
> this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-1820) FTPFileSystem attempts to close the outputstream even when it is not initialised

2020-04-13 Thread Mikhail Pryakhin (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-1820 started by Mikhail Pryakhin.
--
> FTPFileSystem attempts to close the outputstream even when it is not 
> initialised
> 
>
> Key: HDFS-1820
> URL: https://issues.apache.org/jira/browse/HDFS-1820
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 0.20.1
> Environment: occurs on all platforms
>Reporter: Sudharsan Sampath
>Assignee: Mikhail Pryakhin
>Priority: Major
>  Labels: hadoop
> Attachments: HDFS-1820.001.patch
>
>
> FTPFileSystem's create method attempts to close the outputstream even when it 
> is not initialized causing a null pointer exception. In our case the apache 
> commons FTPClient was not able to create the destination file due to 
> permissions issue. The FtpClient promptly reported a 553 : Permissions issue 
> but it was overlooked in FTPFileSystem create method. 
> The following code fails
> if (!FTPReply.isPositivePreliminary(client.getReplyCode())) {
>   // The ftpClient is an inconsistent state. Must close the stream
>   // which in turn will logout and disconnect from FTP server
>   fos.close();
>   throw new IOException("Unable to create file: " + file + ", Aborting");
> }
> as 'fos' is null. As a result the proper error message "Unable to create file 
> XXX" is not reported but rather a null pointer exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15274) NN doesn't remove the blocks from the failed DatanodeStorageInfo

2020-04-13 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082151#comment-17082151
 ] 

Hadoop QA commented on HDFS-15274:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
47s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 38s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 7 new + 121 unchanged - 0 fixed = 128 total (was 121) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}105m 49s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}171m 13s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:e6455cc864d |
| JIRA Issue | HDFS-15274 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12999743/HDFS-15274.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 94502884026d 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8d49229 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_242 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29150/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29150/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29150/testReport/ |
| Max. process+thread count | 3280 (vs. ulimit of 5500) |
| 

[jira] [Commented] (HDFS-15274) NN doesn't remove the blocks from the failed DatanodeStorageInfo

2020-04-13 Thread HuangTao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082132#comment-17082132
 ] 

HuangTao commented on HDFS-15274:
-

[~daryn] please help to take a look.

> NN doesn't remove the blocks from the failed DatanodeStorageInfo
> 
>
> Key: HDFS-15274
> URL: https://issues.apache.org/jira/browse/HDFS-15274
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
> Attachments: HDFS-15274.001.patch
>
>
> In our federation cluster, we found there were some inconsistency failure 
> volumes between two namespaces. The following logs are two NS separately.
> NS1 received the failed storage info and removed the blocks associated with 
> the failed storage.
> {code:java}
> [INFO] [IPC Server handler 76 on 8021] : Number of failed storages changes 
> from 0 to 1
> [INFO] [IPC Server handler 76 on 8021] : 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:NORMAL:X.X.X.X:50010:/data0/dfs 
> failed.
> [INFO] 
> [org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@4fb57fb3]
>  : Removed blocks associated with storage 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs 
> from DataNode X.X.X.X:50010
> [INFO] [IPC Server handler 73 on 8021] : Removed storage 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs 
> from DataNode X.X.X.X:50010{code}
> NS2 just received the failed storage.
> {code:java}
> [INFO] [IPC Server handler 87 on 8021] : Number of failed storages changes 
> from 0 to 1  {code}
>  
> After digging into the code and trying to simulate disk failed with
> {code:java}
> echo offline > /sys/block/sda/device/state
> echo 1 > /sys/block/sda/device/delete
> # re-mount the failed disk
> rescan-scsi-bus.sh -a
> systemctl daemon-reload
> mount /data0
> {code}
> I found the root reason is the inconsistency between StorageReport and 
> VolumeFailureSummary in BPServiceActor#sendHeartBeat.
> {code}
> StorageReport[] reports =
> dn.getFSDataset().getStorageReports(bpos.getBlockPoolId());
>   ..
>   // the DISK may FAILED before executing the next line
> VolumeFailureSummary volumeFailureSummary = dn.getFSDataset()
> .getVolumeFailureSummary();
> int numFailedVolumes = volumeFailureSummary != null ?
> volumeFailureSummary.getFailedStorageLocations().length : 0;
> {code} 
> I improved the tolerance in NN DatanodeDescriptor#updateStorageStats to solve 
> this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2020-04-13 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-14476:

Target Version/s:   (was: 3.4.0)
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

Committed to branch-2.10. Thanks, [~seanlook]. We should set "Fix Version/s" 
after the patch is committed.

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0, 3.0.3
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Fix For: 3.3.0, 2.10.1
>
> Attachments: HDFS-14476-branch-2.01.patch, 
> HDFS-14476-branch-2.02.patch, HDFS-14476-branch-2.10.02.patch, 
> HDFS-14476.00.patch, HDFS-14476.002.patch, HDFS-14476.01.patch, 
> HDFS-14476.branch-3.2.001.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15048) Fix findbug in DirectoryScanner

2020-04-13 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082108#comment-17082108
 ] 

Masatake Iwasaki commented on HDFS-15048:
-

+1 on the HDFS-15048-branch-2.10.002.patch.

> Fix findbug in DirectoryScanner
> ---
>
> Key: HDFS-15048
> URL: https://issues.apache.org/jira/browse/HDFS-15048
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Masatake Iwasaki
>Priority: Major
> Fix For: 3.3.0, 2.10.1
>
> Attachments: HDFS-15048-branch-2.10.002.patch, HDFS-15048.001.patch
>
>
> There is a findbug in DirectoryScanner.
> {noformat}
> Multithreaded correctness Warnings
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile() calls 
> Thread.sleep() with a lock held
> Bug type SWL_SLEEP_WITH_LOCK_HELD (click for details) 
> In class org.apache.hadoop.hdfs.server.datanode.DirectoryScanner
> In method org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile()
> At DirectoryScanner.java:[line 441]
> {noformat}
> https://builds.apache.org/job/PreCommit-HDFS-Build/28498/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2020-04-13 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082107#comment-17082107
 ] 

Masatake Iwasaki commented on HDFS-14476:
-

+1 on the HDFS-14476-branch-2.10.02.patch. Committing this and 
HDFS-15048-branch-2.10.002.patch to branch-2.10.

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0, 3.0.3
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Fix For: 3.3.0, 2.10.1
>
> Attachments: HDFS-14476-branch-2.01.patch, 
> HDFS-14476-branch-2.02.patch, HDFS-14476-branch-2.10.02.patch, 
> HDFS-14476.00.patch, HDFS-14476.002.patch, HDFS-14476.01.patch, 
> HDFS-14476.branch-3.2.001.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15048) Fix findbug in DirectoryScanner

2020-04-13 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082085#comment-17082085
 ] 

Masatake Iwasaki commented on HDFS-15048:
-

[~seanlook] I'm going to commit both HDFS-14476-branch-2.10.02.patch and 
HDFS-15048-branch-2.10.002.patch after testing on my local. Thanks.

> Fix findbug in DirectoryScanner
> ---
>
> Key: HDFS-15048
> URL: https://issues.apache.org/jira/browse/HDFS-15048
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Masatake Iwasaki
>Priority: Major
> Fix For: 3.3.0, 2.10.1
>
> Attachments: HDFS-15048-branch-2.10.002.patch, HDFS-15048.001.patch
>
>
> There is a findbug in DirectoryScanner.
> {noformat}
> Multithreaded correctness Warnings
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile() calls 
> Thread.sleep() with a lock held
> Bug type SWL_SLEEP_WITH_LOCK_HELD (click for details) 
> In class org.apache.hadoop.hdfs.server.datanode.DirectoryScanner
> In method org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile()
> At DirectoryScanner.java:[line 441]
> {noformat}
> https://builds.apache.org/job/PreCommit-HDFS-Build/28498/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org