[jira] [Commented] (HDFS-15398) EC: hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130087#comment-17130087
 ] 

Hadoop QA commented on HDFS-15398:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
45s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
52s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
10s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 49s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
0s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}106m  2s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}189m 13s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes |
|   | hadoop.hdfs.TestStripedFileAppend |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-HDFS-Build/29417/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-15398 |
| JIRA Patch URL | 

[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-06-09 Thread zZtai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zZtai updated HDFS-15098:
-
Attachment: HDFS-15098.006.patch
Status: Patch Available  (was: Open)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.download Bouncy Castle Crypto APIs from bouncycastle.org
> [https://bouncycastle.org/download/bcprov-ext-jdk15on-165.jar]
> 2.Configure JDK
> Place bcprov-ext-jdk15on-165.jar in $JAVA_HOME/jre/lib/ext directory,
> add "security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider" 
> to $JAVA_HOME/jre/lib/security/java.security file
> 3.Configure Hadoop KMS
> 4.test HDFS sm4
> hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
> hdfs dfs -mkdir /benchmarks
> hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
> 1.openssl version >=1.1.1
> 2.configure Bouncy Castle Crypto on JDK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-06-09 Thread zZtai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zZtai updated HDFS-15098:
-
Attachment: (was: HDFS-15098.006.patch)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.download Bouncy Castle Crypto APIs from bouncycastle.org
> [https://bouncycastle.org/download/bcprov-ext-jdk15on-165.jar]
> 2.Configure JDK
> Place bcprov-ext-jdk15on-165.jar in $JAVA_HOME/jre/lib/ext directory,
> add "security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider" 
> to $JAVA_HOME/jre/lib/security/java.security file
> 3.Configure Hadoop KMS
> 4.test HDFS sm4
> hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
> hdfs dfs -mkdir /benchmarks
> hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
> 1.openssl version >=1.1.1
> 2.configure Bouncy Castle Crypto on JDK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-06-09 Thread zZtai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zZtai updated HDFS-15098:
-
Status: Open  (was: Patch Available)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.download Bouncy Castle Crypto APIs from bouncycastle.org
> [https://bouncycastle.org/download/bcprov-ext-jdk15on-165.jar]
> 2.Configure JDK
> Place bcprov-ext-jdk15on-165.jar in $JAVA_HOME/jre/lib/ext directory,
> add "security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider" 
> to $JAVA_HOME/jre/lib/security/java.security file
> 3.Configure Hadoop KMS
> 4.test HDFS sm4
> hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
> hdfs dfs -mkdir /benchmarks
> hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
> 1.openssl version >=1.1.1
> 2.configure Bouncy Castle Crypto on JDK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS

2020-06-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130025#comment-17130025
 ] 

Hadoop QA commented on HDFS-15098:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
59s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} prototool {color} | {color:blue}  0m  
1s{color} | {color:blue} prototool was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
23m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
58s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 
36s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 20m 36s{color} | 
{color:red} root generated 25 new + 137 unchanged - 25 fixed = 162 total (was 
162) {color} |
| {color:green}+1{color} | {color:green} golang {color} | {color:green} 20m 
36s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 20m 36s{color} 
| {color:red} root generated 4 new + 1865 unchanged - 0 fixed = 1869 total (was 
1865) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 19s{color} | {color:orange} root: The patch generated 4 new + 211 unchanged 
- 5 fixed = 215 total (was 216) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 43s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 18s{color} 
| {color:red} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
11s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
46s{color} | {color:green} The patch does not generate ASF 

[jira] [Commented] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2020-06-09 Thread huhaiyang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130019#comment-17130019
 ] 

huhaiyang commented on HDFS-15175:
--

hi [~caiyicong][~wanchang] We encountered this bug in our online environment. 

the scenario performs the truncate and append concurrent operations, How did 
you solve it?

> Multiple CloseOp shared block instance causes the standby namenode to crash 
> when rolling editlog
> 
>
> Key: HDFS-15175
> URL: https://issues.apache.org/jira/browse/HDFS-15175
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Critical
>
>  
> {panel:title=Crash exception}
> 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
> tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
> [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
> atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
> permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
> clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
> txid=32625024993]
>  java.io.IOException: File is not under construction: ..
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
> {panel}
>  
> {panel:title=Editlog}
> 
>  OP_REASSIGN_LEASE
>  
>  32625021150
>  DFSClient_NONMAPREDUCE_-969060727_197760
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625023743
>  0
>  0
>  ..
>  3
>  1581816135883
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> ..
> 
>  OP_TRUNCATE
>  
>  32625024049
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  ..
>  185818644
>  1581816136336
>  
>  5568434562
>  185818648
>  4495417845
>  
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625024993
>  0
>  0
>  ..
>  3
>  1581816138774
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> {panel}
>  
>  
> The block size should be 185818648 in the first CloseOp. When truncate is 
> used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
> synchronized to the JournalNode in the same batch. The block used by CloseOp 
> twice is the same instance, which causes the first CloseOp has wrong block 
> size. When SNN rolling Editlog, TruncateOp does not make the file to the 
> UnderConstruction state. Then, when the second CloseOp is executed, the file 
> is not in the UnderConstruction state, and SNN crashes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15346) RBF: DistCpFedBalance implementation

2020-06-09 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-15346:
---
Attachment: HDFS-15346.009.patch

> RBF: DistCpFedBalance implementation
> 
>
> Key: HDFS-15346
> URL: https://issues.apache.org/jira/browse/HDFS-15346
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15346.001.patch, HDFS-15346.002.patch, 
> HDFS-15346.003.patch, HDFS-15346.004.patch, HDFS-15346.005.patch, 
> HDFS-15346.006.patch, HDFS-15346.007.patch, HDFS-15346.008.patch, 
> HDFS-15346.009.patch
>
>
> Patch in HDFS-15294 is too big to review so we split it into 2 patches. This 
> is the second one. Detail can be found at HDFS-15294.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15346) RBF: DistCpFedBalance implementation

2020-06-09 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130008#comment-17130008
 ] 

Jinglun commented on HDFS-15346:


Upload v09, fix unit test.

> RBF: DistCpFedBalance implementation
> 
>
> Key: HDFS-15346
> URL: https://issues.apache.org/jira/browse/HDFS-15346
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15346.001.patch, HDFS-15346.002.patch, 
> HDFS-15346.003.patch, HDFS-15346.004.patch, HDFS-15346.005.patch, 
> HDFS-15346.006.patch, HDFS-15346.007.patch, HDFS-15346.008.patch, 
> HDFS-15346.009.patch
>
>
> Patch in HDFS-15294 is too big to review so we split it into 2 patches. This 
> is the second one. Detail can be found at HDFS-15294.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15398) EC: hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Hongbing Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129964#comment-17129964
 ] 

Hongbing Wang edited comment on HDFS-15398 at 6/10/20, 2:32 AM:


{quote}

Need to fix this. 

{quote}

sorry I made this mistake, it has been corrected. 

Test failures are unrelated, by reviewing Jenkins. Please help confirm. Thanx.


was (Author: wanghongbing):
> Need to fix this. 

sorry I made this mistake, it has been corrected. 

Test failures are unrelated, by reviewing Jenkins. Please help confirm. Thanx.

> EC: hdfs client hangs when writing EC file occurs an addBlock exception
> ---
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Critical
> Attachments: HDFS-15398.001.patch, HDFS-15398.002.patch, 
> HDFS-15398.003.patch, HDFS-15398.004.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342)
> at 
> 

[jira] [Comment Edited] (HDFS-15398) EC: hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Hongbing Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129964#comment-17129964
 ] 

Hongbing Wang edited comment on HDFS-15398 at 6/10/20, 2:29 AM:


> Need to fix this. 

sorry I made this mistake, it has been corrected. 

Test failures are unrelated, by reviewing Jenkins. Please help confirm. Thanx.


was (Author: wanghongbing):
{quote} Need to fix this. \{quote}

sorry I made this mistake, it has been corrected. 

Test failures are unrelated, by reviewing Jenkins. Please help confirm. Thanx.

> EC: hdfs client hangs when writing EC file occurs an addBlock exception
> ---
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Critical
> Attachments: HDFS-15398.001.patch, HDFS-15398.002.patch, 
> HDFS-15398.003.patch, HDFS-15398.004.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342)
> at 
> 

[jira] [Comment Edited] (HDFS-15398) EC: hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Hongbing Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129964#comment-17129964
 ] 

Hongbing Wang edited comment on HDFS-15398 at 6/10/20, 2:28 AM:


{quote} Need to fix this. \{quote}

sorry I made this mistake, it has been corrected. 

Test failures are unrelated, by reviewing Jenkins. Please help confirm. Thanx.


was (Author: wanghongbing):
 Need to fix this.

sorry I made this mistake, it has been corrected. 

Test failures are unrelated, by reviewing Jenkins. Please help confirm. Thanx.

> EC: hdfs client hangs when writing EC file occurs an addBlock exception
> ---
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Critical
> Attachments: HDFS-15398.001.patch, HDFS-15398.002.patch, 
> HDFS-15398.003.patch, HDFS-15398.004.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342)
> at 
> 

[jira] [Commented] (HDFS-15398) EC: hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Hongbing Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129964#comment-17129964
 ] 

Hongbing Wang commented on HDFS-15398:
--

 Need to fix this.

sorry I made this mistake, it has been corrected. 

Test failures are unrelated, by reviewing Jenkins. Please help confirm. Thanx.

> EC: hdfs client hangs when writing EC file occurs an addBlock exception
> ---
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Critical
> Attachments: HDFS-15398.001.patch, HDFS-15398.002.patch, 
> HDFS-15398.003.patch, HDFS-15398.004.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:277)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:262)
> {code}
> When an exception occurs in addBlock, the 

[jira] [Updated] (HDFS-15398) EC: hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Hongbing Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongbing Wang updated HDFS-15398:
-
Attachment: HDFS-15398.004.patch

> EC: hdfs client hangs when writing EC file occurs an addBlock exception
> ---
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Critical
> Attachments: HDFS-15398.001.patch, HDFS-15398.002.patch, 
> HDFS-15398.003.patch, HDFS-15398.004.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:277)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:262)
> {code}
> When an exception occurs in addBlock, the program will call 
> DFSStripedOutputStream.closeImpl() -> flushBuffer() -> writeChunk() -> 
> allocateNewBlock() -> waitEndBlocks(), waitEndBlocks will enter 

[jira] [Commented] (HDFS-15402) Requesting http jmx metrics leads to too much CLOSE-WAIT on datanode

2020-06-09 Thread Sean Chow (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129949#comment-17129949
 ] 

Sean Chow commented on HDFS-15402:
--

My clients use webhdfs put file a lot. I think this issue is not caused by 
webhdfs, but the jmx endpoint.

Occasionally the http port can not be accessed:

 
{code:java}
$ curl http://127.0.0.1:50075/jmx > a
curl: (7) couldn't connect to host
{code}
 

After restart datanodes, the CLOSE-WAIT disappears.

Currently I have no clue for this because not Exception could be found.

> Requesting http jmx metrics leads to too much CLOSE-WAIT on datanode
> 
>
> Key: HDFS-15402
> URL: https://issues.apache.org/jira/browse/HDFS-15402
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 3.1.3
>Reporter: Sean Chow
>Priority: Major
>
> We access  {{http://127.0.0.1:50075/jmx}}  to get datanode metrics 
> periodically. But there is too much CLOSE-WAIT socket state that lead the 
> normal webhdfs request failed.
>  
> {code:java}
> $ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT |head -10
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:37296 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:26499 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:47470 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:42852 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:40281
> $ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT | wc -l 
> 6729
> lsof -i:37296
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> java 101015 hdfs 3044u IPv4 271157177 0t0 TCP 
> localhost:50075->localhost:37296 (CLOSE_WAIT)
> {code}
>  
> The pid 101015 is the datanode's process id.
> I use {{cdh6.1.1}} and {{apache-hadoop-3.1.3}} in my production, and both of 
> them have the same issue. When the metric retriving script stop, the number 
> of CLOSE-WAIT does not increase anymore.
>  The version apache-hadoop-2.9.2 does not have this issue with the same 
> retriving metric script.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15402) Requesting http jmx metrics leads to too much CLOSE-WAIT on datanode

2020-06-09 Thread Sean Chow (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129949#comment-17129949
 ] 

Sean Chow edited comment on HDFS-15402 at 6/10/20, 2:01 AM:


My clients use webhdfs put file a lot. I think this issue is not caused by 
webhdfs, but the jmx endpoint.

Occasionally the http port can not be accessed:
{code:java}
$ curl http://127.0.0.1:50075/jmx > a
curl: (7) couldn't connect to host
{code}
 

After restart datanodes, the CLOSE-WAIT disappears.

Currently I have no clue for this because no Exception could be found.


was (Author: seanlook):
My clients use webhdfs put file a lot. I think this issue is not caused by 
webhdfs, but the jmx endpoint.

Occasionally the http port can not be accessed:

 
{code:java}
$ curl http://127.0.0.1:50075/jmx > a
curl: (7) couldn't connect to host
{code}
 

After restart datanodes, the CLOSE-WAIT disappears.

Currently I have no clue for this because not Exception could be found.

> Requesting http jmx metrics leads to too much CLOSE-WAIT on datanode
> 
>
> Key: HDFS-15402
> URL: https://issues.apache.org/jira/browse/HDFS-15402
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 3.1.3
>Reporter: Sean Chow
>Priority: Major
>
> We access  {{http://127.0.0.1:50075/jmx}}  to get datanode metrics 
> periodically. But there is too much CLOSE-WAIT socket state that lead the 
> normal webhdfs request failed.
>  
> {code:java}
> $ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT |head -10
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:37296 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:26499 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:47470 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:42852 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:40281
> $ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT | wc -l 
> 6729
> lsof -i:37296
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> java 101015 hdfs 3044u IPv4 271157177 0t0 TCP 
> localhost:50075->localhost:37296 (CLOSE_WAIT)
> {code}
>  
> The pid 101015 is the datanode's process id.
> I use {{cdh6.1.1}} and {{apache-hadoop-3.1.3}} in my production, and both of 
> them have the same issue. When the metric retriving script stop, the number 
> of CLOSE-WAIT does not increase anymore.
>  The version apache-hadoop-2.9.2 does not have this issue with the same 
> retriving metric script.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-06-09 Thread zZtai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zZtai updated HDFS-15098:
-
Attachment: (was: HDFS-15098.006.patch)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.download Bouncy Castle Crypto APIs from bouncycastle.org
> [https://bouncycastle.org/download/bcprov-ext-jdk15on-165.jar]
> 2.Configure JDK
> Place bcprov-ext-jdk15on-165.jar in $JAVA_HOME/jre/lib/ext directory,
> add "security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider" 
> to $JAVA_HOME/jre/lib/security/java.security file
> 3.Configure Hadoop KMS
> 4.test HDFS sm4
> hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
> hdfs dfs -mkdir /benchmarks
> hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
> 1.openssl version >=1.1.1
> 2.configure Bouncy Castle Crypto on JDK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-06-09 Thread zZtai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zZtai updated HDFS-15098:
-
Attachment: HDFS-15098.006.patch
Status: Patch Available  (was: Open)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.download Bouncy Castle Crypto APIs from bouncycastle.org
> [https://bouncycastle.org/download/bcprov-ext-jdk15on-165.jar]
> 2.Configure JDK
> Place bcprov-ext-jdk15on-165.jar in $JAVA_HOME/jre/lib/ext directory,
> add "security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider" 
> to $JAVA_HOME/jre/lib/security/java.security file
> 3.Configure Hadoop KMS
> 4.test HDFS sm4
> hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
> hdfs dfs -mkdir /benchmarks
> hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
> 1.openssl version >=1.1.1
> 2.configure Bouncy Castle Crypto on JDK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-06-09 Thread zZtai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zZtai updated HDFS-15098:
-
Status: Open  (was: Patch Available)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.download Bouncy Castle Crypto APIs from bouncycastle.org
> [https://bouncycastle.org/download/bcprov-ext-jdk15on-165.jar]
> 2.Configure JDK
> Place bcprov-ext-jdk15on-165.jar in $JAVA_HOME/jre/lib/ext directory,
> add "security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider" 
> to $JAVA_HOME/jre/lib/security/java.security file
> 3.Configure Hadoop KMS
> 4.test HDFS sm4
> hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
> hdfs dfs -mkdir /benchmarks
> hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
> 1.openssl version >=1.1.1
> 2.configure Bouncy Castle Crypto on JDK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15372) Files in snapshots no longer see attribute provider permissions

2020-06-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129909#comment-17129909
 ] 

Hadoop QA commented on HDFS-15372:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
20m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m  
2s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
0s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 88 unchanged - 1 fixed = 88 total (was 89) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
5s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}120m 22s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}201m 26s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy 
|
|   | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.TestGetFileChecksum |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestSpaceReservation |
|   | hadoop.hdfs.TestStripedFileAppend |
|   | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-HDFS-Build/29415/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-15372 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13005286/HDFS-15372.005.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 362471de8498 4.15.0-101-generic #102-Ubuntu SMP Mon 

[jira] [Created] (HDFS-15404) ShellCommandFencer should expose info about source

2020-06-09 Thread Chen Liang (Jira)
Chen Liang created HDFS-15404:
-

 Summary: ShellCommandFencer should expose info about source
 Key: HDFS-15404
 URL: https://issues.apache.org/jira/browse/HDFS-15404
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chen Liang
Assignee: Chen Liang


Currently the HA fencing logic in ShellCommandFencer exposes environment 
variable about only the fencing target. i.e. the $target_* variables as 
mentioned in this [document 
page|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html]).
 

But here only the fencing target variables are getting exposed. Sometimes it is 
useful to expose info about the fencing source node. One use case is would 
allow source and target node to identify themselves separately and run 
different commands/scripts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15403) NPE in FileIoProvider#transferToSocketFully

2020-06-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129843#comment-17129843
 ] 

Hadoop QA commented on HDFS-15403:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m  
2s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
0s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
7s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}111m 23s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}191m 57s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy 
|
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
|   | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics |
|   | hadoop.hdfs.qjournal.server.TestJournalNodeSync |
|   | hadoop.hdfs.TestReconstructStripedFile |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-HDFS-Build/29414/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-15403 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13005269/HDFS-15403.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux e6f23cfed551 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 
10:07:26 UTC 2020 x86_64 x86_64 x86_64 

[jira] [Updated] (HDFS-15372) Files in snapshots no longer see attribute provider permissions

2020-06-09 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-15372:
-
Attachment: HDFS-15372.005.patch

> Files in snapshots no longer see attribute provider permissions
> ---
>
> Key: HDFS-15372
> URL: https://issues.apache.org/jira/browse/HDFS-15372
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15372.001.patch, HDFS-15372.002.patch, 
> HDFS-15372.003.patch, HDFS-15372.004.patch, HDFS-15372.005.patch
>
>
> Given a cluster with an authorization provider configured (eg Sentry) and the 
> paths covered by the provider are snapshotable, there was a change in 
> behaviour in how the provider permissions and ACLs are applied to files in 
> snapshots between the 2.x branch and Hadoop 3.0.
> Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs 
> below are provided by Sentry:
> {code}
> hadoop fs -getfacl -R /data
> # file: /data
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::---
> group:flume:rwx
> user:hive:rwx
> group:hive:rwx
> group:testgroup:rwx
> mask::rwx
> other::--x
> /data/tab1
> {code}
> After taking a snapshot, the files in the snapshot do not see the provider 
> permissions:
> {code}
> hadoop fs -getfacl -R /data/.snapshot
> # file: /data/.snapshot
> # owner: 
> # group: 
> user::rwx
> group::rwx
> other::rwx
> # file: /data/.snapshot/snap1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/.snapshot/snap1/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> {code}
> However pre-Hadoop 3.0 (when the attribute provider etc was extensively 
> refactored) snapshots did get the provider permissions.
> The reason is this code in FSDirectory.java which ultimately calls the 
> attribute provider and passes the path we want permissions for:
> {code}
>   INodeAttributes getAttributes(INodesInPath iip)
>   throws IOException {
> INode node = FSDirectory.resolveLastINode(iip);
> int snapshot = iip.getPathSnapshotId();
> INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
> UserGroupInformation ugi = NameNode.getRemoteUser();
> INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi);
> if (ap != null) {
>   // permission checking sends the full components array including the
>   // first empty component for the root.  however file status
>   // related calls are expected to strip out the root component according
>   // to TestINodeAttributeProvider.
>   byte[][] components = iip.getPathComponents();
>   components = Arrays.copyOfRange(components, 1, components.length);
>   nodeAttrs = ap.getAttributes(components, nodeAttrs);
> }
> return nodeAttrs;
>   }
> {code}
> The line:
> {code}
> INode node = FSDirectory.resolveLastINode(iip);
> {code}
> Picks the last resolved Inode and if you then call node.getPathComponents, 
> for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It 
> resolves the snapshot path to its original location, but its still the 
> snapshot inode.
> However the logic passes 'iip.getPathComponents' which returns 
> "/user/.snapshot/snap1/tab" to the provider.
> The pre Hadoop 3.0 code passes the inode directly to the provider, and hence 
> it only ever sees the path as "/user/data/tab1".
> It is debatable which path should be passed to the provider - 
> /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as 
> the behaviour has changed I feel we should ensure the old behaviour is 
> retained.
> It would also be fairly easy to provide a config switch so the provider gets 
> the full snapshot path or the resolved path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15372) Files in snapshots no longer see attribute provider permissions

2020-06-09 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129808#comment-17129808
 ] 

Stephen O'Donnell commented on HDFS-15372:
--

[~hemanthboyina]. Thanks for you help on this change.

You are correct, the inode ID of the .snapshot/snap1 path is the same as the id 
of the parent, so I have changed the code to use that in FSDirectory.

Rather than passing FSDirectory into FSPermissionChecker, I added a new method 
to INodesInPath:

{code}
  static INodesInPath resolveFromRoot(INode inode) {
INode[] inodes = getINodes(inode);
byte[][] paths = INode.getPathComponents(inode.getFullPathName());
INodeDirectory rootDir = inodes[0].asDirectory();
return resolve(rootDir, paths);
  }
{code}

It obtains the root inode by walking back the list of parents on the inode 
passed in. It needs to walk this list anyway to get the components so this does 
not cost anything extra.

However, what I found, was that by passing an inode to this new method, its 
component path always resolves to the correct thing so I did not have to do any 
special logic like in FSDirectory to detect the .snapshot/snap1 inode.

I think it is better to leave FSDirectory using the different logic, as it 
already has the IIP object formed, and we don't need to form it again with 
INodesInPath.resolveFromRoot().

I will upload the latest patch now. Please let me know what you think.

> Files in snapshots no longer see attribute provider permissions
> ---
>
> Key: HDFS-15372
> URL: https://issues.apache.org/jira/browse/HDFS-15372
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15372.001.patch, HDFS-15372.002.patch, 
> HDFS-15372.003.patch, HDFS-15372.004.patch
>
>
> Given a cluster with an authorization provider configured (eg Sentry) and the 
> paths covered by the provider are snapshotable, there was a change in 
> behaviour in how the provider permissions and ACLs are applied to files in 
> snapshots between the 2.x branch and Hadoop 3.0.
> Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs 
> below are provided by Sentry:
> {code}
> hadoop fs -getfacl -R /data
> # file: /data
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::---
> group:flume:rwx
> user:hive:rwx
> group:hive:rwx
> group:testgroup:rwx
> mask::rwx
> other::--x
> /data/tab1
> {code}
> After taking a snapshot, the files in the snapshot do not see the provider 
> permissions:
> {code}
> hadoop fs -getfacl -R /data/.snapshot
> # file: /data/.snapshot
> # owner: 
> # group: 
> user::rwx
> group::rwx
> other::rwx
> # file: /data/.snapshot/snap1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/.snapshot/snap1/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> {code}
> However pre-Hadoop 3.0 (when the attribute provider etc was extensively 
> refactored) snapshots did get the provider permissions.
> The reason is this code in FSDirectory.java which ultimately calls the 
> attribute provider and passes the path we want permissions for:
> {code}
>   INodeAttributes getAttributes(INodesInPath iip)
>   throws IOException {
> INode node = FSDirectory.resolveLastINode(iip);
> int snapshot = iip.getPathSnapshotId();
> INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
> UserGroupInformation ugi = NameNode.getRemoteUser();
> INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi);
> if (ap != null) {
>   // permission checking sends the full components array including the
>   // first empty component for the root.  however file status
>   // related calls are expected to strip out the root component according
>   // to TestINodeAttributeProvider.
>   byte[][] components = iip.getPathComponents();
>   components = Arrays.copyOfRange(components, 1, components.length);
>   nodeAttrs = ap.getAttributes(components, nodeAttrs);
> }
> return nodeAttrs;
>   }
> {code}
> The line:
> {code}
> INode node = FSDirectory.resolveLastINode(iip);
> {code}
> Picks the last resolved Inode and if you then call node.getPathComponents, 
> for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It 
> resolves the snapshot path to its original location, but its still the 
> snapshot inode.
> However the logic passes 'iip.getPathComponents' which returns 
> "/user/.snapshot/snap1/tab" to the provider.
> The pre Hadoop 3.0 code passes the inode directly to the provider, and hence 
> it only ever sees the path as "/user/data/tab1".
> It is debatable which path should be 

[jira] [Commented] (HDFS-15351) Blocks Scheduled Count was wrong on Truncate

2020-06-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129793#comment-17129793
 ] 

Íñigo Goiri commented on HDFS-15351:


OK, let's commit then?

> Blocks Scheduled Count was wrong on Truncate 
> -
>
> Key: HDFS-15351
> URL: https://issues.apache.org/jira/browse/HDFS-15351
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15351.001.patch, HDFS-15351.002.patch, 
> HDFS-15351.003.patch
>
>
> On truncate and append we remove the blocks from Reconstruction Queue 
> On removing the blocks from pending reconstruction , we need to decrement 
> Blocks Scheduled 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15403) NPE in FileIoProvider#transferToSocketFully

2020-06-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-15403:
---
Description: 
{code:java}
[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
/127.0.0.1:34789[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
/127.0.0.1:34789java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:283)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:614)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:809)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:756)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:610)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292)
at java.lang.Thread.run(Thread.java:748) {code}

  was:
{code:java}
[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
/127.0.0.1:34789[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
/127.0.0.1:34789java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:283)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:614)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:809)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:756)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:610)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104
 at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292) at 
java.lang.Thread.run(Thread.java:748) {code}


> NPE in FileIoProvider#transferToSocketFully
> ---
>
> Key: HDFS-15403
> URL: https://issues.apache.org/jira/browse/HDFS-15403
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15403.001.patch
>
>
> {code:java}
> [DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:283)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:614)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:809)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:756)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:610)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292)
> at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HDFS-15403) NPE in FileIoProvider#transferToSocketFully

2020-06-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-15403:
---
Description: 
{code:java}
[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
/127.0.0.1:34789[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
/127.0.0.1:34789java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:283)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:614)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:809)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:756)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:610)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104
 at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292) at 
java.lang.Thread.run(Thread.java:748) {code}

  was:
{code:java}
[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
/127.0.0.1:34789[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
/127.0.0.1:34789java.lang.NullPointerException at 
org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:283)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:614)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:809)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:756)
 at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:610)
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104)
 at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292) at 
java.lang.Thread.run(Thread.java:748) {code}


> NPE in FileIoProvider#transferToSocketFully
> ---
>
> Key: HDFS-15403
> URL: https://issues.apache.org/jira/browse/HDFS-15403
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15403.001.patch
>
>
> {code:java}
> [DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:283)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:614)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:809)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:756)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:610)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292) 
> at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HDFS-15386) ReplicaNotFoundException keeps happening in DN after removing multiple DN's data directories

2020-06-09 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-15386.
--
Resolution: Fixed

> ReplicaNotFoundException keeps happening in DN after removing multiple DN's 
> data directories
> 
>
> Key: HDFS-15386
> URL: https://issues.apache.org/jira/browse/HDFS-15386
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
> Fix For: 3.0.4, 3.2.2, 2.10.1, 3.3.1, 3.4.0, 3.1.5
>
>
> When removing volumes, we need to invalidate all the blocks in the volumes. 
> In the following code (FsDatasetImpl), we keep the blocks that will be 
> invalidate in *blkToInvalidate* map. However as the key of the map is *bpid* 
> (Block Pool ID), it will be overwritten by other removed volumes. As a 
> result, the map will have only the blocks of the last volume we are removing, 
> and invalidate only them:
> {code:java}
> for (String bpid : volumeMap.getBlockPoolList()) {
>   List blocks = new ArrayList<>();
>   for (Iterator it =
> volumeMap.replicas(bpid).iterator(); it.hasNext();) {
> ReplicaInfo block = it.next();
> final StorageLocation blockStorageLocation =
> block.getVolume().getStorageLocation();
> LOG.trace("checking for block " + block.getBlockId() +
> " with storageLocation " + blockStorageLocation);
> if (blockStorageLocation.equals(sdLocation)) {
>   blocks.add(block);
>   it.remove();
> }
>   }
>   blkToInvalidate.put(bpid, blocks);
> }
> {code}
> [https://github.com/apache/hadoop/blob/704409d53bf7ebf717a3c2e988ede80f623bbad3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L580-L595]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15403) NPE in FileIoProvider#transferToSocketFully

2020-06-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129783#comment-17129783
 ] 

Íñigo Goiri commented on HDFS-15403:


Is there an easy test to add?

> NPE in FileIoProvider#transferToSocketFully
> ---
>
> Key: HDFS-15403
> URL: https://issues.apache.org/jira/browse/HDFS-15403
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15403.001.patch
>
>
> {code:java}
> [DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789java.lang.NullPointerException at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:283)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:614)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:809)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:756)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:610)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292) 
> at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15386) ReplicaNotFoundException keeps happening in DN after removing multiple DN's data directories

2020-06-09 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129782#comment-17129782
 ] 

Stephen O'Donnell commented on HDFS-15386:
--

[~brfrn169] I have merged this change to 2.10 too. Thanks for the contribution 
and for tracking down this long standing issue.

> ReplicaNotFoundException keeps happening in DN after removing multiple DN's 
> data directories
> 
>
> Key: HDFS-15386
> URL: https://issues.apache.org/jira/browse/HDFS-15386
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
> Fix For: 3.0.4, 3.2.2, 2.10.1, 3.3.1, 3.4.0, 3.1.5
>
>
> When removing volumes, we need to invalidate all the blocks in the volumes. 
> In the following code (FsDatasetImpl), we keep the blocks that will be 
> invalidate in *blkToInvalidate* map. However as the key of the map is *bpid* 
> (Block Pool ID), it will be overwritten by other removed volumes. As a 
> result, the map will have only the blocks of the last volume we are removing, 
> and invalidate only them:
> {code:java}
> for (String bpid : volumeMap.getBlockPoolList()) {
>   List blocks = new ArrayList<>();
>   for (Iterator it =
> volumeMap.replicas(bpid).iterator(); it.hasNext();) {
> ReplicaInfo block = it.next();
> final StorageLocation blockStorageLocation =
> block.getVolume().getStorageLocation();
> LOG.trace("checking for block " + block.getBlockId() +
> " with storageLocation " + blockStorageLocation);
> if (blockStorageLocation.equals(sdLocation)) {
>   blocks.add(block);
>   it.remove();
> }
>   }
>   blkToInvalidate.put(bpid, blocks);
> }
> {code}
> [https://github.com/apache/hadoop/blob/704409d53bf7ebf717a3c2e988ede80f623bbad3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L580-L595]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15386) ReplicaNotFoundException keeps happening in DN after removing multiple DN's data directories

2020-06-09 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-15386:
-
Fix Version/s: 2.10.1

> ReplicaNotFoundException keeps happening in DN after removing multiple DN's 
> data directories
> 
>
> Key: HDFS-15386
> URL: https://issues.apache.org/jira/browse/HDFS-15386
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
> Fix For: 3.0.4, 3.2.2, 2.10.1, 3.3.1, 3.4.0, 3.1.5
>
>
> When removing volumes, we need to invalidate all the blocks in the volumes. 
> In the following code (FsDatasetImpl), we keep the blocks that will be 
> invalidate in *blkToInvalidate* map. However as the key of the map is *bpid* 
> (Block Pool ID), it will be overwritten by other removed volumes. As a 
> result, the map will have only the blocks of the last volume we are removing, 
> and invalidate only them:
> {code:java}
> for (String bpid : volumeMap.getBlockPoolList()) {
>   List blocks = new ArrayList<>();
>   for (Iterator it =
> volumeMap.replicas(bpid).iterator(); it.hasNext();) {
> ReplicaInfo block = it.next();
> final StorageLocation blockStorageLocation =
> block.getVolume().getStorageLocation();
> LOG.trace("checking for block " + block.getBlockId() +
> " with storageLocation " + blockStorageLocation);
> if (blockStorageLocation.equals(sdLocation)) {
>   blocks.add(block);
>   it.remove();
> }
>   }
>   blkToInvalidate.put(bpid, blocks);
> }
> {code}
> [https://github.com/apache/hadoop/blob/704409d53bf7ebf717a3c2e988ede80f623bbad3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L580-L595]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15403) NPE in FileIoProvider#transferToSocketFully

2020-06-09 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15403:
-
Attachment: HDFS-15403.001.patch
Status: Patch Available  (was: Open)

> NPE in FileIoProvider#transferToSocketFully
> ---
>
> Key: HDFS-15403
> URL: https://issues.apache.org/jira/browse/HDFS-15403
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15403.001.patch
>
>
> {code:java}
> [DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789java.lang.NullPointerException at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:283)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:614)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:809)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:756)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:610)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292) 
> at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15403) NPE in FileIoProvider#transferToSocketFully

2020-06-09 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-15403:


 Summary: NPE in FileIoProvider#transferToSocketFully
 Key: HDFS-15403
 URL: https://issues.apache.org/jira/browse/HDFS-15403
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina
Assignee: hemanthboyina






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15403) NPE in FileIoProvider#transferToSocketFully

2020-06-09 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15403:
-
Description: 
{code:java}
[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
/127.0.0.1:34789[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
/127.0.0.1:34789java.lang.NullPointerException at 
org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:283)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:614)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:809)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:756)
 at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:610)
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104)
 at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292) at 
java.lang.Thread.run(Thread.java:748) {code}

> NPE in FileIoProvider#transferToSocketFully
> ---
>
> Key: HDFS-15403
> URL: https://issues.apache.org/jira/browse/HDFS-15403
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
>
> {code:java}
> [DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789java.lang.NullPointerException at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:283)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:614)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:809)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:756)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:610)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292) 
> at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15346) RBF: DistCpFedBalance implementation

2020-06-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129686#comment-17129686
 ] 

Hadoop QA commented on HDFS-15346:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 27m  
3s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 15 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 44s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
38s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  5m 
20s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
33s{color} | {color:blue} branch/hadoop-project no findbugs output file 
(findbugsXml.xml) {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
33s{color} | {color:blue} branch/hadoop-assemblies no findbugs output file 
(findbugsXml.xml) {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
34s{color} | {color:blue} branch/hadoop-tools/hadoop-tools-dist no findbugs 
output file (findbugsXml.xml) {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
31s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  6m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
36s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
6s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 48s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
20s{color} | {color:green} the patch passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
32s{color} | {color:blue} hadoop-project has no data from findbugs {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
32s{color} | {color:blue} hadoop-assemblies has no data from findbugs {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
33s{color} | {color:blue} hadoop-tools/hadoop-tools-dist has no 

[jira] [Commented] (HDFS-15292) Infinite loop in Lease Manager due to replica is missing in dn

2020-06-09 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129676#comment-17129676
 ] 

Ayush Saxena commented on HDFS-15292:
-

It is already mention in the code that such a situation can lead to infinite 
loop in lease manager.

{code:java}
  // Cannot close file right now, since some blocks 
  // are not yet minimally replicated.
  // This may potentially cause infinite loop in lease recovery
  // if there are no valid replicas on data-nodes.
  String message = "DIR* NameSystem.internalReleaseLease: " +
  "Failed to release lease for file " + src +
  ". Committed blocks are waiting to be mi
{code}

If this is a frequent occurence, you shouldn't allow files to close with 
committed blocks itself. dfs.namenode.file.close.num-committed-allowed 
shouldn't be set

> Infinite loop in Lease Manager due to replica is missing in dn
> --
>
> Key: HDFS-15292
> URL: https://issues.apache.org/jira/browse/HDFS-15292
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.3
>Reporter: Aaron Guo
>Priority: Major
>
> In our production environment, we found that files of under construction keep 
> growing, and the lease manager is trying to release the lease in a Infinite 
> loop:
> {code:java}
> 2020-04-18 23:10:57,816 WARN  namenode.LeaseManager 
> (LeaseManager.java:checkLeases(589)) - Cannot release the path 
> /user/hadoop/myTestFile.txt in the lease [Lease.  Holder: 
> go-hdfs-7VVGF3sGvHZcsZZC, pending creates: 1]. It will be retried.
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
> NameSystem.internalReleaseLease: Failed to release lease for file 
> /user/hadoop/myTestFile.txt. Committed blocks are waiting to be minimally 
> replicated. Try again later.
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3391)
> at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:586)
> at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:524)
> at java.lang.Thread.run(Thread.java:745)
> {code}
>  This is because the last block of this file can NOT meet the minimum 
> required replica of 1, a  AlreadyBeingCreatedException get thrown, and it 
> will keeps retry forever.
> This infinite loop also cause another issue since the lease manager always 
> trying to release the first lease then goto the next one, so no lease will be 
> released.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12969) DfsAdmin listOpenFiles should report files by type

2020-06-09 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129660#comment-17129660
 ] 

hemanthboyina commented on HDFS-12969:
--

now in the present code , we can list all open files or can list open files 
which are blocking ongoing decommission 

On calling dfsadmin -listOpenFiles -blockingDecommission we list only the files 
which are blocking decommission

but On calling dfsadmin -listOpenFiles we list all open files ,  some of these 
open files can be blocking an ongoing decommission , So for  listOpenFiles 
should we return the list based on type ?

> DfsAdmin listOpenFiles should report files by type
> --
>
> Key: HDFS-12969
> URL: https://issues.apache.org/jira/browse/HDFS-12969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Major
>
> HDFS-11847 has introduced a new option to {{-blockingDecommission}} to an 
> existing command 
> {{dfsadmin -listOpenFiles}}. But the reporting done by the command doesn't 
> differentiate the files based on the type (like blocking decommission). In 
> order to change the reporting style, the proto format used for the base 
> command has to be updated to carry additional fields and better be done in a 
> new jira outside of HDFS-11847. This jira is to track the end-to-end 
> enhancements needed for dfsadmin -listOpenFiles console output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12733) Option to disable to namenode local edits

2020-06-09 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129657#comment-17129657
 ] 

Ayush Saxena commented on HDFS-12733:
-

Thanx [~hexiaoqiao]
can you check this :

{code:java}
else if (dirNames.isEmpty()) {
  dirNames = Collections.singletonList(
  DFSConfigKeys.DFS_NAMENODE_EDITS_DIR_DEFAULT);
}
{code}
This block could have pitched in if noShared was empty, Now this will not 
happen, Is this desirable? 

{code:java}
if (!noSharedEditDirs.isEmpty()) {
{code}

Shouldn't we check have a check that editsDirs isn't empty as well?

[~shv] do give a check, if you have time if your previous concerns are solved 
with this approach.

> Option to disable to namenode local edits
> -
>
> Key: HDFS-12733
> URL: https://issues.apache.org/jira/browse/HDFS-12733
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode, performance
>Reporter: Brahma Reddy Battula
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-12733-001.patch, HDFS-12733-002.patch, 
> HDFS-12733-003.patch, HDFS-12733.004.patch, HDFS-12733.005.patch, 
> HDFS-12733.006.patch, HDFS-12733.007.patch, HDFS-12733.008.patch, 
> HDFS-12733.009.patch
>
>
> As of now, Edits will be written in local and shared locations which will be 
> redundant and local edits never used in HA setup.
> Disabling local edits gives little performance improvement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15398) EC: hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129644#comment-17129644
 ] 

Hadoop QA commented on HDFS-15398:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
5s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m  8s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m  
1s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
30s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 51s{color} | {color:orange} hadoop-hdfs-project: The patch generated 1 new + 
9 unchanged - 0 fixed = 10 total (was 9) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 58s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
33s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
59s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m  1s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}180m 33s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
|   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.server.datanode.TestBPOfferService |
|   | hadoop.hdfs.TestSafeModeWithStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.TestStripedFileAppend |
\\
\\
|| Subsystem || Report/Notes ||

[jira] [Updated] (HDFS-15398) EC: hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15398:

Summary: EC: hdfs client hangs when writing EC file occurs an addBlock 
exception  (was: hdfs client hangs when writing EC file occurs an addBlock 
exception)

> EC: hdfs client hangs when writing EC file occurs an addBlock exception
> ---
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Critical
> Attachments: HDFS-15398.001.patch, HDFS-15398.002.patch, 
> HDFS-15398.003.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:277)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:262)
> {code}
> When an exception occurs in addBlock, the program will call 
> DFSStripedOutputStream.closeImpl() 

[jira] [Commented] (HDFS-15398) hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129638#comment-17129638
 ] 

Ayush Saxena commented on HDFS-15398:
-

https://builds.apache.org/job/PreCommit-HDFS-Build/29411/artifact/out/diff-checkstyle-hadoop-hdfs-project.txt

Need to fix this.

> hdfs client hangs when writing EC file occurs an addBlock exception
> ---
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Critical
> Attachments: HDFS-15398.001.patch, HDFS-15398.002.patch, 
> HDFS-15398.003.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:277)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:262)
> {code}
> When an exception occurs in addBlock, the program will call 
> DFSStripedOutputStream.closeImpl() -> 

[jira] [Commented] (HDFS-15402) Requesting http jmx metrics leads to too much CLOSE-WAIT on datanode

2020-06-09 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129509#comment-17129509
 ] 

Wei-Chiu Chuang commented on HDFS-15402:


Initially I thought it was broken by HADOOP-15696 (KMS, httpfs and Hadoop web 
UI all use Jetty) but no. Hadoop web UI defaults to 10 second idle time out and 
the patch doesn't change the behavior of Hadoop web UI.

Instead, something was probably broken when we updated Jetty from 6 to 9 in 
Hadoop 3.0

> Requesting http jmx metrics leads to too much CLOSE-WAIT on datanode
> 
>
> Key: HDFS-15402
> URL: https://issues.apache.org/jira/browse/HDFS-15402
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 3.1.3
>Reporter: Sean Chow
>Priority: Major
>
> We access  {{http://127.0.0.1:50075/jmx}}  to get datanode metrics 
> periodically. But there is too much CLOSE-WAIT socket state that lead the 
> normal webhdfs request failed.
>  
> {code:java}
> $ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT |head -10
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:37296 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:26499 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:47470 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:42852 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:40281
> $ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT | wc -l 
> 6729
> lsof -i:37296
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> java 101015 hdfs 3044u IPv4 271157177 0t0 TCP 
> localhost:50075->localhost:37296 (CLOSE_WAIT)
> {code}
>  
> The pid 101015 is the datanode's process id.
> I use {{cdh6.1.1}} and {{apache-hadoop-3.1.3}} in my production, and both of 
> them have the same issue. When the metric retriving script stop, the number 
> of CLOSE-WAIT does not increase anymore.
>  The version apache-hadoop-2.9.2 does not have this issue with the same 
> retriving metric script.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15398) hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129499#comment-17129499
 ] 

Hadoop QA commented on HDFS-15398:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
48s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
4s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
53s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m  
6s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
49s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 51s{color} | {color:orange} hadoop-hdfs-project: The patch generated 1 new + 
9 unchanged - 0 fixed = 10 total (was 9) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 46s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
7s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 98m 15s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}181m 53s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
|   | hadoop.hdfs.TestSafeModeWithStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 

[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS

2020-06-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129355#comment-17129355
 ] 

Hadoop QA commented on HDFS-15098:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} prototool {color} | {color:blue}  0m  
0s{color} | {color:blue} prototool was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
24m 41s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
53s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 
58s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 21m 58s{color} | 
{color:red} root generated 25 new + 137 unchanged - 25 fixed = 162 total (was 
162) {color} |
| {color:green}+1{color} | {color:green} golang {color} | {color:green} 21m 
58s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 21m 58s{color} 
| {color:red} root generated 3 new + 1861 unchanged - 0 fixed = 1864 total (was 
1861) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 35s{color} | {color:orange} root: The patch generated 4 new + 211 unchanged 
- 5 fixed = 215 total (was 216) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 54s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 42s{color} 
| {color:red} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
6s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
47s{color} | {color:green} The patch does not generate ASF 

[jira] [Updated] (HDFS-15395) The exception message of "DFS_NAMENODE_SERVICE_RPC_ADDRESS_KEY" is not precise

2020-06-09 Thread Yuanliang Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanliang Zhang updated HDFS-15395:
---
Issue Type: Improvement  (was: Bug)

> The exception message of "DFS_NAMENODE_SERVICE_RPC_ADDRESS_KEY" is not precise
> --
>
> Key: HDFS-15395
> URL: https://issues.apache.org/jira/browse/HDFS-15395
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yuanliang Zhang
>Priority: Major
>
> The exception message of "DFS_NAMENODE_SERVICE_RPC_ADDRESS_KEY" in 
> DFSUtil.java may not be precise.
> The current message is: "Incorrect configuration: namenode address 
> {color:#FF}DFS_NAMENODE_SERVICE_RPC_ADDRESS_KEY *or* 
> DFS_NAMENODE_RPC_ADDRESS_KEY is not configured{color}"
> {code:java}
> public static Map> 
> getNNServiceRpcAddresses(
>   Configuration conf) throws IOException {
> ...
> Map> addressList =
>   DFSUtilClient.getAddresses(conf, defaultAddress,
>  DFS_NAMENODE_SERVICE_RPC_ADDRESS_KEY,
>  DFS_NAMENODE_RPC_ADDRESS_KEY);
> if (addressList.isEmpty()) {
>   throw new IOException("Incorrect configuration: namenode address "
>   + DFS_NAMENODE_SERVICE_RPC_ADDRESS_KEY + " or "  
>   + DFS_NAMENODE_RPC_ADDRESS_KEY
>   + " is not configured.");
> }
> return addressList;
> }
> {code}
> However, from the doc:
> {quote}If the value of this property (dfs.namenode.servicerpc-address) is 
> unset the value of dfs.namenode.rpc-address will be used as the default.
> {quote}
> The code in NameNode.java also confirm this logic. So I think this message 
> may need to refined to be consistent with the doc and code logic that the 
> dfs.namenode.rpc-address should always be set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15372) Files in snapshots no longer see attribute provider permissions

2020-06-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129353#comment-17129353
 ] 

Hadoop QA commented on HDFS-15372:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m 
25s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 47s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 88 unchanged - 1 fixed = 90 total (was 89) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
41s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}117m 58s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}191m 12s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy 
|
|   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.TestStripedFileAppend |
|   | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
|   | hadoop.hdfs.TestRollingUpgrade |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-HDFS-Build/29409/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-15372 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13005212/HDFS-15372.004.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 03c4e1ae5aa4 4.15.0-101-generic #102-Ubuntu SMP Mon 

[jira] [Commented] (HDFS-15398) hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Hongbing Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129339#comment-17129339
 ] 

Hongbing Wang commented on HDFS-15398:
--

I have adjusted the position of Test. Thank you [~ayushtkn] very much for 
taking the time to review. ;)

> hdfs client hangs when writing EC file occurs an addBlock exception
> ---
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Critical
> Attachments: HDFS-15398.001.patch, HDFS-15398.002.patch, 
> HDFS-15398.003.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:277)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:262)
> {code}
> When an exception occurs in addBlock, the program will call 
> DFSStripedOutputStream.closeImpl() -> flushBuffer() -> 

[jira] [Updated] (HDFS-15398) hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Hongbing Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongbing Wang updated HDFS-15398:
-
Attachment: HDFS-15398.003.patch

> hdfs client hangs when writing EC file occurs an addBlock exception
> ---
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Critical
> Attachments: HDFS-15398.001.patch, HDFS-15398.002.patch, 
> HDFS-15398.003.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:277)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:262)
> {code}
> When an exception occurs in addBlock, the program will call 
> DFSStripedOutputStream.closeImpl() -> flushBuffer() -> writeChunk() -> 
> allocateNewBlock() -> waitEndBlocks(), waitEndBlocks will enter an infinite 
> loop because 

[jira] [Created] (HDFS-15402) Requesting http jmx metrics leads to too much CLOSE-WAIT on datanode

2020-06-09 Thread Sean Chow (Jira)
Sean Chow created HDFS-15402:


 Summary: Requesting http jmx metrics leads to too much CLOSE-WAIT 
on datanode
 Key: HDFS-15402
 URL: https://issues.apache.org/jira/browse/HDFS-15402
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: metrics
Affects Versions: 3.1.3
Reporter: Sean Chow


We access  {{http://127.0.0.1:50075/jmx}}  to get datanode metrics 
periodically. But there is too much CLOSE-WAIT socket state that lead the 
normal webhdfs request failed.

 
{code:java}
$ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT |head -10
CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:37296 
CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:26499 
CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:47470 
CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:42852 
CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:40281
$ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT | wc -l 
6729
lsof -i:37296
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 101015 hdfs 3044u IPv4 271157177 0t0 TCP localhost:50075->localhost:37296 
(CLOSE_WAIT)
{code}
 

The pid 101015 is the datanode's process id.

I use {{cdh6.1.1}} and {{apache-hadoop-3.1.3}} in my production, and both of 
them have the same issue. When the metric retriving script stop, the number of 
CLOSE-WAIT does not increase anymore.

 The version apache-hadoop-2.9.2 does not have this issue with the same 
retriving metric script.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15211) EC: File write hangs during close in case of Exception during updatePipeline

2020-06-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129286#comment-17129286
 ] 

Hudson commented on HDFS-15211:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18340 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18340/])
HDFS-15211. EC: File write hangs during close in case of Exception 
(ayushsaxena: rev 852587456173f208f78d0c95046cfd0d8aa1c01c)
* (add) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStripedOutputStreamUpdatePipeline.java


> EC: File write hangs during close in case of Exception during updatePipeline
> 
>
> Key: HDFS-15211
> URL: https://issues.apache.org/jira/browse/HDFS-15211
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.1, 3.3.0, 3.2.1
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-15211-01.patch, HDFS-15211-02.patch, 
> HDFS-15211-03.patch, HDFS-15211-04.patch, HDFS-15211-05.patch, 
> TestToRepro-01.patch, Thread-Dump, Thread-Dump-02
>
>
> Ec file write hangs during file close, if there is a exception due to closure 
> of slow stream, and number of data streamers failed increase more than parity 
> block.
> Since in the close, the Stream will try to flush all the healthy streamers, 
> but the streamers won't be having any result due to exception. and the 
> streamers will stay stuck.
> Hence the close will also get stuck.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15398) hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Hongbing Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129284#comment-17129284
 ] 

Hongbing Wang commented on HDFS-15398:
--

Yahh, give me some minutes. You are too efficient!(y) Thanks!

> hdfs client hangs when writing EC file occurs an addBlock exception
> ---
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Critical
> Attachments: HDFS-15398.001.patch, HDFS-15398.002.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:277)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:262)
> {code}
> When an exception occurs in addBlock, the program will call 
> DFSStripedOutputStream.closeImpl() -> flushBuffer() -> writeChunk() -> 
> allocateNewBlock() -> waitEndBlocks(), waitEndBlocks 

[jira] [Commented] (HDFS-15346) RBF: DistCpFedBalance implementation

2020-06-09 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129283#comment-17129283
 ] 

Jinglun commented on HDFS-15346:


Upload v08. All the comments are fixed except the 'read only in normal 
federation' and the 'optimization of unit tests'. Pending jenkins.

> RBF: DistCpFedBalance implementation
> 
>
> Key: HDFS-15346
> URL: https://issues.apache.org/jira/browse/HDFS-15346
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15346.001.patch, HDFS-15346.002.patch, 
> HDFS-15346.003.patch, HDFS-15346.004.patch, HDFS-15346.005.patch, 
> HDFS-15346.006.patch, HDFS-15346.007.patch, HDFS-15346.008.patch
>
>
> Patch in HDFS-15294 is too big to review so we split it into 2 patches. This 
> is the second one. Detail can be found at HDFS-15294.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15346) RBF: DistCpFedBalance implementation

2020-06-09 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128305#comment-17128305
 ] 

Jinglun edited comment on HDFS-15346 at 6/9/20, 1:33 PM:
-

Hi [~linyiqun], thanks your great comments and valuable suggestions !  I'll 
need some time to shoot all of them. So let me respond to the question first.

 
{quote}Here we reset permission to 0, that means no any operation is allowed? 
Is this expected, why not is 400 (only allow read)? The comment said that 
'cancelling the x permission of the source path.' makes me confused.
{quote}
Yes here we reset the permission to 0. Both read and write in the source path 
and all its sub-paths are denied. As far as I know all the read operations need 
to check its parents' execution permission. So setting to 400 can't make it 
only allowing read. We still can't read its sub-paths. I think the only way to 
make it 'only allowing read' is to recursively reduce each directory's 
permission to 555. Reduce permission means: if the original permission is 777 
then change it to 555. If the original permission is 700 then make it to 500. 
 Saving all the directories' permissions is very expensive. A better way may be 
letting the NameNode to support 'readonly-directory'. I think we can first 
using the '0 permission' way to make sure the data is consistent. Then start a 
sub-task to enable the NameNode 'readonly-directory'. Finally change this to 
the NameNode 'readonly-directory'.

 
{quote}One follow-up task I am thinking that we can have a separated config 
file something named fedbalance-default.xml for fedbalance tool, like 
ditcp-default.xml for distcp tool now. I don't prefer to add all tool config 
settings into hdfs-default.xml.
{quote}
Agree with you ! Using a fedbalance-default.xml is much better.

 
{quote}The test need a little long time to execute the whole test.
{quote}
I'll try to figure it out. But it might be quite tricky as the unit tests use 
both MiniDFSCluster and MiniMRYarnCluster. And there are many rounds of distcp. 
Please tell me if you have any suggestions, thanks !


was (Author: lijinglun):
Hi [~linyiqun], thanks your great comments and valuable suggestions !  I'll 
need some time to shoot all of them. So let me respond to the question first.

 
{quote}Here we reset permission to 0, that means no any operation is allowed? 
Is this expected, why not is 400 (only allow read)? The comment said that 
'cancelling the x permission of the source path.' makes me confused.
{quote}
Yes here we reset the permission to 0. Both read and write in the source path 
and all its sub-paths are denied. As far as I know all the read operations need 
to check its parents' execution permission. So setting to 400 can't make it 
only allowing read. We still can't read its sub-paths. I think the only way to 
make it 'only allowing read' is to recursively reduce each directory's 
permission to 555. Reduce permission means: if the original permission is 777 
then change it to 555. If the original permission is 700 then make it to 500. 
Saving all the directories' permissions is very expensive. A better way may be 
letting the NameNode to support 'readonly-directory'. I think we can first 
using the '0 permission' way to make sure the data is consistent. Then start a 
sub-task to enable the NameNode 'readonly-directory'. Finally change this to 
the NameNode 'readonly-directory'.

> RBF: DistCpFedBalance implementation
> 
>
> Key: HDFS-15346
> URL: https://issues.apache.org/jira/browse/HDFS-15346
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15346.001.patch, HDFS-15346.002.patch, 
> HDFS-15346.003.patch, HDFS-15346.004.patch, HDFS-15346.005.patch, 
> HDFS-15346.006.patch, HDFS-15346.007.patch, HDFS-15346.008.patch
>
>
> Patch in HDFS-15294 is too big to review so we split it into 2 patches. This 
> is the second one. Detail can be found at HDFS-15294.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15346) RBF: DistCpFedBalance implementation

2020-06-09 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-15346:
---
Attachment: HDFS-15346.008.patch

> RBF: DistCpFedBalance implementation
> 
>
> Key: HDFS-15346
> URL: https://issues.apache.org/jira/browse/HDFS-15346
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15346.001.patch, HDFS-15346.002.patch, 
> HDFS-15346.003.patch, HDFS-15346.004.patch, HDFS-15346.005.patch, 
> HDFS-15346.006.patch, HDFS-15346.007.patch, HDFS-15346.008.patch
>
>
> Patch in HDFS-15294 is too big to review so we split it into 2 patches. This 
> is the second one. Detail can be found at HDFS-15294.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15398) hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129273#comment-17129273
 ] 

Ayush Saxena commented on HDFS-15398:
-

I have pushed the missed code from HDFS-15211, you can try to pull the code 
again. :)

> hdfs client hangs when writing EC file occurs an addBlock exception
> ---
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Critical
> Attachments: HDFS-15398.001.patch, HDFS-15398.002.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:277)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:262)
> {code}
> When an exception occurs in addBlock, the program will call 
> DFSStripedOutputStream.closeImpl() -> flushBuffer() -> writeChunk() -> 
> allocateNewBlock() -> 

[jira] [Commented] (HDFS-15398) hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129255#comment-17129255
 ] 

Ayush Saxena commented on HDFS-15398:
-

The code LGTM
Regarding the location of the test {{TestDFSStripedOutputStreamWithFailure}} 
has a cluster already running and has a child class 
{{TestDFSStripedOutputStreamWithFailureWithRandomECPolicy}} as well, the test 
will run twice once as part of 
{{TestDFSStripedOutputStreamWithFailureWithRandomECPolicy}} also.
 bq.  HDFS-15211 maybe not merged.
Yahh, just noticed, it isn't merged, give me some 5-10 minutes, I will commit 
that, Post that you can add your test there. You may change the test class name 
as well to make it generic(your wish)

+1 once done

> hdfs client hangs when writing EC file occurs an addBlock exception
> ---
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Critical
> Attachments: HDFS-15398.001.patch, HDFS-15398.002.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407)
> at 
> 

[jira] [Updated] (HDFS-15398) hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15398:

Status: Patch Available  (was: Open)

> hdfs client hangs when writing EC file occurs an addBlock exception
> ---
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Critical
> Attachments: HDFS-15398.001.patch, HDFS-15398.002.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:277)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:262)
> {code}
> When an exception occurs in addBlock, the program will call 
> DFSStripedOutputStream.closeImpl() -> flushBuffer() -> writeChunk() -> 
> allocateNewBlock() -> waitEndBlocks(), waitEndBlocks will enter an infinite 
> loop because the queue in endBlocks is 

[jira] [Commented] (HDFS-15399) Support include or exclude datanode by configure file

2020-06-09 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129247#comment-17129247
 ] 

Stephen O'Donnell commented on HDFS-15399:
--

Should this be a HDDS jira, rather than HDFS?

> Support include or exclude datanode by configure file
> -
>
> Key: HDFS-15399
> URL: https://issues.apache.org/jira/browse/HDFS-15399
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: maobaolong
>Assignee: maobaolong
>Priority: Major
>
> When i dislike a datanode, or just want to let specific datanode join to SCM, 
> i want to have this feature to limit datanode list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15398) hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Hongbing Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129236#comment-17129236
 ] 

Hongbing Wang commented on HDFS-15398:
--

Thanks [~ayushtkn]. Thank you for your suggestions about code 
{{closeAllStreams();}}. I think it is necessary to take your advice and reuse 
your method \{{closeAllStreams();}}.  

In addition, the test \{{testECWriteHang();}} you given is very useful. I have 
added it to TestDFSStripedOutputStreamWithFailure because tests in HDFS-15211 
maybe not merged.

My current changes have met my expectations, do you have any other better 
suggestions? 

> hdfs client hangs when writing EC file occurs an addBlock exception
> ---
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Critical
> Attachments: HDFS-15398.001.patch, HDFS-15398.002.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342)
> at 
> 

[jira] [Updated] (HDFS-15398) hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Hongbing Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongbing Wang updated HDFS-15398:
-
Attachment: HDFS-15398.002.patch

> hdfs client hangs when writing EC file occurs an addBlock exception
> ---
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Critical
> Attachments: HDFS-15398.001.patch, HDFS-15398.002.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:277)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:262)
> {code}
> When an exception occurs in addBlock, the program will call 
> DFSStripedOutputStream.closeImpl() -> flushBuffer() -> writeChunk() -> 
> allocateNewBlock() -> waitEndBlocks(), waitEndBlocks will enter an infinite 
> loop because the queue in endBlocks is 

[jira] [Updated] (HDFS-15398) hdfs client hangs when writing EC file occurs an addBlock exception

2020-06-09 Thread Hongbing Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongbing Wang updated HDFS-15398:
-
Priority: Critical  (was: Major)
 Summary: hdfs client hangs when writing EC file occurs an addBlock 
exception  (was: hdfs client may hang forever when writing EC file)

> hdfs client hangs when writing EC file occurs an addBlock exception
> ---
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Critical
> Attachments: HDFS-15398.001.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:277)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:262)
> {code}
> When an exception occurs in addBlock, the program will call 
> DFSStripedOutputStream.closeImpl() -> flushBuffer() -> writeChunk() -> 
> 

[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-06-09 Thread zZtai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zZtai updated HDFS-15098:
-
Attachment: HDFS-15098.006.patch
Status: Patch Available  (was: Open)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.download Bouncy Castle Crypto APIs from bouncycastle.org
> [https://bouncycastle.org/download/bcprov-ext-jdk15on-165.jar]
> 2.Configure JDK
> Place bcprov-ext-jdk15on-165.jar in $JAVA_HOME/jre/lib/ext directory,
> add "security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider" 
> to $JAVA_HOME/jre/lib/security/java.security file
> 3.Configure Hadoop KMS
> 4.test HDFS sm4
> hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
> hdfs dfs -mkdir /benchmarks
> hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
> 1.openssl version >=1.1.1
> 2.configure Bouncy Castle Crypto on JDK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-06-09 Thread zZtai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zZtai updated HDFS-15098:
-
Status: Open  (was: Patch Available)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.download Bouncy Castle Crypto APIs from bouncycastle.org
> [https://bouncycastle.org/download/bcprov-ext-jdk15on-165.jar]
> 2.Configure JDK
> Place bcprov-ext-jdk15on-165.jar in $JAVA_HOME/jre/lib/ext directory,
> add "security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider" 
> to $JAVA_HOME/jre/lib/security/java.security file
> 3.Configure Hadoop KMS
> 4.test HDFS sm4
> hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
> hdfs dfs -mkdir /benchmarks
> hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
> 1.openssl version >=1.1.1
> 2.configure Bouncy Castle Crypto on JDK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-06-09 Thread zZtai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zZtai updated HDFS-15098:
-
Attachment: (was: HDFS-15098.006.patch)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.download Bouncy Castle Crypto APIs from bouncycastle.org
> [https://bouncycastle.org/download/bcprov-ext-jdk15on-165.jar]
> 2.Configure JDK
> Place bcprov-ext-jdk15on-165.jar in $JAVA_HOME/jre/lib/ext directory,
> add "security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider" 
> to $JAVA_HOME/jre/lib/security/java.security file
> 3.Configure Hadoop KMS
> 4.test HDFS sm4
> hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
> hdfs dfs -mkdir /benchmarks
> hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
> 1.openssl version >=1.1.1
> 2.configure Bouncy Castle Crypto on JDK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15401) Namenode should log warning if concat/append finds file with large number of blocks

2020-06-09 Thread Lokesh Jain (Jira)
Lokesh Jain created HDFS-15401:
--

 Summary: Namenode should log warning if concat/append finds file 
with large number of blocks
 Key: HDFS-15401
 URL: https://issues.apache.org/jira/browse/HDFS-15401
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lokesh Jain


Namenode should log warning if concat/append finds file has more than 
configured number of blocks. 

This is based on [~weichiu]'s comment 
https://issues.apache.org/jira/browse/HDFS-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128732#comment-17128732.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15400) fsck should log a warning if it finds a file with large number of blocks

2020-06-09 Thread Lokesh Jain (Jira)
Lokesh Jain created HDFS-15400:
--

 Summary: fsck should log a warning if it finds a file with large 
number of blocks
 Key: HDFS-15400
 URL: https://issues.apache.org/jira/browse/HDFS-15400
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lokesh Jain


fsck should log a warning if it finds a file has more than configured number of 
blocks.

This is based on [~weichiu]'s comment 
https://issues.apache.org/jira/browse/HDFS-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128732#comment-17128732.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS

2020-06-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129157#comment-17129157
 ] 

Hadoop QA commented on HDFS-15098:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} prototool {color} | {color:blue}  0m  
0s{color} | {color:blue} prototool was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
54s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
21m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
38s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
36s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 
13s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 19m 13s{color} | 
{color:red} root generated 31 new + 131 unchanged - 31 fixed = 162 total (was 
162) {color} |
| {color:green}+1{color} | {color:green} golang {color} | {color:green} 19m 
13s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 19m 13s{color} 
| {color:red} root generated 4 new + 1865 unchanged - 0 fixed = 1869 total (was 
1865) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m  8s{color} | {color:orange} root: The patch generated 4 new + 211 unchanged 
- 5 fixed = 215 total (was 216) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
35s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 49s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
28s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
9s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} 

[jira] [Updated] (HDFS-15372) Files in snapshots no longer see attribute provider permissions

2020-06-09 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-15372:
-
Attachment: HDFS-15372.004.patch

> Files in snapshots no longer see attribute provider permissions
> ---
>
> Key: HDFS-15372
> URL: https://issues.apache.org/jira/browse/HDFS-15372
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15372.001.patch, HDFS-15372.002.patch, 
> HDFS-15372.003.patch, HDFS-15372.004.patch
>
>
> Given a cluster with an authorization provider configured (eg Sentry) and the 
> paths covered by the provider are snapshotable, there was a change in 
> behaviour in how the provider permissions and ACLs are applied to files in 
> snapshots between the 2.x branch and Hadoop 3.0.
> Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs 
> below are provided by Sentry:
> {code}
> hadoop fs -getfacl -R /data
> # file: /data
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::---
> group:flume:rwx
> user:hive:rwx
> group:hive:rwx
> group:testgroup:rwx
> mask::rwx
> other::--x
> /data/tab1
> {code}
> After taking a snapshot, the files in the snapshot do not see the provider 
> permissions:
> {code}
> hadoop fs -getfacl -R /data/.snapshot
> # file: /data/.snapshot
> # owner: 
> # group: 
> user::rwx
> group::rwx
> other::rwx
> # file: /data/.snapshot/snap1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/.snapshot/snap1/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> {code}
> However pre-Hadoop 3.0 (when the attribute provider etc was extensively 
> refactored) snapshots did get the provider permissions.
> The reason is this code in FSDirectory.java which ultimately calls the 
> attribute provider and passes the path we want permissions for:
> {code}
>   INodeAttributes getAttributes(INodesInPath iip)
>   throws IOException {
> INode node = FSDirectory.resolveLastINode(iip);
> int snapshot = iip.getPathSnapshotId();
> INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
> UserGroupInformation ugi = NameNode.getRemoteUser();
> INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi);
> if (ap != null) {
>   // permission checking sends the full components array including the
>   // first empty component for the root.  however file status
>   // related calls are expected to strip out the root component according
>   // to TestINodeAttributeProvider.
>   byte[][] components = iip.getPathComponents();
>   components = Arrays.copyOfRange(components, 1, components.length);
>   nodeAttrs = ap.getAttributes(components, nodeAttrs);
> }
> return nodeAttrs;
>   }
> {code}
> The line:
> {code}
> INode node = FSDirectory.resolveLastINode(iip);
> {code}
> Picks the last resolved Inode and if you then call node.getPathComponents, 
> for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It 
> resolves the snapshot path to its original location, but its still the 
> snapshot inode.
> However the logic passes 'iip.getPathComponents' which returns 
> "/user/.snapshot/snap1/tab" to the provider.
> The pre Hadoop 3.0 code passes the inode directly to the provider, and hence 
> it only ever sees the path as "/user/data/tab1".
> It is debatable which path should be passed to the provider - 
> /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as 
> the behaviour has changed I feel we should ensure the old behaviour is 
> retained.
> It would also be fairly easy to provide a config switch so the provider gets 
> the full snapshot path or the resolved path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-06-09 Thread zZtai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zZtai updated HDFS-15098:
-
  Attachment: HDFS-15098.006.patch
Release Note: Modify whitespace and supplementary description about 
hadoop.security.openssl.engine.id  (was: When Openssl exists in the 
environment,the system will use SM4 algorithm to encrypt and decrypt through 
Openssl.When Openssl is not available,he system will use SM4 algorithm to 
encrypt and decrypt through Jce.)
  Status: Patch Available  (was: Open)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.download Bouncy Castle Crypto APIs from bouncycastle.org
> [https://bouncycastle.org/download/bcprov-ext-jdk15on-165.jar]
> 2.Configure JDK
> Place bcprov-ext-jdk15on-165.jar in $JAVA_HOME/jre/lib/ext directory,
> add "security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider" 
> to $JAVA_HOME/jre/lib/security/java.security file
> 3.Configure Hadoop KMS
> 4.test HDFS sm4
> hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
> hdfs dfs -mkdir /benchmarks
> hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
> 1.openssl version >=1.1.1
> 2.configure Bouncy Castle Crypto on JDK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-06-09 Thread zZtai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zZtai updated HDFS-15098:
-
Status: Open  (was: Patch Available)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.download Bouncy Castle Crypto APIs from bouncycastle.org
> [https://bouncycastle.org/download/bcprov-ext-jdk15on-165.jar]
> 2.Configure JDK
> Place bcprov-ext-jdk15on-165.jar in $JAVA_HOME/jre/lib/ext directory,
> add "security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider" 
> to $JAVA_HOME/jre/lib/security/java.security file
> 3.Configure Hadoop KMS
> 4.test HDFS sm4
> hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
> hdfs dfs -mkdir /benchmarks
> hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
> 1.openssl version >=1.1.1
> 2.configure Bouncy Castle Crypto on JDK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15398) hdfs client may hang forever when writing EC file

2020-06-09 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128931#comment-17128931
 ] 

Ayush Saxena commented on HDFS-15398:
-

Technically all threads should be closed as part of close only IMO, that is why 
we need to call close, even if there is an exception, else in case of exception 
we could handle everything there itself and avoid calling close itself, but 
that isn't true, we still call close, to clear the other stuff, so I feel if an 
exception occurs, we should close the streamers only since post that no write 
isn't possible on them and let other stuffs be handled as part of the default 
process

> hdfs client may hang forever when writing EC file
> -
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Major
> Attachments: HDFS-15398.001.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342)
> at 
> 

[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS

2020-06-09 Thread lindongdong (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128921#comment-17128921
 ] 

lindongdong commented on HDFS-15098:


[~seanlau] get it~

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.download Bouncy Castle Crypto APIs from bouncycastle.org
> [https://bouncycastle.org/download/bcprov-ext-jdk15on-165.jar]
> 2.Configure JDK
> Place bcprov-ext-jdk15on-165.jar in $JAVA_HOME/jre/lib/ext directory,
> add "security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider" 
> to $JAVA_HOME/jre/lib/security/java.security file
> 3.Configure Hadoop KMS
> 4.test HDFS sm4
> hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
> hdfs dfs -mkdir /benchmarks
> hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
> 1.openssl version >=1.1.1
> 2.configure Bouncy Castle Crypto on JDK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15398) hdfs client may hang forever when writing EC file

2020-06-09 Thread Hongbing Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128912#comment-17128912
 ] 

Hongbing Wang commented on HDFS-15398:
--

Thanx [~ayushtkn] for the link and tests.  I will try some tests later. In 
addition, does it make sense to keep other threads alive if an exception occurs?

> hdfs client may hang forever when writing EC file
> -
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Major
> Attachments: HDFS-15398.001.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:277)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:262)
> {code}
> When an exception occurs in addBlock, the program will call 
> DFSStripedOutputStream.closeImpl() -> flushBuffer() -> writeChunk() -> 
> allocateNewBlock() 

[jira] [Comment Edited] (HDFS-15398) hdfs client may hang forever when writing EC file

2020-06-09 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128897#comment-17128897
 ] 

Ayush Saxena edited comment on HDFS-15398 at 6/9/20, 6:11 AM:
--

Check seems some thing like this reproduces for me : 

{code:java}

  @Test
  public void testECWriteHang() throws Exception {
Configuration conf = new HdfsConfiguration();
conf.setLong(DFS_BLOCK_SIZE_KEY, 1 * 1024 * 1024);
try (MiniDFSCluster cluster = new 
MiniDFSCluster.Builder(conf).numDataNodes(3).build()) {
  cluster.waitActive();
  final DistributedFileSystem dfs = cluster.getFileSystem();
  // Create a file with EC policy
  Path dir = new Path("/test");
  dfs.mkdirs(dir);
  dfs.enableErasureCodingPolicy("XOR-2-1-1024k");
  dfs.setErasureCodingPolicy(dir, "XOR-2-1-1024k");
  Path filePath = new Path("/test/file");
  FSDataOutputStream out = dfs.create(filePath);
  for (int i = 0; i < 1024 * 1024 * 2; i++) {
out.write(i);
  }
  dfs.setQuota(dir, 5, 0);
  try {
for (int i = 0; i < 1024 * 1024 * 2; i++) {
  out.write(i);
}
  } catch (Exception e) {
dfs.delete(filePath, true);
  } finally {
// The close should be success, shouldn't get stuck.
IOUtils.closeStream(out);
  }
}
  }
{code}

You can try using this, can add the test in same file as HDFS-15211
Will just {{ closeAllStreamers();}} work? I guess.
Rest of threads we should keep for {{closeImpl}} to close?


was (Author: ayushtkn):
Check seems some thing like this reproduces for me : 

{code:java}

  @Test
  public void testECWriteHang() throws Exception {
Configuration conf = new HdfsConfiguration();
conf.setLong(DFS_BLOCK_SIZE_KEY, 1 * 1024 * 1024);
try (MiniDFSCluster cluster = new 
MiniDFSCluster.Builder(conf).numDataNodes(3).build()) {
  cluster.waitActive();
  final DistributedFileSystem dfs = cluster.getFileSystem();
  // Create a file with EC policy
  Path dir = new Path("/test");
  dfs.mkdirs(dir);
  dfs.enableErasureCodingPolicy("XOR-2-1-1024k");
  dfs.setErasureCodingPolicy(dir, "XOR-2-1-1024k");
  Path filePath = new Path("/test/file");
  FSDataOutputStream out = dfs.create(filePath);
  for (int i = 0; i < 1024 * 1024 * 2; i++) {
out.write(i);
  }
  dfs.setQuota(dir, 5, 0);
  try {
for (int i = 0; i < 1024 * 1024 * 2; i++) {
  out.write(i);
}
  } catch (Exception e) {
dfs.delete(filePath, true);
  } finally {
// The close should be success, shouldn't get stuck.
IOUtils.closeStream(out);
  }
}
  }
{code}

You can try using this, can add the test in same file as HDFS-15211

> hdfs client may hang forever when writing EC file
> -
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Major
> Attachments: HDFS-15398.001.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> 

[jira] [Commented] (HDFS-15398) hdfs client may hang forever when writing EC file

2020-06-09 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128897#comment-17128897
 ] 

Ayush Saxena commented on HDFS-15398:
-

Check seems some thing like this reproduces for me : 

{code:java}

  @Test
  public void testECWriteHang() throws Exception {
Configuration conf = new HdfsConfiguration();
conf.setLong(DFS_BLOCK_SIZE_KEY, 1 * 1024 * 1024);
try (MiniDFSCluster cluster = new 
MiniDFSCluster.Builder(conf).numDataNodes(3).build()) {
  cluster.waitActive();
  final DistributedFileSystem dfs = cluster.getFileSystem();
  // Create a file with EC policy
  Path dir = new Path("/test");
  dfs.mkdirs(dir);
  dfs.enableErasureCodingPolicy("XOR-2-1-1024k");
  dfs.setErasureCodingPolicy(dir, "XOR-2-1-1024k");
  Path filePath = new Path("/test/file");
  FSDataOutputStream out = dfs.create(filePath);
  for (int i = 0; i < 1024 * 1024 * 2; i++) {
out.write(i);
  }
  dfs.setQuota(dir, 5, 0);
  try {
for (int i = 0; i < 1024 * 1024 * 2; i++) {
  out.write(i);
}
  } catch (Exception e) {
dfs.delete(filePath, true);
  } finally {
// The close should be success, shouldn't get stuck.
IOUtils.closeStream(out);
  }
}
  }
{code}

You can try using this, can add the test in same file as HDFS-15211

> hdfs client may hang forever when writing EC file
> -
>
> Key: HDFS-15398
> URL: https://issues.apache.org/jira/browse/HDFS-15398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs-client
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Priority: Major
> Attachments: HDFS-15398.001.patch
>
>
>  In the operation of writing EC files, when the client calls addBlock() 
> applying for the second block group (or >= the second block group) and it 
> happens to exceed quota at this time, the client program will hang forever. 
>  See the demo below:
> {code:java}
> $ hadoop fs -mkdir -p /user/wanghongbing/quota/ec
> $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota
> $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k
> Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec
> $ hadoop fs -put 800m /user/wanghongbing/quota/ec
> ^@^@^@^@^@^@^@^@^Z
> {code}
> In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group 
> needs to apply for 1152M physical space to write 768M logical data. 
> Therefore, writing 800M data will exceed quota when applying for the second 
> block group. At this point, the client will be hang forever.
> The exception stack of client is as follows:
> {code:java}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x8009d5d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
> - locked <0x8009f758> (a 
> org.apache.hadoop.hdfs.DFSStripedOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
>