[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-16 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159671#comment-17159671
 ] 

Ayush Saxena commented on HDFS-15443:
-

Thanx [~elgoiri] 
No objections from my side

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159662#comment-17159662
 ] 

Íñigo Goiri commented on HDFS-15443:


+1 on  [^HDFS-15443.001.patch].
[~ayushtkn] any objections?

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-16 Thread AMC-team (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159657#comment-17159657
 ] 

AMC-team commented on HDFS-15443:
-

[~elgoiri]

Sorry about my late feedback. I have uploaded a new patch. Thanks for reviewing 
it.

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-16 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15443:

Attachment: HDFS-15443.001.patch
Status: Patch Available  (was: Open)

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.

2020-07-16 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15439:

Attachment: HDFS-15439.000.patch
Status: Patch Available  (was: Open)

> Setting dfs.mover.retry.max.attempts to negative value will retry forever.
> --
>
> Key: HDFS-15439
> URL: https://issues.apache.org/jira/browse/HDFS-15439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15439.000.patch
>
>
> Configuration parameter "dfs.mover.retry.max.attempts" is to define the 
> maximum number of retries before the mover consider the move failed. There is 
> no checking code so this parameter can accept any int value.
> Theoratically, setting this value to <=0 should mean that no retry at all. 
> However, if you set the value to negative value. The checking condition for 
> retry failed will never satisfied because the if statement is "*if 
> (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by 
> retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* 
> {code:java}
> private Result processNamespace() throws IOException {
>   ... //wait for pending move to finish and retry the failed migration
>   if (hasFailed && !hasSuccess) {
> if (retryCount.get() == retryMaxAttempts) {
>   result.setRetryFailed();
>   LOG.error("Failed to move some block's after "
>   + retryMaxAttempts + " retries.");
>   return result;
> } else {
>   retryCount.incrementAndGet();
> }
>   } else {
> // Reset retry count if no failure.
> retryCount.set(0);
>   }
>   ...
> }
> {code}
> *How to fix*
> Add checking code of "dfs.mover.retry.max.attempts" to accept only 
> non-negative value or change the if statement condition when retry count 
> exceeds max attempts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.

2020-07-16 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15439:

Attachment: (was: HDFS-15439.000.patch)

> Setting dfs.mover.retry.max.attempts to negative value will retry forever.
> --
>
> Key: HDFS-15439
> URL: https://issues.apache.org/jira/browse/HDFS-15439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15439.000.patch
>
>
> Configuration parameter "dfs.mover.retry.max.attempts" is to define the 
> maximum number of retries before the mover consider the move failed. There is 
> no checking code so this parameter can accept any int value.
> Theoratically, setting this value to <=0 should mean that no retry at all. 
> However, if you set the value to negative value. The checking condition for 
> retry failed will never satisfied because the if statement is "*if 
> (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by 
> retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* 
> {code:java}
> private Result processNamespace() throws IOException {
>   ... //wait for pending move to finish and retry the failed migration
>   if (hasFailed && !hasSuccess) {
> if (retryCount.get() == retryMaxAttempts) {
>   result.setRetryFailed();
>   LOG.error("Failed to move some block's after "
>   + retryMaxAttempts + " retries.");
>   return result;
> } else {
>   retryCount.incrementAndGet();
> }
>   } else {
> // Reset retry count if no failure.
> retryCount.set(0);
>   }
>   ...
> }
> {code}
> *How to fix*
> Add checking code of "dfs.mover.retry.max.attempts" to accept only 
> non-negative value or change the if statement condition when retry count 
> exceeds max attempts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-16 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15438:

Attachment: (was: HDFS-15438.000.patch)

> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-16 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15438:

Attachment: HDFS-15438.000.patch
Status: Patch Available  (was: Open)

> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15025) Applying NVDIMM storage media to HDFS

2020-07-16 Thread huangtianhua (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159614#comment-17159614
 ] 

huangtianhua commented on HDFS-15025:
-

[~weichiu][~ayushtkn][~brahma] Would you please help to review this, thanks 
very much:)

> Applying NVDIMM storage media to HDFS
> -
>
> Key: HDFS-15025
> URL: https://issues.apache.org/jira/browse/HDFS-15025
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, hdfs
>Reporter: hadoop_hdfs_hw
>Priority: Major
> Attachments: Applying NVDIMM to HDFS.pdf, HDFS-15025.001.patch, 
> HDFS-15025.002.patch, HDFS-15025.003.patch, HDFS-15025.004.patch, 
> HDFS-15025.005.patch, HDFS-15025.006.patch, NVDIMM_patch(WIP).patch
>
>
> The non-volatile memory NVDIMM is faster than SSD, it can be used 
> simultaneously with RAM, DISK, SSD. The data of HDFS stored directly on 
> NVDIMM can not only improves the response rate of HDFS, but also ensure the 
> reliability of the data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15404) ShellCommandFencer should expose info about source

2020-07-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159551#comment-17159551
 ] 

Hadoop QA commented on HDFS-15404:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
55s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
43s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
23m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
48s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m 
11s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m  
6s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 52s{color} | {color:orange} root: The patch generated 4 new + 67 unchanged - 
0 fixed = 71 total (was 67) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
34s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
26s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}109m 31s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
55s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}256m 20s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.fs.viewfs.TestViewFileSystemLinkFallback |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
|   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
|   | hadoop.hdfs.TestMultipleNNPortQOP |
|   | hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-HDFS-Build/29521/artifact/out/Dockerfile
 |
| JIRA 

[jira] [Updated] (HDFS-15468) Active namenode crashed when no edit recover

2020-07-16 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15468:
---
Component/s: journal-node
 ha

> Active namenode crashed when no edit recover
> 
>
> Key: HDFS-15468
> URL: https://issues.apache.org/jira/browse/HDFS-15468
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, journal-node, namenode
>Affects Versions: 3.0.0
>Reporter: Karthik Palanisamy
>Priority: Critical
>
> if namenode is under safe mode and let restart two journal node for 
> maintenance activity.
>  In this case, the journal node will not finalize the last edit segment which 
> is edit in-progress. 
>  This last edit segment will be finalized or recovered when edit rolling 
> operation else when epoch change due to namenode failover.
>  But the current scenario is no failover, just namenode is under safe mode. 
> If we leave the safe mode then active namenode will crash.
>  Ie.
>  the current open segment is edits_inprogress_10356376710 but it is 
> not recovered or finalized post JN2 restart. I think we need to recover the 
> edits after JN restart. 
> {code:java}
> Journal node 
> 2020-06-20 16:11:53,458 INFO  server.Journal 
> (Journal.java:scanStorageForLatestEdits(193)) - Latest log is 
> EditLogFile(file=/hadoop/hdfs/journal/xxx/current/edits_inprogress_10356376710,first=10356376710,last=10356376710,inProgress=true,hasCorruptHeader=false)
> 2020-06-20 16:19:06,397 INFO  ipc.Server (Server.java:logException(2435)) - 
> IPC Server handler 3 on 8485, call 
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.journal from 
> 10.x.x.x:28444 Call#49083225 Retry#0
> org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't 
> write, no segment open
>         at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:484)
> {code}
> {code:java}
> {code:java}
> Namenode log:
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many 
> exceptions to achieve quorum size 2/3. 1 successful responses:
> 10.x.x.x:8485: null [success]
> 2 exceptions thrown:
> 10.y.y.y:8485: Can't write, no segment open
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15404) ShellCommandFencer should expose info about source

2020-07-16 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159408#comment-17159408
 ] 

Chen Liang commented on HDFS-15404:
---

Thanks for taking a look [~shv]! In general, I agree that should be fine with 
tooling, I don't have specific example, was mainly brainstorming any potential 
concerns. 

Uploaded v005 patch.

> ShellCommandFencer should expose info about source
> --
>
> Key: HDFS-15404
> URL: https://issues.apache.org/jira/browse/HDFS-15404
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-15404.001.patch, HDFS-15404.002.patch, 
> HDFS-15404.003.patch, HDFS-15404.004.patch, HDFS-15404.005.patch
>
>
> Currently the HA fencing logic in ShellCommandFencer exposes environment 
> variable about only the fencing target. i.e. the $target_* variables as 
> mentioned in this [document 
> page|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html]).
>  
> But here only the fencing target variables are getting exposed. Sometimes it 
> is useful to expose info about the fencing source node. One use case is would 
> allow source and target node to identify themselves separately and run 
> different commands/scripts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15404) ShellCommandFencer should expose info about source

2020-07-16 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-15404:
--
Attachment: HDFS-15404.005.patch

> ShellCommandFencer should expose info about source
> --
>
> Key: HDFS-15404
> URL: https://issues.apache.org/jira/browse/HDFS-15404
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-15404.001.patch, HDFS-15404.002.patch, 
> HDFS-15404.003.patch, HDFS-15404.004.patch, HDFS-15404.005.patch
>
>
> Currently the HA fencing logic in ShellCommandFencer exposes environment 
> variable about only the fencing target. i.e. the $target_* variables as 
> mentioned in this [document 
> page|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html]).
>  
> But here only the fencing target variables are getting exposed. Sometimes it 
> is useful to expose info about the fencing source node. One use case is would 
> allow source and target node to identify themselves separately and run 
> different commands/scripts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15472) Erasure Coding: Support fallback read when zero copy is not supported

2020-07-16 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159403#comment-17159403
 ] 

Wei-Chiu Chuang commented on HDFS-15472:


thanks for the patch. I think the patch is fine.

We should consider using StreamCapabilities class to probe for these features 
(adding a ZEROBYTEREAD?). DFSStripedInputStream should extend hasCapability() 
and return streaming capabilities accordingly.

> Erasure Coding: Support fallback read when zero copy is not supported
> -
>
> Key: HDFS-15472
> URL: https://issues.apache.org/jira/browse/HDFS-15472
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: dzcxzl
>Priority: Trivial
> Attachments: HDFS-15472.000.patch
>
>
> [HDFS-8203|https://issues.apache.org/jira/browse/HDFS-8203] 
> ec does not support zeor copy read, but currently does not support fallback 
> read, it will throw an exception.
> {code:java}
> Caused by: java.lang.UnsupportedOperationException: Not support enhanced byte 
> buffer access.
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.read(DFSStripedInputStream.java:524)
> at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:188)
> at 
> org.apache.hadoop.hive.shims.ZeroCopyShims$ZeroCopyAdapter.readBuffer(ZeroCopyShims.java:79)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15473) Add listSnapshots command to list all snapshots

2020-07-16 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159399#comment-17159399
 ] 

Wei-Chiu Chuang commented on HDFS-15473:


[~hemanthboyina] can you make an example?

Say the cluster has snapshottable directories /tmp and /user, and they have 
snapshots s0, s1 and s2, s3 respectively. Is this command going to list all 
snapshots s0, s1, s2 and s3?

> Add listSnapshots command to list all snapshots
> ---
>
> Key: HDFS-15473
> URL: https://issues.apache.org/jira/browse/HDFS-15473
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hemanth Boyina
>Assignee: Hemanth Boyina
>Priority: Major
>
> In a cluster where snapshots are highly used , it will benefit to have a 
> command to list all the snapshots under root



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

2020-07-16 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-14498:
---
Fix Version/s: 3.1.5
   2.10.1
   3.2.2
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> LeaseManager can loop forever on the file for which create has failed 
> --
>
> Key: HDFS-14498
> URL: https://issues.apache.org/jira/browse/HDFS-14498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.9.0
>Reporter: Sergey Shelukhin
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.2.2, 2.10.1, 3.3.1, 3.4.0, 3.1.5
>
> Attachments: HDFS-14498-branch-2.10.001.patch, 
> HDFS-14498-branch-2.10.002.patch, HDFS-14498-branch-2.10.002.patch, 
> HDFS-14498-branch-2.10.003.patch, HDFS-14498-branch-2.10.004.patch, 
> HDFS-14498-branch-3.1.001.patch, HDFS-14498-branch-3.1.001.patch, 
> HDFS-14498-branch-3.1.001.patch, HDFS-14498-branch-3.2.001.patch, 
> HDFS-14498.001.patch, HDFS-14498.002.patch
>
>
> The logs from file creation are long gone due to infinite lease logging, 
> however it presumably failed... the client who was trying to write this file 
> is definitely long dead.
> The version includes HDFS-4882.
> We get this log pattern repeating infinitely:
> {noformat}
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard 
> limit
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
> Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: 
> Failed to release lease for file . Committed blocks are waiting to be 
> minimally replicated. Try again later.
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path 
>  in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, 
> pending creates: 1]. It will be retried.
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
> NameSystem.internalReleaseLease: Failed to release lease for file . 
> Committed blocks are waiting to be minimally replicated. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509)
>   at java.lang.Thread.run(Thread.java:745)
> $  grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 
> 1" hdfs_nn*
> hdfs_nn.log:1068035
> hdfs_nn.log.2019-05-16-14:1516179
> hdfs_nn.log.2019-05-16-15:1538350
> {noformat}
> Aside from an actual bug fix, it might make sense to make LeaseManager not 
> log so much, in case if there are more bugs like this...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

2020-07-16 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159331#comment-17159331
 ] 

Xiaoqiao He commented on HDFS-14498:


+1. Commit to branch-3.2/branch-3.1/branch-2.10.
Thanks [~sodonnell] and every reviewers.

> LeaseManager can loop forever on the file for which create has failed 
> --
>
> Key: HDFS-14498
> URL: https://issues.apache.org/jira/browse/HDFS-14498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.9.0
>Reporter: Sergey Shelukhin
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-14498-branch-2.10.001.patch, 
> HDFS-14498-branch-2.10.002.patch, HDFS-14498-branch-2.10.002.patch, 
> HDFS-14498-branch-2.10.003.patch, HDFS-14498-branch-2.10.004.patch, 
> HDFS-14498-branch-3.1.001.patch, HDFS-14498-branch-3.1.001.patch, 
> HDFS-14498-branch-3.1.001.patch, HDFS-14498-branch-3.2.001.patch, 
> HDFS-14498.001.patch, HDFS-14498.002.patch
>
>
> The logs from file creation are long gone due to infinite lease logging, 
> however it presumably failed... the client who was trying to write this file 
> is definitely long dead.
> The version includes HDFS-4882.
> We get this log pattern repeating infinitely:
> {noformat}
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard 
> limit
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
> Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: 
> Failed to release lease for file . Committed blocks are waiting to be 
> minimally replicated. Try again later.
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path 
>  in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, 
> pending creates: 1]. It will be retried.
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
> NameSystem.internalReleaseLease: Failed to release lease for file . 
> Committed blocks are waiting to be minimally replicated. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509)
>   at java.lang.Thread.run(Thread.java:745)
> $  grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 
> 1" hdfs_nn*
> hdfs_nn.log:1068035
> hdfs_nn.log.2019-05-16-14:1516179
> hdfs_nn.log.2019-05-16-15:1538350
> {noformat}
> Aside from an actual bug fix, it might make sense to make LeaseManager not 
> log so much, in case if there are more bugs like this...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

2020-07-16 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159219#comment-17159219
 ] 

Stephen O'Donnell commented on HDFS-14498:
--

With  HDFS-14498-branch-2.10.004.patch:

* Checkstyle issues are fixed.
* The Javadoc warnings are not revelant I think - they are referring to exiting 
code untouched by this patch.
* Unit test failures are not relevant.

I think we are good to commit the 2.10, 3.1 and 3.2 branch patches - what do 
you think [~hexiaoqiao] ?

> LeaseManager can loop forever on the file for which create has failed 
> --
>
> Key: HDFS-14498
> URL: https://issues.apache.org/jira/browse/HDFS-14498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.9.0
>Reporter: Sergey Shelukhin
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-14498-branch-2.10.001.patch, 
> HDFS-14498-branch-2.10.002.patch, HDFS-14498-branch-2.10.002.patch, 
> HDFS-14498-branch-2.10.003.patch, HDFS-14498-branch-2.10.004.patch, 
> HDFS-14498-branch-3.1.001.patch, HDFS-14498-branch-3.1.001.patch, 
> HDFS-14498-branch-3.1.001.patch, HDFS-14498-branch-3.2.001.patch, 
> HDFS-14498.001.patch, HDFS-14498.002.patch
>
>
> The logs from file creation are long gone due to infinite lease logging, 
> however it presumably failed... the client who was trying to write this file 
> is definitely long dead.
> The version includes HDFS-4882.
> We get this log pattern repeating infinitely:
> {noformat}
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard 
> limit
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
> Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: 
> Failed to release lease for file . Committed blocks are waiting to be 
> minimally replicated. Try again later.
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path 
>  in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, 
> pending creates: 1]. It will be retried.
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
> NameSystem.internalReleaseLease: Failed to release lease for file . 
> Committed blocks are waiting to be minimally replicated. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509)
>   at java.lang.Thread.run(Thread.java:745)
> $  grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 
> 1" hdfs_nn*
> hdfs_nn.log:1068035
> hdfs_nn.log.2019-05-16-14:1516179
> hdfs_nn.log.2019-05-16-15:1538350
> {noformat}
> Aside from an actual bug fix, it might make sense to make LeaseManager not 
> log so much, in case if there are more bugs like this...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

2020-07-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159204#comment-17159204
 ] 

Hadoop QA commented on HDFS-14498:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.10 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
51s{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
22s{color} | {color:green} branch-2.10 passed with JDK Oracle 
Corporation-1.7.0_95-b00 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} branch-2.10 passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
14s{color} | {color:green} branch-2.10 passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
21s{color} | {color:red} hadoop-hdfs in branch-2.10 failed with JDK Oracle 
Corporation-1.7.0_95-b00. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} branch-2.10 passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m  
3s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
0s{color} | {color:green} branch-2.10 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed with JDK Oracle 
Corporation-1.7.0_95-b00 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  1m 
21s{color} | {color:red} 
hadoop-hdfs-project_hadoop-hdfs-jdkOracleCorporation-1.7.0_95-b00 with JDK 
Oracle Corporation-1.7.0_95-b00 generated 9 new + 0 unchanged - 0 fixed = 9 
total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 30s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}125m 13s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestMultipleNNPortQOP |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | 

[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159196#comment-17159196
 ] 

Hadoop QA commented on HDFS-15098:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} prototool {color} | {color:blue}  0m  
1s{color} | {color:blue} prototool was not available. {color} |
| {color:blue}0{color} | {color:blue} markdownlint {color} | {color:blue}  0m  
1s{color} | {color:blue} markdownlint was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
57s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
22m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
23s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m 
13s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
45s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
26s{color} | {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
18s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
49s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  0m 49s{color} | 
{color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} golang {color} | {color:red}  0m 
49s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 49s{color} 
| {color:red} root in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 44s{color} | {color:orange} root: The patch generated 126 new + 229 
unchanged - 7 fixed = 355 total (was 236) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
29s{color} | {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
19s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 7 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 42 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  0m 
42s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
16s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} |
| 

[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete

2020-07-16 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-15313:
-
Fix Version/s: 3.1.5

> Ensure inodes in active filesytem are not deleted during snapshot delete
> 
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5
>
> Attachments: HDFS-15313-branch-3.1.001.patch, HDFS-15313.000.patch, 
> HDFS-15313.001.patch, HDFS-15313.branch-2.10.patch, 
> HDFS-15313.branch-2.8.patch
>
>
> After HDFS-13101, it was observed in one of our customer deployments that 
> delete snapshot ends up cleaning up inodes from active fs which can be 
> referred from only one snapshot as the isLastReference() check for the parent 
> dir introduced in HDFS-13101 may return true in certain cases. The aim of 
> this Jira to add a check to ensure if the Inodes are being referred in the 
> active fs , should not get deleted while deletion of snapshot happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

2020-07-16 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159134#comment-17159134
 ] 

Stephen O'Donnell commented on HDFS-14498:
--

Upload a v004 branch-2.10 patch to address the checkstyle line length issue 
from 003. Other than that, it is identical.

> LeaseManager can loop forever on the file for which create has failed 
> --
>
> Key: HDFS-14498
> URL: https://issues.apache.org/jira/browse/HDFS-14498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.9.0
>Reporter: Sergey Shelukhin
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-14498-branch-2.10.001.patch, 
> HDFS-14498-branch-2.10.002.patch, HDFS-14498-branch-2.10.002.patch, 
> HDFS-14498-branch-2.10.003.patch, HDFS-14498-branch-2.10.004.patch, 
> HDFS-14498-branch-3.1.001.patch, HDFS-14498-branch-3.1.001.patch, 
> HDFS-14498-branch-3.1.001.patch, HDFS-14498-branch-3.2.001.patch, 
> HDFS-14498.001.patch, HDFS-14498.002.patch
>
>
> The logs from file creation are long gone due to infinite lease logging, 
> however it presumably failed... the client who was trying to write this file 
> is definitely long dead.
> The version includes HDFS-4882.
> We get this log pattern repeating infinitely:
> {noformat}
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard 
> limit
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
> Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: 
> Failed to release lease for file . Committed blocks are waiting to be 
> minimally replicated. Try again later.
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path 
>  in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, 
> pending creates: 1]. It will be retried.
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
> NameSystem.internalReleaseLease: Failed to release lease for file . 
> Committed blocks are waiting to be minimally replicated. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509)
>   at java.lang.Thread.run(Thread.java:745)
> $  grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 
> 1" hdfs_nn*
> hdfs_nn.log:1068035
> hdfs_nn.log.2019-05-16-14:1516179
> hdfs_nn.log.2019-05-16-15:1538350
> {noformat}
> Aside from an actual bug fix, it might make sense to make LeaseManager not 
> log so much, in case if there are more bugs like this...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

2020-07-16 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-14498:
-
Attachment: HDFS-14498-branch-2.10.004.patch

> LeaseManager can loop forever on the file for which create has failed 
> --
>
> Key: HDFS-14498
> URL: https://issues.apache.org/jira/browse/HDFS-14498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.9.0
>Reporter: Sergey Shelukhin
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-14498-branch-2.10.001.patch, 
> HDFS-14498-branch-2.10.002.patch, HDFS-14498-branch-2.10.002.patch, 
> HDFS-14498-branch-2.10.003.patch, HDFS-14498-branch-2.10.004.patch, 
> HDFS-14498-branch-3.1.001.patch, HDFS-14498-branch-3.1.001.patch, 
> HDFS-14498-branch-3.1.001.patch, HDFS-14498-branch-3.2.001.patch, 
> HDFS-14498.001.patch, HDFS-14498.002.patch
>
>
> The logs from file creation are long gone due to infinite lease logging, 
> however it presumably failed... the client who was trying to write this file 
> is definitely long dead.
> The version includes HDFS-4882.
> We get this log pattern repeating infinitely:
> {noformat}
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard 
> limit
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
> Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: 
> Failed to release lease for file . Committed blocks are waiting to be 
> minimally replicated. Try again later.
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path 
>  in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, 
> pending creates: 1]. It will be retried.
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
> NameSystem.internalReleaseLease: Failed to release lease for file . 
> Committed blocks are waiting to be minimally replicated. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509)
>   at java.lang.Thread.run(Thread.java:745)
> $  grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 
> 1" hdfs_nn*
> hdfs_nn.log:1068035
> hdfs_nn.log.2019-05-16-14:1516179
> hdfs_nn.log.2019-05-16-15:1538350
> {noformat}
> Aside from an actual bug fix, it might make sense to make LeaseManager not 
> log so much, in case if there are more bugs like this...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15469) Dynamically configure the size of PacketReceiver#MAX_PACKET_SIZE

2020-07-16 Thread jianghua zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159080#comment-17159080
 ] 

jianghua zhu commented on HDFS-15469:
-

[~hexiaoqiao] , thank you very much.

> Dynamically configure the size of PacketReceiver#MAX_PACKET_SIZE
> 
>
> Key: HDFS-15469
> URL: https://issues.apache.org/jira/browse/HDFS-15469
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.3
>Reporter: jianghua zhu
>Assignee: jianghua zhu
>Priority: Major
> Attachments: HDFS-15469.001.patch
>
>
> Now the value of PacketReceiver#MAX_PACKET_SIZE is fixed and the size is 16M. 
> This value should be configurable to facilitate better performance in 
> different environments. For example, when the network environment is poor, or 
> the machine quality is not good, and the hard disk quality is not good, this 
> value should be set below 16M, such as 8M, which will be more conducive to 
> the stability of the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-16 Thread zZtai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zZtai updated HDFS-15098:
-
Attachment: HDFS-15098.009.patch
Status: Patch Available  (was: Open)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, 
> HDFS-15098.009.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-16 Thread zZtai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zZtai updated HDFS-15098:
-
Status: Open  (was: Patch Available)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error

2020-07-16 Thread Hongbing Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158991#comment-17158991
 ] 

Hongbing Wang commented on HDFS-15240:
--

[~marvelrock] We have the same problem. At the same time as this problem, there 
are frequent FullGCs (every few seconds). Dump and MAT found that there are 
lots of ecWorker objects that almost fill the entire heap.
!image-2020-07-16-15-56-38-608.png|width=722,height=591!

look forward this patch into trunk.

 

> Erasure Coding: dirty buffer causes reconstruction block error
> --
>
> Key: HDFS-15240
> URL: https://issues.apache.org/jira/browse/HDFS-15240
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15240.001.patch, HDFS-15240.002.patch, 
> HDFS-15240.003.patch, HDFS-15240.004.patch, HDFS-15240.005.patch, 
> image-2020-07-16-15-56-38-608.png
>
>
> When read some lzo files we found some blocks were broken.
> I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from 
> DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') 
> blocks. And find the longest common sequenece(LCS) between b6'(decoded) and 
> b6(read from DN)(b7'/b7 and b8'/b8).
> After selecting 6 blocks of the block group in combinations one time and 
> iterating through all cases, I find one case that the length of LCS is the 
> block length - 64KB, 64KB is just the length of ByteBuffer used by 
> StripedBlockReader. So the corrupt reconstruction block is made by a dirty 
> buffer.
> The following log snippet(only show 2 of 28 cases) is my check program 
> output. In my case, I known the 3th block is corrupt, so need other 5 blocks 
> to decode another 3 blocks, then find the 1th block's LCS substring is block 
> length - 64kb.
> It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the 
> dirty buffer was used before read the 1th block.
> Must be noted that StripedBlockReader read from the offset 0 of the 1th block 
> after used the dirty buffer.
> {code:java}
> decode from [0, 2, 3, 4, 5, 7] -> [1, 6, 8]
> Check Block(1) first 131072 bytes longest common substring length 4
> Check Block(6) first 131072 bytes longest common substring length 4
> Check Block(8) first 131072 bytes longest common substring length 4
> decode from [0, 2, 3, 4, 5, 6] -> [1, 7, 8]
> Check Block(1) first 131072 bytes longest common substring length 65536
> CHECK AGAIN: Block(1) all 27262976 bytes longest common substring length 
> 27197440  # this one
> Check Block(7) first 131072 bytes longest common substring length 4
> Check Block(8) first 131072 bytes longest common substring length 4{code}
> Now I know the dirty buffer causes reconstruction block error, but how does 
> the dirty buffer come about?
> After digging into the code and DN log, I found this following DN log is the 
> root reason.
> {code:java}
> [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel 
> java.nio.channels.SocketChannel[connected local=/:52586 
> remote=/:50010]. 18 millis timeout left.
> [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped 
> block: BP-714356632--1519726836856:blk_-YY_3472979393
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834) {code}
> Reading from DN may timeout(hold by a future(F)) and output the INFO log, but 
> the futures that contains the future(F)  is cleared, 
> {code:java}
> return new StripingChunkReadResult(futures.remove(future),
> StripingChunkReadResult.CANCELLED); {code}
> futures.remove(future) cause NPE. So the EC reconstruction is failed. In the 
> finally phase, the 

[jira] [Updated] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error

2020-07-16 Thread Hongbing Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongbing Wang updated HDFS-15240:
-
Attachment: image-2020-07-16-15-56-38-608.png

> Erasure Coding: dirty buffer causes reconstruction block error
> --
>
> Key: HDFS-15240
> URL: https://issues.apache.org/jira/browse/HDFS-15240
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15240.001.patch, HDFS-15240.002.patch, 
> HDFS-15240.003.patch, HDFS-15240.004.patch, HDFS-15240.005.patch, 
> image-2020-07-16-15-56-38-608.png
>
>
> When read some lzo files we found some blocks were broken.
> I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from 
> DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') 
> blocks. And find the longest common sequenece(LCS) between b6'(decoded) and 
> b6(read from DN)(b7'/b7 and b8'/b8).
> After selecting 6 blocks of the block group in combinations one time and 
> iterating through all cases, I find one case that the length of LCS is the 
> block length - 64KB, 64KB is just the length of ByteBuffer used by 
> StripedBlockReader. So the corrupt reconstruction block is made by a dirty 
> buffer.
> The following log snippet(only show 2 of 28 cases) is my check program 
> output. In my case, I known the 3th block is corrupt, so need other 5 blocks 
> to decode another 3 blocks, then find the 1th block's LCS substring is block 
> length - 64kb.
> It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the 
> dirty buffer was used before read the 1th block.
> Must be noted that StripedBlockReader read from the offset 0 of the 1th block 
> after used the dirty buffer.
> {code:java}
> decode from [0, 2, 3, 4, 5, 7] -> [1, 6, 8]
> Check Block(1) first 131072 bytes longest common substring length 4
> Check Block(6) first 131072 bytes longest common substring length 4
> Check Block(8) first 131072 bytes longest common substring length 4
> decode from [0, 2, 3, 4, 5, 6] -> [1, 7, 8]
> Check Block(1) first 131072 bytes longest common substring length 65536
> CHECK AGAIN: Block(1) all 27262976 bytes longest common substring length 
> 27197440  # this one
> Check Block(7) first 131072 bytes longest common substring length 4
> Check Block(8) first 131072 bytes longest common substring length 4{code}
> Now I know the dirty buffer causes reconstruction block error, but how does 
> the dirty buffer come about?
> After digging into the code and DN log, I found this following DN log is the 
> root reason.
> {code:java}
> [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel 
> java.nio.channels.SocketChannel[connected local=/:52586 
> remote=/:50010]. 18 millis timeout left.
> [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped 
> block: BP-714356632--1519726836856:blk_-YY_3472979393
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834) {code}
> Reading from DN may timeout(hold by a future(F)) and output the INFO log, but 
> the futures that contains the future(F)  is cleared, 
> {code:java}
> return new StripingChunkReadResult(futures.remove(future),
> StripingChunkReadResult.CANCELLED); {code}
> futures.remove(future) cause NPE. So the EC reconstruction is failed. In the 
> finally phase, the code snippet in *getStripedReader().close()* 
> {code:java}
> reconstructor.freeBuffer(reader.getReadBuffer());
> reader.freeReadBuffer();
> reader.closeBlockReader(); {code}
> free buffer firstly, but the StripedBlockReader still holds the buffer and 
> write it.



--
This message was sent by Atlassian 

[jira] [Commented] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

2020-07-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158921#comment-17158921
 ] 

Hadoop QA commented on HDFS-14498:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.10 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
 2s{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} branch-2.10 passed with JDK Oracle 
Corporation-1.7.0_95-b00 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} branch-2.10 passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} branch-2.10 passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
16s{color} | {color:red} hadoop-hdfs in branch-2.10 failed with JDK Oracle 
Corporation-1.7.0_95-b00. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} branch-2.10 passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
56s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
53s{color} | {color:green} branch-2.10 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed with JDK Oracle 
Corporation-1.7.0_95-b00 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 37s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 228 unchanged - 0 fixed = 229 total (was 228) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  1m 
24s{color} | {color:red} 
hadoop-hdfs-project_hadoop-hdfs-jdkOracleCorporation-1.7.0_95-b00 with JDK 
Oracle Corporation-1.7.0_95-b00 generated 9 new + 0 unchanged - 0 fixed = 9 
total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 35s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}125m 22s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestMaintenanceState |
|   |