[jira] [Commented] (HDFS-15442) Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative value

2020-07-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164728#comment-17164728
 ] 

Hadoop QA commented on HDFS-15442:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
22s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15442 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008389/HDFS-15442.000.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29562/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative 
> value
> ---
>
> Key: HDFS-15442
> URL: https://issues.apache.org/jira/browse/HDFS-15442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15442.000.patch
>
>
> In current implementation of checkpoint image transfer, if the file length is 
> bigger than the configured value dfs.image.transfer.chunksize, it will use 
> chunked streaming mode to avoid internal buffering. This mode should be used 
> only if more than chunkSize data is present to upload, otherwise upload may 
> not happen sometimes.
> {code:java}
> //TransferFsImage.java
> int chunkSize = (int) conf.getLongBytes(
> DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_KEY,
> DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_DEFAULT);
> if (imageFile.length() > chunkSize) {
>   // using chunked streaming mode to support upload of 2GB+ files and to
>   // avoid internal buffering.
>   // this mode should be used only if more than chunkSize data is present
>   // to upload. otherwise upload may not happen sometimes.
>   connection.setChunkedStreamingMode(chunkSize);
> }
> {code}
> There is no check code for this parameter. User may accidentally set this 
> value to a wrong value. Here, if the user set chunkSize to a negative value. 
> Chunked streaming mode will always be used. In 
> setChunkedStreamingMode(chunkSize), there is a correction code that if the 
> chunkSize is <=0, it will be change to DEFAULT_CHUNK_SIZE.
> {code:java}
> public void setChunkedStreamingMode (int chunklen) {
> if (connected) {
> throw new IllegalStateException ("Can't set streaming mode: already 
> connected");
> }
> if (fixedContentLength != -1 || fixedContentLengthLong != -1) {
> throw new IllegalStateException ("Fixed length streaming mode set");
> }
> chunkLength = chunklen <=0? DEFAULT_CHUNK_SIZE : chunklen;
> }
> {code}
> However,
>  *If the user set dfs.image.transfer.chunksize to value that <= 0, even for 
> images whose imageFile.length() < DEFAULT_CHUNK_SIZE will use chunked 
> streaming mode and may fail the upload as mentioned above.* *(This scenario 
> may not be common, but* *we can prevent users setting this param to an 
> extremely small value.**)*
> *How to fix:*
> Add checking code or correction code right after parsing the config value 
> before really use the value (setChunkedStreamingMode). 
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15442) Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative value

2020-07-24 Thread AMC-team (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164724#comment-17164724
 ] 

AMC-team edited comment on HDFS-15442 at 7/25/20, 3:05 AM:
---

upload a patch to fall back the invalid chunksize value to default right after 
parsing


was (Author: amc-team):
upload a patch to fall back the invalid chunksize value to default

> Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative 
> value
> ---
>
> Key: HDFS-15442
> URL: https://issues.apache.org/jira/browse/HDFS-15442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15442.000.patch
>
>
> In current implementation of checkpoint image transfer, if the file length is 
> bigger than the configured value dfs.image.transfer.chunksize, it will use 
> chunked streaming mode to avoid internal buffering. This mode should be used 
> only if more than chunkSize data is present to upload, otherwise upload may 
> not happen sometimes.
> {code:java}
> //TransferFsImage.java
> int chunkSize = (int) conf.getLongBytes(
> DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_KEY,
> DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_DEFAULT);
> if (imageFile.length() > chunkSize) {
>   // using chunked streaming mode to support upload of 2GB+ files and to
>   // avoid internal buffering.
>   // this mode should be used only if more than chunkSize data is present
>   // to upload. otherwise upload may not happen sometimes.
>   connection.setChunkedStreamingMode(chunkSize);
> }
> {code}
> There is no check code for this parameter. User may accidentally set this 
> value to a wrong value. Here, if the user set chunkSize to a negative value. 
> Chunked streaming mode will always be used. In 
> setChunkedStreamingMode(chunkSize), there is a correction code that if the 
> chunkSize is <=0, it will be change to DEFAULT_CHUNK_SIZE.
> {code:java}
> public void setChunkedStreamingMode (int chunklen) {
> if (connected) {
> throw new IllegalStateException ("Can't set streaming mode: already 
> connected");
> }
> if (fixedContentLength != -1 || fixedContentLengthLong != -1) {
> throw new IllegalStateException ("Fixed length streaming mode set");
> }
> chunkLength = chunklen <=0? DEFAULT_CHUNK_SIZE : chunklen;
> }
> {code}
> However,
>  *If the user set dfs.image.transfer.chunksize to value that <= 0, even for 
> images whose imageFile.length() < DEFAULT_CHUNK_SIZE will use chunked 
> streaming mode and may fail the upload as mentioned above.* *(This scenario 
> may not be common, but* *we can prevent users setting this param to an 
> extremely small value.**)*
> *How to fix:*
> Add checking code or correction code right after parsing the config value 
> before really use the value (setChunkedStreamingMode). 
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15442) Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative value

2020-07-24 Thread AMC-team (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164724#comment-17164724
 ] 

AMC-team commented on HDFS-15442:
-

upload a patch to fall back the invalid chunksize value to default

> Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative 
> value
> ---
>
> Key: HDFS-15442
> URL: https://issues.apache.org/jira/browse/HDFS-15442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15442.000.patch
>
>
> In current implementation of checkpoint image transfer, if the file length is 
> bigger than the configured value dfs.image.transfer.chunksize, it will use 
> chunked streaming mode to avoid internal buffering. This mode should be used 
> only if more than chunkSize data is present to upload, otherwise upload may 
> not happen sometimes.
> {code:java}
> //TransferFsImage.java
> int chunkSize = (int) conf.getLongBytes(
> DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_KEY,
> DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_DEFAULT);
> if (imageFile.length() > chunkSize) {
>   // using chunked streaming mode to support upload of 2GB+ files and to
>   // avoid internal buffering.
>   // this mode should be used only if more than chunkSize data is present
>   // to upload. otherwise upload may not happen sometimes.
>   connection.setChunkedStreamingMode(chunkSize);
> }
> {code}
> There is no check code for this parameter. User may accidentally set this 
> value to a wrong value. Here, if the user set chunkSize to a negative value. 
> Chunked streaming mode will always be used. In 
> setChunkedStreamingMode(chunkSize), there is a correction code that if the 
> chunkSize is <=0, it will be change to DEFAULT_CHUNK_SIZE.
> {code:java}
> public void setChunkedStreamingMode (int chunklen) {
> if (connected) {
> throw new IllegalStateException ("Can't set streaming mode: already 
> connected");
> }
> if (fixedContentLength != -1 || fixedContentLengthLong != -1) {
> throw new IllegalStateException ("Fixed length streaming mode set");
> }
> chunkLength = chunklen <=0? DEFAULT_CHUNK_SIZE : chunklen;
> }
> {code}
> However,
>  *If the user set dfs.image.transfer.chunksize to value that <= 0, even for 
> images whose imageFile.length() < DEFAULT_CHUNK_SIZE will use chunked 
> streaming mode and may fail the upload as mentioned above.* *(This scenario 
> may not be common, but* *we can prevent users setting this param to an 
> extremely small value.**)*
> *How to fix:*
> Add checking code or correction code right after parsing the config value 
> before really use the value (setChunkedStreamingMode). 
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15442) Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative value

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15442:

Attachment: (was: HDFS-15442.000.patch)

> Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative 
> value
> ---
>
> Key: HDFS-15442
> URL: https://issues.apache.org/jira/browse/HDFS-15442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15442.000.patch
>
>
> In current implementation of checkpoint image transfer, if the file length is 
> bigger than the configured value dfs.image.transfer.chunksize, it will use 
> chunked streaming mode to avoid internal buffering. This mode should be used 
> only if more than chunkSize data is present to upload, otherwise upload may 
> not happen sometimes.
> {code:java}
> //TransferFsImage.java
> int chunkSize = (int) conf.getLongBytes(
> DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_KEY,
> DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_DEFAULT);
> if (imageFile.length() > chunkSize) {
>   // using chunked streaming mode to support upload of 2GB+ files and to
>   // avoid internal buffering.
>   // this mode should be used only if more than chunkSize data is present
>   // to upload. otherwise upload may not happen sometimes.
>   connection.setChunkedStreamingMode(chunkSize);
> }
> {code}
> There is no check code for this parameter. User may accidentally set this 
> value to a wrong value. Here, if the user set chunkSize to a negative value. 
> Chunked streaming mode will always be used. In 
> setChunkedStreamingMode(chunkSize), there is a correction code that if the 
> chunkSize is <=0, it will be change to DEFAULT_CHUNK_SIZE.
> {code:java}
> public void setChunkedStreamingMode (int chunklen) {
> if (connected) {
> throw new IllegalStateException ("Can't set streaming mode: already 
> connected");
> }
> if (fixedContentLength != -1 || fixedContentLengthLong != -1) {
> throw new IllegalStateException ("Fixed length streaming mode set");
> }
> chunkLength = chunklen <=0? DEFAULT_CHUNK_SIZE : chunklen;
> }
> {code}
> However,
>  *If the user set dfs.image.transfer.chunksize to value that <= 0, even for 
> images whose imageFile.length() < DEFAULT_CHUNK_SIZE will use chunked 
> streaming mode and may fail the upload as mentioned above.* *(This scenario 
> may not be common, but* *we can prevent users setting this param to an 
> extremely small value.**)*
> *How to fix:*
> Add checking code or correction code right after parsing the config value 
> before really use the value (setChunkedStreamingMode). 
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15442) Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative value

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15442:

Attachment: HDFS-15442.000.patch
Status: Patch Available  (was: Open)

> Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative 
> value
> ---
>
> Key: HDFS-15442
> URL: https://issues.apache.org/jira/browse/HDFS-15442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15442.000.patch
>
>
> In current implementation of checkpoint image transfer, if the file length is 
> bigger than the configured value dfs.image.transfer.chunksize, it will use 
> chunked streaming mode to avoid internal buffering. This mode should be used 
> only if more than chunkSize data is present to upload, otherwise upload may 
> not happen sometimes.
> {code:java}
> //TransferFsImage.java
> int chunkSize = (int) conf.getLongBytes(
> DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_KEY,
> DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_DEFAULT);
> if (imageFile.length() > chunkSize) {
>   // using chunked streaming mode to support upload of 2GB+ files and to
>   // avoid internal buffering.
>   // this mode should be used only if more than chunkSize data is present
>   // to upload. otherwise upload may not happen sometimes.
>   connection.setChunkedStreamingMode(chunkSize);
> }
> {code}
> There is no check code for this parameter. User may accidentally set this 
> value to a wrong value. Here, if the user set chunkSize to a negative value. 
> Chunked streaming mode will always be used. In 
> setChunkedStreamingMode(chunkSize), there is a correction code that if the 
> chunkSize is <=0, it will be change to DEFAULT_CHUNK_SIZE.
> {code:java}
> public void setChunkedStreamingMode (int chunklen) {
> if (connected) {
> throw new IllegalStateException ("Can't set streaming mode: already 
> connected");
> }
> if (fixedContentLength != -1 || fixedContentLengthLong != -1) {
> throw new IllegalStateException ("Fixed length streaming mode set");
> }
> chunkLength = chunklen <=0? DEFAULT_CHUNK_SIZE : chunklen;
> }
> {code}
> However,
>  *If the user set dfs.image.transfer.chunksize to value that <= 0, even for 
> images whose imageFile.length() < DEFAULT_CHUNK_SIZE will use chunked 
> streaming mode and may fail the upload as mentioned above.* *(This scenario 
> may not be common, but* *we can prevent users setting this param to an 
> extremely small value.**)*
> *How to fix:*
> Add checking code or correction code right after parsing the config value 
> before really use the value (setChunkedStreamingMode). 
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15442) Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative value

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15442:

Attachment: HDFS-15442.000.patch

> Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative 
> value
> ---
>
> Key: HDFS-15442
> URL: https://issues.apache.org/jira/browse/HDFS-15442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15442.000.patch
>
>
> In current implementation of checkpoint image transfer, if the file length is 
> bigger than the configured value dfs.image.transfer.chunksize, it will use 
> chunked streaming mode to avoid internal buffering. This mode should be used 
> only if more than chunkSize data is present to upload, otherwise upload may 
> not happen sometimes.
> {code:java}
> //TransferFsImage.java
> int chunkSize = (int) conf.getLongBytes(
> DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_KEY,
> DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_DEFAULT);
> if (imageFile.length() > chunkSize) {
>   // using chunked streaming mode to support upload of 2GB+ files and to
>   // avoid internal buffering.
>   // this mode should be used only if more than chunkSize data is present
>   // to upload. otherwise upload may not happen sometimes.
>   connection.setChunkedStreamingMode(chunkSize);
> }
> {code}
> There is no check code for this parameter. User may accidentally set this 
> value to a wrong value. Here, if the user set chunkSize to a negative value. 
> Chunked streaming mode will always be used. In 
> setChunkedStreamingMode(chunkSize), there is a correction code that if the 
> chunkSize is <=0, it will be change to DEFAULT_CHUNK_SIZE.
> {code:java}
> public void setChunkedStreamingMode (int chunklen) {
> if (connected) {
> throw new IllegalStateException ("Can't set streaming mode: already 
> connected");
> }
> if (fixedContentLength != -1 || fixedContentLengthLong != -1) {
> throw new IllegalStateException ("Fixed length streaming mode set");
> }
> chunkLength = chunklen <=0? DEFAULT_CHUNK_SIZE : chunklen;
> }
> {code}
> However,
>  *If the user set dfs.image.transfer.chunksize to value that <= 0, even for 
> images whose imageFile.length() < DEFAULT_CHUNK_SIZE will use chunked 
> streaming mode and may fail the upload as mentioned above.* *(This scenario 
> may not be common, but* *we can prevent users setting this param to an 
> extremely small value.**)*
> *How to fix:*
> Add checking code or correction code right after parsing the config value 
> before really use the value (setChunkedStreamingMode). 
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15440) The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.

2020-07-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164723#comment-17164723
 ] 

Hadoop QA commented on HDFS-15440:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
26s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15440 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008387/HDFS-15440.000.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29561/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.  
> --
>
> Key: HDFS-15440
> URL: https://issues.apache.org/jira/browse/HDFS-15440
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15440.000.patch
>
>
> In HDFS disk balancer, configuration parameter 
> "dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10 
> means 10%) which defines a good enough move.
> The description in hdfs-default.xml is not so clear to me how the value 
> actually calculates and works
> {quote}When a disk balancer copy operation is proceeding, the datanode is 
> still active. So it might not be possible to move the exactly specified 
> amount of data. So tolerance allows us to define a percentage which defines a 
> good enough move.
> {quote}
> So I refer to the [official doc of HDFS disk 
> balancer|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html]
>  and the description is:
> {quote}The tolerance percent specifies when we have reached a good enough 
> value for any copy step. For example, if you specify 10 then getting close to 
> 10% of the target value is good enough. It is to say if the move operation is 
> 20GB in size, if we can move 18GB (20 * (1-10%)) that operation is considered 
> successful.
> {quote}
> However from the source code in DiskBalancer.java
> {code:java}
> // Inflates bytesCopied and returns true or false. This allows us to stop
> // copying if we have reached close enough.
> private boolean isCloseEnough(DiskBalancerWorkItem item) {
>   long temp = item.getBytesCopied() +
>  ((item.getBytesCopied() * getBlockTolerancePercentage(item)) / 100);
>   return (item.getBytesToCopy() >= temp) ? false : true;
> }
> {code}
> Here, if item.getBytesToCopy() = 20GB, then item.getBytesCopied() = 18GB is 
> still not enough because 20 > 18 + 18*0.1
> Here, we should check whether 18 > 20*(1-0.1).
>  The calculation in isLessThanNeeded() (Checks if a given block is less than 
> needed size to meet our goal.) is also not intuitive in the same way.
> Also, this parameter doesn't have upper bound check, which means you can even 
> set it to 100% which is obviously wrong value.
> *How to fix*
> Although this may not lead severe failure, it is better to make it consistent 
> between doc and code, and also better to refine the description in 
> hdfs-default.xml to make it more precise and clear.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15440) The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.

2020-07-24 Thread AMC-team (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164721#comment-17164721
 ] 

AMC-team commented on HDFS-15440:
-

upload a patch to change the current logic and refine parameter check

> The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.  
> --
>
> Key: HDFS-15440
> URL: https://issues.apache.org/jira/browse/HDFS-15440
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15440.000.patch
>
>
> In HDFS disk balancer, configuration parameter 
> "dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10 
> means 10%) which defines a good enough move.
> The description in hdfs-default.xml is not so clear to me how the value 
> actually calculates and works
> {quote}When a disk balancer copy operation is proceeding, the datanode is 
> still active. So it might not be possible to move the exactly specified 
> amount of data. So tolerance allows us to define a percentage which defines a 
> good enough move.
> {quote}
> So I refer to the [official doc of HDFS disk 
> balancer|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html]
>  and the description is:
> {quote}The tolerance percent specifies when we have reached a good enough 
> value for any copy step. For example, if you specify 10 then getting close to 
> 10% of the target value is good enough. It is to say if the move operation is 
> 20GB in size, if we can move 18GB (20 * (1-10%)) that operation is considered 
> successful.
> {quote}
> However from the source code in DiskBalancer.java
> {code:java}
> // Inflates bytesCopied and returns true or false. This allows us to stop
> // copying if we have reached close enough.
> private boolean isCloseEnough(DiskBalancerWorkItem item) {
>   long temp = item.getBytesCopied() +
>  ((item.getBytesCopied() * getBlockTolerancePercentage(item)) / 100);
>   return (item.getBytesToCopy() >= temp) ? false : true;
> }
> {code}
> Here, if item.getBytesToCopy() = 20GB, then item.getBytesCopied() = 18GB is 
> still not enough because 20 > 18 + 18*0.1
> Here, we should check whether 18 > 20*(1-0.1).
>  The calculation in isLessThanNeeded() (Checks if a given block is less than 
> needed size to meet our goal.) is also not intuitive in the same way.
> Also, this parameter doesn't have upper bound check, which means you can even 
> set it to 100% which is obviously wrong value.
> *How to fix*
> Although this may not lead severe failure, it is better to make it consistent 
> between doc and code, and also better to refine the description in 
> hdfs-default.xml to make it more precise and clear.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15440) The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15440:

Attachment: HDFS-15440.000.patch
Status: Patch Available  (was: Open)

> The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.  
> --
>
> Key: HDFS-15440
> URL: https://issues.apache.org/jira/browse/HDFS-15440
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15440.000.patch
>
>
> In HDFS disk balancer, configuration parameter 
> "dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10 
> means 10%) which defines a good enough move.
> The description in hdfs-default.xml is not so clear to me how the value 
> actually calculates and works
> {quote}When a disk balancer copy operation is proceeding, the datanode is 
> still active. So it might not be possible to move the exactly specified 
> amount of data. So tolerance allows us to define a percentage which defines a 
> good enough move.
> {quote}
> So I refer to the [official doc of HDFS disk 
> balancer|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html]
>  and the description is:
> {quote}The tolerance percent specifies when we have reached a good enough 
> value for any copy step. For example, if you specify 10 then getting close to 
> 10% of the target value is good enough. It is to say if the move operation is 
> 20GB in size, if we can move 18GB (20 * (1-10%)) that operation is considered 
> successful.
> {quote}
> However from the source code in DiskBalancer.java
> {code:java}
> // Inflates bytesCopied and returns true or false. This allows us to stop
> // copying if we have reached close enough.
> private boolean isCloseEnough(DiskBalancerWorkItem item) {
>   long temp = item.getBytesCopied() +
>  ((item.getBytesCopied() * getBlockTolerancePercentage(item)) / 100);
>   return (item.getBytesToCopy() >= temp) ? false : true;
> }
> {code}
> Here, if item.getBytesToCopy() = 20GB, then item.getBytesCopied() = 18GB is 
> still not enough because 20 > 18 + 18*0.1
> Here, we should check whether 18 > 20*(1-0.1).
>  The calculation in isLessThanNeeded() (Checks if a given block is less than 
> needed size to meet our goal.) is also not intuitive in the same way.
> Also, this parameter doesn't have upper bound check, which means you can even 
> set it to 100% which is obviously wrong value.
> *How to fix*
> Although this may not lead severe failure, it is better to make it consistent 
> between doc and code, and also better to refine the description in 
> hdfs-default.xml to make it more precise and clear.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15440) The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15440:

Description: 
In HDFS disk balancer, configuration parameter 
"dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10 
means 10%) which defines a good enough move.

The description in hdfs-default.xml is not so clear to me how the value 
actually calculates and works
{quote}When a disk balancer copy operation is proceeding, the datanode is still 
active. So it might not be possible to move the exactly specified amount of 
data. So tolerance allows us to define a percentage which defines a good enough 
move.
{quote}
So I refer to the [official doc of HDFS disk 
balancer|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html]
 and the description is:
{quote}The tolerance percent specifies when we have reached a good enough value 
for any copy step. For example, if you specify 10 then getting close to 10% of 
the target value is good enough. It is to say if the move operation is 20GB in 
size, if we can move 18GB (20 * (1-10%)) that operation is considered 
successful.
{quote}
However from the source code in DiskBalancer.java
{code:java}
// Inflates bytesCopied and returns true or false. This allows us to stop
// copying if we have reached close enough.
private boolean isCloseEnough(DiskBalancerWorkItem item) {
  long temp = item.getBytesCopied() +
 ((item.getBytesCopied() * getBlockTolerancePercentage(item)) / 100);
  return (item.getBytesToCopy() >= temp) ? false : true;
}
{code}
Here, if item.getBytesToCopy() = 20GB, then item.getBytesCopied() = 18GB is 
still not enough because 20 > 18 + 18*0.1
Here, we should check whether 18 > 20*(1-0.1).
 The calculation in isLessThanNeeded() (Checks if a given block is less than 
needed size to meet our goal.) is also not intuitive in the same way.

Also, this parameter doesn't have upper bound check, which means you can even 
set it to 100% which is obviously wrong value.

*How to fix*

Although this may not lead severe failure, it is better to make it consistent 
between doc and code, and also better to refine the description in 
hdfs-default.xml to make it more precise and clear.

  was:
In HDFS disk balancer, configuration parameter 
"dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10 
means 10%) which defines a good enough move.

The description in hdfs-default.xml is not so clear to me how the value 
actually calculates and works
{quote}When a disk balancer copy operation is proceeding, the datanode is still 
active. So it might not be possible to move the exactly specified amount of 
data. So tolerance allows us to define a percentage which defines a good enough 
move.
{quote}
So I refer to the [official doc of HDFS disk 
balancer|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html]
 and the description is:
{quote}The tolerance percent specifies when we have reached a good enough value 
for any copy step. For example, if you specify 10 then getting close to 10% of 
the target value is good enough. It is to say if the move operation is 20GB in 
size, if we can move 18GB (20 * (1-10%)) that operation is considered 
successful.
{quote}
However from the source code in DiskBalancer.java
{code:java}
// Inflates bytesCopied and returns true or false. This allows us to stop
// copying if we have reached close enough.
private boolean isCloseEnough(DiskBalancerWorkItem item) {
  long temp = item.getBytesCopied() +
 ((item.getBytesCopied() * getBlockTolerancePercentage(item)) / 100);
  return (item.getBytesToCopy() >= temp) ? false : true;
}
{code}
Here, if item.getBytesToCopy() = 20GB, then item.getBytesCopied() = 18GB is 
still not enough because 20 > 18 + 18*0.1
 The calculation in isLessThanNeeded() (Checks if a given block is less than 
needed size to meet our goal.) is also not intuitive in the same way.

*How to fix*

Although this may not lead severe failure, it is better to make it consistent 
between doc and code, and also better to refine the description in 
hdfs-default.xml to make it more precise and clear.


> The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.  
> --
>
> Key: HDFS-15440
> URL: https://issues.apache.org/jira/browse/HDFS-15440
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
>
> In HDFS disk balancer, configuration parameter 
> "dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10 
> means 10%) which defines a good enough move.
> The description in hdfs-default.xml is not so clear to me how the value 
> actually calculates and works
> {quote}When a disk balancer copy 

[jira] [Updated] (HDFS-15440) The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15440:

Summary: The usage of dfs.disk.balancer.block.tolerance.percent is not 
intuitive.(was: The doc of dfs.disk.balancer.block.tolerance.percent is 
misleading)

> The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.  
> --
>
> Key: HDFS-15440
> URL: https://issues.apache.org/jira/browse/HDFS-15440
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
>
> In HDFS disk balancer, configuration parameter 
> "dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10 
> means 10%) which defines a good enough move.
> The description in hdfs-default.xml is not so clear to me how the value 
> actually calculates and works
> {quote}When a disk balancer copy operation is proceeding, the datanode is 
> still active. So it might not be possible to move the exactly specified 
> amount of data. So tolerance allows us to define a percentage which defines a 
> good enough move.
> {quote}
> So I refer to the [official doc of HDFS disk 
> balancer|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html]
>  and the description is:
> {quote}The tolerance percent specifies when we have reached a good enough 
> value for any copy step. For example, if you specify 10 then getting close to 
> 10% of the target value is good enough. It is to say if the move operation is 
> 20GB in size, if we can move 18GB (20 * (1-10%)) that operation is considered 
> successful.
> {quote}
> However from the source code in DiskBalancer.java
> {code:java}
> // Inflates bytesCopied and returns true or false. This allows us to stop
> // copying if we have reached close enough.
> private boolean isCloseEnough(DiskBalancerWorkItem item) {
>   long temp = item.getBytesCopied() +
>  ((item.getBytesCopied() * getBlockTolerancePercentage(item)) / 100);
>   return (item.getBytesToCopy() >= temp) ? false : true;
> }
> {code}
> Here, if item.getBytesToCopy() = 20GB, then item.getBytesCopied() = 18GB is 
> still not enough because 20 > 18 + 18*0.1
>  The calculation in isLessThanNeeded() (Checks if a given block is less than 
> needed size to meet our goal.) is also not intuitive in the same way.
> *How to fix*
> Although this may not lead severe failure, it is better to make it consistent 
> between doc and code, and also better to refine the description in 
> hdfs-default.xml to make it more precise and clear.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread AMC-team (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164706#comment-17164706
 ] 

AMC-team edited comment on HDFS-15443 at 7/25/20, 1:52 AM:
---

Thanks [~ayushtkn] for the great feedback.

I refined the patch  (change maxXceiverCount to this.maxXceiverCount)

I will check the failed test


was (Author: amc-team):
Thanks [~ayushtkn] for the great feedback.

I refined the patch  (change maxXceiverCount to this.maxXceiverCount)

I also checked standard output of the consistently failed test  
hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader and I think it is not 
relevant to this patch:
{quote}java.lang.IllegalArgumentException: Path /test is not under 
hdfs://localhost:36213/test
 at com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
 at 
org.apache.hadoop.fs.impl.AbstractMultipartUploader.checkPath(AbstractMultipartUploader.java:73)
 at 
org.apache.hadoop.fs.impl.AbstractMultipartUploader.abortUploadsUnderPath(AbstractMultipartUploader.java:136)
 at 
org.apache.hadoop.fs.contract.AbstractContractMultipartUploaderTest.teardown(AbstractContractMultipartUploaderTest.java:107)
{quote}

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch, HDFS-15443.003.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164715#comment-17164715
 ] 

Hadoop QA commented on HDFS-15098:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
18s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15098 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008383/HDFS-15098.009.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29560/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, 
> HDFS-15098.009.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164714#comment-17164714
 ] 

Hadoop QA commented on HDFS-15438:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
48s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15438 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008384/HDFS-15438.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29559/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-24 Thread AMC-team (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164713#comment-17164713
 ] 

AMC-team commented on HDFS-15438:
-

Thanks [~ayushtkn] for the feedback. I upload a patch to change the while loop 
condition and if condition to support value 0.

What's more, IMHO, the current code logic may be more intuitive. Previously if 
we set  dfs.disk.balancer.max.disk.errors to n, it can actually just tolerate 
n-1 errors. Now it can tolerate n errors, which is more consistent with the 
parameter's documentation:
{quote}During a block move from a source to destination disk, we might 
encounter various errors. *This defines how many errors we can tolerate* before 
we declare a move between 2 disks (or a step) has failed.
{quote}

> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15438:

Attachment: HDFS-15438.001.patch

> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.

2020-07-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164711#comment-17164711
 ] 

Hadoop QA commented on HDFS-15439:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
28s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15439 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008382/HDFS-15439.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29558/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Setting dfs.mover.retry.max.attempts to negative value will retry forever.
> --
>
> Key: HDFS-15439
> URL: https://issues.apache.org/jira/browse/HDFS-15439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15439.000.patch, HDFS-15439.001.patch
>
>
> Configuration parameter "dfs.mover.retry.max.attempts" is to define the 
> maximum number of retries before the mover consider the move failed. There is 
> no checking code so this parameter can accept any int value.
> Theoratically, setting this value to <=0 should mean that no retry at all. 
> However, if you set the value to negative value. The checking condition for 
> retry failed will never satisfied because the if statement is "*if 
> (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by 
> retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* 
> {code:java}
> private Result processNamespace() throws IOException {
>   ... //wait for pending move to finish and retry the failed migration
>   if (hasFailed && !hasSuccess) {
> if (retryCount.get() == retryMaxAttempts) {
>   result.setRetryFailed();
>   LOG.error("Failed to move some block's after "
>   + retryMaxAttempts + " retries.");
>   return result;
> } else {
>   retryCount.incrementAndGet();
> }
>   } else {
> // Reset retry count if no failure.
> retryCount.set(0);
>   }
>   ...
> }
> {code}
> *How to fix*
> Add checking code of "dfs.mover.retry.max.attempts" to accept only 
> non-negative value or change the if statement condition when retry count 
> exceeds max attempts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread liusheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liusheng updated HDFS-15098:

Attachment: HDFS-15098.009.patch
Status: Patch Available  (was: Open)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, 
> HDFS-15098.009.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread liusheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liusheng updated HDFS-15098:

Attachment: (was: HDFS-15098.009.patch)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread liusheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liusheng updated HDFS-15098:

Status: Open  (was: Patch Available)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, 
> HDFS-15098.009.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.

2020-07-24 Thread AMC-team (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164710#comment-17164710
 ] 

AMC-team commented on HDFS-15439:
-

Upload a patch based on [~ayushtkn]'s suggestion. Thanks!

> Setting dfs.mover.retry.max.attempts to negative value will retry forever.
> --
>
> Key: HDFS-15439
> URL: https://issues.apache.org/jira/browse/HDFS-15439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15439.000.patch, HDFS-15439.001.patch
>
>
> Configuration parameter "dfs.mover.retry.max.attempts" is to define the 
> maximum number of retries before the mover consider the move failed. There is 
> no checking code so this parameter can accept any int value.
> Theoratically, setting this value to <=0 should mean that no retry at all. 
> However, if you set the value to negative value. The checking condition for 
> retry failed will never satisfied because the if statement is "*if 
> (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by 
> retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* 
> {code:java}
> private Result processNamespace() throws IOException {
>   ... //wait for pending move to finish and retry the failed migration
>   if (hasFailed && !hasSuccess) {
> if (retryCount.get() == retryMaxAttempts) {
>   result.setRetryFailed();
>   LOG.error("Failed to move some block's after "
>   + retryMaxAttempts + " retries.");
>   return result;
> } else {
>   retryCount.incrementAndGet();
> }
>   } else {
> // Reset retry count if no failure.
> retryCount.set(0);
>   }
>   ...
> }
> {code}
> *How to fix*
> Add checking code of "dfs.mover.retry.max.attempts" to accept only 
> non-negative value or change the if statement condition when retry count 
> exceeds max attempts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15439:

Attachment: HDFS-15439.001.patch

> Setting dfs.mover.retry.max.attempts to negative value will retry forever.
> --
>
> Key: HDFS-15439
> URL: https://issues.apache.org/jira/browse/HDFS-15439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15439.000.patch, HDFS-15439.001.patch
>
>
> Configuration parameter "dfs.mover.retry.max.attempts" is to define the 
> maximum number of retries before the mover consider the move failed. There is 
> no checking code so this parameter can accept any int value.
> Theoratically, setting this value to <=0 should mean that no retry at all. 
> However, if you set the value to negative value. The checking condition for 
> retry failed will never satisfied because the if statement is "*if 
> (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by 
> retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* 
> {code:java}
> private Result processNamespace() throws IOException {
>   ... //wait for pending move to finish and retry the failed migration
>   if (hasFailed && !hasSuccess) {
> if (retryCount.get() == retryMaxAttempts) {
>   result.setRetryFailed();
>   LOG.error("Failed to move some block's after "
>   + retryMaxAttempts + " retries.");
>   return result;
> } else {
>   retryCount.incrementAndGet();
> }
>   } else {
> // Reset retry count if no failure.
> retryCount.set(0);
>   }
>   ...
> }
> {code}
> *How to fix*
> Add checking code of "dfs.mover.retry.max.attempts" to accept only 
> non-negative value or change the if statement condition when retry count 
> exceeds max attempts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread AMC-team (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164706#comment-17164706
 ] 

AMC-team commented on HDFS-15443:
-

Thanks [~ayushtkn] for the great feedback.

I refined the patch  (change maxXceiverCount to this.maxXceiverCount)

I also checked standard output of the consistently failed test  
hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader and I think it is not 
relevant to this patch:
{quote}java.lang.IllegalArgumentException: Path /test is not under 
hdfs://localhost:36213/test
 at com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
 at 
org.apache.hadoop.fs.impl.AbstractMultipartUploader.checkPath(AbstractMultipartUploader.java:73)
 at 
org.apache.hadoop.fs.impl.AbstractMultipartUploader.abortUploadsUnderPath(AbstractMultipartUploader.java:136)
 at 
org.apache.hadoop.fs.contract.AbstractContractMultipartUploaderTest.teardown(AbstractContractMultipartUploaderTest.java:107)
{quote}

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch, HDFS-15443.003.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164707#comment-17164707
 ] 

Hadoop QA commented on HDFS-15443:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
26s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15443 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008381/HDFS-15443.003.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29557/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch, HDFS-15443.003.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15443:

Attachment: HDFS-15443.003.patch

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch, HDFS-15443.003.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164698#comment-17164698
 ] 

Hadoop QA commented on HDFS-15443:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
25s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15443 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008379/HDFS-15443.003.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29556/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15443:

Attachment: (was: HDFS-15443.003.patch)

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15443:

Attachment: HDFS-15443.003.patch

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch, HDFS-15443.003.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15438:

Comment: was deleted

(was: I upload a new patch based on [~ayushtkn]'s suggestion.

IMHO, the current logic may be better because previously if we set 
"dfs.disk.balancer.max.disk.errors" to n, it actually can just tolerate n-1 
errors because of the while loop condition. Now it can tolerate n errors, which 
is more consistent with the documentation:

{quote}During a block move from a source to destination disk, we might 
encounter various errors. This defines how many errors we can tolerate before 
we declare a move between 2 disks (or a step) has failed.{quote})

> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15443:

Comment: was deleted

(was: Thanks [~ayushtkn] for the great feedback. 
 I refined the patch (change *maxXceiverCount* to *this.maxXceiverCount*)

 I also checked the Standard Output of the consistently failed test 
*hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader.* And I think it is 
not related to this patch

{quote}(AbstractContractMultipartUploaderTest.java:teardown(110)) - Exeception 
in teardown
java.lang.IllegalArgumentException: Path /test is not under 
hdfs://localhost:36213/test
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
at 
org.apache.hadoop.fs.impl.AbstractMultipartUploader.checkPath(AbstractMultipartUploader.java:73)
at 
org.apache.hadoop.fs.impl.AbstractMultipartUploader.abortUploadsUnderPath(AbstractMultipartUploader.java:136)
at 
org.apache.hadoop.fs.contract.AbstractContractMultipartUploaderTest.teardown(AbstractContractMultipartUploaderTest.java:107)
{quote})

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15439:

Comment: was deleted

(was: upload a new patch based on [~ayushtkn]' feedback)

> Setting dfs.mover.retry.max.attempts to negative value will retry forever.
> --
>
> Key: HDFS-15439
> URL: https://issues.apache.org/jira/browse/HDFS-15439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15439.000.patch
>
>
> Configuration parameter "dfs.mover.retry.max.attempts" is to define the 
> maximum number of retries before the mover consider the move failed. There is 
> no checking code so this parameter can accept any int value.
> Theoratically, setting this value to <=0 should mean that no retry at all. 
> However, if you set the value to negative value. The checking condition for 
> retry failed will never satisfied because the if statement is "*if 
> (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by 
> retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* 
> {code:java}
> private Result processNamespace() throws IOException {
>   ... //wait for pending move to finish and retry the failed migration
>   if (hasFailed && !hasSuccess) {
> if (retryCount.get() == retryMaxAttempts) {
>   result.setRetryFailed();
>   LOG.error("Failed to move some block's after "
>   + retryMaxAttempts + " retries.");
>   return result;
> } else {
>   retryCount.incrementAndGet();
> }
>   } else {
> // Reset retry count if no failure.
> retryCount.set(0);
>   }
>   ...
> }
> {code}
> *How to fix*
> Add checking code of "dfs.mover.retry.max.attempts" to accept only 
> non-negative value or change the if statement condition when retry count 
> exceeds max attempts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15438:

Attachment: (was: HDFS-15438.001.patch)

> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15439:

Attachment: (was: HDFS-15439.001.patch)

> Setting dfs.mover.retry.max.attempts to negative value will retry forever.
> --
>
> Key: HDFS-15439
> URL: https://issues.apache.org/jira/browse/HDFS-15439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15439.000.patch
>
>
> Configuration parameter "dfs.mover.retry.max.attempts" is to define the 
> maximum number of retries before the mover consider the move failed. There is 
> no checking code so this parameter can accept any int value.
> Theoratically, setting this value to <=0 should mean that no retry at all. 
> However, if you set the value to negative value. The checking condition for 
> retry failed will never satisfied because the if statement is "*if 
> (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by 
> retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* 
> {code:java}
> private Result processNamespace() throws IOException {
>   ... //wait for pending move to finish and retry the failed migration
>   if (hasFailed && !hasSuccess) {
> if (retryCount.get() == retryMaxAttempts) {
>   result.setRetryFailed();
>   LOG.error("Failed to move some block's after "
>   + retryMaxAttempts + " retries.");
>   return result;
> } else {
>   retryCount.incrementAndGet();
> }
>   } else {
> // Reset retry count if no failure.
> retryCount.set(0);
>   }
>   ...
> }
> {code}
> *How to fix*
> Add checking code of "dfs.mover.retry.max.attempts" to accept only 
> non-negative value or change the if statement condition when retry count 
> exceeds max attempts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15443:

Attachment: (was: HDFS-15443.003.patch)

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-24 Thread AMC-team (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164690#comment-17164690
 ] 

AMC-team commented on HDFS-15438:
-

I upload a new patch based on [~ayushtkn]'s suggestion.

IMHO, the current logic may be better because previously if we set 
"dfs.disk.balancer.max.disk.errors" to n, it actually can just tolerate n-1 
errors because of the while loop condition. Now it can tolerate n errors, which 
is more consistent with the documentation:

{quote}During a block move from a source to destination disk, we might 
encounter various errors. This defines how many errors we can tolerate before 
we declare a move between 2 disks (or a step) has failed.{quote}

> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164689#comment-17164689
 ] 

Hadoop QA commented on HDFS-15438:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
25s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15438 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008378/HDFS-15438.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29555/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15438:

Attachment: HDFS-15438.001.patch

> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.

2020-07-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164688#comment-17164688
 ] 

Hadoop QA commented on HDFS-15439:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
16s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15439 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008376/HDFS-15439.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29554/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Setting dfs.mover.retry.max.attempts to negative value will retry forever.
> --
>
> Key: HDFS-15439
> URL: https://issues.apache.org/jira/browse/HDFS-15439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15439.000.patch, HDFS-15439.001.patch
>
>
> Configuration parameter "dfs.mover.retry.max.attempts" is to define the 
> maximum number of retries before the mover consider the move failed. There is 
> no checking code so this parameter can accept any int value.
> Theoratically, setting this value to <=0 should mean that no retry at all. 
> However, if you set the value to negative value. The checking condition for 
> retry failed will never satisfied because the if statement is "*if 
> (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by 
> retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* 
> {code:java}
> private Result processNamespace() throws IOException {
>   ... //wait for pending move to finish and retry the failed migration
>   if (hasFailed && !hasSuccess) {
> if (retryCount.get() == retryMaxAttempts) {
>   result.setRetryFailed();
>   LOG.error("Failed to move some block's after "
>   + retryMaxAttempts + " retries.");
>   return result;
> } else {
>   retryCount.incrementAndGet();
> }
>   } else {
> // Reset retry count if no failure.
> retryCount.set(0);
>   }
>   ...
> }
> {code}
> *How to fix*
> Add checking code of "dfs.mover.retry.max.attempts" to accept only 
> non-negative value or change the if statement condition when retry count 
> exceeds max attempts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.

2020-07-24 Thread AMC-team (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164684#comment-17164684
 ] 

AMC-team commented on HDFS-15439:
-

upload a new patch based on [~ayushtkn]' feedback

> Setting dfs.mover.retry.max.attempts to negative value will retry forever.
> --
>
> Key: HDFS-15439
> URL: https://issues.apache.org/jira/browse/HDFS-15439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15439.000.patch, HDFS-15439.001.patch
>
>
> Configuration parameter "dfs.mover.retry.max.attempts" is to define the 
> maximum number of retries before the mover consider the move failed. There is 
> no checking code so this parameter can accept any int value.
> Theoratically, setting this value to <=0 should mean that no retry at all. 
> However, if you set the value to negative value. The checking condition for 
> retry failed will never satisfied because the if statement is "*if 
> (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by 
> retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* 
> {code:java}
> private Result processNamespace() throws IOException {
>   ... //wait for pending move to finish and retry the failed migration
>   if (hasFailed && !hasSuccess) {
> if (retryCount.get() == retryMaxAttempts) {
>   result.setRetryFailed();
>   LOG.error("Failed to move some block's after "
>   + retryMaxAttempts + " retries.");
>   return result;
> } else {
>   retryCount.incrementAndGet();
> }
>   } else {
> // Reset retry count if no failure.
> retryCount.set(0);
>   }
>   ...
> }
> {code}
> *How to fix*
> Add checking code of "dfs.mover.retry.max.attempts" to accept only 
> non-negative value or change the if statement condition when retry count 
> exceeds max attempts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15439:

Attachment: HDFS-15439.001.patch

> Setting dfs.mover.retry.max.attempts to negative value will retry forever.
> --
>
> Key: HDFS-15439
> URL: https://issues.apache.org/jira/browse/HDFS-15439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15439.000.patch, HDFS-15439.001.patch
>
>
> Configuration parameter "dfs.mover.retry.max.attempts" is to define the 
> maximum number of retries before the mover consider the move failed. There is 
> no checking code so this parameter can accept any int value.
> Theoratically, setting this value to <=0 should mean that no retry at all. 
> However, if you set the value to negative value. The checking condition for 
> retry failed will never satisfied because the if statement is "*if 
> (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by 
> retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* 
> {code:java}
> private Result processNamespace() throws IOException {
>   ... //wait for pending move to finish and retry the failed migration
>   if (hasFailed && !hasSuccess) {
> if (retryCount.get() == retryMaxAttempts) {
>   result.setRetryFailed();
>   LOG.error("Failed to move some block's after "
>   + retryMaxAttempts + " retries.");
>   return result;
> } else {
>   retryCount.incrementAndGet();
> }
>   } else {
> // Reset retry count if no failure.
> retryCount.set(0);
>   }
>   ...
> }
> {code}
> *How to fix*
> Add checking code of "dfs.mover.retry.max.attempts" to accept only 
> non-negative value or change the if statement condition when retry count 
> exceeds max attempts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164680#comment-17164680
 ] 

Hadoop QA commented on HDFS-15443:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
25s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15443 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008375/HDFS-15443.003.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29553/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch, HDFS-15443.003.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164678#comment-17164678
 ] 

Hadoop QA commented on HDFS-15443:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 28m 
48s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15443 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008374/HDFS-15443.003.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29552/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch, HDFS-15443.003.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread AMC-team (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164677#comment-17164677
 ] 

AMC-team edited comment on HDFS-15443 at 7/24/20, 11:34 PM:


Thanks [~ayushtkn] for the great feedback. 
 I refined the patch (change *maxXceiverCount* to *this.maxXceiverCount*)

 I also checked the Standard Output of the consistently failed test 
*hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader.* And I think it is 
not related to this patch

{quote}(AbstractContractMultipartUploaderTest.java:teardown(110)) - Exeception 
in teardown
java.lang.IllegalArgumentException: Path /test is not under 
hdfs://localhost:36213/test
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
at 
org.apache.hadoop.fs.impl.AbstractMultipartUploader.checkPath(AbstractMultipartUploader.java:73)
at 
org.apache.hadoop.fs.impl.AbstractMultipartUploader.abortUploadsUnderPath(AbstractMultipartUploader.java:136)
at 
org.apache.hadoop.fs.contract.AbstractContractMultipartUploaderTest.teardown(AbstractContractMultipartUploaderTest.java:107)
{quote}


was (Author: amc-team):
Thanks [~ayushtkn] for the great feedback. 
 I refined the patch (change *maxXceiverCount* to *this.maxXceiverCount*)
 I also checked the Standard Output of the consistently failed test 
*hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader.* And I think it is 
not related to this patch

{quote}(AbstractContractMultipartUploaderTest.java:teardown(110)) - Exeception 
in teardown
java.lang.IllegalArgumentException: Path /test is not under 
hdfs://localhost:36213/test
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
at 
org.apache.hadoop.fs.impl.AbstractMultipartUploader.checkPath(AbstractMultipartUploader.java:73)
at 
org.apache.hadoop.fs.impl.AbstractMultipartUploader.abortUploadsUnderPath(AbstractMultipartUploader.java:136)
at 
org.apache.hadoop.fs.contract.AbstractContractMultipartUploaderTest.teardown(AbstractContractMultipartUploaderTest.java:107)
{quote}

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch, HDFS-15443.003.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread AMC-team (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164677#comment-17164677
 ] 

AMC-team commented on HDFS-15443:
-

Thanks [~ayushtkn] for the great feedback. 
 I refined the patch (change *maxXceiverCount* to *this.maxXceiverCount*)
 I also checked the Standard Output of the consistently failed test 
*hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader.* And I think it is 
not related to this patch

{quote}(AbstractContractMultipartUploaderTest.java:teardown(110)) - Exeception 
in teardown
java.lang.IllegalArgumentException: Path /test is not under 
hdfs://localhost:36213/test
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
at 
org.apache.hadoop.fs.impl.AbstractMultipartUploader.checkPath(AbstractMultipartUploader.java:73)
at 
org.apache.hadoop.fs.impl.AbstractMultipartUploader.abortUploadsUnderPath(AbstractMultipartUploader.java:136)
at 
org.apache.hadoop.fs.contract.AbstractContractMultipartUploaderTest.teardown(AbstractContractMultipartUploaderTest.java:107)
{quote}

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch, HDFS-15443.003.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15443:

Attachment: HDFS-15443.003.patch

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch, HDFS-15443.003.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15443:

Attachment: (was: HDFS-15443.003.patch)

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread AMC-team (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15443:

Attachment: HDFS-15443.003.patch

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch, HDFS-15443.003.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.

2020-07-24 Thread Chengwei Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164356#comment-17164356
 ] 

Chengwei Wang commented on HDFS-15493:
--

Thanks  Stephen O'Donnell for your info about HDFS-13693, I will try to apply 
and test it.  I'd really appreciate it if you can help me review this patch.




> Update block map and name cache in parallel while loading fsimage.
> --
>
> Key: HDFS-15493
> URL: https://issues.apache.org/jira/browse/HDFS-15493
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chengwei Wang
>Priority: Major
> Attachments: HDFS-15493.001.patch
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory. It would reduce time 
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
> reduc to 410s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164350#comment-17164350
 ] 

Ayush Saxena commented on HDFS-15443:
-

In such a case there is only two solutions, first is as soon as you get to know 
the conf is invalid you fail the operation and alarm it out, Second is that you 
observe the value is invalid you correct it and use the default one, as it is 
done in many places, like {{DatanodeAdminMonitorBase}} and bunch of places 
others, The only thing that I feel what we can't do is tolerate the invalid 
value and go ahead with that only, by giving it a pass where it is creating 
trouble, which initially HDFS-15439 tends to do, That is why I though you don't 
want to crash, better change to default. Choice between the two approaches #1 
or #2 goes depending on case by case basis

Here in case of Datanode, it seems to be a long running service and one of the 
critical part of the cluster, I think here crashing and alarming for wrong conf 
should be better.

 

[~AMC-team] I think we can keep the current patch, just confirm the jenkins 
warnings aren't related.

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164346#comment-17164346
 ] 

Hadoop QA commented on HDFS-15098:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} markdownlint {color} | {color:blue}  0m  
0s{color} | {color:blue} markdownlint was not available. {color} |
| {color:blue}0{color} | {color:blue} prototool {color} | {color:blue}  0m  
0s{color} | {color:blue} prototool was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
4s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 23m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
25m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
45s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m 
58s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 10m  
1s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
34s{color} | {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
45s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  1m 
14s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  1m 14s{color} | 
{color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} golang {color} | {color:red}  1m 
14s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 14s{color} 
| {color:red} root in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 26s{color} | {color:orange} root: The patch generated 3 new + 213 unchanged 
- 8 fixed = 216 total (was 221) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
38s{color} | {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
51s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
4s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  0m 
58s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
36s{color} | {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
47s{color} | {color:red} 

[jira] [Commented] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.

2020-07-24 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164336#comment-17164336
 ] 

Stephen O'Donnell commented on HDFS-15493:
--

This looks like another good speed improvement. I will try to review this in 
the next day or two.

For info, there is also HDFS-13693 which may give you some additional 
improvement.

> Update block map and name cache in parallel while loading fsimage.
> --
>
> Key: HDFS-15493
> URL: https://issues.apache.org/jira/browse/HDFS-15493
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chengwei Wang
>Priority: Major
> Attachments: HDFS-15493.001.patch
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory. It would reduce time 
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
> reduc to 410s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.

2020-07-24 Thread Chengwei Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164307#comment-17164307
 ] 

Chengwei Wang edited comment on HDFS-15493 at 7/24/20, 10:03 AM:
-

submit patch v001.

Similar to HDFS-14617,it use threads to update name cache and blocks map in 
parallel. In our test case, it can reduce more than 10% time cost of loading 
fsimage.

The feature can be enabled/disabled by config
         dfs.image.blocksmap.update.async=true 
        dfs.image.blocksmap.update.threads=4

 


was (Author: smarthan):
submit patch v001.

> Update block map and name cache in parallel while loading fsimage.
> --
>
> Key: HDFS-15493
> URL: https://issues.apache.org/jira/browse/HDFS-15493
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chengwei Wang
>Priority: Major
> Attachments: HDFS-15493.001.patch
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory. It would reduce time 
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
> reduc to 410s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread liusheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164316#comment-17164316
 ] 

liusheng commented on HDFS-15098:
-

Hi [~lindongdong],

Sorry, what do you mean about "for this two old methods, pls handle them in 
native code" ? please check the new 0009 patch. Actually, I don't think there 
is compatibility problems in the places, I have tested functionalities and 
running tests OK locally (both AES and SM4), can you please explain more ?

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, 
> HDFS-15098.009.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.

2020-07-24 Thread Chengwei Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengwei Wang updated HDFS-15493:
-
External issue ID: HDFS-14617

> Update block map and name cache in parallel while loading fsimage.
> --
>
> Key: HDFS-15493
> URL: https://issues.apache.org/jira/browse/HDFS-15493
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chengwei Wang
>Priority: Major
> Attachments: HDFS-15493.001.patch
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory. It would reduce time 
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
> reduc to 410s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.

2020-07-24 Thread Chengwei Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengwei Wang updated HDFS-15493:
-
External issue ID:   (was: HDFS-14617)

> Update block map and name cache in parallel while loading fsimage.
> --
>
> Key: HDFS-15493
> URL: https://issues.apache.org/jira/browse/HDFS-15493
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chengwei Wang
>Priority: Major
> Attachments: HDFS-15493.001.patch
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory. It would reduce time 
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
> reduc to 410s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.

2020-07-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164312#comment-17164312
 ] 

Hadoop QA commented on HDFS-15493:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
28s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15493 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008339/HDFS-15493.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29551/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Update block map and name cache in parallel while loading fsimage.
> --
>
> Key: HDFS-15493
> URL: https://issues.apache.org/jira/browse/HDFS-15493
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chengwei Wang
>Priority: Major
> Attachments: HDFS-15493.001.patch
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory. It would reduce time 
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
> reduc to 410s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.

2020-07-24 Thread Chengwei Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengwei Wang updated HDFS-15493:
-
Description: 
While loading INodeDirectorySection of fsimage, it will update name cache and 
block map after added inode file to inode directory. It would reduce time cost 
of fsimage loading to enable these steps run in parallel.

In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
reduc to 410s.

  was:
While loading INodeDirectorySection of fsimage, it will update name cache and 
block map after added inode file to inode directory. It would reduce time cost 
of fsimage loading to enable these steps run in parallel.

In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost is 
410s.


> Update block map and name cache in parallel while loading fsimage.
> --
>
> Key: HDFS-15493
> URL: https://issues.apache.org/jira/browse/HDFS-15493
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chengwei Wang
>Priority: Major
> Attachments: HDFS-15493.001.patch
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory. It would reduce time 
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
> reduc to 410s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.

2020-07-24 Thread Chengwei Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164307#comment-17164307
 ] 

Chengwei Wang commented on HDFS-15493:
--

submit patch v001.

> Update block map and name cache in parallel while loading fsimage.
> --
>
> Key: HDFS-15493
> URL: https://issues.apache.org/jira/browse/HDFS-15493
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chengwei Wang
>Priority: Major
> Attachments: HDFS-15493.001.patch
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory. It would reduce time 
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
> is 410s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.

2020-07-24 Thread Chengwei Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengwei Wang updated HDFS-15493:
-
Attachment: HDFS-15493.001.patch
Status: Patch Available  (was: Open)

> Update block map and name cache in parallel while loading fsimage.
> --
>
> Key: HDFS-15493
> URL: https://issues.apache.org/jira/browse/HDFS-15493
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chengwei Wang
>Priority: Major
> Attachments: HDFS-15493.001.patch
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory. It would reduce time 
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
> is 410s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.

2020-07-24 Thread Chengwei Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengwei Wang updated HDFS-15493:
-
Description: 
While loading INodeDirectorySection of fsimage, it will update name cache and 
block map after added inode file to inode directory. It would reduce time cost 
of fsimage loading to enable these steps run in parallel.

In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost is 
410s.

> Update block map and name cache in parallel while loading fsimage.
> --
>
> Key: HDFS-15493
> URL: https://issues.apache.org/jira/browse/HDFS-15493
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chengwei Wang
>Priority: Major
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory. It would reduce time 
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
> is 410s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.

2020-07-24 Thread Chengwei Wang (Jira)
Chengwei Wang created HDFS-15493:


 Summary: Update block map and name cache in parallel while loading 
fsimage.
 Key: HDFS-15493
 URL: https://issues.apache.org/jira/browse/HDFS-15493
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Chengwei Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread lindongdong (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164267#comment-17164267
 ] 

lindongdong commented on HDFS-15098:


[~seanlau] , Hi, the latest patch also is not OK.

 

for this two old methods, pls handle them in native code:


private native long init(long context, int mode, int alg, int padding, private 
native long init(long context, int mode, int alg, int padding, byte[] key, 
byte[] iv); byte[] key, byte[] iv);private native void clean(long context);
 

also, add the method for the same reason:


private OpensslCipher(long context, int alg, int padding) {
 

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, 
> HDFS-15098.009.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread AMC-team (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163966#comment-17163966
 ] 

AMC-team edited comment on HDFS-15443 at 7/24/20, 8:40 AM:
---

Sure, thanks for reminding!

Before that, I'm thinking that can we fall back the parameter value to its 
default value (4096) and give a log message? Just like what [~ayushtkn] suggest 
in [HDFS-15439|https://issues.apache.org/jira/browse/HDFS-15439]. Since this is 
a sanity check, falling back to default value can be a safe and conservative 
choice.

Do you have any suggestion? [~elgoiri] [~ayushtkn] [~jianghuazhu]


was (Author: amc-team):
Sure

But before that, I'm thinking that can we fall back the parameter value to its 
default value (4096) and give a log message. Just like what [~ayushtkn] suggest 
in [HDFS-15439|https://issues.apache.org/jira/browse/HDFS-15439]. Since this is 
a sanity check, falling back to default value can be a safe and conservative 
choice.

How do you think? [~elgoiri] [~ayushtkn] [~jianghuazhu]

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.

2020-07-24 Thread AMC-team (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163974#comment-17163974
 ] 

AMC-team edited comment on HDFS-15439 at 7/24/20, 8:39 AM:
---

Thanks [~ayushtkn] for the great suggestion! That's actually what I want to do 
initially: To correct the parameter value at the beginning and don't let the 
invalid value going through the program. 

I will try to upload a patch soon.


was (Author: amc-team):
Thanks [~ayushtkn] for the great suggestion! That's actually what I want to do 
initially: To correct the parameter value at the beginning and don't let the 
invalid value going through the program. 

 

I will try to upload a patch soon.

> Setting dfs.mover.retry.max.attempts to negative value will retry forever.
> --
>
> Key: HDFS-15439
> URL: https://issues.apache.org/jira/browse/HDFS-15439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15439.000.patch
>
>
> Configuration parameter "dfs.mover.retry.max.attempts" is to define the 
> maximum number of retries before the mover consider the move failed. There is 
> no checking code so this parameter can accept any int value.
> Theoratically, setting this value to <=0 should mean that no retry at all. 
> However, if you set the value to negative value. The checking condition for 
> retry failed will never satisfied because the if statement is "*if 
> (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by 
> retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* 
> {code:java}
> private Result processNamespace() throws IOException {
>   ... //wait for pending move to finish and retry the failed migration
>   if (hasFailed && !hasSuccess) {
> if (retryCount.get() == retryMaxAttempts) {
>   result.setRetryFailed();
>   LOG.error("Failed to move some block's after "
>   + retryMaxAttempts + " retries.");
>   return result;
> } else {
>   retryCount.incrementAndGet();
> }
>   } else {
> // Reset retry count if no failure.
> retryCount.set(0);
>   }
>   ...
> }
> {code}
> *How to fix*
> Add checking code of "dfs.mover.retry.max.attempts" to accept only 
> non-negative value or change the if statement condition when retry count 
> exceeds max attempts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164252#comment-17164252
 ] 

Hadoop QA commented on HDFS-15098:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 29m 
18s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15098 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008335/HDFS-15098.009.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29550/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, 
> HDFS-15098.009.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164243#comment-17164243
 ] 

Hadoop QA commented on HDFS-15098:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 24m 
36s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15098 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008334/HDFS-15098.009.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29549/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, 
> HDFS-15098.009.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread liusheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liusheng updated HDFS-15098:

Attachment: (was: HDFS-15098.009.patch)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, 
> HDFS-15098.009.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread liusheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liusheng updated HDFS-15098:

Attachment: HDFS-15098.009.patch
Status: Patch Available  (was: Open)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, 
> HDFS-15098.009.patch, HDFS-15098.009.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread liusheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liusheng updated HDFS-15098:

Status: Open  (was: Patch Available)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, 
> HDFS-15098.009.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread liusheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164228#comment-17164228
 ] 

liusheng commented on HDFS-15098:
-

Hi [~lindongdong],

Thanks for help to review, I have updated the 0009 patch.

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, 
> HDFS-15098.009.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread liusheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liusheng updated HDFS-15098:

Attachment: HDFS-15098.009.patch
  Assignee: (was: zZtai)
Status: Patch Available  (was: Open)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, 
> HDFS-15098.009.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread liusheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liusheng updated HDFS-15098:

Attachment: (was: HDFS-15098.009.patch)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, 
> HDFS-15098.009.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread liusheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liusheng updated HDFS-15098:

Status: Open  (was: Patch Available)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, 
> HDFS-15098.009.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread lindongdong (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164185#comment-17164185
 ] 

lindongdong commented on HDFS-15098:


[~zZtai], hi, I find some code just for debug, may be remove them:
{code:java}
System.out.println("Now Codec is OpensslAesCtrCryptoCodec");{code}
 

and this one:
{code:java}
public void log(GeneralSecurityException e) {
 LOG.warn(e.getMessage());
 }{code}
 

for compatibility, I think it is better to keep the old method that is without 
engine:
{code:java}
private native long init(long context, int mode, int alg, int padding, private 
native long init(long context, int mode, int alg, int padding,        byte[] 
key, byte[] iv); byte[] key, byte[] iv, long engine);

private native void clean(long context, long engine);{code}
 

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, 
> HDFS-15098.009.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15486) Costly sendResponse operation slows down async editlog handling

2020-07-24 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164173#comment-17164173
 ] 

Yiqun Lin commented on HDFS-15486:
--

Hi [~yuanbo] , thanks for the comment. We don't have the  centos version 
changed in our cluster, seems this is not really related.

[~John Smith], the place you pointed is exactly what we want to improve.

> Costly sendResponse operation slows down async editlog handling
> ---
>
> Key: HDFS-15486
> URL: https://issues.apache.org/jira/browse/HDFS-15486
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Yiqun Lin
>Priority: Major
> Attachments: Async-profile-(2).jpg, async-profile-(1).jpg
>
>
> When our cluster NameNode in a very high load, we find it often stuck in 
> Async-editlog handling.
> We use async-profile tool to get the flamegraph.
> !Async-profile-(2).jpg!
> This happened in that async editlog thread consumes Edit from the queue and 
> triggers the sendResponse call.
> But here the sendResponse call is a little expensive since our cluster 
> enabled the security env and will do some encode operations when doing the 
> return response operation.
> We often catch some moments of costly sendResponse operation when rpc call 
> queue is fulled.
> !async-profile-(1).jpg!
> Slowness on consuming Edit in async editlog will make Edit pending Queue 
> easily become the fulled state, then block its enqueue operation that is 
> invoked in writeLock type methods in FSNamesystem class.
> Here the enhancement is that we can use multiple thread to parallel execute 
> sendResponse call. sendResponse doesn't need use the write lock to do 
> protection, so this change is safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread liusheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liusheng updated HDFS-15098:

Attachment: HDFS-15098.009.patch
Status: Patch Available  (was: Open)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, 
> HDFS-15098.009.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-24 Thread liusheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liusheng updated HDFS-15098:

Status: Open  (was: Patch Available)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org