[jira] [Commented] (HDFS-15442) Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative value
[ https://issues.apache.org/jira/browse/HDFS-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164728#comment-17164728 ] Hadoop QA commented on HDFS-15442: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 22s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15442 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008389/HDFS-15442.000.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29562/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative > value > --- > > Key: HDFS-15442 > URL: https://issues.apache.org/jira/browse/HDFS-15442 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15442.000.patch > > > In current implementation of checkpoint image transfer, if the file length is > bigger than the configured value dfs.image.transfer.chunksize, it will use > chunked streaming mode to avoid internal buffering. This mode should be used > only if more than chunkSize data is present to upload, otherwise upload may > not happen sometimes. > {code:java} > //TransferFsImage.java > int chunkSize = (int) conf.getLongBytes( > DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_KEY, > DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_DEFAULT); > if (imageFile.length() > chunkSize) { > // using chunked streaming mode to support upload of 2GB+ files and to > // avoid internal buffering. > // this mode should be used only if more than chunkSize data is present > // to upload. otherwise upload may not happen sometimes. > connection.setChunkedStreamingMode(chunkSize); > } > {code} > There is no check code for this parameter. User may accidentally set this > value to a wrong value. Here, if the user set chunkSize to a negative value. > Chunked streaming mode will always be used. In > setChunkedStreamingMode(chunkSize), there is a correction code that if the > chunkSize is <=0, it will be change to DEFAULT_CHUNK_SIZE. > {code:java} > public void setChunkedStreamingMode (int chunklen) { > if (connected) { > throw new IllegalStateException ("Can't set streaming mode: already > connected"); > } > if (fixedContentLength != -1 || fixedContentLengthLong != -1) { > throw new IllegalStateException ("Fixed length streaming mode set"); > } > chunkLength = chunklen <=0? DEFAULT_CHUNK_SIZE : chunklen; > } > {code} > However, > *If the user set dfs.image.transfer.chunksize to value that <= 0, even for > images whose imageFile.length() < DEFAULT_CHUNK_SIZE will use chunked > streaming mode and may fail the upload as mentioned above.* *(This scenario > may not be common, but* *we can prevent users setting this param to an > extremely small value.**)* > *How to fix:* > Add checking code or correction code right after parsing the config value > before really use the value (setChunkedStreamingMode). > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15442) Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative value
[ https://issues.apache.org/jira/browse/HDFS-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164724#comment-17164724 ] AMC-team edited comment on HDFS-15442 at 7/25/20, 3:05 AM: --- upload a patch to fall back the invalid chunksize value to default right after parsing was (Author: amc-team): upload a patch to fall back the invalid chunksize value to default > Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative > value > --- > > Key: HDFS-15442 > URL: https://issues.apache.org/jira/browse/HDFS-15442 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15442.000.patch > > > In current implementation of checkpoint image transfer, if the file length is > bigger than the configured value dfs.image.transfer.chunksize, it will use > chunked streaming mode to avoid internal buffering. This mode should be used > only if more than chunkSize data is present to upload, otherwise upload may > not happen sometimes. > {code:java} > //TransferFsImage.java > int chunkSize = (int) conf.getLongBytes( > DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_KEY, > DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_DEFAULT); > if (imageFile.length() > chunkSize) { > // using chunked streaming mode to support upload of 2GB+ files and to > // avoid internal buffering. > // this mode should be used only if more than chunkSize data is present > // to upload. otherwise upload may not happen sometimes. > connection.setChunkedStreamingMode(chunkSize); > } > {code} > There is no check code for this parameter. User may accidentally set this > value to a wrong value. Here, if the user set chunkSize to a negative value. > Chunked streaming mode will always be used. In > setChunkedStreamingMode(chunkSize), there is a correction code that if the > chunkSize is <=0, it will be change to DEFAULT_CHUNK_SIZE. > {code:java} > public void setChunkedStreamingMode (int chunklen) { > if (connected) { > throw new IllegalStateException ("Can't set streaming mode: already > connected"); > } > if (fixedContentLength != -1 || fixedContentLengthLong != -1) { > throw new IllegalStateException ("Fixed length streaming mode set"); > } > chunkLength = chunklen <=0? DEFAULT_CHUNK_SIZE : chunklen; > } > {code} > However, > *If the user set dfs.image.transfer.chunksize to value that <= 0, even for > images whose imageFile.length() < DEFAULT_CHUNK_SIZE will use chunked > streaming mode and may fail the upload as mentioned above.* *(This scenario > may not be common, but* *we can prevent users setting this param to an > extremely small value.**)* > *How to fix:* > Add checking code or correction code right after parsing the config value > before really use the value (setChunkedStreamingMode). > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15442) Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative value
[ https://issues.apache.org/jira/browse/HDFS-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164724#comment-17164724 ] AMC-team commented on HDFS-15442: - upload a patch to fall back the invalid chunksize value to default > Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative > value > --- > > Key: HDFS-15442 > URL: https://issues.apache.org/jira/browse/HDFS-15442 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15442.000.patch > > > In current implementation of checkpoint image transfer, if the file length is > bigger than the configured value dfs.image.transfer.chunksize, it will use > chunked streaming mode to avoid internal buffering. This mode should be used > only if more than chunkSize data is present to upload, otherwise upload may > not happen sometimes. > {code:java} > //TransferFsImage.java > int chunkSize = (int) conf.getLongBytes( > DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_KEY, > DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_DEFAULT); > if (imageFile.length() > chunkSize) { > // using chunked streaming mode to support upload of 2GB+ files and to > // avoid internal buffering. > // this mode should be used only if more than chunkSize data is present > // to upload. otherwise upload may not happen sometimes. > connection.setChunkedStreamingMode(chunkSize); > } > {code} > There is no check code for this parameter. User may accidentally set this > value to a wrong value. Here, if the user set chunkSize to a negative value. > Chunked streaming mode will always be used. In > setChunkedStreamingMode(chunkSize), there is a correction code that if the > chunkSize is <=0, it will be change to DEFAULT_CHUNK_SIZE. > {code:java} > public void setChunkedStreamingMode (int chunklen) { > if (connected) { > throw new IllegalStateException ("Can't set streaming mode: already > connected"); > } > if (fixedContentLength != -1 || fixedContentLengthLong != -1) { > throw new IllegalStateException ("Fixed length streaming mode set"); > } > chunkLength = chunklen <=0? DEFAULT_CHUNK_SIZE : chunklen; > } > {code} > However, > *If the user set dfs.image.transfer.chunksize to value that <= 0, even for > images whose imageFile.length() < DEFAULT_CHUNK_SIZE will use chunked > streaming mode and may fail the upload as mentioned above.* *(This scenario > may not be common, but* *we can prevent users setting this param to an > extremely small value.**)* > *How to fix:* > Add checking code or correction code right after parsing the config value > before really use the value (setChunkedStreamingMode). > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15442) Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative value
[ https://issues.apache.org/jira/browse/HDFS-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15442: Attachment: (was: HDFS-15442.000.patch) > Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative > value > --- > > Key: HDFS-15442 > URL: https://issues.apache.org/jira/browse/HDFS-15442 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15442.000.patch > > > In current implementation of checkpoint image transfer, if the file length is > bigger than the configured value dfs.image.transfer.chunksize, it will use > chunked streaming mode to avoid internal buffering. This mode should be used > only if more than chunkSize data is present to upload, otherwise upload may > not happen sometimes. > {code:java} > //TransferFsImage.java > int chunkSize = (int) conf.getLongBytes( > DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_KEY, > DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_DEFAULT); > if (imageFile.length() > chunkSize) { > // using chunked streaming mode to support upload of 2GB+ files and to > // avoid internal buffering. > // this mode should be used only if more than chunkSize data is present > // to upload. otherwise upload may not happen sometimes. > connection.setChunkedStreamingMode(chunkSize); > } > {code} > There is no check code for this parameter. User may accidentally set this > value to a wrong value. Here, if the user set chunkSize to a negative value. > Chunked streaming mode will always be used. In > setChunkedStreamingMode(chunkSize), there is a correction code that if the > chunkSize is <=0, it will be change to DEFAULT_CHUNK_SIZE. > {code:java} > public void setChunkedStreamingMode (int chunklen) { > if (connected) { > throw new IllegalStateException ("Can't set streaming mode: already > connected"); > } > if (fixedContentLength != -1 || fixedContentLengthLong != -1) { > throw new IllegalStateException ("Fixed length streaming mode set"); > } > chunkLength = chunklen <=0? DEFAULT_CHUNK_SIZE : chunklen; > } > {code} > However, > *If the user set dfs.image.transfer.chunksize to value that <= 0, even for > images whose imageFile.length() < DEFAULT_CHUNK_SIZE will use chunked > streaming mode and may fail the upload as mentioned above.* *(This scenario > may not be common, but* *we can prevent users setting this param to an > extremely small value.**)* > *How to fix:* > Add checking code or correction code right after parsing the config value > before really use the value (setChunkedStreamingMode). > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15442) Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative value
[ https://issues.apache.org/jira/browse/HDFS-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15442: Attachment: HDFS-15442.000.patch Status: Patch Available (was: Open) > Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative > value > --- > > Key: HDFS-15442 > URL: https://issues.apache.org/jira/browse/HDFS-15442 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15442.000.patch > > > In current implementation of checkpoint image transfer, if the file length is > bigger than the configured value dfs.image.transfer.chunksize, it will use > chunked streaming mode to avoid internal buffering. This mode should be used > only if more than chunkSize data is present to upload, otherwise upload may > not happen sometimes. > {code:java} > //TransferFsImage.java > int chunkSize = (int) conf.getLongBytes( > DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_KEY, > DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_DEFAULT); > if (imageFile.length() > chunkSize) { > // using chunked streaming mode to support upload of 2GB+ files and to > // avoid internal buffering. > // this mode should be used only if more than chunkSize data is present > // to upload. otherwise upload may not happen sometimes. > connection.setChunkedStreamingMode(chunkSize); > } > {code} > There is no check code for this parameter. User may accidentally set this > value to a wrong value. Here, if the user set chunkSize to a negative value. > Chunked streaming mode will always be used. In > setChunkedStreamingMode(chunkSize), there is a correction code that if the > chunkSize is <=0, it will be change to DEFAULT_CHUNK_SIZE. > {code:java} > public void setChunkedStreamingMode (int chunklen) { > if (connected) { > throw new IllegalStateException ("Can't set streaming mode: already > connected"); > } > if (fixedContentLength != -1 || fixedContentLengthLong != -1) { > throw new IllegalStateException ("Fixed length streaming mode set"); > } > chunkLength = chunklen <=0? DEFAULT_CHUNK_SIZE : chunklen; > } > {code} > However, > *If the user set dfs.image.transfer.chunksize to value that <= 0, even for > images whose imageFile.length() < DEFAULT_CHUNK_SIZE will use chunked > streaming mode and may fail the upload as mentioned above.* *(This scenario > may not be common, but* *we can prevent users setting this param to an > extremely small value.**)* > *How to fix:* > Add checking code or correction code right after parsing the config value > before really use the value (setChunkedStreamingMode). > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15442) Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative value
[ https://issues.apache.org/jira/browse/HDFS-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15442: Attachment: HDFS-15442.000.patch > Image upload may fail if dfs.image.transfer.chunksize wrongly set to negative > value > --- > > Key: HDFS-15442 > URL: https://issues.apache.org/jira/browse/HDFS-15442 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15442.000.patch > > > In current implementation of checkpoint image transfer, if the file length is > bigger than the configured value dfs.image.transfer.chunksize, it will use > chunked streaming mode to avoid internal buffering. This mode should be used > only if more than chunkSize data is present to upload, otherwise upload may > not happen sometimes. > {code:java} > //TransferFsImage.java > int chunkSize = (int) conf.getLongBytes( > DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_KEY, > DFSConfigKeys.DFS_IMAGE_TRANSFER_CHUNKSIZE_DEFAULT); > if (imageFile.length() > chunkSize) { > // using chunked streaming mode to support upload of 2GB+ files and to > // avoid internal buffering. > // this mode should be used only if more than chunkSize data is present > // to upload. otherwise upload may not happen sometimes. > connection.setChunkedStreamingMode(chunkSize); > } > {code} > There is no check code for this parameter. User may accidentally set this > value to a wrong value. Here, if the user set chunkSize to a negative value. > Chunked streaming mode will always be used. In > setChunkedStreamingMode(chunkSize), there is a correction code that if the > chunkSize is <=0, it will be change to DEFAULT_CHUNK_SIZE. > {code:java} > public void setChunkedStreamingMode (int chunklen) { > if (connected) { > throw new IllegalStateException ("Can't set streaming mode: already > connected"); > } > if (fixedContentLength != -1 || fixedContentLengthLong != -1) { > throw new IllegalStateException ("Fixed length streaming mode set"); > } > chunkLength = chunklen <=0? DEFAULT_CHUNK_SIZE : chunklen; > } > {code} > However, > *If the user set dfs.image.transfer.chunksize to value that <= 0, even for > images whose imageFile.length() < DEFAULT_CHUNK_SIZE will use chunked > streaming mode and may fail the upload as mentioned above.* *(This scenario > may not be common, but* *we can prevent users setting this param to an > extremely small value.**)* > *How to fix:* > Add checking code or correction code right after parsing the config value > before really use the value (setChunkedStreamingMode). > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15440) The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.
[ https://issues.apache.org/jira/browse/HDFS-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164723#comment-17164723 ] Hadoop QA commented on HDFS-15440: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 26s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15440 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008387/HDFS-15440.000.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29561/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive. > -- > > Key: HDFS-15440 > URL: https://issues.apache.org/jira/browse/HDFS-15440 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15440.000.patch > > > In HDFS disk balancer, configuration parameter > "dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10 > means 10%) which defines a good enough move. > The description in hdfs-default.xml is not so clear to me how the value > actually calculates and works > {quote}When a disk balancer copy operation is proceeding, the datanode is > still active. So it might not be possible to move the exactly specified > amount of data. So tolerance allows us to define a percentage which defines a > good enough move. > {quote} > So I refer to the [official doc of HDFS disk > balancer|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html] > and the description is: > {quote}The tolerance percent specifies when we have reached a good enough > value for any copy step. For example, if you specify 10 then getting close to > 10% of the target value is good enough. It is to say if the move operation is > 20GB in size, if we can move 18GB (20 * (1-10%)) that operation is considered > successful. > {quote} > However from the source code in DiskBalancer.java > {code:java} > // Inflates bytesCopied and returns true or false. This allows us to stop > // copying if we have reached close enough. > private boolean isCloseEnough(DiskBalancerWorkItem item) { > long temp = item.getBytesCopied() + > ((item.getBytesCopied() * getBlockTolerancePercentage(item)) / 100); > return (item.getBytesToCopy() >= temp) ? false : true; > } > {code} > Here, if item.getBytesToCopy() = 20GB, then item.getBytesCopied() = 18GB is > still not enough because 20 > 18 + 18*0.1 > Here, we should check whether 18 > 20*(1-0.1). > The calculation in isLessThanNeeded() (Checks if a given block is less than > needed size to meet our goal.) is also not intuitive in the same way. > Also, this parameter doesn't have upper bound check, which means you can even > set it to 100% which is obviously wrong value. > *How to fix* > Although this may not lead severe failure, it is better to make it consistent > between doc and code, and also better to refine the description in > hdfs-default.xml to make it more precise and clear. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15440) The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.
[ https://issues.apache.org/jira/browse/HDFS-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164721#comment-17164721 ] AMC-team commented on HDFS-15440: - upload a patch to change the current logic and refine parameter check > The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive. > -- > > Key: HDFS-15440 > URL: https://issues.apache.org/jira/browse/HDFS-15440 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15440.000.patch > > > In HDFS disk balancer, configuration parameter > "dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10 > means 10%) which defines a good enough move. > The description in hdfs-default.xml is not so clear to me how the value > actually calculates and works > {quote}When a disk balancer copy operation is proceeding, the datanode is > still active. So it might not be possible to move the exactly specified > amount of data. So tolerance allows us to define a percentage which defines a > good enough move. > {quote} > So I refer to the [official doc of HDFS disk > balancer|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html] > and the description is: > {quote}The tolerance percent specifies when we have reached a good enough > value for any copy step. For example, if you specify 10 then getting close to > 10% of the target value is good enough. It is to say if the move operation is > 20GB in size, if we can move 18GB (20 * (1-10%)) that operation is considered > successful. > {quote} > However from the source code in DiskBalancer.java > {code:java} > // Inflates bytesCopied and returns true or false. This allows us to stop > // copying if we have reached close enough. > private boolean isCloseEnough(DiskBalancerWorkItem item) { > long temp = item.getBytesCopied() + > ((item.getBytesCopied() * getBlockTolerancePercentage(item)) / 100); > return (item.getBytesToCopy() >= temp) ? false : true; > } > {code} > Here, if item.getBytesToCopy() = 20GB, then item.getBytesCopied() = 18GB is > still not enough because 20 > 18 + 18*0.1 > Here, we should check whether 18 > 20*(1-0.1). > The calculation in isLessThanNeeded() (Checks if a given block is less than > needed size to meet our goal.) is also not intuitive in the same way. > Also, this parameter doesn't have upper bound check, which means you can even > set it to 100% which is obviously wrong value. > *How to fix* > Although this may not lead severe failure, it is better to make it consistent > between doc and code, and also better to refine the description in > hdfs-default.xml to make it more precise and clear. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15440) The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.
[ https://issues.apache.org/jira/browse/HDFS-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15440: Attachment: HDFS-15440.000.patch Status: Patch Available (was: Open) > The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive. > -- > > Key: HDFS-15440 > URL: https://issues.apache.org/jira/browse/HDFS-15440 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15440.000.patch > > > In HDFS disk balancer, configuration parameter > "dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10 > means 10%) which defines a good enough move. > The description in hdfs-default.xml is not so clear to me how the value > actually calculates and works > {quote}When a disk balancer copy operation is proceeding, the datanode is > still active. So it might not be possible to move the exactly specified > amount of data. So tolerance allows us to define a percentage which defines a > good enough move. > {quote} > So I refer to the [official doc of HDFS disk > balancer|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html] > and the description is: > {quote}The tolerance percent specifies when we have reached a good enough > value for any copy step. For example, if you specify 10 then getting close to > 10% of the target value is good enough. It is to say if the move operation is > 20GB in size, if we can move 18GB (20 * (1-10%)) that operation is considered > successful. > {quote} > However from the source code in DiskBalancer.java > {code:java} > // Inflates bytesCopied and returns true or false. This allows us to stop > // copying if we have reached close enough. > private boolean isCloseEnough(DiskBalancerWorkItem item) { > long temp = item.getBytesCopied() + > ((item.getBytesCopied() * getBlockTolerancePercentage(item)) / 100); > return (item.getBytesToCopy() >= temp) ? false : true; > } > {code} > Here, if item.getBytesToCopy() = 20GB, then item.getBytesCopied() = 18GB is > still not enough because 20 > 18 + 18*0.1 > Here, we should check whether 18 > 20*(1-0.1). > The calculation in isLessThanNeeded() (Checks if a given block is less than > needed size to meet our goal.) is also not intuitive in the same way. > Also, this parameter doesn't have upper bound check, which means you can even > set it to 100% which is obviously wrong value. > *How to fix* > Although this may not lead severe failure, it is better to make it consistent > between doc and code, and also better to refine the description in > hdfs-default.xml to make it more precise and clear. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15440) The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.
[ https://issues.apache.org/jira/browse/HDFS-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15440: Description: In HDFS disk balancer, configuration parameter "dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10 means 10%) which defines a good enough move. The description in hdfs-default.xml is not so clear to me how the value actually calculates and works {quote}When a disk balancer copy operation is proceeding, the datanode is still active. So it might not be possible to move the exactly specified amount of data. So tolerance allows us to define a percentage which defines a good enough move. {quote} So I refer to the [official doc of HDFS disk balancer|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html] and the description is: {quote}The tolerance percent specifies when we have reached a good enough value for any copy step. For example, if you specify 10 then getting close to 10% of the target value is good enough. It is to say if the move operation is 20GB in size, if we can move 18GB (20 * (1-10%)) that operation is considered successful. {quote} However from the source code in DiskBalancer.java {code:java} // Inflates bytesCopied and returns true or false. This allows us to stop // copying if we have reached close enough. private boolean isCloseEnough(DiskBalancerWorkItem item) { long temp = item.getBytesCopied() + ((item.getBytesCopied() * getBlockTolerancePercentage(item)) / 100); return (item.getBytesToCopy() >= temp) ? false : true; } {code} Here, if item.getBytesToCopy() = 20GB, then item.getBytesCopied() = 18GB is still not enough because 20 > 18 + 18*0.1 Here, we should check whether 18 > 20*(1-0.1). The calculation in isLessThanNeeded() (Checks if a given block is less than needed size to meet our goal.) is also not intuitive in the same way. Also, this parameter doesn't have upper bound check, which means you can even set it to 100% which is obviously wrong value. *How to fix* Although this may not lead severe failure, it is better to make it consistent between doc and code, and also better to refine the description in hdfs-default.xml to make it more precise and clear. was: In HDFS disk balancer, configuration parameter "dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10 means 10%) which defines a good enough move. The description in hdfs-default.xml is not so clear to me how the value actually calculates and works {quote}When a disk balancer copy operation is proceeding, the datanode is still active. So it might not be possible to move the exactly specified amount of data. So tolerance allows us to define a percentage which defines a good enough move. {quote} So I refer to the [official doc of HDFS disk balancer|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html] and the description is: {quote}The tolerance percent specifies when we have reached a good enough value for any copy step. For example, if you specify 10 then getting close to 10% of the target value is good enough. It is to say if the move operation is 20GB in size, if we can move 18GB (20 * (1-10%)) that operation is considered successful. {quote} However from the source code in DiskBalancer.java {code:java} // Inflates bytesCopied and returns true or false. This allows us to stop // copying if we have reached close enough. private boolean isCloseEnough(DiskBalancerWorkItem item) { long temp = item.getBytesCopied() + ((item.getBytesCopied() * getBlockTolerancePercentage(item)) / 100); return (item.getBytesToCopy() >= temp) ? false : true; } {code} Here, if item.getBytesToCopy() = 20GB, then item.getBytesCopied() = 18GB is still not enough because 20 > 18 + 18*0.1 The calculation in isLessThanNeeded() (Checks if a given block is less than needed size to meet our goal.) is also not intuitive in the same way. *How to fix* Although this may not lead severe failure, it is better to make it consistent between doc and code, and also better to refine the description in hdfs-default.xml to make it more precise and clear. > The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive. > -- > > Key: HDFS-15440 > URL: https://issues.apache.org/jira/browse/HDFS-15440 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > > In HDFS disk balancer, configuration parameter > "dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10 > means 10%) which defines a good enough move. > The description in hdfs-default.xml is not so clear to me how the value > actually calculates and works > {quote}When a disk balancer copy
[jira] [Updated] (HDFS-15440) The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.
[ https://issues.apache.org/jira/browse/HDFS-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15440: Summary: The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.(was: The doc of dfs.disk.balancer.block.tolerance.percent is misleading) > The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive. > -- > > Key: HDFS-15440 > URL: https://issues.apache.org/jira/browse/HDFS-15440 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > > In HDFS disk balancer, configuration parameter > "dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10 > means 10%) which defines a good enough move. > The description in hdfs-default.xml is not so clear to me how the value > actually calculates and works > {quote}When a disk balancer copy operation is proceeding, the datanode is > still active. So it might not be possible to move the exactly specified > amount of data. So tolerance allows us to define a percentage which defines a > good enough move. > {quote} > So I refer to the [official doc of HDFS disk > balancer|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html] > and the description is: > {quote}The tolerance percent specifies when we have reached a good enough > value for any copy step. For example, if you specify 10 then getting close to > 10% of the target value is good enough. It is to say if the move operation is > 20GB in size, if we can move 18GB (20 * (1-10%)) that operation is considered > successful. > {quote} > However from the source code in DiskBalancer.java > {code:java} > // Inflates bytesCopied and returns true or false. This allows us to stop > // copying if we have reached close enough. > private boolean isCloseEnough(DiskBalancerWorkItem item) { > long temp = item.getBytesCopied() + > ((item.getBytesCopied() * getBlockTolerancePercentage(item)) / 100); > return (item.getBytesToCopy() >= temp) ? false : true; > } > {code} > Here, if item.getBytesToCopy() = 20GB, then item.getBytesCopied() = 18GB is > still not enough because 20 > 18 + 18*0.1 > The calculation in isLessThanNeeded() (Checks if a given block is less than > needed size to meet our goal.) is also not intuitive in the same way. > *How to fix* > Although this may not lead severe failure, it is better to make it consistent > between doc and code, and also better to refine the description in > hdfs-default.xml to make it more precise and clear. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164706#comment-17164706 ] AMC-team edited comment on HDFS-15443 at 7/25/20, 1:52 AM: --- Thanks [~ayushtkn] for the great feedback. I refined the patch (change maxXceiverCount to this.maxXceiverCount) I will check the failed test was (Author: amc-team): Thanks [~ayushtkn] for the great feedback. I refined the patch (change maxXceiverCount to this.maxXceiverCount) I also checked standard output of the consistently failed test hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader and I think it is not relevant to this patch: {quote}java.lang.IllegalArgumentException: Path /test is not under hdfs://localhost:36213/test at com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) at org.apache.hadoop.fs.impl.AbstractMultipartUploader.checkPath(AbstractMultipartUploader.java:73) at org.apache.hadoop.fs.impl.AbstractMultipartUploader.abortUploadsUnderPath(AbstractMultipartUploader.java:136) at org.apache.hadoop.fs.contract.AbstractContractMultipartUploaderTest.teardown(AbstractContractMultipartUploaderTest.java:107) {quote} > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch, HDFS-15443.003.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164715#comment-17164715 ] Hadoop QA commented on HDFS-15098: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 18s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15098 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008383/HDFS-15098.009.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29560/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: liusheng >Priority: Major > Labels: sm4 > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, > HDFS-15098.009.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164714#comment-17164714 ] Hadoop QA commented on HDFS-15438: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 48s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15438 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008384/HDFS-15438.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29559/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164713#comment-17164713 ] AMC-team commented on HDFS-15438: - Thanks [~ayushtkn] for the feedback. I upload a patch to change the while loop condition and if condition to support value 0. What's more, IMHO, the current code logic may be more intuitive. Previously if we set dfs.disk.balancer.max.disk.errors to n, it can actually just tolerate n-1 errors. Now it can tolerate n errors, which is more consistent with the parameter's documentation: {quote}During a block move from a source to destination disk, we might encounter various errors. *This defines how many errors we can tolerate* before we declare a move between 2 disks (or a step) has failed. {quote} > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15438: Attachment: HDFS-15438.001.patch > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.
[ https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164711#comment-17164711 ] Hadoop QA commented on HDFS-15439: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 28s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15439 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008382/HDFS-15439.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29558/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > Setting dfs.mover.retry.max.attempts to negative value will retry forever. > -- > > Key: HDFS-15439 > URL: https://issues.apache.org/jira/browse/HDFS-15439 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15439.000.patch, HDFS-15439.001.patch > > > Configuration parameter "dfs.mover.retry.max.attempts" is to define the > maximum number of retries before the mover consider the move failed. There is > no checking code so this parameter can accept any int value. > Theoratically, setting this value to <=0 should mean that no retry at all. > However, if you set the value to negative value. The checking condition for > retry failed will never satisfied because the if statement is "*if > (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by > retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* > {code:java} > private Result processNamespace() throws IOException { > ... //wait for pending move to finish and retry the failed migration > if (hasFailed && !hasSuccess) { > if (retryCount.get() == retryMaxAttempts) { > result.setRetryFailed(); > LOG.error("Failed to move some block's after " > + retryMaxAttempts + " retries."); > return result; > } else { > retryCount.incrementAndGet(); > } > } else { > // Reset retry count if no failure. > retryCount.set(0); > } > ... > } > {code} > *How to fix* > Add checking code of "dfs.mover.retry.max.attempts" to accept only > non-negative value or change the if statement condition when retry count > exceeds max attempts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liusheng updated HDFS-15098: Attachment: HDFS-15098.009.patch Status: Patch Available (was: Open) > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: liusheng >Priority: Major > Labels: sm4 > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, > HDFS-15098.009.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liusheng updated HDFS-15098: Attachment: (was: HDFS-15098.009.patch) > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: liusheng >Priority: Major > Labels: sm4 > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liusheng updated HDFS-15098: Status: Open (was: Patch Available) > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: liusheng >Priority: Major > Labels: sm4 > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, > HDFS-15098.009.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.
[ https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164710#comment-17164710 ] AMC-team commented on HDFS-15439: - Upload a patch based on [~ayushtkn]'s suggestion. Thanks! > Setting dfs.mover.retry.max.attempts to negative value will retry forever. > -- > > Key: HDFS-15439 > URL: https://issues.apache.org/jira/browse/HDFS-15439 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15439.000.patch, HDFS-15439.001.patch > > > Configuration parameter "dfs.mover.retry.max.attempts" is to define the > maximum number of retries before the mover consider the move failed. There is > no checking code so this parameter can accept any int value. > Theoratically, setting this value to <=0 should mean that no retry at all. > However, if you set the value to negative value. The checking condition for > retry failed will never satisfied because the if statement is "*if > (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by > retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* > {code:java} > private Result processNamespace() throws IOException { > ... //wait for pending move to finish and retry the failed migration > if (hasFailed && !hasSuccess) { > if (retryCount.get() == retryMaxAttempts) { > result.setRetryFailed(); > LOG.error("Failed to move some block's after " > + retryMaxAttempts + " retries."); > return result; > } else { > retryCount.incrementAndGet(); > } > } else { > // Reset retry count if no failure. > retryCount.set(0); > } > ... > } > {code} > *How to fix* > Add checking code of "dfs.mover.retry.max.attempts" to accept only > non-negative value or change the if statement condition when retry count > exceeds max attempts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.
[ https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15439: Attachment: HDFS-15439.001.patch > Setting dfs.mover.retry.max.attempts to negative value will retry forever. > -- > > Key: HDFS-15439 > URL: https://issues.apache.org/jira/browse/HDFS-15439 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15439.000.patch, HDFS-15439.001.patch > > > Configuration parameter "dfs.mover.retry.max.attempts" is to define the > maximum number of retries before the mover consider the move failed. There is > no checking code so this parameter can accept any int value. > Theoratically, setting this value to <=0 should mean that no retry at all. > However, if you set the value to negative value. The checking condition for > retry failed will never satisfied because the if statement is "*if > (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by > retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* > {code:java} > private Result processNamespace() throws IOException { > ... //wait for pending move to finish and retry the failed migration > if (hasFailed && !hasSuccess) { > if (retryCount.get() == retryMaxAttempts) { > result.setRetryFailed(); > LOG.error("Failed to move some block's after " > + retryMaxAttempts + " retries."); > return result; > } else { > retryCount.incrementAndGet(); > } > } else { > // Reset retry count if no failure. > retryCount.set(0); > } > ... > } > {code} > *How to fix* > Add checking code of "dfs.mover.retry.max.attempts" to accept only > non-negative value or change the if statement condition when retry count > exceeds max attempts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164706#comment-17164706 ] AMC-team commented on HDFS-15443: - Thanks [~ayushtkn] for the great feedback. I refined the patch (change maxXceiverCount to this.maxXceiverCount) I also checked standard output of the consistently failed test hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader and I think it is not relevant to this patch: {quote}java.lang.IllegalArgumentException: Path /test is not under hdfs://localhost:36213/test at com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) at org.apache.hadoop.fs.impl.AbstractMultipartUploader.checkPath(AbstractMultipartUploader.java:73) at org.apache.hadoop.fs.impl.AbstractMultipartUploader.abortUploadsUnderPath(AbstractMultipartUploader.java:136) at org.apache.hadoop.fs.contract.AbstractContractMultipartUploaderTest.teardown(AbstractContractMultipartUploaderTest.java:107) {quote} > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch, HDFS-15443.003.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164707#comment-17164707 ] Hadoop QA commented on HDFS-15443: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 26s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15443 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008381/HDFS-15443.003.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29557/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch, HDFS-15443.003.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15443: Attachment: HDFS-15443.003.patch > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch, HDFS-15443.003.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164698#comment-17164698 ] Hadoop QA commented on HDFS-15443: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 25s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15443 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008379/HDFS-15443.003.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29556/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15443: Attachment: (was: HDFS-15443.003.patch) > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15443: Attachment: HDFS-15443.003.patch > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch, HDFS-15443.003.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15438: Comment: was deleted (was: I upload a new patch based on [~ayushtkn]'s suggestion. IMHO, the current logic may be better because previously if we set "dfs.disk.balancer.max.disk.errors" to n, it actually can just tolerate n-1 errors because of the while loop condition. Now it can tolerate n errors, which is more consistent with the documentation: {quote}During a block move from a source to destination disk, we might encounter various errors. This defines how many errors we can tolerate before we declare a move between 2 disks (or a step) has failed.{quote}) > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15443: Comment: was deleted (was: Thanks [~ayushtkn] for the great feedback. I refined the patch (change *maxXceiverCount* to *this.maxXceiverCount*) I also checked the Standard Output of the consistently failed test *hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader.* And I think it is not related to this patch {quote}(AbstractContractMultipartUploaderTest.java:teardown(110)) - Exeception in teardown java.lang.IllegalArgumentException: Path /test is not under hdfs://localhost:36213/test at com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) at org.apache.hadoop.fs.impl.AbstractMultipartUploader.checkPath(AbstractMultipartUploader.java:73) at org.apache.hadoop.fs.impl.AbstractMultipartUploader.abortUploadsUnderPath(AbstractMultipartUploader.java:136) at org.apache.hadoop.fs.contract.AbstractContractMultipartUploaderTest.teardown(AbstractContractMultipartUploaderTest.java:107) {quote}) > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.
[ https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15439: Comment: was deleted (was: upload a new patch based on [~ayushtkn]' feedback) > Setting dfs.mover.retry.max.attempts to negative value will retry forever. > -- > > Key: HDFS-15439 > URL: https://issues.apache.org/jira/browse/HDFS-15439 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15439.000.patch > > > Configuration parameter "dfs.mover.retry.max.attempts" is to define the > maximum number of retries before the mover consider the move failed. There is > no checking code so this parameter can accept any int value. > Theoratically, setting this value to <=0 should mean that no retry at all. > However, if you set the value to negative value. The checking condition for > retry failed will never satisfied because the if statement is "*if > (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by > retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* > {code:java} > private Result processNamespace() throws IOException { > ... //wait for pending move to finish and retry the failed migration > if (hasFailed && !hasSuccess) { > if (retryCount.get() == retryMaxAttempts) { > result.setRetryFailed(); > LOG.error("Failed to move some block's after " > + retryMaxAttempts + " retries."); > return result; > } else { > retryCount.incrementAndGet(); > } > } else { > // Reset retry count if no failure. > retryCount.set(0); > } > ... > } > {code} > *How to fix* > Add checking code of "dfs.mover.retry.max.attempts" to accept only > non-negative value or change the if statement condition when retry count > exceeds max attempts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15438: Attachment: (was: HDFS-15438.001.patch) > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.
[ https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15439: Attachment: (was: HDFS-15439.001.patch) > Setting dfs.mover.retry.max.attempts to negative value will retry forever. > -- > > Key: HDFS-15439 > URL: https://issues.apache.org/jira/browse/HDFS-15439 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15439.000.patch > > > Configuration parameter "dfs.mover.retry.max.attempts" is to define the > maximum number of retries before the mover consider the move failed. There is > no checking code so this parameter can accept any int value. > Theoratically, setting this value to <=0 should mean that no retry at all. > However, if you set the value to negative value. The checking condition for > retry failed will never satisfied because the if statement is "*if > (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by > retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* > {code:java} > private Result processNamespace() throws IOException { > ... //wait for pending move to finish and retry the failed migration > if (hasFailed && !hasSuccess) { > if (retryCount.get() == retryMaxAttempts) { > result.setRetryFailed(); > LOG.error("Failed to move some block's after " > + retryMaxAttempts + " retries."); > return result; > } else { > retryCount.incrementAndGet(); > } > } else { > // Reset retry count if no failure. > retryCount.set(0); > } > ... > } > {code} > *How to fix* > Add checking code of "dfs.mover.retry.max.attempts" to accept only > non-negative value or change the if statement condition when retry count > exceeds max attempts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15443: Attachment: (was: HDFS-15443.003.patch) > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164690#comment-17164690 ] AMC-team commented on HDFS-15438: - I upload a new patch based on [~ayushtkn]'s suggestion. IMHO, the current logic may be better because previously if we set "dfs.disk.balancer.max.disk.errors" to n, it actually can just tolerate n-1 errors because of the while loop condition. Now it can tolerate n errors, which is more consistent with the documentation: {quote}During a block move from a source to destination disk, we might encounter various errors. This defines how many errors we can tolerate before we declare a move between 2 disks (or a step) has failed.{quote} > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164689#comment-17164689 ] Hadoop QA commented on HDFS-15438: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 25s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15438 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008378/HDFS-15438.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29555/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15438: Attachment: HDFS-15438.001.patch > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.
[ https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164688#comment-17164688 ] Hadoop QA commented on HDFS-15439: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 16s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15439 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008376/HDFS-15439.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29554/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > Setting dfs.mover.retry.max.attempts to negative value will retry forever. > -- > > Key: HDFS-15439 > URL: https://issues.apache.org/jira/browse/HDFS-15439 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15439.000.patch, HDFS-15439.001.patch > > > Configuration parameter "dfs.mover.retry.max.attempts" is to define the > maximum number of retries before the mover consider the move failed. There is > no checking code so this parameter can accept any int value. > Theoratically, setting this value to <=0 should mean that no retry at all. > However, if you set the value to negative value. The checking condition for > retry failed will never satisfied because the if statement is "*if > (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by > retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* > {code:java} > private Result processNamespace() throws IOException { > ... //wait for pending move to finish and retry the failed migration > if (hasFailed && !hasSuccess) { > if (retryCount.get() == retryMaxAttempts) { > result.setRetryFailed(); > LOG.error("Failed to move some block's after " > + retryMaxAttempts + " retries."); > return result; > } else { > retryCount.incrementAndGet(); > } > } else { > // Reset retry count if no failure. > retryCount.set(0); > } > ... > } > {code} > *How to fix* > Add checking code of "dfs.mover.retry.max.attempts" to accept only > non-negative value or change the if statement condition when retry count > exceeds max attempts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.
[ https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164684#comment-17164684 ] AMC-team commented on HDFS-15439: - upload a new patch based on [~ayushtkn]' feedback > Setting dfs.mover.retry.max.attempts to negative value will retry forever. > -- > > Key: HDFS-15439 > URL: https://issues.apache.org/jira/browse/HDFS-15439 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15439.000.patch, HDFS-15439.001.patch > > > Configuration parameter "dfs.mover.retry.max.attempts" is to define the > maximum number of retries before the mover consider the move failed. There is > no checking code so this parameter can accept any int value. > Theoratically, setting this value to <=0 should mean that no retry at all. > However, if you set the value to negative value. The checking condition for > retry failed will never satisfied because the if statement is "*if > (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by > retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* > {code:java} > private Result processNamespace() throws IOException { > ... //wait for pending move to finish and retry the failed migration > if (hasFailed && !hasSuccess) { > if (retryCount.get() == retryMaxAttempts) { > result.setRetryFailed(); > LOG.error("Failed to move some block's after " > + retryMaxAttempts + " retries."); > return result; > } else { > retryCount.incrementAndGet(); > } > } else { > // Reset retry count if no failure. > retryCount.set(0); > } > ... > } > {code} > *How to fix* > Add checking code of "dfs.mover.retry.max.attempts" to accept only > non-negative value or change the if statement condition when retry count > exceeds max attempts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.
[ https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15439: Attachment: HDFS-15439.001.patch > Setting dfs.mover.retry.max.attempts to negative value will retry forever. > -- > > Key: HDFS-15439 > URL: https://issues.apache.org/jira/browse/HDFS-15439 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15439.000.patch, HDFS-15439.001.patch > > > Configuration parameter "dfs.mover.retry.max.attempts" is to define the > maximum number of retries before the mover consider the move failed. There is > no checking code so this parameter can accept any int value. > Theoratically, setting this value to <=0 should mean that no retry at all. > However, if you set the value to negative value. The checking condition for > retry failed will never satisfied because the if statement is "*if > (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by > retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* > {code:java} > private Result processNamespace() throws IOException { > ... //wait for pending move to finish and retry the failed migration > if (hasFailed && !hasSuccess) { > if (retryCount.get() == retryMaxAttempts) { > result.setRetryFailed(); > LOG.error("Failed to move some block's after " > + retryMaxAttempts + " retries."); > return result; > } else { > retryCount.incrementAndGet(); > } > } else { > // Reset retry count if no failure. > retryCount.set(0); > } > ... > } > {code} > *How to fix* > Add checking code of "dfs.mover.retry.max.attempts" to accept only > non-negative value or change the if statement condition when retry count > exceeds max attempts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164680#comment-17164680 ] Hadoop QA commented on HDFS-15443: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 25s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15443 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008375/HDFS-15443.003.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29553/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch, HDFS-15443.003.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164678#comment-17164678 ] Hadoop QA commented on HDFS-15443: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 28m 48s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15443 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008374/HDFS-15443.003.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29552/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch, HDFS-15443.003.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164677#comment-17164677 ] AMC-team edited comment on HDFS-15443 at 7/24/20, 11:34 PM: Thanks [~ayushtkn] for the great feedback. I refined the patch (change *maxXceiverCount* to *this.maxXceiverCount*) I also checked the Standard Output of the consistently failed test *hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader.* And I think it is not related to this patch {quote}(AbstractContractMultipartUploaderTest.java:teardown(110)) - Exeception in teardown java.lang.IllegalArgumentException: Path /test is not under hdfs://localhost:36213/test at com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) at org.apache.hadoop.fs.impl.AbstractMultipartUploader.checkPath(AbstractMultipartUploader.java:73) at org.apache.hadoop.fs.impl.AbstractMultipartUploader.abortUploadsUnderPath(AbstractMultipartUploader.java:136) at org.apache.hadoop.fs.contract.AbstractContractMultipartUploaderTest.teardown(AbstractContractMultipartUploaderTest.java:107) {quote} was (Author: amc-team): Thanks [~ayushtkn] for the great feedback. I refined the patch (change *maxXceiverCount* to *this.maxXceiverCount*) I also checked the Standard Output of the consistently failed test *hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader.* And I think it is not related to this patch {quote}(AbstractContractMultipartUploaderTest.java:teardown(110)) - Exeception in teardown java.lang.IllegalArgumentException: Path /test is not under hdfs://localhost:36213/test at com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) at org.apache.hadoop.fs.impl.AbstractMultipartUploader.checkPath(AbstractMultipartUploader.java:73) at org.apache.hadoop.fs.impl.AbstractMultipartUploader.abortUploadsUnderPath(AbstractMultipartUploader.java:136) at org.apache.hadoop.fs.contract.AbstractContractMultipartUploaderTest.teardown(AbstractContractMultipartUploaderTest.java:107) {quote} > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch, HDFS-15443.003.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164677#comment-17164677 ] AMC-team commented on HDFS-15443: - Thanks [~ayushtkn] for the great feedback. I refined the patch (change *maxXceiverCount* to *this.maxXceiverCount*) I also checked the Standard Output of the consistently failed test *hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader.* And I think it is not related to this patch {quote}(AbstractContractMultipartUploaderTest.java:teardown(110)) - Exeception in teardown java.lang.IllegalArgumentException: Path /test is not under hdfs://localhost:36213/test at com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) at org.apache.hadoop.fs.impl.AbstractMultipartUploader.checkPath(AbstractMultipartUploader.java:73) at org.apache.hadoop.fs.impl.AbstractMultipartUploader.abortUploadsUnderPath(AbstractMultipartUploader.java:136) at org.apache.hadoop.fs.contract.AbstractContractMultipartUploaderTest.teardown(AbstractContractMultipartUploaderTest.java:107) {quote} > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch, HDFS-15443.003.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15443: Attachment: HDFS-15443.003.patch > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch, HDFS-15443.003.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15443: Attachment: (was: HDFS-15443.003.patch) > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AMC-team updated HDFS-15443: Attachment: HDFS-15443.003.patch > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch, HDFS-15443.003.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.
[ https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164356#comment-17164356 ] Chengwei Wang commented on HDFS-15493: -- Thanks Stephen O'Donnell for your info about HDFS-13693, I will try to apply and test it. I'd really appreciate it if you can help me review this patch. > Update block map and name cache in parallel while loading fsimage. > -- > > Key: HDFS-15493 > URL: https://issues.apache.org/jira/browse/HDFS-15493 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Chengwei Wang >Priority: Major > Attachments: HDFS-15493.001.patch > > > While loading INodeDirectorySection of fsimage, it will update name cache and > block map after added inode file to inode directory. It would reduce time > cost of fsimage loading to enable these steps run in parallel. > In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load > fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost > reduc to 410s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164350#comment-17164350 ] Ayush Saxena commented on HDFS-15443: - In such a case there is only two solutions, first is as soon as you get to know the conf is invalid you fail the operation and alarm it out, Second is that you observe the value is invalid you correct it and use the default one, as it is done in many places, like {{DatanodeAdminMonitorBase}} and bunch of places others, The only thing that I feel what we can't do is tolerate the invalid value and go ahead with that only, by giving it a pass where it is creating trouble, which initially HDFS-15439 tends to do, That is why I though you don't want to crash, better change to default. Choice between the two approaches #1 or #2 goes depending on case by case basis Here in case of Datanode, it seems to be a long running service and one of the critical part of the cluster, I think here crashing and alarming for wrong conf should be better. [~AMC-team] I think we can keep the current patch, just confirm the jenkins warnings aren't related. > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164346#comment-17164346 ] Hadoop QA commented on HDFS-15098: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:blue}0{color} | {color:blue} markdownlint {color} | {color:blue} 0m 0s{color} | {color:blue} markdownlint was not available. {color} | | {color:blue}0{color} | {color:blue} prototool {color} | {color:blue} 0m 0s{color} | {color:blue} prototool was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 4s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 23m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 25m 40s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 45s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 58s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 10m 1s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 34s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 45s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 14s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 1m 14s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} golang {color} | {color:red} 1m 14s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 14s{color} | {color:red} root in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 26s{color} | {color:orange} root: The patch generated 3 new + 213 unchanged - 8 fixed = 216 total (was 221) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 38s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 51s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 4s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 0m 58s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 36s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 47s{color} | {color:red}
[jira] [Commented] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.
[ https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164336#comment-17164336 ] Stephen O'Donnell commented on HDFS-15493: -- This looks like another good speed improvement. I will try to review this in the next day or two. For info, there is also HDFS-13693 which may give you some additional improvement. > Update block map and name cache in parallel while loading fsimage. > -- > > Key: HDFS-15493 > URL: https://issues.apache.org/jira/browse/HDFS-15493 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Chengwei Wang >Priority: Major > Attachments: HDFS-15493.001.patch > > > While loading INodeDirectorySection of fsimage, it will update name cache and > block map after added inode file to inode directory. It would reduce time > cost of fsimage loading to enable these steps run in parallel. > In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load > fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost > reduc to 410s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.
[ https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164307#comment-17164307 ] Chengwei Wang edited comment on HDFS-15493 at 7/24/20, 10:03 AM: - submit patch v001. Similar to HDFS-14617,it use threads to update name cache and blocks map in parallel. In our test case, it can reduce more than 10% time cost of loading fsimage. The feature can be enabled/disabled by config dfs.image.blocksmap.update.async=true dfs.image.blocksmap.update.threads=4 was (Author: smarthan): submit patch v001. > Update block map and name cache in parallel while loading fsimage. > -- > > Key: HDFS-15493 > URL: https://issues.apache.org/jira/browse/HDFS-15493 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Chengwei Wang >Priority: Major > Attachments: HDFS-15493.001.patch > > > While loading INodeDirectorySection of fsimage, it will update name cache and > block map after added inode file to inode directory. It would reduce time > cost of fsimage loading to enable these steps run in parallel. > In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load > fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost > reduc to 410s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164316#comment-17164316 ] liusheng commented on HDFS-15098: - Hi [~lindongdong], Sorry, what do you mean about "for this two old methods, pls handle them in native code" ? please check the new 0009 patch. Actually, I don't think there is compatibility problems in the places, I have tested functionalities and running tests OK locally (both AES and SM4), can you please explain more ? > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: liusheng >Priority: Major > Labels: sm4 > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, > HDFS-15098.009.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.
[ https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengwei Wang updated HDFS-15493: - External issue ID: HDFS-14617 > Update block map and name cache in parallel while loading fsimage. > -- > > Key: HDFS-15493 > URL: https://issues.apache.org/jira/browse/HDFS-15493 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Chengwei Wang >Priority: Major > Attachments: HDFS-15493.001.patch > > > While loading INodeDirectorySection of fsimage, it will update name cache and > block map after added inode file to inode directory. It would reduce time > cost of fsimage loading to enable these steps run in parallel. > In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load > fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost > reduc to 410s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.
[ https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengwei Wang updated HDFS-15493: - External issue ID: (was: HDFS-14617) > Update block map and name cache in parallel while loading fsimage. > -- > > Key: HDFS-15493 > URL: https://issues.apache.org/jira/browse/HDFS-15493 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Chengwei Wang >Priority: Major > Attachments: HDFS-15493.001.patch > > > While loading INodeDirectorySection of fsimage, it will update name cache and > block map after added inode file to inode directory. It would reduce time > cost of fsimage loading to enable these steps run in parallel. > In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load > fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost > reduc to 410s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.
[ https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164312#comment-17164312 ] Hadoop QA commented on HDFS-15493: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 28s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15493 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008339/HDFS-15493.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29551/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > Update block map and name cache in parallel while loading fsimage. > -- > > Key: HDFS-15493 > URL: https://issues.apache.org/jira/browse/HDFS-15493 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Chengwei Wang >Priority: Major > Attachments: HDFS-15493.001.patch > > > While loading INodeDirectorySection of fsimage, it will update name cache and > block map after added inode file to inode directory. It would reduce time > cost of fsimage loading to enable these steps run in parallel. > In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load > fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost > reduc to 410s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.
[ https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengwei Wang updated HDFS-15493: - Description: While loading INodeDirectorySection of fsimage, it will update name cache and block map after added inode file to inode directory. It would reduce time cost of fsimage loading to enable these steps run in parallel. In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost reduc to 410s. was: While loading INodeDirectorySection of fsimage, it will update name cache and block map after added inode file to inode directory. It would reduce time cost of fsimage loading to enable these steps run in parallel. In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost is 410s. > Update block map and name cache in parallel while loading fsimage. > -- > > Key: HDFS-15493 > URL: https://issues.apache.org/jira/browse/HDFS-15493 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Chengwei Wang >Priority: Major > Attachments: HDFS-15493.001.patch > > > While loading INodeDirectorySection of fsimage, it will update name cache and > block map after added inode file to inode directory. It would reduce time > cost of fsimage loading to enable these steps run in parallel. > In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load > fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost > reduc to 410s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.
[ https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164307#comment-17164307 ] Chengwei Wang commented on HDFS-15493: -- submit patch v001. > Update block map and name cache in parallel while loading fsimage. > -- > > Key: HDFS-15493 > URL: https://issues.apache.org/jira/browse/HDFS-15493 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Chengwei Wang >Priority: Major > Attachments: HDFS-15493.001.patch > > > While loading INodeDirectorySection of fsimage, it will update name cache and > block map after added inode file to inode directory. It would reduce time > cost of fsimage loading to enable these steps run in parallel. > In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load > fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost > is 410s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.
[ https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengwei Wang updated HDFS-15493: - Attachment: HDFS-15493.001.patch Status: Patch Available (was: Open) > Update block map and name cache in parallel while loading fsimage. > -- > > Key: HDFS-15493 > URL: https://issues.apache.org/jira/browse/HDFS-15493 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Chengwei Wang >Priority: Major > Attachments: HDFS-15493.001.patch > > > While loading INodeDirectorySection of fsimage, it will update name cache and > block map after added inode file to inode directory. It would reduce time > cost of fsimage loading to enable these steps run in parallel. > In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load > fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost > is 410s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.
[ https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengwei Wang updated HDFS-15493: - Description: While loading INodeDirectorySection of fsimage, it will update name cache and block map after added inode file to inode directory. It would reduce time cost of fsimage loading to enable these steps run in parallel. In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost is 410s. > Update block map and name cache in parallel while loading fsimage. > -- > > Key: HDFS-15493 > URL: https://issues.apache.org/jira/browse/HDFS-15493 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Chengwei Wang >Priority: Major > > While loading INodeDirectorySection of fsimage, it will update name cache and > block map after added inode file to inode directory. It would reduce time > cost of fsimage loading to enable these steps run in parallel. > In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load > fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost > is 410s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.
Chengwei Wang created HDFS-15493: Summary: Update block map and name cache in parallel while loading fsimage. Key: HDFS-15493 URL: https://issues.apache.org/jira/browse/HDFS-15493 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Chengwei Wang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164267#comment-17164267 ] lindongdong commented on HDFS-15098: [~seanlau] , Hi, the latest patch also is not OK. for this two old methods, pls handle them in native code: private native long init(long context, int mode, int alg, int padding, private native long init(long context, int mode, int alg, int padding, byte[] key, byte[] iv); byte[] key, byte[] iv);private native void clean(long context); also, add the method for the same reason: private OpensslCipher(long context, int alg, int padding) { > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: liusheng >Priority: Major > Labels: sm4 > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, > HDFS-15098.009.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163966#comment-17163966 ] AMC-team edited comment on HDFS-15443 at 7/24/20, 8:40 AM: --- Sure, thanks for reminding! Before that, I'm thinking that can we fall back the parameter value to its default value (4096) and give a log message? Just like what [~ayushtkn] suggest in [HDFS-15439|https://issues.apache.org/jira/browse/HDFS-15439]. Since this is a sanity check, falling back to default value can be a safe and conservative choice. Do you have any suggestion? [~elgoiri] [~ayushtkn] [~jianghuazhu] was (Author: amc-team): Sure But before that, I'm thinking that can we fall back the parameter value to its default value (4096) and give a log message. Just like what [~ayushtkn] suggest in [HDFS-15439|https://issues.apache.org/jira/browse/HDFS-15439]. Since this is a sanity check, falling back to default value can be a safe and conservative choice. How do you think? [~elgoiri] [~ayushtkn] [~jianghuazhu] > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.
[ https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163974#comment-17163974 ] AMC-team edited comment on HDFS-15439 at 7/24/20, 8:39 AM: --- Thanks [~ayushtkn] for the great suggestion! That's actually what I want to do initially: To correct the parameter value at the beginning and don't let the invalid value going through the program. I will try to upload a patch soon. was (Author: amc-team): Thanks [~ayushtkn] for the great suggestion! That's actually what I want to do initially: To correct the parameter value at the beginning and don't let the invalid value going through the program. I will try to upload a patch soon. > Setting dfs.mover.retry.max.attempts to negative value will retry forever. > -- > > Key: HDFS-15439 > URL: https://issues.apache.org/jira/browse/HDFS-15439 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15439.000.patch > > > Configuration parameter "dfs.mover.retry.max.attempts" is to define the > maximum number of retries before the mover consider the move failed. There is > no checking code so this parameter can accept any int value. > Theoratically, setting this value to <=0 should mean that no retry at all. > However, if you set the value to negative value. The checking condition for > retry failed will never satisfied because the if statement is "*if > (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by > retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* > {code:java} > private Result processNamespace() throws IOException { > ... //wait for pending move to finish and retry the failed migration > if (hasFailed && !hasSuccess) { > if (retryCount.get() == retryMaxAttempts) { > result.setRetryFailed(); > LOG.error("Failed to move some block's after " > + retryMaxAttempts + " retries."); > return result; > } else { > retryCount.incrementAndGet(); > } > } else { > // Reset retry count if no failure. > retryCount.set(0); > } > ... > } > {code} > *How to fix* > Add checking code of "dfs.mover.retry.max.attempts" to accept only > non-negative value or change the if statement condition when retry count > exceeds max attempts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164252#comment-17164252 ] Hadoop QA commented on HDFS-15098: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 29m 18s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15098 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008335/HDFS-15098.009.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29550/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: liusheng >Priority: Major > Labels: sm4 > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, > HDFS-15098.009.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164243#comment-17164243 ] Hadoop QA commented on HDFS-15098: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 24m 36s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15098 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008334/HDFS-15098.009.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29549/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: liusheng >Priority: Major > Labels: sm4 > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, > HDFS-15098.009.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liusheng updated HDFS-15098: Attachment: (was: HDFS-15098.009.patch) > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: liusheng >Priority: Major > Labels: sm4 > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, > HDFS-15098.009.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liusheng updated HDFS-15098: Attachment: HDFS-15098.009.patch Status: Patch Available (was: Open) > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: liusheng >Priority: Major > Labels: sm4 > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, > HDFS-15098.009.patch, HDFS-15098.009.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liusheng updated HDFS-15098: Status: Open (was: Patch Available) > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: liusheng >Priority: Major > Labels: sm4 > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, > HDFS-15098.009.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164228#comment-17164228 ] liusheng commented on HDFS-15098: - Hi [~lindongdong], Thanks for help to review, I have updated the 0009 patch. > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: liusheng >Priority: Major > Labels: sm4 > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, > HDFS-15098.009.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liusheng updated HDFS-15098: Attachment: HDFS-15098.009.patch Assignee: (was: zZtai) Status: Patch Available (was: Open) > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: liusheng >Priority: Major > Labels: sm4 > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, > HDFS-15098.009.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liusheng updated HDFS-15098: Attachment: (was: HDFS-15098.009.patch) > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: liusheng >Assignee: zZtai >Priority: Major > Labels: sm4 > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, > HDFS-15098.009.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liusheng updated HDFS-15098: Status: Open (was: Patch Available) > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: liusheng >Assignee: zZtai >Priority: Major > Labels: sm4 > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, > HDFS-15098.009.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164185#comment-17164185 ] lindongdong commented on HDFS-15098: [~zZtai], hi, I find some code just for debug, may be remove them: {code:java} System.out.println("Now Codec is OpensslAesCtrCryptoCodec");{code} and this one: {code:java} public void log(GeneralSecurityException e) { LOG.warn(e.getMessage()); }{code} for compatibility, I think it is better to keep the old method that is without engine: {code:java} private native long init(long context, int mode, int alg, int padding, private native long init(long context, int mode, int alg, int padding, byte[] key, byte[] iv); byte[] key, byte[] iv, long engine); private native void clean(long context, long engine);{code} > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: liusheng >Assignee: zZtai >Priority: Major > Labels: sm4 > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, > HDFS-15098.009.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15486) Costly sendResponse operation slows down async editlog handling
[ https://issues.apache.org/jira/browse/HDFS-15486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164173#comment-17164173 ] Yiqun Lin commented on HDFS-15486: -- Hi [~yuanbo] , thanks for the comment. We don't have the centos version changed in our cluster, seems this is not really related. [~John Smith], the place you pointed is exactly what we want to improve. > Costly sendResponse operation slows down async editlog handling > --- > > Key: HDFS-15486 > URL: https://issues.apache.org/jira/browse/HDFS-15486 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Yiqun Lin >Priority: Major > Attachments: Async-profile-(2).jpg, async-profile-(1).jpg > > > When our cluster NameNode in a very high load, we find it often stuck in > Async-editlog handling. > We use async-profile tool to get the flamegraph. > !Async-profile-(2).jpg! > This happened in that async editlog thread consumes Edit from the queue and > triggers the sendResponse call. > But here the sendResponse call is a little expensive since our cluster > enabled the security env and will do some encode operations when doing the > return response operation. > We often catch some moments of costly sendResponse operation when rpc call > queue is fulled. > !async-profile-(1).jpg! > Slowness on consuming Edit in async editlog will make Edit pending Queue > easily become the fulled state, then block its enqueue operation that is > invoked in writeLock type methods in FSNamesystem class. > Here the enhancement is that we can use multiple thread to parallel execute > sendResponse call. sendResponse doesn't need use the write lock to do > protection, so this change is safe. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liusheng updated HDFS-15098: Attachment: HDFS-15098.009.patch Status: Patch Available (was: Open) > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: liusheng >Assignee: zZtai >Priority: Major > Labels: sm4 > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, > HDFS-15098.009.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liusheng updated HDFS-15098: Status: Open (was: Patch Available) > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: liusheng >Assignee: zZtai >Priority: Major > Labels: sm4 > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org