[jira] [Updated] (HDDS-2032) Ozone client should retry writes in case of any ratis/stateMachine exceptions
[ https://issues.apache.org/jira/browse/HDDS-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2032: -- Description: Currently, Ozone client retry writes on a different pipeline or container in case of some specific exceptions. But in case, it sees exception such as DISK_FULL, CONTAINER_UNHEALTHY or any corruption , it just aborts the write. In general, the every such exception on the client should be a retriable exception in ozone client and on some specific exceptions, it should take some more specific exception like excluding certain containers or pipelines while retrying or informing SCM of a corrupt replica etc. (was: Currently, Ozone client retry writes on a different pipeline or container in case of some specific exceptions. But in case, it sees exception such as DISK_FULL, CONTAINER_UNHEALTHY or any corruption , it just aborts the right. In general, the every such exception on the client should be a retriable exception in ozone client and on some specific exceptions, it should take some more specific exception like excluding certain containers or pipelines while retrying or informing SCM of a corrupt replica etc.) > Ozone client should retry writes in case of any ratis/stateMachine exceptions > - > > Key: HDDS-2032 > URL: https://issues.apache.org/jira/browse/HDDS-2032 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > Currently, Ozone client retry writes on a different pipeline or container in > case of some specific exceptions. But in case, it sees exception such as > DISK_FULL, CONTAINER_UNHEALTHY or any corruption , it just aborts the write. > In general, the every such exception on the client should be a retriable > exception in ozone client and on some specific exceptions, it should take > some more specific exception like excluding certain containers or pipelines > while retrying or informing SCM of a corrupt replica etc. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2032) Ozone client should retry writes in case of any ratis/stateMachine exceptions
[ https://issues.apache.org/jira/browse/HDDS-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2032: -- Summary: Ozone client should retry writes in case of any ratis/stateMachine exceptions (was: Ozone client retry writes in case of any ratis/stateMachine exceptions) > Ozone client should retry writes in case of any ratis/stateMachine exceptions > - > > Key: HDDS-2032 > URL: https://issues.apache.org/jira/browse/HDDS-2032 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > Currently, Ozone client retry writes on a different pipeline or container in > case of some specific exceptions. But in case, it sees exception such as > DISK_FULL, CONTAINER_UNHEALTHY or any corruption , it just aborts the right. > In general, the every such exception on the client should be a retriable > exception in ozone client and on some specific exceptions, it should take > some more specific exception like excluding certain containers or pipelines > while retrying or informing SCM of a corrupt replica etc. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2032) Ozone client retry writes in case of any ratis/stateMachine exceptions
Shashikant Banerjee created HDDS-2032: - Summary: Ozone client retry writes in case of any ratis/stateMachine exceptions Key: HDDS-2032 URL: https://issues.apache.org/jira/browse/HDDS-2032 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Affects Versions: 0.5.0 Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 Currently, Ozone client retry writes on a different pipeline or container in case of some specific exceptions. But in case, it sees exception such as DISK_FULL, CONTAINER_UNHEALTHY or any corruption , it just aborts the right. In general, the every such exception on the client should be a retriable exception in ozone client and on some specific exceptions, it should take some more specific exception like excluding certain containers or pipelines while retrying or informing SCM of a corrupt replica etc. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12904) Add DataTransferThrottler to the Datanode transfers
[ https://issues.apache.org/jira/browse/HDFS-12904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915466#comment-16915466 ] Lisheng Sun commented on HDFS-12904: I confirmed the two UT are ok in my local. So the UT failures are unrelated to this patch. [~elgoiri] Could you help take a review for the patch? Thank you. > Add DataTransferThrottler to the Datanode transfers > --- > > Key: HDFS-12904 > URL: https://issues.apache.org/jira/browse/HDFS-12904 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Reporter: Íñigo Goiri >Assignee: Lisheng Sun >Priority: Minor > Attachments: HDFS-12904.000.patch, HDFS-12904.001.patch, > HDFS-12904.002.patch, HDFS-12904.003.patch > > > The {{DataXceiverServer}} already uses throttling for the balancing. The > Datanode should also allow throttling the regular data transfers. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14748) Make DataNodePeerMetrics#minOutlierDetectionSamples configurable
[ https://issues.apache.org/jira/browse/HDFS-14748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915464#comment-16915464 ] Lisheng Sun commented on HDFS-14748: hi [~xkrogen] [~jojochuang] Could you help take a reiview for this patch? Thank you. > Make DataNodePeerMetrics#minOutlierDetectionSamples configurable > > > Key: HDFS-14748 > URL: https://issues.apache.org/jira/browse/HDFS-14748 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.9.0, 3.0.0-alpha4 >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14748.001.patch, HDFS-14748.002.patch, > HDFS-14748.003.patch > > > Slow node monitoring is to transfer 1000 packets between DataNodes within > three hours before they are eligible to calculate and upload transmission > delays to the namenode. > But if Write data is very small and number of packets is less than 1000, the > slow node will not be reported to NameNode, so make > DataNodePeerMetrics#minOutlierDetectionSamplesconfigurable. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11246) FSNameSystem#logAuditEvent should be called outside the read or write locks
[ https://issues.apache.org/jira/browse/HDFS-11246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915460#comment-16915460 ] Wei-Chiu Chuang commented on HDFS-11246: +1 any one else like to review/comment? Otherwise I'll commit the v011 patch in a few days before it goes stale again. > FSNameSystem#logAuditEvent should be called outside the read or write locks > --- > > Key: HDFS-11246 > URL: https://issues.apache.org/jira/browse/HDFS-11246 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Kuhu Shukla >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-11246.001.patch, HDFS-11246.002.patch, > HDFS-11246.003.patch, HDFS-11246.004.patch, HDFS-11246.005.patch, > HDFS-11246.006.patch, HDFS-11246.007.patch, HDFS-11246.008.patch, > HDFS-11246.009.patch, HDFS-11246.010.patch, HDFS-11246.011.patch > > > {code} > readLock(); > boolean success = true; > ContentSummary cs; > try { > checkOperation(OperationCategory.READ); > cs = FSDirStatAndListingOp.getContentSummary(dir, src); > } catch (AccessControlException ace) { > success = false; > logAuditEvent(success, operationName, src); > throw ace; > } finally { > readUnlock(operationName); > } > {code} > It would be nice to have audit logging outside the lock esp. in scenarios > where applications hammer a given operation several times. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11246) FSNameSystem#logAuditEvent should be called outside the read or write locks
[ https://issues.apache.org/jira/browse/HDFS-11246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915452#comment-16915452 ] He Xiaoqiao commented on HDFS-11246: check failed unit tests and it seems that most of them is related with OOM, I try to test them at local and all passed. Please help to check if have time. Thanks. > FSNameSystem#logAuditEvent should be called outside the read or write locks > --- > > Key: HDFS-11246 > URL: https://issues.apache.org/jira/browse/HDFS-11246 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Kuhu Shukla >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-11246.001.patch, HDFS-11246.002.patch, > HDFS-11246.003.patch, HDFS-11246.004.patch, HDFS-11246.005.patch, > HDFS-11246.006.patch, HDFS-11246.007.patch, HDFS-11246.008.patch, > HDFS-11246.009.patch, HDFS-11246.010.patch, HDFS-11246.011.patch > > > {code} > readLock(); > boolean success = true; > ContentSummary cs; > try { > checkOperation(OperationCategory.READ); > cs = FSDirStatAndListingOp.getContentSummary(dir, src); > } catch (AccessControlException ace) { > success = false; > logAuditEvent(success, operationName, src); > throw ace; > } finally { > readUnlock(operationName); > } > {code} > It would be nice to have audit logging outside the lock esp. in scenarios > where applications hammer a given operation several times. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1930) Test Topology Aware Job scheduling with Ozone Topology
[ https://issues.apache.org/jira/browse/HDDS-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1930: - Description: My initial results with Terasort does not seem to report the counter properly. Most of the requests are handled by rack local but no node local. This ticket is opened to add more system testing to validate the feature. Total Allocated Containers: 3778 Each table cell represents the number of NodeLocal/RackLocal/OffSwitch containers satisfied by NodeLocal/RackLocal/OffSwitch resource requests. Node Local Request Rack Local Request Off Switch Request Num Node Local Containers (satisfied by)0 Num Rack Local Containers (satisfied by)0 3648 Num Off Switch Containers (satisfied by)0 96 34 was: My initial results with Terasort does not seem to report the counter properly. Most of the requests are handled by rack locl but no node local. This ticket is opened to add more system testing to validate the feature. Total Allocated Containers: 3778 Each table cell represents the number of NodeLocal/RackLocal/OffSwitch containers satisfied by NodeLocal/RackLocal/OffSwitch resource requests. Node Local Request Rack Local Request Off Switch Request Num Node Local Containers (satisfied by)0 Num Rack Local Containers (satisfied by)0 3648 Num Off Switch Containers (satisfied by)0 96 34 > Test Topology Aware Job scheduling with Ozone Topology > -- > > Key: HDDS-1930 > URL: https://issues.apache.org/jira/browse/HDDS-1930 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Priority: Major > > My initial results with Terasort does not seem to report the counter > properly. Most of the requests are handled by rack local but no node local. > This ticket is opened to add more system testing to validate the feature. > Total Allocated Containers: 3778 > Each table cell represents the number of NodeLocal/RackLocal/OffSwitch > containers satisfied by NodeLocal/RackLocal/OffSwitch resource requests. > Node Local RequestRack Local Request Off Switch Request > Num Node Local Containers (satisfied by) 0 > Num Rack Local Containers (satisfied by) 0 3648 > Num Off Switch Containers (satisfied by) 0 96 34 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2031) Choose datanode for pipeline creation based on network topology
Sammi Chen created HDDS-2031: Summary: Choose datanode for pipeline creation based on network topology Key: HDDS-2031 URL: https://issues.apache.org/jira/browse/HDDS-2031 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Sammi Chen Assignee: Sammi Chen There are regular heartbeats between datanodes in a pipeline. Choose datanodes based on network topology, to guarantee data reliability and reduce heartbeat network traffic latency. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14568) setStoragePolicy should check quota and update consume on storage type quota.
[ https://issues.apache.org/jira/browse/HDFS-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinglun updated HDFS-14568: --- Attachment: HDFS-14568.004.patch Status: Patch Available (was: Open) Re-upload patch-003 as patch-004 to trigger jenkins. > setStoragePolicy should check quota and update consume on storage type quota. > - > > Key: HDFS-14568 > URL: https://issues.apache.org/jira/browse/HDFS-14568 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.1.0 >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HDFS-14568-001.patch, HDFS-14568-unit-test.patch, > HDFS-14568.002.patch, HDFS-14568.003.patch, HDFS-14568.004.patch > > > The quota and consume of the file's ancestors are not handled when the > storage policy of the file is changed. For example: > 1. Set quota StorageType.SSD fileSpace-1 to the parent dir; > 2. Create a file size of fileSpace with storage policy \{DISK,DISK,DISK} > under it; > 3. Change the storage policy of the file to ALLSSD_STORAGE_POLICY_NAME and > expect a QuotaByStorageTypeExceededException. > Because the quota and consume is not handled, the expected exception is not > threw out. > > There are 3 reasons why we should handle the consume and the quota. > 1. Replication uses the new storage policy. Considering a file with BlockType > CONTIGUOUS. It's replication factor is 3 and it's storage policy is "HOT". > Now we change the policy to "ONE_SSD". If a DN goes down and the file needs > replication, the NN will choose storages in policy "ONE_SSD" and replicate > the block to a SSD storage. > 2. We acturally have a cluster storaging both HOT and COLD data. We have a > backgroud process searching all the files to find those that are not accessed > for a period of time. Then we set them to COLD and start a mover to move the > replicas. After moving, all the replicas are consistent with the storage > policy. > 3. The NameNode manages the global state of the cluster. If there is any > inconsistent situation, such as the replicas doesn't match the storage policy > of the file, we should take the NameNode as the standard and make the cluster > to match the NameNode. The block replication is a good example of the rule. > When we count the consume of a file(CONTIGUOUS), we multiply the replication > factor with the file's length, no matter the file is under replicated or > excessed. So does the storage type quota and consume. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14568) setStoragePolicy should check quota and update consume on storage type quota.
[ https://issues.apache.org/jira/browse/HDFS-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinglun updated HDFS-14568: --- Status: Open (was: Patch Available) > setStoragePolicy should check quota and update consume on storage type quota. > - > > Key: HDFS-14568 > URL: https://issues.apache.org/jira/browse/HDFS-14568 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.1.0 >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HDFS-14568-001.patch, HDFS-14568-unit-test.patch, > HDFS-14568.002.patch, HDFS-14568.003.patch > > > The quota and consume of the file's ancestors are not handled when the > storage policy of the file is changed. For example: > 1. Set quota StorageType.SSD fileSpace-1 to the parent dir; > 2. Create a file size of fileSpace with storage policy \{DISK,DISK,DISK} > under it; > 3. Change the storage policy of the file to ALLSSD_STORAGE_POLICY_NAME and > expect a QuotaByStorageTypeExceededException. > Because the quota and consume is not handled, the expected exception is not > threw out. > > There are 3 reasons why we should handle the consume and the quota. > 1. Replication uses the new storage policy. Considering a file with BlockType > CONTIGUOUS. It's replication factor is 3 and it's storage policy is "HOT". > Now we change the policy to "ONE_SSD". If a DN goes down and the file needs > replication, the NN will choose storages in policy "ONE_SSD" and replicate > the block to a SSD storage. > 2. We acturally have a cluster storaging both HOT and COLD data. We have a > backgroud process searching all the files to find those that are not accessed > for a period of time. Then we set them to COLD and start a mover to move the > replicas. After moving, all the replicas are consistent with the storage > policy. > 3. The NameNode manages the global state of the cluster. If there is any > inconsistent situation, such as the replicas doesn't match the storage policy > of the file, we should take the NameNode as the standard and make the cluster > to match the NameNode. The block replication is a good example of the rule. > When we count the consume of a file(CONTIGUOUS), we multiply the replication > factor with the file's length, no matter the file is under replicated or > excessed. So does the storage type quota and consume. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2013) Add flag gdprEnabled for BucketInfo in OzoneManager proto
[ https://issues.apache.org/jira/browse/HDDS-2013?focusedWorklogId=300985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300985 ] ASF GitHub Bot logged work on HDDS-2013: Author: ASF GitHub Bot Created on: 26/Aug/19 02:45 Start Date: 26/Aug/19 02:45 Worklog Time Spent: 10m Work Description: dineshchitlangia commented on issue #1345: HDDS-2013. Add flag gdprEnabled for BucketInfo in OzoneManager proto URL: https://github.com/apache/hadoop/pull/1345#issuecomment-524695251 Thanks @bharatviswa504 for review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 300985) Time Spent: 0.5h (was: 20m) > Add flag gdprEnabled for BucketInfo in OzoneManager proto > - > > Key: HDDS-2013 > URL: https://issues.apache.org/jira/browse/HDDS-2013 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Dinesh Chitlangia >Assignee: Dinesh Chitlangia >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14633) The StorageType quota and consume in QuotaFeature is not handled when rename and setStoragePolicy etc.
[ https://issues.apache.org/jira/browse/HDFS-14633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinglun updated HDFS-14633: --- Attachment: HDFS-14633.007.patch > The StorageType quota and consume in QuotaFeature is not handled when rename > and setStoragePolicy etc. > --- > > Key: HDFS-14633 > URL: https://issues.apache.org/jira/browse/HDFS-14633 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HDFS-14633-testcases-explanation, HDFS-14633.002.patch, > HDFS-14633.003.patch, HDFS-14633.004.patch, HDFS-14633.005.patch, > HDFS-14633.006.patch, HDFS-14633.007.patch > > > The NameNode manages the global state of the cluster. We should always take > NameNode's records as the sole criterion because no matter what inconsistent > is the NameNode should finally make everything right based on it's records. > Let's call it rule NSC(NameNode is the Sole Criterion). That means when we do > all quota related rpcs, we do the quota check according to NameNode's records > regardless of any inconsistent situation, such as the replicas doesn't match > the storage policy of the file, or the replicas count doesn't match the > file's set replica. > The work SPS deals with the wrongly palced replicas. There is a thought > about putting off the consume update of the DirectoryQuota until all replicas > are re-placed by SPS. I can't agree that because if we do so we will abandon > letting the NameNode's records to be the sole criterion. The block > replication is a good example of the rule NSC. When we count the consume of a > file(CONTIGUOUS), we multiply the replication factor with the file's length, > no matter the blocks are under replicated or excessed. We should do the same > thing for the storage type quota. > Another concern is the change will let setStoragePolicy throw > QuotaByStorageTypeExceededException which it doesn't before. I don't think > it's a big problem since the setStoragePolicy already throws IOException. Or > we can wrap the QuotaByStorageTypeExceededException in an IOException, but I > won't recommend that because it's ugly. > To make storage type consume follow the rule NSC, we need change > rename(moving a file with storage policy inherited from it's parent) and > setStoragePolicy. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14633) The StorageType quota and consume in QuotaFeature is not handled when rename and setStoragePolicy etc.
[ https://issues.apache.org/jira/browse/HDFS-14633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915427#comment-16915427 ] Jinglun commented on HDFS-14633: Upload patch-007, fix unit test failures. > The StorageType quota and consume in QuotaFeature is not handled when rename > and setStoragePolicy etc. > --- > > Key: HDFS-14633 > URL: https://issues.apache.org/jira/browse/HDFS-14633 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HDFS-14633-testcases-explanation, HDFS-14633.002.patch, > HDFS-14633.003.patch, HDFS-14633.004.patch, HDFS-14633.005.patch, > HDFS-14633.006.patch, HDFS-14633.007.patch > > > The NameNode manages the global state of the cluster. We should always take > NameNode's records as the sole criterion because no matter what inconsistent > is the NameNode should finally make everything right based on it's records. > Let's call it rule NSC(NameNode is the Sole Criterion). That means when we do > all quota related rpcs, we do the quota check according to NameNode's records > regardless of any inconsistent situation, such as the replicas doesn't match > the storage policy of the file, or the replicas count doesn't match the > file's set replica. > The work SPS deals with the wrongly palced replicas. There is a thought > about putting off the consume update of the DirectoryQuota until all replicas > are re-placed by SPS. I can't agree that because if we do so we will abandon > letting the NameNode's records to be the sole criterion. The block > replication is a good example of the rule NSC. When we count the consume of a > file(CONTIGUOUS), we multiply the replication factor with the file's length, > no matter the blocks are under replicated or excessed. We should do the same > thing for the storage type quota. > Another concern is the change will let setStoragePolicy throw > QuotaByStorageTypeExceededException which it doesn't before. I don't think > it's a big problem since the setStoragePolicy already throws IOException. Or > we can wrap the QuotaByStorageTypeExceededException in an IOException, but I > won't recommend that because it's ugly. > To make storage type consume follow the rule NSC, we need change > rename(moving a file with storage policy inherited from it's parent) and > setStoragePolicy. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2013) Add flag gdprEnabled for BucketInfo in OzoneManager proto
[ https://issues.apache.org/jira/browse/HDDS-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Chitlangia updated HDDS-2013: Status: Patch Available (was: Open) > Add flag gdprEnabled for BucketInfo in OzoneManager proto > - > > Key: HDDS-2013 > URL: https://issues.apache.org/jira/browse/HDDS-2013 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Dinesh Chitlangia >Assignee: Dinesh Chitlangia >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guojh updated HDFS-14768: - Summary: In some cases, erasure blocks are corruption when they are reconstruct. (was: In some cases, erasure blocks are corruption when they are rebuilt.) > In some cases, erasure blocks are corruption when they are reconstruct. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs)); > // Ensure decommissioned datanode is not automatically shutdown > DFSClient client = getDfsClient(cluster.getNameNode(0), conf); > assertEquals("All datanodes must be alive", numDNs, > client.datanodeReport(DatanodeReportType.LIVE).length); > FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); > Assert.assertTrue("Checksum mismatches!", > fileChecksum1.equals(fileChecksum2)); > StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes, > null, blockGroupSize); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915412#comment-16915412 ] Xudong Cao edited comment on HDFS-14646 at 8/26/19 2:08 AM: Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but merged multi-SBN feature. We have about 20,000 machines (divided into dozens of HDFS clusters). Not long ago, we frequently encountered the problem described by this jira. and after this patch was merged, the errors are no longer reported. was (Author: xudongcao): Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but merged multi-SBN feature. We have about 20,000 machines (divided into dozens of HDFS clusters). Not long ago, we frequently encountered the problem described by the jira. and after this patch was merged, the errors are no longer reported. > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at >
[jira] [Comment Edited] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915412#comment-16915412 ] Xudong Cao edited comment on HDFS-14646 at 8/26/19 2:07 AM: Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but merged multi-SBN feature. We have about 20,000 machines (divided into dozens of HDFS clusters). Not long ago, we frequently encountered the problem described by the jira. and after this patch was merged, the errors are no longer reported. was (Author: xudongcao): Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but merged multi-SBN feature. We have about 20,000 machines (divided into dozens of clusters). Not long ago, we frequently encountered the problem described by the jira. and after this patch was merged, the errors are no longer reported. > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at >
[jira] [Comment Edited] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915412#comment-16915412 ] Xudong Cao edited comment on HDFS-14646 at 8/26/19 1:58 AM: Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but merged multi-SBN feature. We have about 20,000 machines (divided into dozens of clusters). Not long ago, we frequently encountered the problem described by the jira. and after this patch was merged, the errors are no longer reported. was (Author: xudongcao): Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but merged multi-SBN feature. We have about 20,000 machines (divided into dozens of clusters), not long ago, we frequently encountered the problem described by the jira. and after this patch was merged, the errors are no longer reported. > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) >
[jira] [Commented] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915412#comment-16915412 ] Xudong Cao commented on HDFS-14646: --- Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but merged multi-SBN feature. We have about 20,000 machines (divided into dozens of clusters), not long ago, we frequently encountered the problem described by the jira. and after this patch was merged, the errors are no longer reported. > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN,
[jira] [Commented] (HDFS-14745) Backport HDFS persistent memory read cache support to branch-3.1
[ https://issues.apache.org/jira/browse/HDFS-14745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915404#comment-16915404 ] Feilong He commented on HDFS-14745: --- Thanks [~rakeshr] for your comment. I have renamed the patch. Yes, backport for other branch-3.x will be done one by one. > Backport HDFS persistent memory read cache support to branch-3.1 > > > Key: HDFS-14745 > URL: https://issues.apache.org/jira/browse/HDFS-14745 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Labels: cache, datanode > Fix For: 3.3.0 > > Attachments: HDFS-14745-branch-3.1-000.patch > > > We are proposing to backport the patches for HDFS-13762, HDFS persistent > memory read cache support, to branch-3.1. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14745) Backport HDFS persistent memory read cache support to branch-3.1
[ https://issues.apache.org/jira/browse/HDFS-14745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feilong He updated HDFS-14745: -- Attachment: (was: HDFS-14745.002.patch) > Backport HDFS persistent memory read cache support to branch-3.1 > > > Key: HDFS-14745 > URL: https://issues.apache.org/jira/browse/HDFS-14745 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Labels: cache, datanode > Fix For: 3.3.0 > > Attachments: HDFS-14745-branch-3.1-000.patch > > > We are proposing to backport the patches for HDFS-13762, HDFS persistent > memory read cache support, to branch-3.1. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14745) Backport HDFS persistent memory read cache support to branch-3.1
[ https://issues.apache.org/jira/browse/HDFS-14745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feilong He updated HDFS-14745: -- Attachment: (was: HDFS-14745-branch-3.1.002.patch) > Backport HDFS persistent memory read cache support to branch-3.1 > > > Key: HDFS-14745 > URL: https://issues.apache.org/jira/browse/HDFS-14745 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Labels: cache, datanode > Fix For: 3.3.0 > > Attachments: HDFS-14745-branch-3.1-000.patch > > > We are proposing to backport the patches for HDFS-13762, HDFS persistent > memory read cache support, to branch-3.1. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14745) Backport HDFS persistent memory read cache support to branch-3.1
[ https://issues.apache.org/jira/browse/HDFS-14745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feilong He updated HDFS-14745: -- Attachment: (was: HDFS-14745.000.patch) > Backport HDFS persistent memory read cache support to branch-3.1 > > > Key: HDFS-14745 > URL: https://issues.apache.org/jira/browse/HDFS-14745 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Labels: cache, datanode > Fix For: 3.3.0 > > Attachments: HDFS-14745-branch-3.1-000.patch > > > We are proposing to backport the patches for HDFS-13762, HDFS persistent > memory read cache support, to branch-3.1. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14745) Backport HDFS persistent memory read cache support to branch-3.1
[ https://issues.apache.org/jira/browse/HDFS-14745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feilong He updated HDFS-14745: -- Attachment: HDFS-14745-branch-3.1-000.patch > Backport HDFS persistent memory read cache support to branch-3.1 > > > Key: HDFS-14745 > URL: https://issues.apache.org/jira/browse/HDFS-14745 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Labels: cache, datanode > Fix For: 3.3.0 > > Attachments: HDFS-14745-branch-3.1-000.patch > > > We are proposing to backport the patches for HDFS-13762, HDFS persistent > memory read cache support, to branch-3.1. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14745) Backport HDFS persistent memory read cache support to branch-3.1
[ https://issues.apache.org/jira/browse/HDFS-14745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feilong He updated HDFS-14745: -- Attachment: (was: HDFS-14745.001.patch) > Backport HDFS persistent memory read cache support to branch-3.1 > > > Key: HDFS-14745 > URL: https://issues.apache.org/jira/browse/HDFS-14745 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Labels: cache, datanode > Fix For: 3.3.0 > > Attachments: HDFS-14745-branch-3.1-000.patch > > > We are proposing to backport the patches for HDFS-13762, HDFS persistent > memory read cache support, to branch-3.1. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14745) Backport HDFS persistent memory read cache support to branch-3.1
[ https://issues.apache.org/jira/browse/HDFS-14745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feilong He updated HDFS-14745: -- Attachment: HDFS-14745-branch-3.1.002.patch > Backport HDFS persistent memory read cache support to branch-3.1 > > > Key: HDFS-14745 > URL: https://issues.apache.org/jira/browse/HDFS-14745 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Labels: cache, datanode > Fix For: 3.3.0 > > Attachments: HDFS-14745-branch-3.1.002.patch, HDFS-14745.000.patch, > HDFS-14745.001.patch, HDFS-14745.002.patch > > > We are proposing to backport the patches for HDFS-13762, HDFS persistent > memory read cache support, to branch-3.1. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13220) Change lastCheckpointTime to use fsimage mostRecentCheckpointTime
[ https://issues.apache.org/jira/browse/HDFS-13220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915401#comment-16915401 ] Hadoop QA commented on HDFS-13220: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 7s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 54s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 29s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 11s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 36s{color} | {color:red} The patch generated 7 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}142m 48s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestFileCreation | | | hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup | | | hadoop.hdfs.TestBlockTokenWrappingQOP | | | hadoop.hdfs.TestBlocksScheduledCounter | | | hadoop.hdfs.TestMaintenanceState | | | hadoop.hdfs.TestDFSStripedInputStream | | | hadoop.hdfs.server.balancer.TestBalancerService | | | hadoop.hdfs.server.datanode.TestDataNodeInitStorage | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.server.datanode.TestBPOfferService | | | hadoop.hdfs.server.datanode.TestDataNodeMetrics | | | hadoop.hdfs.TestErasureCodingExerciseAPIs | | | hadoop.hdfs.TestPread | | | hadoop.hdfs.TestErasureCodingPolicies | | | hadoop.hdfs.TestErasureCodingPoliciesWithRandomECPolicy | | | hadoop.hdfs.TestHDFSFileSystemContract | | | hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer | | | hadoop.hdfs.TestFileChecksumCompositeCrc | | | hadoop.hdfs.server.balancer.TestBalancerRPCDelay | | | hadoop.hdfs.TestQuota | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1
[jira] [Commented] (HDFS-14768) In some cases, erasure blocks are corruption when they are rebuilt.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915395#comment-16915395 ] guojh commented on HDFS-14768: -- [~drankye] Can we talk about this issue? > In some cases, erasure blocks are corruption when they are rebuilt. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs)); > // Ensure decommissioned datanode is not automatically shutdown > DFSClient client = getDfsClient(cluster.getNameNode(0), conf); > assertEquals("All datanodes must be alive", numDNs, > client.datanodeReport(DatanodeReportType.LIVE).length); > FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); > Assert.assertTrue("Checksum mismatches!", > fileChecksum1.equals(fileChecksum2)); > StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes, > null, blockGroupSize); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14768) In some cases, erasure blocks are corruption when they are rebuilt.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guojh updated HDFS-14768: - Attachment: (was: HDFS-14768.000.patch) > In some cases, erasure blocks are corruption when they are rebuilt. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs)); > // Ensure decommissioned datanode is not automatically shutdown > DFSClient client = getDfsClient(cluster.getNameNode(0), conf); > assertEquals("All datanodes must be alive", numDNs, > client.datanodeReport(DatanodeReportType.LIVE).length); > FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); > Assert.assertTrue("Checksum mismatches!", > fileChecksum1.equals(fileChecksum2)); > StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes, > null, blockGroupSize); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14768) In some cases, erasure blocks are corruption when they are rebuilt.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guojh updated HDFS-14768: - Attachment: HDFS-14768.000.patch Fix Version/s: 3.3.0 Target Version/s: 3.3.0 Labels: patch (was: ) Status: Patch Available (was: Open) fixed the block swell and build bug > In some cases, erasure blocks are corruption when they are rebuilt. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch, HDFS-14768.000.patch > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs)); > // Ensure decommissioned datanode is not automatically shutdown > DFSClient client = getDfsClient(cluster.getNameNode(0), conf); > assertEquals("All datanodes must be alive", numDNs, > client.datanodeReport(DatanodeReportType.LIVE).length); > FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); > Assert.assertTrue("Checksum mismatches!", > fileChecksum1.equals(fileChecksum2)); > StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes, > null, blockGroupSize); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail:
[jira] [Commented] (HDFS-14771) Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing sub-sections to the fsimage index)
[ https://issues.apache.org/jira/browse/HDFS-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915364#comment-16915364 ] Hadoop QA commented on HDFS-14771: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 47s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 19s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 34s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 535 unchanged - 3 fixed = 538 total (was 538) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 31s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 98m 9s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:da675796017 | | JIRA Issue | HDFS-14771 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12978529/HDFS-14771.branch-2.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 0d8425cc7cb9 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC
[jira] [Commented] (HDFS-11246) FSNameSystem#logAuditEvent should be called outside the read or write locks
[ https://issues.apache.org/jira/browse/HDFS-11246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915355#comment-16915355 ] Hadoop QA commented on HDFS-11246: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 26s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 23s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 164 unchanged - 2 fixed = 164 total (was 166) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 5s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}161m 45s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestEncryptionZonesWithKMS | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy | | | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy | | | hadoop.hdfs.tools.TestDFSAdmin | | | hadoop.hdfs.server.namenode.TestNamenodeRetryCache | | | hadoop.hdfs.tools.TestECAdmin | | | hadoop.hdfs.TestFileChecksum | | | hadoop.hdfs.server.balancer.TestBalancerRPCDelay | | | hadoop.hdfs.TestMultipleNNPortQOP | | | hadoop.hdfs.server.namenode.TestFSEditLogLoader | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HDFS-11246 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12978525/HDFS-11246.011.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 177306f0b57a 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HDFS-12904) Add DataTransferThrottler to the Datanode transfers
[ https://issues.apache.org/jira/browse/HDFS-12904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915350#comment-16915350 ] Hadoop QA commented on HDFS-12904: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 3s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 45s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}157m 20s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestSnapshot | | | hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRackFaultTolerant | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HDFS-12904 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12978524/HDFS-12904.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 88d2a2940b18 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d2225c8 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/27666/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/27666/testReport/ | | Max. process+thread count | 3945 (vs. ulimit of 5500) | | modules | C:
[jira] [Resolved] (HDFS-14776) Log more detail for slow RPC
[ https://issues.apache.org/jira/browse/HDFS-14776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Zhang resolved HDFS-14776. --- Resolution: Abandoned > Log more detail for slow RPC > > > Key: HDFS-14776 > URL: https://issues.apache.org/jira/browse/HDFS-14776 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Priority: Major > > Current implementation only log process time > {code:java} > if ((rpcMetrics.getProcessingSampleCount() > minSampleSize) && > (processingTime > threeSigma)) { > LOG.warn("Slow RPC : {} took {} {} to process from client {}", > methodName, processingTime, RpcMetrics.TIMEUNIT, call); > rpcMetrics.incrSlowRpc(); > } > {code} > We need to log more details to help us locate the problem (eg. how long it > take to request lock, holding lock, or do other things) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14776) Log more detail for slow RPC
Chen Zhang created HDFS-14776: - Summary: Log more detail for slow RPC Key: HDFS-14776 URL: https://issues.apache.org/jira/browse/HDFS-14776 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chen Zhang Current implementation only log process time {code:java} if ((rpcMetrics.getProcessingSampleCount() > minSampleSize) && (processingTime > threeSigma)) { LOG.warn("Slow RPC : {} took {} {} to process from client {}", methodName, processingTime, RpcMetrics.TIMEUNIT, call); rpcMetrics.incrSlowRpc(); } {code} We need to log more details to help us locate the problem (eg. how long it take to request lock, holding lock, or do other things) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14775) Add Timestamp for longest FSN write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915284#comment-16915284 ] Chen Zhang commented on HDFS-14775: --- cc [~xkrogen] and [~linyiqun]. > Add Timestamp for longest FSN write/read lock held log > -- > > Key: HDFS-14775 > URL: https://issues.apache.org/jira/browse/HDFS-14775 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > > HDFS-13946 improved the log for longest read/write lock held time, it's very > useful improvement. > In some condition, we need to locate the detailed call information(user, ip, > path, etc.) for longest lock holder, but the default throttle interval(10s) > is too long to find the corresponding audit log. I think we should add the > timestamp for the {{longestWriteLockHeldStackTrace}} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14775) Add Timestamp for longest FSN write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Zhang updated HDFS-14775: -- Description: HDFS-13946 improved the log for longest read/write lock held time, it's very useful improvement. In some condition, we need to locate the detailed call information(user, ip, path, etc.) for longest lock holder, but the default throttle interval(10s) is too long to find the corresponding audit log. I think we should add the timestamp for the {{longestWriteLockHeldStackTrace}} was: HDFS-13946 improved the log for longest read/write lock held time, it's very useful improvement. In some condition, we need to locate the detailed call information(user, ip, path, etc.) for longest lock holder, but the default throttle interval(10s) is too long to find the corresponding audit log. We need to add the timestamp for the {{longestWriteLockHeldStackTrace}} > Add Timestamp for longest FSN write/read lock held log > -- > > Key: HDFS-14775 > URL: https://issues.apache.org/jira/browse/HDFS-14775 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > > HDFS-13946 improved the log for longest read/write lock held time, it's very > useful improvement. > In some condition, we need to locate the detailed call information(user, ip, > path, etc.) for longest lock holder, but the default throttle interval(10s) > is too long to find the corresponding audit log. I think we should add the > timestamp for the {{longestWriteLockHeldStackTrace}} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14775) Add Timestamp for longest FSN write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Zhang reassigned HDFS-14775: - Assignee: Chen Zhang > Add Timestamp for longest FSN write/read lock held log > -- > > Key: HDFS-14775 > URL: https://issues.apache.org/jira/browse/HDFS-14775 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > > HDFS-13946 improved the log for longest read/write lock held time, it's very > useful improvement. > In some condition, we need to locate the detailed call information(user, ip, > path, etc.) for longest lock holder, but the default throttle interval(10s) > is too long to find the corresponding audit log. We need to add the timestamp > for the {{longestWriteLockHeldStackTrace}} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14775) Add Timestamp for longest FSN write/read lock held log
Chen Zhang created HDFS-14775: - Summary: Add Timestamp for longest FSN write/read lock held log Key: HDFS-14775 URL: https://issues.apache.org/jira/browse/HDFS-14775 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chen Zhang HDFS-13946 improved the log for longest read/write lock held time, it's very useful improvement. In some condition, we need to locate the detailed call information(user, ip, path, etc.) for longest lock holder, but the default throttle interval(10s) is too long to find the corresponding audit log. We need to add the timestamp for the {{longestWriteLockHeldStackTrace}} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model
[ https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915281#comment-16915281 ] Lisheng Sun commented on HDFS-14648: Thank [~zhangchen] for good suggestion. I will upload the a design doc which describes this patch in detail later. > DeadNodeDetector basic model > > > Key: HDFS-14648 > URL: https://issues.apache.org/jira/browse/HDFS-14648 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, > HDFS-14648.003.patch, HDFS-14648.004.patch > > > This Jira constructs DeadNodeDetector state machine model. The function it > implements as follow: > # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode > of the block is found to inaccessible, put the DataNode into > DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when > DataNode is not accessible, it is likely that the replica has been removed > from the DataNode.Therefore, it needs to be confirmed by re-probing and > requires a higher priority processing. > # DeadNodeDetector will periodically detect the Node in > DeadNodeDetector#deadnode, If the access is successful, the Node will be > moved from DeadNodeDetector#deadnode. Continuous detection of the dead node > is necessary. The DataNode need rejoin the cluster due to a service > restart/machine repair. The DataNode may be permanently excluded if there is > no added probe mechanism. > # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using > DataNode. When the DFSInputstream is closed, it will be moved from > DeadNodeDetector#dfsInputStreamNodes. > # Every time get the global deanode, update the DeadNodeDetector#deadnode. > The new DeadNodeDetector#deadnode Equals to the intersection of the old > DeadNodeDetector#deadnode and the Datanodes are by > DeadNodeDetector#dfsInputStreamNodes. > # DeadNodeDetector has a switch that is turned off by default. When it is > closed, each DFSInputstream still uses its own local deadnode. > # This feature has been used in the XIAOMI production environment for a long > time. Reduced hbase read stuck, due to node hangs. > # Just open the DeadNodeDetector switch and you can use it directly. No > other restrictions. Don't want to use DeadNodeDetector, just close it. > {code:java} > if (sharedDeadNodesEnabled && deadNodeDetector == null) { > deadNodeDetector = new DeadNodeDetector(name); > deadNodeDetectorThr = new Daemon(deadNodeDetector); > deadNodeDetectorThr.start(); > } > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14771) Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing sub-sections to the fsimage index)
[ https://issues.apache.org/jira/browse/HDFS-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915273#comment-16915273 ] He Xiaoqiao commented on HDFS-14771: Thanks [~linyiqun] for your feedback. [^HDFS-14771.branch-2.001.patch] this demo patch is totally same compared with patch merged to trunk and no other changes not. {quote}I prefer to convert this JIRA to the independent JIRA and add a link to HDFS-14617 since HDFS-14617 has been done and closed.{quote} +1, consider this feature is not ready completely, such as OIV support mentioned by HDFS-14617 and so on. I think we could need another Über-jira? Thanks again. > Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing > sub-sections to the fsimage index) > > > Key: HDFS-14771 > URL: https://issues.apache.org/jira/browse/HDFS-14771 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.10.0 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14771.branch-2.001.patch > > > This JIRA aims to backport HDFS-14617 to branch-2: fsimage load time by > writing sub-sections to the fsimage index. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model
[ https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915262#comment-16915262 ] He Xiaoqiao commented on HDFS-14648: Thanks [~leosun08] for your contributions. I try to learn [^HDFS-14648.004.patch] with some minor comments, 1. Some codestyles reported by Jenkins above, Please take a look. 2. It is better to locate same module configuration together in hdfs-default.xml to find conveniently relevant items, just suggest to define `dfs.client.deadnode.detect.enabled` together with other `dfs.client.*` items. 3. Some method name is open to different interpretations. for instance, `DeadNodeDetector#removeNodeFromDetect`, my first impression is that remove node from `deadNodes` shared by all DFSInputStreams in the same DFSClient, however it just mean remove this node from `localNodes` which just visible to some one DFSInputStream actually. right? 4. about DeadNodeDetector, I do not get why set different STATEs for `DeadNodeDetector`, will be used by the following implements? 5. about DeadNodeDetector#run, does it need to catch InterruptedException out of while loop and return? 6. Some constant such as '5000'/'1' but not explain why, I think we should define this constant at the begin of Class and add some annotation. 7. IIUC, some nodes are detected as dead, it should not be in next pipeline, right? but I do not see anywhere to add this deadNodes to `excludeNodes`. Thanks [~leosun08] again. > DeadNodeDetector basic model > > > Key: HDFS-14648 > URL: https://issues.apache.org/jira/browse/HDFS-14648 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, > HDFS-14648.003.patch, HDFS-14648.004.patch > > > This Jira constructs DeadNodeDetector state machine model. The function it > implements as follow: > # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode > of the block is found to inaccessible, put the DataNode into > DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when > DataNode is not accessible, it is likely that the replica has been removed > from the DataNode.Therefore, it needs to be confirmed by re-probing and > requires a higher priority processing. > # DeadNodeDetector will periodically detect the Node in > DeadNodeDetector#deadnode, If the access is successful, the Node will be > moved from DeadNodeDetector#deadnode. Continuous detection of the dead node > is necessary. The DataNode need rejoin the cluster due to a service > restart/machine repair. The DataNode may be permanently excluded if there is > no added probe mechanism. > # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using > DataNode. When the DFSInputstream is closed, it will be moved from > DeadNodeDetector#dfsInputStreamNodes. > # Every time get the global deanode, update the DeadNodeDetector#deadnode. > The new DeadNodeDetector#deadnode Equals to the intersection of the old > DeadNodeDetector#deadnode and the Datanodes are by > DeadNodeDetector#dfsInputStreamNodes. > # DeadNodeDetector has a switch that is turned off by default. When it is > closed, each DFSInputstream still uses its own local deadnode. > # This feature has been used in the XIAOMI production environment for a long > time. Reduced hbase read stuck, due to node hangs. > # Just open the DeadNodeDetector switch and you can use it directly. No > other restrictions. Don't want to use DeadNodeDetector, just close it. > {code:java} > if (sharedDeadNodesEnabled && deadNodeDetector == null) { > deadNodeDetector = new DeadNodeDetector(name); > deadNodeDetectorThr = new Daemon(deadNodeDetector); > deadNodeDetectorThr.start(); > } > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14771) Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing sub-sections to the fsimage index)
[ https://issues.apache.org/jira/browse/HDFS-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915252#comment-16915252 ] Yiqun Lin commented on HDFS-14771: -- Thanks for backporting this to branch-2, [~hexiaoqiao]. It's a great improvement. I take a deep look for the logic in trunk but have two comments before reviewing detailed for branch-2: * For the branch-2 patch, any other major change compared with trunk patch? * I prefer to convert this JIRA to the independent JIRA and add a link to HDFS-14617 since HDFS-14617 has been done and closed. > Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing > sub-sections to the fsimage index) > > > Key: HDFS-14771 > URL: https://issues.apache.org/jira/browse/HDFS-14771 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.10.0 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14771.branch-2.001.patch > > > This JIRA aims to backport HDFS-14617 to branch-2: fsimage load time by > writing sub-sections to the fsimage index. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13220) Change lastCheckpointTime to use fsimage mostRecentCheckpointTime
[ https://issues.apache.org/jira/browse/HDFS-13220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hemanthboyina updated HDFS-13220: - Attachment: HDFS-13220.patch Status: Patch Available (was: Open) > Change lastCheckpointTime to use fsimage mostRecentCheckpointTime > - > > Key: HDFS-13220 > URL: https://issues.apache.org/jira/browse/HDFS-13220 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Nie Gus >Assignee: hemanthboyina >Priority: Minor > Attachments: HDFS-13220.patch > > > we found the our standby nn did not do the checkpoint, and the checkpoint > alert keep alert, we use the jmx last checkpoint time and > dfs.namenode.checkpoint.period to do the monitor check. > > then check the code and log, found the standby NN are using monotonicNow, not > fsimage checkpoint time, so when Standby NN restart or switch to Active, then > the > lastCheckpointTime in doWork will be reset. so there is risk standby nn > restart or stand active switch will cause the checkpoint delay. > StandbyCheckpointer.java > {code:java} > private void doWork() { > final long checkPeriod = 1000 * checkpointConf.getCheckPeriod(); > // Reset checkpoint time so that we don't always checkpoint > // on startup. > lastCheckpointTime = monotonicNow(); > while (shouldRun) { > boolean needRollbackCheckpoint = namesystem.isNeedRollbackFsImage(); > if (!needRollbackCheckpoint) { > try { > Thread.sleep(checkPeriod); > } catch (InterruptedException ie) { > } > if (!shouldRun) { > break; > } > } > try { > // We may have lost our ticket since last checkpoint, log in again, just in > case > if (UserGroupInformation.isSecurityEnabled()) { > UserGroupInformation.getCurrentUser().checkTGTAndReloginFromKeytab(); > } > final long now = monotonicNow(); > final long uncheckpointed = countUncheckpointedTxns(); > final long secsSinceLast = (now - lastCheckpointTime) / 1000; > boolean needCheckpoint = needRollbackCheckpoint; > if (needCheckpoint) { > LOG.info("Triggering a rollback fsimage for rolling upgrade."); > } else if (uncheckpointed >= checkpointConf.getTxnCount()) { > LOG.info("Triggering checkpoint because there have been " + > uncheckpointed + " txns since the last checkpoint, which " + > "exceeds the configured threshold " + > checkpointConf.getTxnCount()); > needCheckpoint = true; > } else if (secsSinceLast >= checkpointConf.getPeriod()) { > LOG.info("Triggering checkpoint because it has been " + > secsSinceLast + " seconds since the last checkpoint, which " + > "exceeds the configured interval " + checkpointConf.getPeriod()); > needCheckpoint = true; > } > synchronized (cancelLock) { > if (now < preventCheckpointsUntil) { > LOG.info("But skipping this checkpoint since we are about to failover!"); > canceledCount++; > continue; > } > assert canceler == null; > canceler = new Canceler(); > } > if (needCheckpoint) { > doCheckpoint(); > // reset needRollbackCheckpoint to false only when we finish a ckpt > // for rollback image > if (needRollbackCheckpoint > && namesystem.getFSImage().hasRollbackFSImage()) { > namesystem.setCreatedRollbackImages(true); > namesystem.setNeedRollbackFsImage(false); > } > lastCheckpointTime = now; > } > } catch (SaveNamespaceCancelledException ce) { > LOG.info("Checkpoint was cancelled: " + ce.getMessage()); > canceledCount++; > } catch (InterruptedException ie) { > LOG.info("Interrupted during checkpointing", ie); > // Probably requested shutdown. > continue; > } catch (Throwable t) { > LOG.error("Exception in doCheckpoint", t); > } finally { > synchronized (cancelLock) { > canceler = null; > } > } > } > } > } > {code} > > can we use the fsimage's mostRecentCheckpointTime to do the check. > > thanks, > Gus -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-14497) Write lock held by metasave impact following RPC processing
[ https://issues.apache.org/jira/browse/HDFS-14497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao reopened HDFS-14497: > Write lock held by metasave impact following RPC processing > --- > > Key: HDFS-14497 > URL: https://issues.apache.org/jira/browse/HDFS-14497 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14497-addendum.001.patch, HDFS-14497.001.patch > > > NameNode meta save hold global write lock currently, so following RPC r/w > request or inner-thread of NameNode could be paused if they try to acquire > global read/write lock and have to wait before metasave release it. > I propose to change write lock to read lock and let some read request could > be process normally. I think it could not change informations which meta save > try to get if we try to open read request. > Actually, we need ensure that there are only one thread to execute metaSave, > otherwise, output streams could meet exception especially both streams hold > the same file handle or some other same output stream. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14497) Write lock held by metasave impact following RPC processing
[ https://issues.apache.org/jira/browse/HDFS-14497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915230#comment-16915230 ] He Xiaoqiao edited comment on HDFS-14497 at 8/25/19 1:45 PM: - Thanks [~jojochuang], {quote}suggests metaSaveLock should be a final object{quote} it makes sense to me. Reopen this JIRA and submit [^HDFS-14497-addendum.001.patch] try to change metaSaveLock to be a final object. Please help to take reviews. was (Author: hexiaoqiao): Thanks [~jojochuang], {quote}suggests metaSaveLock should be a final object{quote} it makes sense to me. [^HDFS-14497-addendum.001.patch] try to change metaSaveLock to be a final object. Please help to take reviews. > Write lock held by metasave impact following RPC processing > --- > > Key: HDFS-14497 > URL: https://issues.apache.org/jira/browse/HDFS-14497 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14497-addendum.001.patch, HDFS-14497.001.patch > > > NameNode meta save hold global write lock currently, so following RPC r/w > request or inner-thread of NameNode could be paused if they try to acquire > global read/write lock and have to wait before metasave release it. > I propose to change write lock to read lock and let some read request could > be process normally. I think it could not change informations which meta save > try to get if we try to open read request. > Actually, we need ensure that there are only one thread to execute metaSave, > otherwise, output streams could meet exception especially both streams hold > the same file handle or some other same output stream. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14771) Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing sub-sections to the fsimage index)
[ https://issues.apache.org/jira/browse/HDFS-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14771: --- Attachment: HDFS-14771.branch-2.001.patch Status: Patch Available (was: Open) submit demo patch following HDFS-14617 and pending what Jenkins says. > Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing > sub-sections to the fsimage index) > > > Key: HDFS-14771 > URL: https://issues.apache.org/jira/browse/HDFS-14771 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.10.0 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14771.branch-2.001.patch > > > This JIRA aims to backport HDFS-14617 to branch-2: fsimage load time by > writing sub-sections to the fsimage index. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14758) Decrease lease hard limit
[ https://issues.apache.org/jira/browse/HDFS-14758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915244#comment-16915244 ] hemanthboyina commented on HDFS-14758: -- yes [~zhangchen] even decreasing hard limit doesn't solve in all the scenarios , as [~jojochuang] mentioned _there can be network partitions or client may simply crash (NN crash doesn't loose state). HDFS-14694 doesn't address all failure scenarios._ so i feel better we have both which covers all the failure scenarios > Decrease lease hard limit > - > > Key: HDFS-14758 > URL: https://issues.apache.org/jira/browse/HDFS-14758 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Eric Payne >Assignee: hemanthboyina >Priority: Minor > > The hard limit is currently hard-coded to be 1 hour. This also determines the > NN automatic lease recovery interval. Something like 20 min will make more > sense. > After the 5 min soft limit, other clients can recover the lease. If no one > else takes the lease away, the original client still can renew the lease > within the hard limit. So even after a NN full GC of 8 minutes, leases can be > still valid. > However, there is one risk in reducing the hard limit. E.g. Reduced to 20 > min. If the NN crashes and the manual failover takes more than 20 minutes, > clients will abort. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14762) "Path(Path/String parent, String child)" will fail when "child" contains ":"
[ https://issues.apache.org/jira/browse/HDFS-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915242#comment-16915242 ] hemanthboyina commented on HDFS-14762: -- [~zsxwing] , then we need to handle the exception with a proper message ? whats your expecting point on the issue ? > "Path(Path/String parent, String child)" will fail when "child" contains ":" > > > Key: HDFS-14762 > URL: https://issues.apache.org/jira/browse/HDFS-14762 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shixiong Zhu >Assignee: hemanthboyina >Priority: Major > > When the "child" parameter contains ":", "Path(Path/String parent, String > child)" will throw the following exception: > {code} > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: ... > {code} > Not sure if this is a legit bug. But the following places will hit this error > when seeing a Path with a file name containing ":": > https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java#L101 > https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L270 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-14762) "Path(Path/String parent, String child)" will fail when "child" contains ":"
[ https://issues.apache.org/jira/browse/HDFS-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hemanthboyina updated HDFS-14762: - Comment: was deleted (was: [~zsxwing] , then we need to handle the exception with a proper message ? whats your expecting point on this ?) > "Path(Path/String parent, String child)" will fail when "child" contains ":" > > > Key: HDFS-14762 > URL: https://issues.apache.org/jira/browse/HDFS-14762 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shixiong Zhu >Assignee: hemanthboyina >Priority: Major > > When the "child" parameter contains ":", "Path(Path/String parent, String > child)" will throw the following exception: > {code} > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: ... > {code} > Not sure if this is a legit bug. But the following places will hit this error > when seeing a Path with a file name containing ":": > https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java#L101 > https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L270 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14762) "Path(Path/String parent, String child)" will fail when "child" contains ":"
[ https://issues.apache.org/jira/browse/HDFS-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915241#comment-16915241 ] hemanthboyina commented on HDFS-14762: -- [~zsxwing] , then we need to handle the exception with a proper message ? whats your expecting point on this ? > "Path(Path/String parent, String child)" will fail when "child" contains ":" > > > Key: HDFS-14762 > URL: https://issues.apache.org/jira/browse/HDFS-14762 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shixiong Zhu >Assignee: hemanthboyina >Priority: Major > > When the "child" parameter contains ":", "Path(Path/String parent, String > child)" will throw the following exception: > {code} > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: ... > {code} > Not sure if this is a legit bug. But the following places will hit this error > when seeing a Path with a file name containing ":": > https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java#L101 > https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L270 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2030) Generate simplifed reports by the dev-support/checks/*.sh scripts
[ https://issues.apache.org/jira/browse/HDDS-2030?focusedWorklogId=300837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300837 ] ASF GitHub Bot logged work on HDDS-2030: Author: ASF GitHub Bot Created on: 25/Aug/19 12:29 Start Date: 25/Aug/19 12:29 Worklog Time Spent: 10m Work Description: adoroszlai commented on pull request #1348: HDDS-2030. Generate simplifed reports by the dev-support/checks/*.sh scripts URL: https://github.com/apache/hadoop/pull/1348#discussion_r317395504 ## File path: hadoop-ozone/dev-support/checks/acceptance.sh ## @@ -16,7 +16,19 @@ DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" cd "$DIR/../../.." || exit 1 +REPORT_DIR=${OUTPUT_DIR:-"$DIR/../../../target/acceptance"} +mkdir -p "$REPORT_DIR" + OZONE_VERSION=$(grep "" "$DIR/../../pom.xml" | sed 's/<[^>]*>//g'| sed 's/^[ \t]*//') -cd "$DIR/../../dist/target/ozone-$OZONE_VERSION/compose" || exit 1 +DIST_DIR="$DIR/../../dist/target/ozone-$OZONE_VERSION" + +if [ ! -d "$DIST_DIR" ]; then +echo "Distribution dir is missing. Doing a full build" +"$DIR/build.sh" +fi + +cd "$DIST_DIR/compose" || exit 1 ./test-all.sh +cp results/* "$REPORT_DIR/" Review comment: ```suggestion cp result/* "$REPORT_DIR/" ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 300837) Time Spent: 1h (was: 50m) > Generate simplifed reports by the dev-support/checks/*.sh scripts > - > > Key: HDDS-2030 > URL: https://issues.apache.org/jira/browse/HDDS-2030 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: build >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > hadoop-ozone/dev-support/checks directory contains shell scripts to execute > different type of code checks (findbugs, checkstyle, etc.) > Currently the contract is very simple. Every shell script executes one (and > only one) check and the shell response code is set according to the result > (non-zero code if failed). > To have better reporting in the github pr build, it would be great to improve > the scripts to generate simple summary files and save the relevant files for > archiving. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2030) Generate simplifed reports by the dev-support/checks/*.sh scripts
[ https://issues.apache.org/jira/browse/HDDS-2030?focusedWorklogId=300838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300838 ] ASF GitHub Bot logged work on HDDS-2030: Author: ASF GitHub Bot Created on: 25/Aug/19 12:29 Start Date: 25/Aug/19 12:29 Worklog Time Spent: 10m Work Description: adoroszlai commented on pull request #1348: HDDS-2030. Generate simplifed reports by the dev-support/checks/*.sh scripts URL: https://github.com/apache/hadoop/pull/1348#discussion_r317395338 ## File path: hadoop-ozone/dev-support/checks/unit.sh ## @@ -13,12 +13,20 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. +DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" +cd "$DIR/../../.." || exit 1 +set -x + export MAVEN_OPTS="-Xmx4096m" -mvn -fn test -f pom.ozone.xml -pl \!:hadoop-ozone-integration-test,\!:hadoop-ozone-filesystem,\!:hadoop-ozone-tools -module_failed_tests=$(find "." -name 'TEST*.xml' -print0 \ -| xargs -n1 -0 "grep" -l -E " Generate simplifed reports by the dev-support/checks/*.sh scripts > - > > Key: HDDS-2030 > URL: https://issues.apache.org/jira/browse/HDDS-2030 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: build >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > hadoop-ozone/dev-support/checks directory contains shell scripts to execute > different type of code checks (findbugs, checkstyle, etc.) > Currently the contract is very simple. Every shell script executes one (and > only one) check and the shell response code is set according to the result > (non-zero code if failed). > To have better reporting in the github pr build, it would be great to improve > the scripts to generate simple summary files and save the relevant files for > archiving. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2030) Generate simplifed reports by the dev-support/checks/*.sh scripts
[ https://issues.apache.org/jira/browse/HDDS-2030?focusedWorklogId=300835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300835 ] ASF GitHub Bot logged work on HDDS-2030: Author: ASF GitHub Bot Created on: 25/Aug/19 12:29 Start Date: 25/Aug/19 12:29 Worklog Time Spent: 10m Work Description: adoroszlai commented on pull request #1348: HDDS-2030. Generate simplifed reports by the dev-support/checks/*.sh scripts URL: https://github.com/apache/hadoop/pull/1348#discussion_r317395567 ## File path: hadoop-ozone/dev-support/checks/README.md ## @@ -0,0 +1,27 @@ + + +# Ozone checks + +This directory contains a collection of easy-to-user helper scripts to execute various type of tests on the ozone/hdds codebase. Review comment: ```suggestion This directory contains a collection of easy-to-use helper scripts to execute various type of tests on the ozone/hdds codebase. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 300835) Time Spent: 50m (was: 40m) > Generate simplifed reports by the dev-support/checks/*.sh scripts > - > > Key: HDDS-2030 > URL: https://issues.apache.org/jira/browse/HDDS-2030 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: build >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > hadoop-ozone/dev-support/checks directory contains shell scripts to execute > different type of code checks (findbugs, checkstyle, etc.) > Currently the contract is very simple. Every shell script executes one (and > only one) check and the shell response code is set according to the result > (non-zero code if failed). > To have better reporting in the github pr build, it would be great to improve > the scripts to generate simple summary files and save the relevant files for > archiving. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2030) Generate simplifed reports by the dev-support/checks/*.sh scripts
[ https://issues.apache.org/jira/browse/HDDS-2030?focusedWorklogId=300839=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300839 ] ASF GitHub Bot logged work on HDDS-2030: Author: ASF GitHub Bot Created on: 25/Aug/19 12:29 Start Date: 25/Aug/19 12:29 Worklog Time Spent: 10m Work Description: adoroszlai commented on pull request #1348: HDDS-2030. Generate simplifed reports by the dev-support/checks/*.sh scripts URL: https://github.com/apache/hadoop/pull/1348#discussion_r317395273 ## File path: hadoop-ozone/dev-support/checks/_mvn_unit_report.sh ## @@ -0,0 +1,53 @@ +#!/usr/bin/env bash +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +## generate summary txt file +find "." -name 'TEST*.xml' -print0 \ +| xargs -n1 -0 "grep" -l -E " "$SUMMARY_FILE" +for TEST_RESULT_FILE in $(find "$REPORT_DIR" -name "*.txt" | grep -v output); do + +FAILURES=$(grep FAILURE "$TEST_RESULT_FILE" | grep "Tests run" | awk '{print $18}' | sort | uniq) + +for FAILURE in $FAILURES; do +TEST_RESULT_LOCATION="$(realpath --relative-to="$REPORT_DIR" "$TEST_RESULT_FILE")" +TEST_OUTPUT_LOCATION="${TEST_RESULT_LOCATION//.txt/-output.txt/}" +printf " * [%s](%s) ([output](%s))" "$FAILURE" "$TEST_RESULT_LOCATION" "$TEST_OUTPUT_LOCATION" >> "$SUMMARY_FILE" Review comment: ```suggestion printf " * [%s](%s) ([output](%s))" "$FAILURE" "$TEST_RESULT_LOCATION" "$TEST_OUTPUT_LOCATION\n" >> "$SUMMARY_FILE" ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 300839) Time Spent: 1h (was: 50m) > Generate simplifed reports by the dev-support/checks/*.sh scripts > - > > Key: HDDS-2030 > URL: https://issues.apache.org/jira/browse/HDDS-2030 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: build >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > hadoop-ozone/dev-support/checks directory contains shell scripts to execute > different type of code checks (findbugs, checkstyle, etc.) > Currently the contract is very simple. Every shell script executes one (and > only one) check and the shell response code is set according to the result > (non-zero code if failed). > To have better reporting in the github pr build, it would be great to improve > the scripts to generate simple summary files and save the relevant files for > archiving. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2030) Generate simplifed reports by the dev-support/checks/*.sh scripts
[ https://issues.apache.org/jira/browse/HDDS-2030?focusedWorklogId=300836=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300836 ] ASF GitHub Bot logged work on HDDS-2030: Author: ASF GitHub Bot Created on: 25/Aug/19 12:29 Start Date: 25/Aug/19 12:29 Worklog Time Spent: 10m Work Description: adoroszlai commented on pull request #1348: HDDS-2030. Generate simplifed reports by the dev-support/checks/*.sh scripts URL: https://github.com/apache/hadoop/pull/1348#discussion_r317395730 ## File path: hadoop-ozone/dev-support/checks/acceptance.sh ## @@ -16,7 +16,19 @@ DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" cd "$DIR/../../.." || exit 1 +REPORT_DIR=${OUTPUT_DIR:-"$DIR/../../../target/acceptance"} +mkdir -p "$REPORT_DIR" + OZONE_VERSION=$(grep "" "$DIR/../../pom.xml" | sed 's/<[^>]*>//g'| sed 's/^[ \t]*//') -cd "$DIR/../../dist/target/ozone-$OZONE_VERSION/compose" || exit 1 +DIST_DIR="$DIR/../../dist/target/ozone-$OZONE_VERSION" + +if [ ! -d "$DIST_DIR" ]; then +echo "Distribution dir is missing. Doing a full build" +"$DIR/build.sh" +fi + +cd "$DIST_DIR/compose" || exit 1 ./test-all.sh +cp results/* "$REPORT_DIR/" +cp "$REPORT_DIR/log.html" "$REPORT_DIR/summary.html" exit $? Review comment: Now this `$?` reflects the result of `cp`, so result of `test-all.sh` should be saved and used for `exit`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 300836) Time Spent: 1h (was: 50m) > Generate simplifed reports by the dev-support/checks/*.sh scripts > - > > Key: HDDS-2030 > URL: https://issues.apache.org/jira/browse/HDDS-2030 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: build >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > hadoop-ozone/dev-support/checks directory contains shell scripts to execute > different type of code checks (findbugs, checkstyle, etc.) > Currently the contract is very simple. Every shell script executes one (and > only one) check and the shell response code is set according to the result > (non-zero code if failed). > To have better reporting in the github pr build, it would be great to improve > the scripts to generate simple summary files and save the relevant files for > archiving. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14497) Write lock held by metasave impact following RPC processing
[ https://issues.apache.org/jira/browse/HDFS-14497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915230#comment-16915230 ] He Xiaoqiao commented on HDFS-14497: Thanks [~jojochuang], {quote}suggests metaSaveLock should be a final object{quote} it makes sense to me. [^HDFS-14497-addendum.001.patch] try to change metaSaveLock to be a final object. Please help to take reviews. > Write lock held by metasave impact following RPC processing > --- > > Key: HDFS-14497 > URL: https://issues.apache.org/jira/browse/HDFS-14497 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14497-addendum.001.patch, HDFS-14497.001.patch > > > NameNode meta save hold global write lock currently, so following RPC r/w > request or inner-thread of NameNode could be paused if they try to acquire > global read/write lock and have to wait before metasave release it. > I propose to change write lock to read lock and let some read request could > be process normally. I think it could not change informations which meta save > try to get if we try to open read request. > Actually, we need ensure that there are only one thread to execute metaSave, > otherwise, output streams could meet exception especially both streams hold > the same file handle or some other same output stream. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14497) Write lock held by metasave impact following RPC processing
[ https://issues.apache.org/jira/browse/HDFS-14497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14497: --- Attachment: HDFS-14497-addendum.001.patch > Write lock held by metasave impact following RPC processing > --- > > Key: HDFS-14497 > URL: https://issues.apache.org/jira/browse/HDFS-14497 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14497-addendum.001.patch, HDFS-14497.001.patch > > > NameNode meta save hold global write lock currently, so following RPC r/w > request or inner-thread of NameNode could be paused if they try to acquire > global read/write lock and have to wait before metasave release it. > I propose to change write lock to read lock and let some read request could > be process normally. I think it could not change informations which meta save > try to get if we try to open read request. > Actually, we need ensure that there are only one thread to execute metaSave, > otherwise, output streams could meet exception especially both streams hold > the same file handle or some other same output stream. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14711) RBF: RBFMetrics throws NullPointerException if stateStore disabled
[ https://issues.apache.org/jira/browse/HDFS-14711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915227#comment-16915227 ] Chen Zhang commented on HDFS-14711: --- Hi [~ayushtkn], since HDFS-14656 is not updated for over one month, so I uploaded a patch here to add NULL check, hope you don't mind. > RBF: RBFMetrics throws NullPointerException if stateStore disabled > -- > > Key: HDFS-14711 > URL: https://issues.apache.org/jira/browse/HDFS-14711 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14711.001.patch, HDFS-14711.002.patch, > HDFS-14711.003.patch > > > In current implementation, if \{{stateStore}} initialize fail, only log an > error message. Actually RBFMetrics can't work normally at this time. > {code:java} > 2019-08-08 22:43:58,024 [qtp812446698-28] ERROR jmx.JMXJsonServlet > (JMXJsonServlet.java:writeAttribute(345)) - getting attribute FilesTotal of > Hadoop:service=NameNode,name=FSNamesystem-2 threw an exception > javax.management.RuntimeMBeanException: java.lang.NullPointerException > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678) > at > org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:338) > at org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:316) > at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:210) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) > at > org.apache.hadoop.security.authentication.server.ProxyUserAuthenticationFilter.doFilter(ProxyUserAuthenticationFilter.java:104) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) > at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:51) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:110) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:539) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) > at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at >
[jira] [Commented] (HDFS-14617) Improve fsimage load time by writing sub-sections to the fsimage index
[ https://issues.apache.org/jira/browse/HDFS-14617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915221#comment-16915221 ] He Xiaoqiao commented on HDFS-14617: [~csun],[~sodonnell],[~jojochuang] HDFS-14771 is tracking for backport this patch to branch-2. I will try to test and coverage most cases about FsImage when branch-2 patch is ready. > Improve fsimage load time by writing sub-sections to the fsimage index > -- > > Key: HDFS-14617 > URL: https://issues.apache.org/jira/browse/HDFS-14617 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14617.001.patch, ParallelLoading.svg, > SerialLoading.svg, dirs-single.svg, flamegraph.parallel.svg, > flamegraph.serial.svg, inodes.svg > > > Loading an fsimage is basically a single threaded process. The current > fsimage is written out in sections, eg iNode, iNode_Directory, Snapshots, > Snapshot_Diff etc. Then at the end of the file, an index is written that > contains the offset and length of each section. The image loader code uses > this index to initialize an input stream to read and process each section. It > is important that one section is fully loaded before another is started, as > the next section depends on the results of the previous one. > What I would like to propose is the following: > 1. When writing the image, we can optionally output sub_sections to the > index. That way, a given section would effectively be split into several > sections, eg: > {code:java} >inode_section offset 10 length 1000 > inode_sub_section offset 10 length 500 > inode_sub_section offset 510 length 500 > >inode_dir_section offset 1010 length 1000 > inode_dir_sub_section offset 1010 length 500 > inode_dir_sub_section offset 1010 length 500 > {code} > Here you can see we still have the original section index, but then we also > have sub-section entries that cover the entire section. Then a processor can > either read the full section in serial, or read each sub-section in parallel. > 2. In the Image Writer code, we should set a target number of sub-sections, > and then based on the total inodes in memory, it will create that many > sub-sections per major image section. I think the only sections worth doing > this for are inode, inode_reference, inode_dir and snapshot_diff. All others > tend to be fairly small in practice. > 3. If there are under some threshold of inodes (eg 10M) then don't bother > with the sub-sections as a serial load only takes a few seconds at that scale. > 4. The image loading code can then have a switch to enable 'parallel loading' > and a 'number of threads' where it uses the sub-sections, or if not enabled > falls back to the existing logic to read the entire section in serial. > Working with a large image of 316M inodes and 35GB on disk, I have a proof of > concept of this change working, allowing just inode and inode_dir to be > loaded in parallel, but I believe inode_reference and snapshot_diff can be > make parallel with the same technique. > Some benchmarks I have are as follows: > {code:java} > Threads 1 2 3 4 > > inodes448 290 226 189 > inode_dir 326 211 170 161 > Total 927 651 535 488 (MD5 calculation about 100 seconds) > {code} > The above table shows the time in seconds to load the inode section and the > inode_directory section, and then the total load time of the image. > With 4 threads using the above technique, we are able to better than half the > load time of the two sections. With the patch in HDFS-13694 it would take a > further 100 seconds off the run time, going from 927 seconds to 388, which is > a significant improvement. Adding more threads beyond 4 has diminishing > returns as there are some synchronized points in the loading code to protect > the in memory structures. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11246) FSNameSystem#logAuditEvent should be called outside the read or write locks
[ https://issues.apache.org/jira/browse/HDFS-11246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915218#comment-16915218 ] He Xiaoqiao commented on HDFS-11246: Thanks [~jojochuang], [^HDFS-11246.011.patch] try to correct lock about #addCachePool and fix checkstyle. Pending what Jenkins says. > FSNameSystem#logAuditEvent should be called outside the read or write locks > --- > > Key: HDFS-11246 > URL: https://issues.apache.org/jira/browse/HDFS-11246 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Kuhu Shukla >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-11246.001.patch, HDFS-11246.002.patch, > HDFS-11246.003.patch, HDFS-11246.004.patch, HDFS-11246.005.patch, > HDFS-11246.006.patch, HDFS-11246.007.patch, HDFS-11246.008.patch, > HDFS-11246.009.patch, HDFS-11246.010.patch, HDFS-11246.011.patch > > > {code} > readLock(); > boolean success = true; > ContentSummary cs; > try { > checkOperation(OperationCategory.READ); > cs = FSDirStatAndListingOp.getContentSummary(dir, src); > } catch (AccessControlException ace) { > success = false; > logAuditEvent(success, operationName, src); > throw ace; > } finally { > readUnlock(operationName); > } > {code} > It would be nice to have audit logging outside the lock esp. in scenarios > where applications hammer a given operation several times. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11246) FSNameSystem#logAuditEvent should be called outside the read or write locks
[ https://issues.apache.org/jira/browse/HDFS-11246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-11246: --- Attachment: HDFS-11246.011.patch > FSNameSystem#logAuditEvent should be called outside the read or write locks > --- > > Key: HDFS-11246 > URL: https://issues.apache.org/jira/browse/HDFS-11246 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Kuhu Shukla >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-11246.001.patch, HDFS-11246.002.patch, > HDFS-11246.003.patch, HDFS-11246.004.patch, HDFS-11246.005.patch, > HDFS-11246.006.patch, HDFS-11246.007.patch, HDFS-11246.008.patch, > HDFS-11246.009.patch, HDFS-11246.010.patch, HDFS-11246.011.patch > > > {code} > readLock(); > boolean success = true; > ContentSummary cs; > try { > checkOperation(OperationCategory.READ); > cs = FSDirStatAndListingOp.getContentSummary(dir, src); > } catch (AccessControlException ace) { > success = false; > logAuditEvent(success, operationName, src); > throw ace; > } finally { > readUnlock(operationName); > } > {code} > It would be nice to have audit logging outside the lock esp. in scenarios > where applications hammer a given operation several times. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12904) Add DataTransferThrottler to the Datanode transfers
[ https://issues.apache.org/jira/browse/HDFS-12904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915199#comment-16915199 ] Lisheng Sun commented on HDFS-12904: Uploaded the v003 patch. > Add DataTransferThrottler to the Datanode transfers > --- > > Key: HDFS-12904 > URL: https://issues.apache.org/jira/browse/HDFS-12904 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Reporter: Íñigo Goiri >Assignee: Lisheng Sun >Priority: Minor > Attachments: HDFS-12904.000.patch, HDFS-12904.001.patch, > HDFS-12904.002.patch, HDFS-12904.003.patch > > > The {{DataXceiverServer}} already uses throttling for the balancing. The > Datanode should also allow throttling the regular data transfers. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12904) Add DataTransferThrottler to the Datanode transfers
[ https://issues.apache.org/jira/browse/HDFS-12904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-12904: --- Attachment: HDFS-12904.003.patch > Add DataTransferThrottler to the Datanode transfers > --- > > Key: HDFS-12904 > URL: https://issues.apache.org/jira/browse/HDFS-12904 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Reporter: Íñigo Goiri >Assignee: Lisheng Sun >Priority: Minor > Attachments: HDFS-12904.000.patch, HDFS-12904.001.patch, > HDFS-12904.002.patch, HDFS-12904.003.patch > > > The {{DataXceiverServer}} already uses throttling for the balancing. The > Datanode should also allow throttling the regular data transfers. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2030) Generate simplifed reports by the dev-support/checks/*.sh scripts
[ https://issues.apache.org/jira/browse/HDDS-2030?focusedWorklogId=300826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300826 ] ASF GitHub Bot logged work on HDDS-2030: Author: ASF GitHub Bot Created on: 25/Aug/19 07:25 Start Date: 25/Aug/19 07:25 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1348: HDDS-2030. Generate simplifed reports by the dev-support/checks/*.sh scripts URL: https://github.com/apache/hadoop/pull/1348#issuecomment-524607886 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 145 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | 0 | shelldocs | 1 | Shelldocs was not available. | | 0 | @author | 0 | Skipping @author checks as author.sh has been patched. | | -1 | test4tests | 0 | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | ||| _ trunk Compile Tests _ | | +1 | mvninstall | 1223 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 1542 | branch has no errors when building and testing our client artifacts. | ||| _ Patch Compile Tests _ | | +1 | mvninstall | 1218 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | -1 | shellcheck | 2 | The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 775 | patch has no errors when building and testing our client artifacts. | ||| _ Other Tests _ | | +1 | unit | 119 | hadoop-hdds in the patch passed. | | +1 | unit | 322 | hadoop-ozone in the patch passed. | | +1 | asflicense | 50 | The patch does not generate ASF License warnings. | | | | 5905 | | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1348/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1348 | | Optional Tests | dupname asflicense mvnsite unit shellcheck shelldocs | | uname | Linux 2a2de9de6b88 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / d2225c8 | | shellcheck | https://builds.apache.org/job/hadoop-multibranch/job/PR-1348/1/artifact/out/diff-patch-shellcheck.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1348/1/testReport/ | | Max. process+thread count | 329 (vs. ulimit of 5500) | | modules | C: hadoop-ozone U: hadoop-ozone | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1348/1/console | | versions | git=2.7.4 maven=3.3.9 shellcheck=0.4.6 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 300826) Time Spent: 40m (was: 0.5h) > Generate simplifed reports by the dev-support/checks/*.sh scripts > - > > Key: HDDS-2030 > URL: https://issues.apache.org/jira/browse/HDDS-2030 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: build >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > hadoop-ozone/dev-support/checks directory contains shell scripts to execute > different type of code checks (findbugs, checkstyle, etc.) > Currently the contract is very simple. Every shell script executes one (and > only one) check and the shell response code is set according to the result > (non-zero code if failed). > To have better reporting in the github pr build, it would be great to improve > the scripts to generate simple summary files and save the relevant files for > archiving. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For
[jira] [Work logged] (HDDS-2030) Generate simplifed reports by the dev-support/checks/*.sh scripts
[ https://issues.apache.org/jira/browse/HDDS-2030?focusedWorklogId=300825=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300825 ] ASF GitHub Bot logged work on HDDS-2030: Author: ASF GitHub Bot Created on: 25/Aug/19 07:25 Start Date: 25/Aug/19 07:25 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #1348: HDDS-2030. Generate simplifed reports by the dev-support/checks/*.sh scripts URL: https://github.com/apache/hadoop/pull/1348#discussion_r317385576 ## File path: hadoop-ozone/dev-support/checks/unit.sh ## @@ -13,12 +13,20 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. +DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" +cd "$DIR/../../.." || exit 1 +set -x + export MAVEN_OPTS="-Xmx4096m" -mvn -fn test -f pom.ozone.xml -pl \!:hadoop-ozone-integration-test,\!:hadoop-ozone-filesystem,\!:hadoop-ozone-tools -module_failed_tests=$(find "." -name 'TEST*.xml' -print0 \ -| xargs -n1 -0 "grep" -l -E " Generate simplifed reports by the dev-support/checks/*.sh scripts > - > > Key: HDDS-2030 > URL: https://issues.apache.org/jira/browse/HDDS-2030 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: build >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > hadoop-ozone/dev-support/checks directory contains shell scripts to execute > different type of code checks (findbugs, checkstyle, etc.) > Currently the contract is very simple. Every shell script executes one (and > only one) check and the shell response code is set according to the result > (non-zero code if failed). > To have better reporting in the github pr build, it would be great to improve > the scripts to generate simple summary files and save the relevant files for > archiving. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2030) Generate simplifed reports by the dev-support/checks/*.sh scripts
[ https://issues.apache.org/jira/browse/HDDS-2030?focusedWorklogId=300824=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300824 ] ASF GitHub Bot logged work on HDDS-2030: Author: ASF GitHub Bot Created on: 25/Aug/19 07:25 Start Date: 25/Aug/19 07:25 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #1348: HDDS-2030. Generate simplifed reports by the dev-support/checks/*.sh scripts URL: https://github.com/apache/hadoop/pull/1348#discussion_r317385575 ## File path: hadoop-ozone/dev-support/checks/integration.sh ## @@ -20,10 +20,14 @@ export MAVEN_OPTS="-Xmx4096m" mvn -B install -f pom.ozone.xml -DskipTests mvn -B -fn test -f pom.ozone.xml -pl :hadoop-ozone-integration-test,:hadoop-ozone-filesystem,:hadoop-ozone-tools \ -Dtest=\!TestMiniChaosOzoneCluster -module_failed_tests=$(find "." -name 'TEST*.xml' -print0 \ -| xargs -0 -n1 "grep" -l -E " Generate simplifed reports by the dev-support/checks/*.sh scripts > - > > Key: HDDS-2030 > URL: https://issues.apache.org/jira/browse/HDDS-2030 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: build >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > hadoop-ozone/dev-support/checks directory contains shell scripts to execute > different type of code checks (findbugs, checkstyle, etc.) > Currently the contract is very simple. Every shell script executes one (and > only one) check and the shell response code is set according to the result > (non-zero code if failed). > To have better reporting in the github pr build, it would be great to improve > the scripts to generate simple summary files and save the relevant files for > archiving. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2006) Autogenerated docker config fails with space in the file name issue.
[ https://issues.apache.org/jira/browse/HDDS-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravindan Vijayan resolved HDDS-2006. - Resolution: Cannot Reproduce > Autogenerated docker config fails with space in the file name issue. > > > Key: HDDS-2006 > URL: https://issues.apache.org/jira/browse/HDDS-2006 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Anu Engineer >Assignee: Aravindan Vijayan >Priority: Major > > If you follow the instructions in the "Local multi-container cluster" and > generate docker-config, and later try to use it. The docker-compose up -d > command will fail with > > *ERROR: In file ~/testOzoneInstructions/docker-config: environment variable > name 'Setting up environment!' may not contains whitespace.* -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org