[jira] [Updated] (HDDS-3683) Ozone fuse support
[ https://issues.apache.org/jira/browse/HDDS-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maobaolong updated HDDS-3683: - Description: https://github.com/opendataio/hcfsfuse design doc will be updated here. https://docs.google.com/document/d/1IY9xhRTeo42Sfzw6U-NngHOTLO_B7_0BiUsonvPKvh8/edit?usp=sharing was:https://github.com/opendataio/hcfsfuse > Ozone fuse support > --- > > Key: HDDS-3683 > URL: https://issues.apache.org/jira/browse/HDDS-3683 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: maobaolong >Assignee: maobaolong >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png > > > https://github.com/opendataio/hcfsfuse > design doc will be updated here. > https://docs.google.com/document/d/1IY9xhRTeo42Sfzw6U-NngHOTLO_B7_0BiUsonvPKvh8/edit?usp=sharing -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] codecov-commenter commented on pull request #1054: Hdds 3772. Add LOG to S3ErrorTable for easier problem locating.
codecov-commenter commented on pull request #1054: URL: https://github.com/apache/hadoop-ozone/pull/1054#issuecomment-642411148 # [Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1054?src=pr=h1) Report > :exclamation: No coverage uploaded for pull request base (`master@67244e5`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit). > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hadoop-ozone/pull/1054/graphs/tree.svg?width=650=150=pr=5YeeptJMby)](https://codecov.io/gh/apache/hadoop-ozone/pull/1054?src=pr=tree) ```diff @@Coverage Diff@@ ## master#1054 +/- ## = Coverage ? 69.43% Complexity? 9113 = Files ? 961 Lines ?48150 Branches ? 4679 = Hits ?33435 Misses?12499 Partials ? 2216 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1054?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1054?src=pr=footer). Last update [67244e5...06c521d](https://codecov.io/gh/apache/hadoop-ozone/pull/1054?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3481) SCM ask too many datanodes to replicate the same container
[ https://issues.apache.org/jira/browse/HDDS-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar updated HDDS-3481: -- Status: Patch Available (was: Open) > SCM ask too many datanodes to replicate the same container > -- > > Key: HDDS-3481 > URL: https://issues.apache.org/jira/browse/HDDS-3481 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Blocker > Labels: Triaged, pull-request-available > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > > *What's the problem ?* > As the image shows, scm ask 31 datanodes to replicate container 2037 every > 10 minutes from 2020-04-17 23:38:51. And at 2020-04-18 08:58:52 scm find the > replicate num of container 2037 is 12, then it ask 11 datanodes to delete > container 2037. > !screenshot-1.png! > !screenshot-2.png! > *What's the reason ?* > scm check whether (container replicates num + > inflightReplication.get(containerId).size() - > inflightDeletion.get(containerId).size()) is less than 3. If less than 3, it > will ask some datanode to replicate the container, and add the action into > inflightReplication.get(containerId). The replicate action time out is 10 > minutes, if action timeout, scm will delete the action from > inflightReplication.get(containerId) as the image shows. Then (container > replicates num + inflightReplication.get(containerId).size() - > inflightDeletion.get(containerId).size()) is less than 3 again, and scm ask > another datanode to replicate the container. > Because replicate container cost a long time, sometimes it cannot finish in > 10 minutes, thus 31 datanodes has to replicate the container every 10 > minutes. 19 of 31 datanodes replicate container from the same source > datanode, it will also cause big pressure on the source datanode and > replicate container become slower. Actually it cost 4 hours to finish the > first replicate. > !screenshot-4.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] leosunli commented on a change in pull request #1033: HDDS-3667. If we gracefully stop datanode it would be better to notify scm and r…
leosunli commented on a change in pull request #1033: URL: https://github.com/apache/hadoop-ozone/pull/1033#discussion_r438540923 ## File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/states/endpoint/UnRegisterEndpointTask.java ## @@ -0,0 +1,262 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with this + * work for additional information regarding copyright ownership. The ASF + * licenses this file to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.hadoop.ozone.container.common.states.endpoint; + +import java.io.IOException; +import java.util.UUID; +import java.util.concurrent.Callable; +import java.util.concurrent.Future; + +import org.apache.commons.lang3.StringUtils; +import org.apache.hadoop.hdds.conf.ConfigurationSource; +import org.apache.hadoop.hdds.protocol.DatanodeDetails; +import org.apache.hadoop.hdds.protocol.proto.StorageContainerDatanodeProtocolProtos.ContainerReportsProto; +import org.apache.hadoop.hdds.protocol.proto.StorageContainerDatanodeProtocolProtos.NodeReportProto; +import org.apache.hadoop.hdds.protocol.proto.StorageContainerDatanodeProtocolProtos.PipelineReportsProto; +import org.apache.hadoop.hdds.protocol.proto.StorageContainerDatanodeProtocolProtos.SCMUNRegisteredResponseProto; +import org.apache.hadoop.ozone.container.common.statemachine.EndpointStateMachine; +import org.apache.hadoop.ozone.container.common.statemachine.EndpointStateMachine.EndPointStates; +import org.apache.hadoop.ozone.container.common.statemachine.StateContext; +import org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; + +/** + * UnRegister a datanode with SCM. + */ +public final class UnRegisterEndpointTask implements +Callable { + static final Logger LOG = + LoggerFactory.getLogger(UnRegisterEndpointTask.class); + + private final EndpointStateMachine rpcEndPoint; + private final ConfigurationSource conf; + private Future result; + private DatanodeDetails datanodeDetails; + private final OzoneContainer datanodeContainerManager; + private StateContext stateContext; + + /** + * Creates a register endpoint task. + * + * @param rpcEndPoint - endpoint + * @param conf - conf + * @param ozoneContainer - container + */ + @VisibleForTesting + public UnRegisterEndpointTask(EndpointStateMachine rpcEndPoint, + ConfigurationSource conf, OzoneContainer ozoneContainer, + StateContext context) { +this.rpcEndPoint = rpcEndPoint; +this.conf = conf; +this.datanodeContainerManager = ozoneContainer; +this.stateContext = context; + + } + + /** + * Get the DatanodeDetails. + * + * @return DatanodeDetailsProto + */ + public DatanodeDetails getDatanodeDetails() { +return datanodeDetails; + } + + /** + * Set the contiainerNodeID Proto. + * + * @param datanodeDetails - Container Node ID. + */ + public void setDatanodeDetails( + DatanodeDetails datanodeDetails) { +this.datanodeDetails = datanodeDetails; + } + + /** + * Computes a result, or throws an exception if unable to do so. + * + * @return computed result + * @throws Exception if unable to compute a result + */ + @Override + public EndpointStateMachine.EndPointStates call() throws Exception { + +if (getDatanodeDetails() == null) { + LOG.error("DatanodeDetails cannot be null in RegisterEndpoint task, " + + "shutting down the endpoint."); + return rpcEndPoint.setState(EndpointStateMachine.EndPointStates.SHUTDOWN); +} + +rpcEndPoint.lock(); +try { + + if (rpcEndPoint.getState() + .equals(EndPointStates.SHUTDOWN)) { +ContainerReportsProto containerReport = +datanodeContainerManager.getController().getContainerReport(); +NodeReportProto nodeReport = datanodeContainerManager.getNodeReport(); +PipelineReportsProto pipelineReportsProto = +datanodeContainerManager.getPipelineReport(); +// TODO : Add responses to the command Queue. +SCMUNRegisteredResponseProto response = rpcEndPoint.getEndPoint() +.unregister(datanodeDetails.getProtoBufMessage(), nodeReport, +containerReport,
[jira] [Updated] (HDDS-3481) SCM ask too many datanodes to replicate the same container
[ https://issues.apache.org/jira/browse/HDDS-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar updated HDDS-3481: -- Labels: Triaged pull-request-available (was: TriagePending pull-request-available) > SCM ask too many datanodes to replicate the same container > -- > > Key: HDDS-3481 > URL: https://issues.apache.org/jira/browse/HDDS-3481 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Blocker > Labels: Triaged, pull-request-available > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > > *What's the problem ?* > As the image shows, scm ask 31 datanodes to replicate container 2037 every > 10 minutes from 2020-04-17 23:38:51. And at 2020-04-18 08:58:52 scm find the > replicate num of container 2037 is 12, then it ask 11 datanodes to delete > container 2037. > !screenshot-1.png! > !screenshot-2.png! > *What's the reason ?* > scm check whether (container replicates num + > inflightReplication.get(containerId).size() - > inflightDeletion.get(containerId).size()) is less than 3. If less than 3, it > will ask some datanode to replicate the container, and add the action into > inflightReplication.get(containerId). The replicate action time out is 10 > minutes, if action timeout, scm will delete the action from > inflightReplication.get(containerId) as the image shows. Then (container > replicates num + inflightReplication.get(containerId).size() - > inflightDeletion.get(containerId).size()) is less than 3 again, and scm ask > another datanode to replicate the container. > Because replicate container cost a long time, sometimes it cannot finish in > 10 minutes, thus 31 datanodes has to replicate the container every 10 > minutes. 19 of 31 datanodes replicate container from the same source > datanode, it will also cause big pressure on the source datanode and > replicate container become slower. Actually it cost 4 hours to finish the > first replicate. > !screenshot-4.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] ChenSammi opened a new pull request #1054: Hdds 3772. Add LOG to S3ErrorTable for easier problem locating.
ChenSammi opened a new pull request #1054: URL: https://github.com/apache/hadoop-ozone/pull/1054 https://issues.apache.org/jira/browse/HDDS-3772 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3772) Add LOG to S3ErrorTable for easier problem locating
[ https://issues.apache.org/jira/browse/HDDS-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-3772: - Description: Currently it's hard to directly tell the failure reason when something unexpected happened. Here is an example when downloading a file through aws java sdk. com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 0a5f2404-71bb-4edf-b488-2a5de9f6b753), S3 Extended Request ID: rv8deQRJyX3zCEk at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1389) at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:902) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607) at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376) at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3826) at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1015) at com.amazonaws.services.s3.transfer.TransferManager.doDownload(TransferManager.java:939) at com.amazonaws.services.s3.transfer.TransferManager.download(TransferManager.java:795) at com.amazonaws.services.s3.transfer.TransferManager.download(TransferManager.java:713) at com.amazonaws.services.s3.transfer.TransferManager.download(TransferManager.java:667) at AWSS3UtilTest$AWSS3Util.download(AWSS3UtilTest.java:213) at AWSS3UtilTest.test08_downloadAsyn(AWSS3UtilTest.java:107) at AWSS3UtilTest.main(AWSS3UtilTest.java:47) > Add LOG to S3ErrorTable for easier problem locating > --- > > Key: HDDS-3772 > URL: https://issues.apache.org/jira/browse/HDDS-3772 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > > Currently it's hard to directly tell the failure reason when something > unexpected happened. Here is an example when downloading a file through aws > java sdk. > com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon > S3; Status Code: 404; Error Code: 404 Not Found; Request ID: > 0a5f2404-71bb-4edf-b488-2a5de9f6b753), S3 Extended Request ID: rv8deQRJyX3zCEk > at > com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1389) > at > com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:902) > at > com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607) > at > com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376) > at > com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287) > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3826) > at > com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1015) > at > com.amazonaws.services.s3.transfer.TransferManager.doDownload(TransferManager.java:939) > at > com.amazonaws.services.s3.transfer.TransferManager.download(TransferManager.java:795) > at > com.amazonaws.services.s3.transfer.TransferManager.download(TransferManager.java:713) > at > com.amazonaws.services.s3.transfer.TransferManager.download(TransferManager.java:667) > at AWSS3UtilTest$AWSS3Util.download(AWSS3UtilTest.java:213) > at AWSS3UtilTest.test08_downloadAsyn(AWSS3UtilTest.java:107) > at AWSS3UtilTest.main(AWSS3UtilTest.java:47) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3776) Upgrading RocksDB version to avoid java heap issue
Li Cheng created HDDS-3776: -- Summary: Upgrading RocksDB version to avoid java heap issue Key: HDDS-3776 URL: https://issues.apache.org/jira/browse/HDDS-3776 Project: Hadoop Distributed Data Store Issue Type: Bug Components: upgrade Affects Versions: 0.5.0 Reporter: Li Cheng Currently we have rocksdb 6.6.4 as major version and there are some jvm issues in tests (happened in [https://github.com/apache/hadoop-ozone/pull/1019]) related to rocksdb core dump. We may upgrade to 6.8.1 to avoid this issue. {{JRE version: Java(TM) SE Runtime Environment (8.0_211-b12) (build 1.8.0_211-b12) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.211-b12 mixed mode bsd-amd64 compressed oops) # Problematic frame: # C [librocksdbjni2954960755376440018.jnilib+0x602b8] rocksdb::GetColumnFamilyID(rocksdb::ColumnFamilyHandle*)+0x8 See full dump at [https://the-asf.slack.com/files/U0159PV5Z6U/F0152UAJF0S/hs_err_pid90655.log?origin_team=T4S1WH2J3_channel=D014L2URB6E](url)}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] timmylicheng commented on pull request #1019: HDDS-3679. Add unit tests for PipelineManagerV2.
timmylicheng commented on pull request #1019: URL: https://github.com/apache/hadoop-ozone/pull/1019#issuecomment-642370476 @elek Tests seem passed here. I created https://issues.apache.org/jira/browse/HDDS-3776 to track rocksdb upgrade. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3499) Address compatibility issue by SCM DB instances change
[ https://issues.apache.org/jira/browse/HDDS-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132853#comment-17132853 ] Li Cheng commented on HDDS-3499: [~arp] Our internal production deployment is still on schedule. But we have done internal tests to verify the step works for me. Resolving this now... > Address compatibility issue by SCM DB instances change > -- > > Key: HDDS-3499 > URL: https://issues.apache.org/jira/browse/HDDS-3499 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Li Cheng >Assignee: Marton Elek >Priority: Blocker > Labels: Triaged > > After https://issues.apache.org/jira/browse/HDDS-3172, SCM now has one single > rocksdb instance instead of multiple db instances. > For running Ozone cluster, we need to address compatibility issues. One > possible way is to have a side-way tool to migrate old metadata from multiple > dbs to current single db. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3499) Address compatibility issue by SCM DB instances change
[ https://issues.apache.org/jira/browse/HDDS-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-3499. Fix Version/s: 0.6.0 Resolution: Fixed > Address compatibility issue by SCM DB instances change > -- > > Key: HDDS-3499 > URL: https://issues.apache.org/jira/browse/HDDS-3499 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Li Cheng >Assignee: Marton Elek >Priority: Blocker > Labels: Triaged > Fix For: 0.6.0 > > > After https://issues.apache.org/jira/browse/HDDS-3172, SCM now has one single > rocksdb instance instead of multiple db instances. > For running Ozone cluster, we need to address compatibility issues. One > possible way is to have a side-way tool to migrate old metadata from multiple > dbs to current single db. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] codecov-commenter edited a comment on pull request #986: HDDS-3476. Use persisted transaction info during OM startup in OM StateMachine.
codecov-commenter edited a comment on pull request #986: URL: https://github.com/apache/hadoop-ozone/pull/986#issuecomment-642271726 # [Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/986?src=pr=h1) Report > Merging [#986](https://codecov.io/gh/apache/hadoop-ozone/pull/986?src=pr=desc) into [master](https://codecov.io/gh/apache/hadoop-ozone/commit/f7e95d9b015e764ca93cfe2ccfc96d95160931bc=desc) will **decrease** coverage by `0.10%`. > The diff coverage is `65.28%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hadoop-ozone/pull/986/graphs/tree.svg?width=650=150=pr=5YeeptJMby)](https://codecov.io/gh/apache/hadoop-ozone/pull/986?src=pr=tree) ```diff @@ Coverage Diff @@ ## master #986 +/- ## - Coverage 69.48% 69.38% -0.11% + Complexity 9110 9102 -8 Files 961 961 Lines 4813248123 -9 Branches 4672 4676 +4 - Hits 3344633388 -58 - Misses1246812519 +51 + Partials 2218 2216 -2 ``` | [Impacted Files](https://codecov.io/gh/apache/hadoop-ozone/pull/986?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...main/java/org/apache/hadoop/ozone/OzoneConsts.java](https://codecov.io/gh/apache/hadoop-ozone/pull/986/diff?src=pr=tree#diff-aGFkb29wLWhkZHMvY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9oYWRvb3Avb3pvbmUvT3pvbmVDb25zdHMuamF2YQ==) | `84.21% <ø> (ø)` | `1.00 <0.00> (ø)` | | | [...apache/hadoop/hdds/utils/db/RocksDBCheckpoint.java](https://codecov.io/gh/apache/hadoop-ozone/pull/986/diff?src=pr=tree#diff-aGFkb29wLWhkZHMvZnJhbWV3b3JrL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9oYWRvb3AvaGRkcy91dGlscy9kYi9Sb2Nrc0RCQ2hlY2twb2ludC5qYXZh) | `90.90% <ø> (+0.90%)` | `5.00 <0.00> (-3.00)` | :arrow_up: | | [.../java/org/apache/hadoop/ozone/om/OMConfigKeys.java](https://codecov.io/gh/apache/hadoop-ozone/pull/986/diff?src=pr=tree#diff-aGFkb29wLW96b25lL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaGFkb29wL296b25lL29tL09NQ29uZmlnS2V5cy5qYXZh) | `100.00% <ø> (ø)` | `1.00 <0.00> (ø)` | | | [.../apache/hadoop/ozone/om/OMDBCheckpointServlet.java](https://codecov.io/gh/apache/hadoop-ozone/pull/986/diff?src=pr=tree#diff-aGFkb29wLW96b25lL296b25lLW1hbmFnZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9vbS9PTURCQ2hlY2twb2ludFNlcnZsZXQuamF2YQ==) | `66.26% <ø> (-4.27%)` | `8.00 <0.00> (-2.00)` | | | [...a/org/apache/hadoop/ozone/om/ha/OMNodeDetails.java](https://codecov.io/gh/apache/hadoop-ozone/pull/986/diff?src=pr=tree#diff-aGFkb29wLW96b25lL296b25lLW1hbmFnZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9vbS9oYS9PTU5vZGVEZXRhaWxzLmphdmE=) | `86.66% <ø> (ø)` | `12.00 <0.00> (ø)` | | | [...p/ozone/om/ratis/utils/OzoneManagerRatisUtils.java](https://codecov.io/gh/apache/hadoop-ozone/pull/986/diff?src=pr=tree#diff-aGFkb29wLW96b25lL296b25lLW1hbmFnZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9vbS9yYXRpcy91dGlscy9Pem9uZU1hbmFnZXJSYXRpc1V0aWxzLmphdmE=) | `67.44% <0.00%> (-19.13%)` | `39.00 <0.00> (ø)` | | | [.../java/org/apache/hadoop/ozone/om/OzoneManager.java](https://codecov.io/gh/apache/hadoop-ozone/pull/986/diff?src=pr=tree#diff-aGFkb29wLW96b25lL296b25lLW1hbmFnZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9vbS9Pem9uZU1hbmFnZXIuamF2YQ==) | `64.22% <12.50%> (-0.37%)` | `185.00 <1.00> (-1.00)` | | | [...adoop/ozone/om/ratis/OzoneManagerStateMachine.java](https://codecov.io/gh/apache/hadoop-ozone/pull/986/diff?src=pr=tree#diff-aGFkb29wLW96b25lL296b25lLW1hbmFnZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9vbS9yYXRpcy9Pem9uZU1hbmFnZXJTdGF0ZU1hY2hpbmUuamF2YQ==) | `58.03% <90.00%> (+2.29%)` | `27.00 <4.00> (+1.00)` | | | [.../org/apache/hadoop/hdds/scm/pipeline/Pipeline.java](https://codecov.io/gh/apache/hadoop-ozone/pull/986/diff?src=pr=tree#diff-aGFkb29wLWhkZHMvY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9oYWRvb3AvaGRkcy9zY20vcGlwZWxpbmUvUGlwZWxpbmUuamF2YQ==) | `85.71% <100.00%> (+0.20%)` | `44.00 <0.00> (+1.00)` | | | [.../org/apache/hadoop/ozone/om/helpers/OmKeyInfo.java](https://codecov.io/gh/apache/hadoop-ozone/pull/986/diff?src=pr=tree#diff-aGFkb29wLW96b25lL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaGFkb29wL296b25lL29tL2hlbHBlcnMvT21LZXlJbmZvLmphdmE=) | `86.25% <100.00%> (+0.33%)` | `42.00 <0.00> (+2.00)` | | | ... and [28 more](https://codecov.io/gh/apache/hadoop-ozone/pull/986/diff?src=pr=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/986?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data`
[jira] [Updated] (HDDS-3737) Improve OM performance
[ https://issues.apache.org/jira/browse/HDDS-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3737: - Labels: pull-request-available (was: ) > Improve OM performance > -- > > Key: HDDS-3737 > URL: https://issues.apache.org/jira/browse/HDDS-3737 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] runzhiwang opened a new pull request #1053: HDDS-3737. Avoid serialization between UUID and String
runzhiwang opened a new pull request #1053: URL: https://github.com/apache/hadoop-ozone/pull/1053 ## What changes were proposed in this pull request? **What's the problem ?** Serialization between UUID and String: UUID.toString (I have improved this) and UUID.fromString, not only cost cpu, because encode and decode String and UUID.fromString both cost cpu, but also make the proto bigger, because uuid is just a number which is 16Byte, covet it to string will need 32Byte. **How to fix ?** Actually, JDK implement UUID with two long number: `mostSigBits` and `leastSigBits`. When `UUID.fromString`, JDK get `mostSigBits` and `leastSigBits` from String, and new a object of UUID. So we can convert UUID to 2 long number in proto, which make serialization and de serialization UUID more faster, and make proto smaller. ![image](https://user-images.githubusercontent.com/51938049/84329780-37fed080-abb8-11ea-8b49-a981334fcb8c.png) ![image](https://user-images.githubusercontent.com/51938049/84329867-6f6d7d00-abb8-11ea-8815-71b7ae57d4c1.png) ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-3763 ## How was this patch tested? Existed tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] codecov-commenter commented on pull request #1002: HDDS-3642. Stop/Pause Background services while replacing OM DB with checkpoint from Leader
codecov-commenter commented on pull request #1002: URL: https://github.com/apache/hadoop-ozone/pull/1002#issuecomment-642325738 # [Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1002?src=pr=h1) Report > Merging [#1002](https://codecov.io/gh/apache/hadoop-ozone/pull/1002?src=pr=desc) into [master](https://codecov.io/gh/apache/hadoop-ozone/commit/f7e95d9b015e764ca93cfe2ccfc96d95160931bc=desc) will **decrease** coverage by `0.03%`. > The diff coverage is `0.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hadoop-ozone/pull/1002/graphs/tree.svg?width=650=150=pr=5YeeptJMby)](https://codecov.io/gh/apache/hadoop-ozone/pull/1002?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#1002 +/- ## - Coverage 69.48% 69.45% -0.04% - Complexity 9110 9114 +4 Files 961 961 Lines 4813248155 +23 Branches 4672 4679 +7 Hits 3344633446 - Misses1246812494 +26 + Partials 2218 2215 -3 ``` | [Impacted Files](https://codecov.io/gh/apache/hadoop-ozone/pull/1002?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [.../java/org/apache/hadoop/ozone/om/OzoneManager.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1002/diff?src=pr=tree#diff-aGFkb29wLW96b25lL296b25lLW1hbmFnZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9vbS9Pem9uZU1hbmFnZXIuamF2YQ==) | `64.27% <0.00%> (-0.32%)` | `186.00 <0.00> (ø)` | | | [...er/common/transport/server/GrpcXceiverService.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1002/diff?src=pr=tree#diff-aGFkb29wLWhkZHMvY29udGFpbmVyLXNlcnZpY2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9jb250YWluZXIvY29tbW9uL3RyYW5zcG9ydC9zZXJ2ZXIvR3JwY1hjZWl2ZXJTZXJ2aWNlLmphdmE=) | `70.00% <0.00%> (-10.00%)` | `3.00% <0.00%> (ø%)` | | | [...ache/hadoop/ozone/om/codec/S3SecretValueCodec.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1002/diff?src=pr=tree#diff-aGFkb29wLW96b25lL296b25lLW1hbmFnZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9vbS9jb2RlYy9TM1NlY3JldFZhbHVlQ29kZWMuamF2YQ==) | `90.90% <0.00%> (-9.10%)` | `3.00% <0.00%> (-1.00%)` | | | [.../transport/server/ratis/ContainerStateMachine.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1002/diff?src=pr=tree#diff-aGFkb29wLWhkZHMvY29udGFpbmVyLXNlcnZpY2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9jb250YWluZXIvY29tbW9uL3RyYW5zcG9ydC9zZXJ2ZXIvcmF0aXMvQ29udGFpbmVyU3RhdGVNYWNoaW5lLmphdmE=) | `69.36% <0.00%> (-6.76%)` | `59.00% <0.00%> (-5.00%)` | | | [...ozone/container/ozoneimpl/ContainerController.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1002/diff?src=pr=tree#diff-aGFkb29wLWhkZHMvY29udGFpbmVyLXNlcnZpY2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9jb250YWluZXIvb3pvbmVpbXBsL0NvbnRhaW5lckNvbnRyb2xsZXIuamF2YQ==) | `63.15% <0.00%> (-5.27%)` | `11.00% <0.00%> (-1.00%)` | | | [...iner/common/transport/server/ratis/CSMMetrics.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1002/diff?src=pr=tree#diff-aGFkb29wLWhkZHMvY29udGFpbmVyLXNlcnZpY2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9jb250YWluZXIvY29tbW9uL3RyYW5zcG9ydC9zZXJ2ZXIvcmF0aXMvQ1NNTWV0cmljcy5qYXZh) | `67.69% <0.00%> (-3.08%)` | `19.00% <0.00%> (-1.00%)` | | | [.../ozone/container/common/volume/AbstractFuture.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1002/diff?src=pr=tree#diff-aGFkb29wLWhkZHMvY29udGFpbmVyLXNlcnZpY2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9jb250YWluZXIvY29tbW9uL3ZvbHVtZS9BYnN0cmFjdEZ1dHVyZS5qYXZh) | `29.87% <0.00%> (-0.52%)` | `19.00% <0.00%> (-1.00%)` | | | [...doop/ozone/container/keyvalue/KeyValueHandler.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1002/diff?src=pr=tree#diff-aGFkb29wLWhkZHMvY29udGFpbmVyLXNlcnZpY2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9jb250YWluZXIva2V5dmFsdWUvS2V5VmFsdWVIYW5kbGVyLmphdmE=) | `61.55% <0.00%> (-0.45%)` | `63.00% <0.00%> (-1.00%)` | | | [...adoop/ozone/om/request/key/OMKeyCommitRequest.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1002/diff?src=pr=tree#diff-aGFkb29wLW96b25lL296b25lLW1hbmFnZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9vbS9yZXF1ZXN0L2tleS9PTUtleUNvbW1pdFJlcXVlc3QuamF2YQ==) | `97.00% <0.00%> (ø)` | `18.00% <0.00%> (+1.00%)` | | | [.../org/apache/hadoop/hdds/scm/pipeline/Pipeline.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1002/diff?src=pr=tree#diff-aGFkb29wLWhkZHMvY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9oYWRvb3AvaGRkcy9zY20vcGlwZWxpbmUvUGlwZWxpbmUuamF2YQ==) | `85.71% <0.00%> (+0.20%)` | `44.00% <0.00%> (+1.00%)` | | | ... and [15
[GitHub] [hadoop-ozone] codecov-commenter edited a comment on pull request #986: HDDS-3476. Use persisted transaction info during OM startup in OM StateMachine.
codecov-commenter edited a comment on pull request #986: URL: https://github.com/apache/hadoop-ozone/pull/986#issuecomment-642271726 # [Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/986?src=pr=h1) Report > :exclamation: No coverage uploaded for pull request base (`master@3328d7d`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit). > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hadoop-ozone/pull/986/graphs/tree.svg?width=650=150=pr=5YeeptJMby)](https://codecov.io/gh/apache/hadoop-ozone/pull/986?src=pr=tree) ```diff @@Coverage Diff@@ ## master #986 +/- ## = Coverage ? 69.48% Complexity? 9112 = Files ? 961 Lines ?48107 Branches ? 4669 = Hits ?33428 Misses?12468 Partials ? 2211 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/986?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/986?src=pr=footer). Last update [3328d7d...b2bda39](https://codecov.io/gh/apache/hadoop-ozone/pull/986?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 merged pull request #1052: HDDS-3749. Addendum: Fix checkstyle issue.
bharatviswa504 merged pull request #1052: URL: https://github.com/apache/hadoop-ozone/pull/1052 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on pull request #1034: HDDS-3749. Improve OM performance with 3.7% by avoid stream.collect
bharatviswa504 commented on pull request #1034: URL: https://github.com/apache/hadoop-ozone/pull/1034#issuecomment-642305014 Hi @xiaoyuyao This has caused checkstyle issues and PR's are failing with CI run. Posted a PR to fix this https://github.com/apache/hadoop-ozone/pull/1052 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 opened a new pull request #1052: HDDS-3479. Addendum: Fix checkstyle issue.
bharatviswa504 opened a new pull request #1052: URL: https://github.com/apache/hadoop-ozone/pull/1052 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## What is the link to the Apache JIRA (Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HDDS-. Fix a typo in YYY.) Please replace this section with the link to the Apache JIRA) ## How was this patch tested? (Please explain how this patch was tested. Ex: unit tests, manual tests) (If this patch involves UI changes, please attach a screen-shot; otherwise, remove this) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3749) Improve OM performance with 3.7% by avoid stream.collect
[ https://issues.apache.org/jira/browse/HDDS-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao resolved HDDS-3749. -- Fix Version/s: 0.6.0 Resolution: Fixed > Improve OM performance with 3.7% by avoid stream.collect > > > Key: HDDS-3749 > URL: https://issues.apache.org/jira/browse/HDDS-3749 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png, screenshot-5.png > > > I start a ozone cluster with 1000 datanodes and 10 s3gateway, and run two > weeks with heavy workload, and perf om. > !screenshot-1.png! > !screenshot-2.png! > !screenshot-3.png! > !screenshot-4.png! > !screenshot-5.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] xiaoyuyao merged pull request #1034: HDDS-3749. Improve OM performance with 3.7% by avoid stream.collect
xiaoyuyao merged pull request #1034: URL: https://github.com/apache/hadoop-ozone/pull/1034 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #986: HDDS-3476. Use persisted transaction info during OM startup in OM StateMachine.
bharatviswa504 commented on a change in pull request #986: URL: https://github.com/apache/hadoop-ozone/pull/986#discussion_r438435882 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java ## @@ -338,20 +352,19 @@ public void unpause(long newLastAppliedSnaphsotIndex, } /** - * Take OM Ratis snapshot. Write the snapshot index to file. Snapshot index - * is the log index corresponding to the last applied transaction on the OM - * State Machine. + * Take OM Ratis snapshot is a dummy operation as when double buffer + * flushes the lastAppliedIndex is flushed to DB and that is used as + * snapshot index. * * @return the last applied index on the state machine which has been * stored in the snapshot file. */ @Override public long takeSnapshot() throws IOException { -LOG.info("Saving Ratis snapshot on the OM."); -if (ozoneManager != null) { - return ozoneManager.saveRatisSnapshot().getIndex(); -} -return 0; +LOG.info("Current Snapshot Index {}", getLastAppliedTermIndex()); +long lastAppliedIndex = getLastAppliedTermIndex().getIndex(); +ozoneManager.getMetadataManager().getStore().flush(); +return lastAppliedIndex; Review comment: So, that is the reason get lastAppliedIndex first, then flush. If we change the order, it will lead to data loss. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #986: HDDS-3476. Use persisted transaction info during OM startup in OM StateMachine.
bharatviswa504 commented on a change in pull request #986: URL: https://github.com/apache/hadoop-ozone/pull/986#discussion_r438435468 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java ## @@ -338,20 +352,19 @@ public void unpause(long newLastAppliedSnaphsotIndex, } /** - * Take OM Ratis snapshot. Write the snapshot index to file. Snapshot index - * is the log index corresponding to the last applied transaction on the OM - * State Machine. + * Take OM Ratis snapshot is a dummy operation as when double buffer + * flushes the lastAppliedIndex is flushed to DB and that is used as + * snapshot index. * * @return the last applied index on the state machine which has been * stored in the snapshot file. */ @Override public long takeSnapshot() throws IOException { -LOG.info("Saving Ratis snapshot on the OM."); -if (ozoneManager != null) { - return ozoneManager.saveRatisSnapshot().getIndex(); -} -return 0; +LOG.info("Current Snapshot Index {}", getLastAppliedTermIndex()); +long lastAppliedIndex = getLastAppliedTermIndex().getIndex(); +ozoneManager.getMetadataManager().getStore().flush(); +return lastAppliedIndex; Review comment: Why it will lead to data loss. I am returning already flushed index, and ratis only does log purge which have been flushed to DB. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #986: HDDS-3476. Use persisted transaction info during OM startup in OM StateMachine.
bharatviswa504 commented on a change in pull request #986: URL: https://github.com/apache/hadoop-ozone/pull/986#discussion_r438435468 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java ## @@ -338,20 +352,19 @@ public void unpause(long newLastAppliedSnaphsotIndex, } /** - * Take OM Ratis snapshot. Write the snapshot index to file. Snapshot index - * is the log index corresponding to the last applied transaction on the OM - * State Machine. + * Take OM Ratis snapshot is a dummy operation as when double buffer + * flushes the lastAppliedIndex is flushed to DB and that is used as + * snapshot index. * * @return the last applied index on the state machine which has been * stored in the snapshot file. */ @Override public long takeSnapshot() throws IOException { -LOG.info("Saving Ratis snapshot on the OM."); -if (ozoneManager != null) { - return ozoneManager.saveRatisSnapshot().getIndex(); -} -return 0; +LOG.info("Current Snapshot Index {}", getLastAppliedTermIndex()); +long lastAppliedIndex = getLastAppliedTermIndex().getIndex(); +ozoneManager.getMetadataManager().getStore().flush(); +return lastAppliedIndex; Review comment: Why it will lead to data loss. We are returning already flushed index, and ratis only does log purge which have been flushed to DB. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #986: HDDS-3476. Use persisted transaction info during OM startup in OM StateMachine.
bharatviswa504 commented on a change in pull request #986: URL: https://github.com/apache/hadoop-ozone/pull/986#discussion_r438434496 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java ## @@ -338,20 +352,19 @@ public void unpause(long newLastAppliedSnaphsotIndex, } /** - * Take OM Ratis snapshot. Write the snapshot index to file. Snapshot index - * is the log index corresponding to the last applied transaction on the OM - * State Machine. + * Take OM Ratis snapshot is a dummy operation as when double buffer + * flushes the lastAppliedIndex is flushed to DB and that is used as + * snapshot index. * * @return the last applied index on the state machine which has been * stored in the snapshot file. */ @Override public long takeSnapshot() throws IOException { -LOG.info("Saving Ratis snapshot on the OM."); -if (ozoneManager != null) { - return ozoneManager.saveRatisSnapshot().getIndex(); -} -return 0; +LOG.info("Current Snapshot Index {}", getLastAppliedTermIndex()); +long lastAppliedIndex = getLastAppliedTermIndex().getIndex(); +ozoneManager.getMetadataManager().getStore().flush(); Review comment: Done ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java ## @@ -515,13 +528,12 @@ private synchronized void computeAndUpdateLastAppliedIndex( } } - public void updateLastAppliedIndexWithSnaphsotIndex() { + public void updateLastAppliedIndexWithSnaphsotIndex() throws IOException { Review comment: Done ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3168,8 +3172,8 @@ File replaceOMDBWithCheckpoint(long lastAppliedIndex, Path checkpointPath) * All the classes which use/ store MetadataManager should also be updated * with the new MetadataManager instance. */ - void reloadOMState(long newSnapshotIndex, - long newSnapShotTermIndex) throws IOException { + void reloadOMState(long newSnapshotIndex, long newSnapShotTermIndex) + throws IOException { Review comment: Done ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3033,32 +3024,47 @@ public TermIndex installSnapshot(String leaderId) { DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId); Path newDBlocation = omDBcheckpoint.getCheckpointLocation(); -// Check if current ratis log index is smaller than the downloaded -// snapshot index. If yes, proceed by stopping the ratis server so that -// the OM state can be re-initialized. If no, then do not proceed with -// installSnapshot. +LOG.info("Downloaded checkpoint from Leader {}, in to the location {}", +leaderId, newDBlocation); + long lastAppliedIndex = omRatisServer.getLastAppliedTermIndex().getIndex(); -long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex(); -long checkpointSnapshotTermIndex = -omDBcheckpoint.getRatisSnapshotTerm(); -if (checkpointSnapshotIndex <= lastAppliedIndex) { - LOG.error("Failed to install checkpoint from OM leader: {}. The last " + - "applied index: {} is greater than or equal to the checkpoint's" - + " " + - "snapshot index: {}. Deleting the downloaded checkpoint {}", - leaderId, - lastAppliedIndex, checkpointSnapshotIndex, + +// Check if current ratis log index is smaller than the downloaded +// checkpoint transaction index. If yes, proceed by stopping the ratis +// server so that the OM state can be re-initialized. If no, then do not +// proceed with installSnapshot. + +OMTransactionInfo omTransactionInfo = null; + +Path dbDir = newDBlocation.getParent(); +if (dbDir == null) { + LOG.error("Incorrect DB location path {} received from checkpoint.", newDBlocation); - try { -FileUtils.deleteFully(newDBlocation); - } catch (IOException e) { -LOG.error("Failed to fully delete the downloaded DB checkpoint {} " + -"from OM leader {}.", newDBlocation, -leaderId, e); - } return null; } +try { + omTransactionInfo = + OzoneManagerRatisUtils.getTransactionInfoFromDownloadedSnapshot( + configuration, dbDir); +} catch (Exception ex) { + LOG.error("Failed during opening downloaded snapshot from " + + "{} to obtain transaction index", newDBlocation, ex); + return null; +} + +boolean canProceed = +OzoneManagerRatisUtils.verifyTransactionInfo(omTransactionInfo, +lastAppliedIndex, leaderId, newDBlocation); + Review comment: Done. ## File path:
[GitHub] [hadoop-ozone] bharatviswa504 commented on pull request #986: HDDS-3476. Use persisted transaction info during OM startup in OM StateMachine.
bharatviswa504 commented on pull request #986: URL: https://github.com/apache/hadoop-ozone/pull/986#issuecomment-642292921 Thank You @hanishakoneru for the review. I have addressed review comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] hanishakoneru commented on a change in pull request #986: HDDS-3476. Use persisted transaction info during OM startup in OM StateMachine.
hanishakoneru commented on a change in pull request #986: URL: https://github.com/apache/hadoop-ozone/pull/986#discussion_r438432283 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java ## @@ -338,20 +352,19 @@ public void unpause(long newLastAppliedSnaphsotIndex, } /** - * Take OM Ratis snapshot. Write the snapshot index to file. Snapshot index - * is the log index corresponding to the last applied transaction on the OM - * State Machine. + * Take OM Ratis snapshot is a dummy operation as when double buffer + * flushes the lastAppliedIndex is flushed to DB and that is used as + * snapshot index. * * @return the last applied index on the state machine which has been * stored in the snapshot file. */ @Override public long takeSnapshot() throws IOException { -LOG.info("Saving Ratis snapshot on the OM."); -if (ozoneManager != null) { - return ozoneManager.saveRatisSnapshot().getIndex(); -} -return 0; +LOG.info("Current Snapshot Index {}", getLastAppliedTermIndex()); +long lastAppliedIndex = getLastAppliedTermIndex().getIndex(); +ozoneManager.getMetadataManager().getStore().flush(); +return lastAppliedIndex; Review comment: So this needs to be fixed then. Or could lead to data loss. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #986: HDDS-3476. Use persisted transaction info during OM startup in OM StateMachine.
bharatviswa504 commented on a change in pull request #986: URL: https://github.com/apache/hadoop-ozone/pull/986#discussion_r438431531 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java ## @@ -338,20 +352,19 @@ public void unpause(long newLastAppliedSnaphsotIndex, } /** - * Take OM Ratis snapshot. Write the snapshot index to file. Snapshot index - * is the log index corresponding to the last applied transaction on the OM - * State Machine. + * Take OM Ratis snapshot is a dummy operation as when double buffer + * flushes the lastAppliedIndex is flushed to DB and that is used as + * snapshot index. * * @return the last applied index on the state machine which has been * stored in the snapshot file. */ @Override public long takeSnapshot() throws IOException { -LOG.info("Saving Ratis snapshot on the OM."); -if (ozoneManager != null) { - return ozoneManager.saveRatisSnapshot().getIndex(); -} -return 0; +LOG.info("Current Snapshot Index {}", getLastAppliedTermIndex()); +long lastAppliedIndex = getLastAppliedTermIndex().getIndex(); +ozoneManager.getMetadataManager().getStore().flush(); +return lastAppliedIndex; Review comment: Yes. It would not. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #986: HDDS-3476. Use persisted transaction info during OM startup in OM StateMachine.
bharatviswa504 commented on a change in pull request #986: URL: https://github.com/apache/hadoop-ozone/pull/986#discussion_r438407648 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OzoneManagerSnapshotProvider.java ## @@ -112,16 +112,16 @@ public OzoneManagerSnapshotProvider(ConfigurationSource conf, */ public DBCheckpoint getOzoneManagerDBSnapshot(String leaderOMNodeID) throws IOException { -String snapshotFileName = OM_SNAPSHOT_DB + "_" + System.currentTimeMillis(); -File targetFile = new File(omSnapshotDir, snapshotFileName + ".tar.gz"); +String snapshotTime = Long.toString(System.currentTimeMillis()); +String snapshotFileName = Paths.get(omSnapshotDir.getAbsolutePath(), +snapshotTime, OM_DB_NAME).toFile().getAbsolutePath(); +File targetFile = new File(snapshotFileName + ".tar.gz"); Review comment: Still we need this. As We use DBStore. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] hanishakoneru commented on a change in pull request #986: HDDS-3476. Use persisted transaction info during OM startup in OM StateMachine.
hanishakoneru commented on a change in pull request #986: URL: https://github.com/apache/hadoop-ozone/pull/986#discussion_r438353082 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java ## @@ -259,16 +261,25 @@ public void start(OzoneConfiguration configuration) throws IOException { rocksDBConfiguration.setSyncOption(true); } - DBStoreBuilder dbStoreBuilder = DBStoreBuilder.newBuilder(configuration, - rocksDBConfiguration).setName(OM_DB_NAME) - .setPath(Paths.get(metaDir.getPath())); + this.store = loadDB(configuration, metaDir); - this.store = addOMTablesAndCodecs(dbStoreBuilder).build(); + // This value will be used internally, not to be exposed to end users. Review comment: We can remove this comment now. ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3033,32 +3024,47 @@ public TermIndex installSnapshot(String leaderId) { DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId); Path newDBlocation = omDBcheckpoint.getCheckpointLocation(); -// Check if current ratis log index is smaller than the downloaded -// snapshot index. If yes, proceed by stopping the ratis server so that -// the OM state can be re-initialized. If no, then do not proceed with -// installSnapshot. +LOG.info("Downloaded checkpoint from Leader {}, in to the location {}", +leaderId, newDBlocation); + long lastAppliedIndex = omRatisServer.getLastAppliedTermIndex().getIndex(); -long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex(); -long checkpointSnapshotTermIndex = -omDBcheckpoint.getRatisSnapshotTerm(); -if (checkpointSnapshotIndex <= lastAppliedIndex) { - LOG.error("Failed to install checkpoint from OM leader: {}. The last " + - "applied index: {} is greater than or equal to the checkpoint's" - + " " + - "snapshot index: {}. Deleting the downloaded checkpoint {}", - leaderId, - lastAppliedIndex, checkpointSnapshotIndex, + +// Check if current ratis log index is smaller than the downloaded +// checkpoint transaction index. If yes, proceed by stopping the ratis +// server so that the OM state can be re-initialized. If no, then do not +// proceed with installSnapshot. + +OMTransactionInfo omTransactionInfo = null; + +Path dbDir = newDBlocation.getParent(); +if (dbDir == null) { + LOG.error("Incorrect DB location path {} received from checkpoint.", newDBlocation); - try { -FileUtils.deleteFully(newDBlocation); - } catch (IOException e) { -LOG.error("Failed to fully delete the downloaded DB checkpoint {} " + -"from OM leader {}.", newDBlocation, -leaderId, e); - } return null; } +try { + omTransactionInfo = + OzoneManagerRatisUtils.getTransactionInfoFromDownloadedSnapshot( + configuration, dbDir); +} catch (Exception ex) { + LOG.error("Failed during opening downloaded snapshot from " + + "{} to obtain transaction index", newDBlocation, ex); + return null; +} + +boolean canProceed = +OzoneManagerRatisUtils.verifyTransactionInfo(omTransactionInfo, +lastAppliedIndex, leaderId, newDBlocation); + Review comment: The lastAppliedIndex could have been updated between its assignment and the canProceed check. This check should be synchronous. Or at least the assignment should happen after reading the transactionInfo from DB. ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3168,8 +3172,8 @@ File replaceOMDBWithCheckpoint(long lastAppliedIndex, Path checkpointPath) * All the classes which use/ store MetadataManager should also be updated * with the new MetadataManager instance. */ - void reloadOMState(long newSnapshotIndex, - long newSnapShotTermIndex) throws IOException { + void reloadOMState(long newSnapshotIndex, long newSnapShotTermIndex) + throws IOException { Review comment: NIT: SnapShot -> Snapshot ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OzoneManagerSnapshotProvider.java ## @@ -112,16 +112,16 @@ public OzoneManagerSnapshotProvider(ConfigurationSource conf, */ public DBCheckpoint getOzoneManagerDBSnapshot(String leaderOMNodeID) throws IOException { -String snapshotFileName = OM_SNAPSHOT_DB + "_" + System.currentTimeMillis(); -File targetFile = new File(omSnapshotDir, snapshotFileName + ".tar.gz"); +String snapshotTime = Long.toString(System.currentTimeMillis()); +String snapshotFileName = Paths.get(omSnapshotDir.getAbsolutePath(), +snapshotTime,
[jira] [Commented] (HDDS-1134) OzoneFileSystem#create should allocate alteast one block for future writes.
[ https://issues.apache.org/jira/browse/HDDS-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132701#comment-17132701 ] Bharat Viswanadham commented on HDDS-1134: -- Hi [~msingh] I see this is being handled in OzoneManager, if the length passed is zero, we allocate at least one block. Code link. [#codelink|https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeyCreateRequest.java#L104] > OzoneFileSystem#create should allocate alteast one block for future writes. > --- > > Key: HDDS-1134 > URL: https://issues.apache.org/jira/browse/HDDS-1134 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: TriagePending > Fix For: 0.6.0 > > Attachments: HDDS-1134.001.patch > > > While opening a new key, OM should at least allocate one block for the key, > this should be done in case the client is not sure about the number of block. > However for users of OzoneFS, if the key is being created for a directory, > then no blocks should be allocated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1134) OzoneFileSystem#create should allocate alteast one block for future writes.
[ https://issues.apache.org/jira/browse/HDDS-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-1134. -- Fix Version/s: 0.6.0 Resolution: Fixed This has been already fixed. Right now, we allocate at least one block in createKey call. This has been taken care of during the OM HA refactor. > OzoneFileSystem#create should allocate alteast one block for future writes. > --- > > Key: HDDS-1134 > URL: https://issues.apache.org/jira/browse/HDDS-1134 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: TriagePending > Fix For: 0.6.0 > > Attachments: HDDS-1134.001.patch > > > While opening a new key, OM should at least allocate one block for the key, > this should be done in case the client is not sure about the number of block. > However for users of OzoneFS, if the key is being created for a directory, > then no blocks should be allocated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] smengcl edited a comment on pull request #1046: HDDS-3767. [OFS] Address merge conflicts after HDDS-3627
smengcl edited a comment on pull request #1046: URL: https://github.com/apache/hadoop-ozone/pull/1046#issuecomment-642120503 > Thanks this patch @smengcl (And sorry for the master changes, we worked paralell.) > > Understanding this PR is really challenging. Finally, I fetched the PR branch and compared with the master and everything seems to be the right place. Thanks for the review @elek . Actually I put a [compare link](https://github.com/smengcl/hadoop-ozone/compare/HDDS-2665-ofs...HDDS-3767) in a previous comment which should have make the review easier in theory. > > One question: Why did you deleted `TestRootedOzoneFileSystemWithMocks.java`? I removed `TestRootedOzoneFileSystemWithMocks.java` because HDDS-3627 removed `TestOzoneFileSystemWithMocks.java`. I have just restored `TestRootedOzoneFileSystemWithMocks.java` under `ozonefs`. > > And one comment: `META-INF/services/...FileSystem` entries can be created for ofs, too. (In the future) I believe we could only put one implementation [here](https://github.com/apache/hadoop-ozone/blob/072370b947416d89fae11d00a84a1d9a6b31beaa/hadoop-ozone/ozonefs/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem#L16)? Maybe later we can replace `org.apache.hadoop.fs.ozone.OzoneFileSystem` with `org.apache.hadoop.fs.ozone.RootedOzoneFileSystem`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3685) Remove replay logic from actual request logic
[ https://issues.apache.org/jira/browse/HDDS-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-3685: - Priority: Critical (was: Major) > Remove replay logic from actual request logic > - > > Key: HDDS-3685 > URL: https://issues.apache.org/jira/browse/HDDS-3685 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Critical > > HDDS-3476 used the transaction info persisted in OM DB during double buffer > flush when OM is restarted. This transaction info log index and the term are > used as a snapshot index. So, we can remove the replay logic from actual > request logic. (As now we shall never have the transaction which is applied > to OM DB will never be again replayed to DB) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-3707) UUID can be non unique for a huge samples
[ https://issues.apache.org/jira/browse/HDDS-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132675#comment-17132675 ] Arpit Agarwal edited comment on HDDS-3707 at 6/10/20, 7:52 PM: --- Hi [~maobaolong] the probability is so infinitesimal I don't think it is worth trying to change it. :) was (Author: arpitagarwal): Hi [~maobaolong] the probability is so infinitesimal I don't think it is worth trying to fix it. :) > UUID can be non unique for a huge samples > - > > Key: HDDS-3707 > URL: https://issues.apache.org/jira/browse/HDDS-3707 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, Ozone Manager, SCM >Affects Versions: 0.7.0 >Reporter: maobaolong >Priority: Minor > Labels: Triaged > > Now, we have used UUID as id for many places, for example, DataNodeId, > pipelineId. I believe that it should be pretty less chance to met collision, > but, if met the collision, we are in trouble. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3707) UUID can be non unique for a huge samples
[ https://issues.apache.org/jira/browse/HDDS-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132675#comment-17132675 ] Arpit Agarwal commented on HDDS-3707: - Hi [~maobaolong] the probability is so infinitesimal I don't think it is worth trying to fix it. :) > UUID can be non unique for a huge samples > - > > Key: HDDS-3707 > URL: https://issues.apache.org/jira/browse/HDDS-3707 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, Ozone Manager, SCM >Affects Versions: 0.7.0 >Reporter: maobaolong >Priority: Minor > Labels: Triaged > > Now, we have used UUID as id for many places, for example, DataNodeId, > pipelineId. I believe that it should be pretty less chance to met collision, > but, if met the collision, we are in trouble. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3639) Maintain FileHandle Information in OMMetadataManager
[ https://issues.apache.org/jira/browse/HDDS-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru resolved HDDS-3639. -- Resolution: Fixed > Maintain FileHandle Information in OMMetadataManager > > > Key: HDDS-3639 > URL: https://issues.apache.org/jira/browse/HDDS-3639 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Filesystem >Reporter: Prashant Pogde >Assignee: Prashant Pogde >Priority: Major > Labels: pull-request-available > > Maintain FileHandle Information in OMMetadataManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3775) Add documentation for flame graph
Wei-Chiu Chuang created HDDS-3775: - Summary: Add documentation for flame graph Key: HDDS-3775 URL: https://issues.apache.org/jira/browse/HDDS-3775 Project: Hadoop Distributed Data Store Issue Type: Task Reporter: Wei-Chiu Chuang HDDS-1116 added flame graph but looks like there's no documentation to enable it. To enable it, add configuration hdds.profiler.endpoint.enabled = true to ozone-site.xml download the profiler from https://github.com/jvm-profiling-tools/async-profiler to a local directory, say /tmp and start the DataNode with java system property -Dasync.profiler.home=/tmp or environment variable $ASYNC_PROFILER_HOME and then go to the datanode servlet, say dn1:9883/prof to see the graph. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] vivekratnavel commented on pull request #1047: HDDS-3726. Upload code coverage data to Codecov and enable checks in …
vivekratnavel commented on pull request #1047: URL: https://github.com/apache/hadoop-ozone/pull/1047#issuecomment-642162869 @elek Thanks for the review and merge! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] maobaolong commented on pull request #1051: Redundancy if condition code in ListPipelinesSubcommand
maobaolong commented on pull request #1051: URL: https://github.com/apache/hadoop-ozone/pull/1051#issuecomment-642148224 @bhemanthkumar Thanks for working on this, please fix the style problem. Also, please update the description from the given template. Reference this PR. https://github.com/apache/hadoop-ozone/pull/920 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] maobaolong commented on a change in pull request #1051: Redundancy if condition code in ListPipelinesSubcommand
maobaolong commented on a change in pull request #1051: URL: https://github.com/apache/hadoop-ozone/pull/1051#discussion_r438285600 ## File path: hadoop-hdds/tools/src/main/java/org/apache/hadoop/hdds/scm/cli/pipeline/ListPipelinesSubcommand.java ## @@ -54,17 +54,13 @@ @Override public Void call() throws Exception { try (ScmClient scmClient = parent.getParent().createScmClient()) { - if (Strings.isNullOrEmpty(factor) && Strings.isNullOrEmpty(state)) { -scmClient.listPipelines().forEach(System.out::println); - } else { -scmClient.listPipelines().stream() -.filter(p -> ((Strings.isNullOrEmpty(factor) || -(p.getFactor().toString().compareToIgnoreCase(factor) == 0)) -&& (Strings.isNullOrEmpty(state) || -(p.getPipelineState().toString().compareToIgnoreCase(state) -== 0 + scmClient.listPipelines().stream() Review comment: Please reduce the indent to fix the checkstyle failure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3726) Upload code coverage to Codecov and enable checks in PR workflow of Github Actions
[ https://issues.apache.org/jira/browse/HDDS-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Elek updated HDDS-3726: -- Fix Version/s: 0.6.0 > Upload code coverage to Codecov and enable checks in PR workflow of Github > Actions > -- > > Key: HDDS-3726 > URL: https://issues.apache.org/jira/browse/HDDS-3726 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: build >Affects Versions: 0.6.0 >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > > HDDS-3170 aggregates code coverage across all components. We need to upload > the reports to codecov to be able to keep track of coverage and coverage > diffs to be able to tell if a PR does not do a good job on writing unit tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] smengcl commented on pull request #1046: HDDS-3767. [OFS] Address merge conflicts after HDDS-3627
smengcl commented on pull request #1046: URL: https://github.com/apache/hadoop-ozone/pull/1046#issuecomment-642120503 > Thanks this patch @smengcl (And sorry for the master changes, we worked paralell.) > > Understanding this PR is really challenging. Finally, I fetched the PR branch and compared with the master and everything seems to be the right place. Thanks for the review @elek . Actually I put a [compare link](https://github.com/smengcl/hadoop-ozone/compare/HDDS-2665-ofs...HDDS-3767) in a previous comment which should have make the review easier in theory. > > One question: Why did you deleted `TestRootedOzoneFileSystemWithMocks.java`? I removed `TestRootedOzoneFileSystemWithMocks.java` because HDDS-3627 removed `TestOzoneFileSystemWithMocks.java`. Shall we put it back? > > And one comment: `META-INF/services/...FileSystem` entries can be created for ofs, too. (In the future) I believe we could only put one implementation [here](https://github.com/apache/hadoop-ozone/blob/072370b947416d89fae11d00a84a1d9a6b31beaa/hadoop-ozone/ozonefs/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem#L16)? Maybe later we can replace `org.apache.hadoop.fs.ozone.OzoneFileSystem` with `org.apache.hadoop.fs.ozone.RootedOzoneFileSystem`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-3747) Remove the redundancy if condition code in ListPipelinesSubcommand
[ https://issues.apache.org/jira/browse/HDDS-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130830#comment-17130830 ] hemanthboyina edited comment on HDDS-3747 at 6/10/20, 4:02 PM: --- raised a PR : [https://github.com/apache/hadoop-ozone/pull/1051] please review was (Author: hemanthboyina): raised a PR : [https://github.com/apache/hadoop-ozone/pull/1051] > Remove the redundancy if condition code in ListPipelinesSubcommand > -- > > Key: HDDS-3747 > URL: https://issues.apache.org/jira/browse/HDDS-3747 > Project: Hadoop Distributed Data Store > Issue Type: New Feature > Components: Ozone CLI >Affects Versions: 0.7.0 >Reporter: maobaolong >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3747) Remove the redundancy if condition code in ListPipelinesSubcommand
[ https://issues.apache.org/jira/browse/HDDS-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130830#comment-17130830 ] hemanthboyina commented on HDDS-3747: - raised a PR : [https://github.com/apache/hadoop-ozone/pull/1051] > Remove the redundancy if condition code in ListPipelinesSubcommand > -- > > Key: HDDS-3747 > URL: https://issues.apache.org/jira/browse/HDDS-3747 > Project: Hadoop Distributed Data Store > Issue Type: New Feature > Components: Ozone CLI >Affects Versions: 0.7.0 >Reporter: maobaolong >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bhemanthkumar opened a new pull request #1051: Redundancy if condition code in ListPipelinesSubcommand
bhemanthkumar opened a new pull request #1051: URL: https://github.com/apache/hadoop-ozone/pull/1051 Remove the redundancy if condition code in ListPipelinesSubcommand ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## What is the link to the Apache JIRA (Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HDDS-. Fix a typo in YYY.) Please replace this section with the link to the Apache JIRA) ## How was this patch tested? (Please explain how this patch was tested. Ex: unit tests, manual tests) (If this patch involves UI changes, please attach a screen-shot; otherwise, remove this) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3512) s3g multi-upload saved content incorrect when client uses aws java sdk 1.11.* jar
[ https://issues.apache.org/jira/browse/HDDS-3512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130792#comment-17130792 ] Marton Elek commented on HDDS-3512: --- Is it still a problem? I tried to reproduce it with freon: {code:java} ozone freon s3kg -e http://s3g:9878 -n 10 -s 5242880 {code} But chunk files created with the same size: {code:java} -rw-r--r-- 1 hadoop users 5242880 Jun 10 15:23 ./hdds/hdds/15292b6c-34d6-48d0-bb97-f58043767ade/current/containerDir0/1/chunks/104320375385031656.block -rw-r--r-- 1 hadoop users 5242880 Jun 10 15:23 ./hdds/hdds/15292b6c-34d6-48d0-bb97-f58043767ade/current/containerDir0/1/chunks/104320375385031657.block -rw-r--r-- 1 hadoop users 5242880 Jun 10 15:23 ./hdds/hdds/15292b6c-34d6-48d0-bb97-f58043767ade/current/containerDir0/1/chunks/104320375385031658.block -rw-r--r-- 1 hadoop users 5242880 Jun 10 15:23 ./hdds/hdds/15292b6c-34d6-48d0-bb97-f58043767ade/current/containerDir0/1/chunks/104320375385162731.block -rw-r--r-- 1 hadoop users 5242880 Jun 10 15:23 ./hdds/hdds/15292b6c-34d6-48d0-bb97-f58043767ade/current/containerDir0/1/chunks/104320375385424876.block -rw-r--r-- 1 hadoop users 5242880 Jun 10 15:23 ./hdds/hdds/15292b6c-34d6-48d0-bb97-f58043767ade/current/containerDir0/1/chunks/104320375385424877.block -rw-r--r-- 1 hadoop users 5242880 Jun 10 15:23 ./hdds/hdds/15292b6c-34d6-48d0-bb97-f58043767ade/current/containerDir0/1/chunks/104320375385424878.block -rw-r--r-- 1 hadoop users 5242880 Jun 10 15:23 ./hdds/hdds/15292b6c-34d6-48d0-bb97-f58043767ade/current/containerDir0/1/chunks/104320375385424879.block -rw-r--r-- 1 hadoop users 5242880 Jun 10 15:23 ./hdds/hdds/15292b6c-34d6-48d0-bb97-f58043767ade/current/containerDir0/1/chunks/104320375385490416.block -rw-r--r-- 1 hadoop users 5242880 Jun 10 15:23 ./hdds/hdds/15292b6c-34d6-48d0-bb97-f58043767ade/current/containerDir0/1/chunks/10432037538953.block {code} > s3g multi-upload saved content incorrect when client uses aws java sdk 1.11.* > jar > - > > Key: HDDS-3512 > URL: https://issues.apache.org/jira/browse/HDDS-3512 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: S3 >Reporter: Sammi Chen >Assignee: Marton Elek >Priority: Blocker > Labels: TriagePending > > The default multi-part size is 5MB, which is 5242880 byte, while all the > chunks saved by s3g is 5246566 byte which is greater than 5MB. > By looking into the ObjectEndpoint.java, it seems the chunk size is retrieved > from the "Content-Length" header. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] nandakumar131 commented on pull request #1048: HDDS-3481. SCM ask too many datanodes to replicate the same container
nandakumar131 commented on pull request #1048: URL: https://github.com/apache/hadoop-ozone/pull/1048#issuecomment-642077756 @runzhiwang was this PR closed by accident? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3481) SCM ask too many datanodes to replicate the same container
[ https://issues.apache.org/jira/browse/HDDS-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130772#comment-17130772 ] Nanda kumar commented on HDDS-3481: --- [~yjxxtd], [~xyao], [~elek], [~sodonnell] I completely agree. We need some kind of balancing and throttling in SCM. Created HDDS-3774 for the same. > SCM ask too many datanodes to replicate the same container > -- > > Key: HDDS-3481 > URL: https://issues.apache.org/jira/browse/HDDS-3481 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Blocker > Labels: TriagePending, pull-request-available > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > > *What's the problem ?* > As the image shows, scm ask 31 datanodes to replicate container 2037 every > 10 minutes from 2020-04-17 23:38:51. And at 2020-04-18 08:58:52 scm find the > replicate num of container 2037 is 12, then it ask 11 datanodes to delete > container 2037. > !screenshot-1.png! > !screenshot-2.png! > *What's the reason ?* > scm check whether (container replicates num + > inflightReplication.get(containerId).size() - > inflightDeletion.get(containerId).size()) is less than 3. If less than 3, it > will ask some datanode to replicate the container, and add the action into > inflightReplication.get(containerId). The replicate action time out is 10 > minutes, if action timeout, scm will delete the action from > inflightReplication.get(containerId) as the image shows. Then (container > replicates num + inflightReplication.get(containerId).size() - > inflightDeletion.get(containerId).size()) is less than 3 again, and scm ask > another datanode to replicate the container. > Because replicate container cost a long time, sometimes it cannot finish in > 10 minutes, thus 31 datanodes has to replicate the container every 10 > minutes. 19 of 31 datanodes replicate container from the same source > datanode, it will also cause big pressure on the source datanode and > replicate container become slower. Actually it cost 4 hours to finish the > first replicate. > !screenshot-4.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3774) Throttle replication commands sent to datanode
Nanda kumar created HDDS-3774: - Summary: Throttle replication commands sent to datanode Key: HDDS-3774 URL: https://issues.apache.org/jira/browse/HDDS-3774 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Reporter: Nanda kumar Assignee: Nanda kumar The Replication/Deletion command sent by SCM to datanode should be throttled and controlled by SCM. * SCM should consider the load on datanode before sending any command. * If network topology is configured, SCM should use it for sorting the source datanode for replication. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3512) s3g multi-upload saved content incorrect when client uses aws java sdk 1.11.* jar
[ https://issues.apache.org/jira/browse/HDDS-3512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Elek reassigned HDDS-3512: - Assignee: Marton Elek > s3g multi-upload saved content incorrect when client uses aws java sdk 1.11.* > jar > - > > Key: HDDS-3512 > URL: https://issues.apache.org/jira/browse/HDDS-3512 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: S3 >Reporter: Sammi Chen >Assignee: Marton Elek >Priority: Blocker > Labels: TriagePending > > The default multi-part size is 5MB, which is 5242880 byte, while all the > chunks saved by s3g is 5246566 byte which is greater than 5MB. > By looking into the ObjectEndpoint.java, it seems the chunk size is retrieved > from the "Content-Length" header. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2447) Allow datanodes to operate with simulated containers
[ https://issues.apache.org/jira/browse/HDDS-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-2447: -- Target Version/s: 0.7.0 (was: 0.6.0) > Allow datanodes to operate with simulated containers > > > Key: HDDS-2447 > URL: https://issues.apache.org/jira/browse/HDDS-2447 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Stephen O'Donnell >Priority: Major > Labels: TriagePending > > The Storage Container Manager (SCM) generally deals with datanodes and > containers. Datanodes report their containers via container reports and the > SCM keeps track of them, schedules new replicas to be created when needed > etc. SCM does not care about individual blocks within the containers (aside > from deleting them) or keys. Therefore it should be possible to scale test > much of SCM without OM or worrying about writing keys. > In order to scale test SCM and some of its internal features like like > decommission, maintenance mode and the replication manager, it would be > helpful to quickly create clusters with many containers, without needing to > go through a data loading exercise. > What I imagine happening is: > * We generate a list of container IDs and container sizes - this could be a > fixed size or configured size for all containers. We could also fix the > number of blocks / chunks inside a 'generated simulated container' so they > are all the same. > * When the Datanode starts, if it has simulated containers enabled, it would > optionally look for this list of containers and load the meta data into > memory. Then it would report the containers to SCM as normal, and the SCM > would believe the containers actually exist. > * If SCM creates a new container, then the datanode should create the > meta-data in memory, but not write anything to disk. > * If SCM instructs a DN to replicate a container, then we should stream > simulated data over the wire equivalent to the container size, but again > throw away the data at the receiving side and store only the metadata in > datanode memory. > * It would be acceptable for a DN restart to forget all containers and > re-load them from the generated list. A nice-to-have feature would persist > any changes to disk somehow so a DN restart would return to its pre-restart > state. > At this stage, I am not too concerned about OM, or clients trying to read > chunks out of these simulated containers (my focus is on SCM at the moment), > but it would be great if that were possible too. > I believe this feature would let us do scale testing of SCM and benchmark > some dead node / replication / decommission scenarios on clusters with much > reduced hardware requirements. > It would also allow clusters with a large number of containers to be created > quickly, rather than going through a dataload exercise. > This would open the door to a tool similar to > https://github.com/linkedin/dynamometer which uses simulated storage on HDFS > to perform scale tests against the namenode with reduced hardware > requirements. > HDDS-1094 added the ability to have a level of simulated storage on a > datanode. In that Jira, when a client writes data to a chunk the data is > thrown away and nothing is written to disk. If a client later tries to read > the data back, it just gets zeroed byte buffers. Hopefully this Jira could > build on that feature to fully simulate the containers from the SCM point of > view and later we can extend to allowing clients to create keys etc too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2449) Delete block command should use a thread pool
[ https://issues.apache.org/jira/browse/HDDS-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-2449: -- Target Version/s: 0.7.0 (was: 0.6.0) > Delete block command should use a thread pool > - > > Key: HDDS-2449 > URL: https://issues.apache.org/jira/browse/HDDS-2449 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Affects Versions: 0.6.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: TriagePending > > The datanode receives commands over the heartbeat and queues all commands on > a single queue in StateContext.commandQueue. Inside DatanodeStateMachine a > single thread is used to process this queue (started by initCommandHander > thread) and it passes each command to a ‘handler’. Each command type has its > own handler. > The delete block command immediately executes the command on the thread used > to process the command queue. Therefore if the delete is slow for some reason > (it must access disk, so this is possible) it could cause other commands to > backup. > This should be changed to use a threadpool to queue the deleteBlock command, > in a similar way to ReplicateContainerCommand. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek commented on pull request #1019: HDDS-3679. Add unit tests for PipelineManagerV2.
elek commented on pull request #1019: URL: https://github.com/apache/hadoop-ozone/pull/1019#issuecomment-642035752 > @elek Shall we make a separate commit to upgrade rocksdb version? I am open for both approaches, but seems to be a good idea to do it on `master`, too. I am +1, in advance, if the build is green ;-) (But we can also add it to here temporary, to check if it helps...) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3773) Add OMDBDefinition to define structure of om.db
[ https://issues.apache.org/jira/browse/HDDS-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sadanand Shenoy reassigned HDDS-3773: - Assignee: Sadanand Shenoy > Add OMDBDefinition to define structure of om.db > --- > > Key: HDDS-3773 > URL: https://issues.apache.org/jira/browse/HDDS-3773 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Tools >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Minor > > The rocksdb tool to display data from a db file uses implementations of > DBDefinition class which specifically describes the structure and type of the > db file. To support the tool for om.db this class is defined. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3773) Add OMDBDefinition to define structure of om.db
Sadanand Shenoy created HDDS-3773: - Summary: Add OMDBDefinition to define structure of om.db Key: HDDS-3773 URL: https://issues.apache.org/jira/browse/HDDS-3773 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: Tools Reporter: Sadanand Shenoy The rocksdb tool to display data from a db file uses implementations of DBDefinition class which specifically describes the structure and type of the db file. To support the tool for om.db this class is defined. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek merged pull request #1043: HDDS-3760. Avoid UUID#toString call in Pipeline#getProtobufMessage
elek merged pull request #1043: URL: https://github.com/apache/hadoop-ozone/pull/1043 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] lokeshj1703 commented on a change in pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.
lokeshj1703 commented on a change in pull request #1005: URL: https://github.com/apache/hadoop-ozone/pull/1005#discussion_r438144376 ## File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java ## @@ -281,15 +281,33 @@ public static final String DFS_CONTAINER_RATIS_DATANODE_STORAGE_DIR = "dfs.container.ratis.datanode.storage.dir"; - public static final String DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_KEY = - ScmConfigKeys.DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_KEY; - public static final int DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_DEFAULT = - ScmConfigKeys.DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_DEFAULT; - public static final String DFS_RATIS_CLIENT_REQUEST_RETRY_INTERVAL_KEY = - ScmConfigKeys.DFS_RATIS_CLIENT_REQUEST_RETRY_INTERVAL_KEY; + Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] lokeshj1703 commented on a change in pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.
lokeshj1703 commented on a change in pull request #1005: URL: https://github.com/apache/hadoop-ozone/pull/1005#discussion_r438144281 ## File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/RatisHelper.java ## @@ -269,23 +282,76 @@ static GrpcTlsConfig createTlsClientConfig(SecurityConfig conf, return tlsConfig; } Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] lokeshj1703 commented on a change in pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.
lokeshj1703 commented on a change in pull request #1005: URL: https://github.com/apache/hadoop-ozone/pull/1005#discussion_r438143919 ## File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/RatisHelper.java ## @@ -269,23 +282,76 @@ static GrpcTlsConfig createTlsClientConfig(SecurityConfig conf, return tlsConfig; } - static RetryPolicy createRetryPolicy(ConfigurationSource conf) { -int maxRetryCount = -conf.getInt(OzoneConfigKeys.DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_KEY, + public static RetryPolicy createRetryPolicy(ConfigurationSource conf) { +ExponentialBackoffRetry exponentialBackoffRetry = +createExponentialBackoffPolicy(conf); +MultipleLinearRandomRetry multipleLinearRandomRetry = +MultipleLinearRandomRetry.parseCommaSeparated(conf.get( +OzoneConfigKeys.DFS_RATIS_CLIENT_MULTILINEAR_RANDOM_RETRY_POLICY, OzoneConfigKeys. -DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_DEFAULT); -long retryInterval = conf.getTimeDuration(OzoneConfigKeys. -DFS_RATIS_CLIENT_REQUEST_RETRY_INTERVAL_KEY, OzoneConfigKeys. -DFS_RATIS_CLIENT_REQUEST_RETRY_INTERVAL_DEFAULT -.toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS); -TimeDuration sleepDuration = -TimeDuration.valueOf(retryInterval, TimeUnit.MILLISECONDS); -RetryPolicy retryPolicy = RetryPolicies -.retryUpToMaximumCountWithFixedSleep(maxRetryCount, sleepDuration); -return retryPolicy; +DFS_RATIS_CLIENT_MULTILINEAR_RANDOM_RETRY_POLICY_DEFAULT)); + +long writeTimeout = conf.getTimeDuration( +OzoneConfigKeys.DFS_RATIS_CLIENT_REQUEST_WRITE_TIMEOUT, OzoneConfigKeys. +DFS_RATIS_CLIENT_REQUEST_WRITE_TIMEOUT_DEFAULT +.toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS); +long watchTimeout = conf.getTimeDuration( +OzoneConfigKeys.DFS_RATIS_CLIENT_REQUEST_WATCH_TIMEOUT, OzoneConfigKeys. +DFS_RATIS_CLIENT_REQUEST_WATCH_TIMEOUT_DEFAULT +.toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS); + +return RequestTypeDependentRetryPolicy.newBuilder() +.setRetryPolicy(RaftProtos.RaftClientRequestProto.TypeCase.WRITE, +createExceptionDependentPolicy(exponentialBackoffRetry, +multipleLinearRandomRetry, exponentialBackoffRetry)) +.setRetryPolicy(RaftProtos.RaftClientRequestProto.TypeCase.WATCH, +createExceptionDependentPolicy(exponentialBackoffRetry, +multipleLinearRandomRetry, RetryPolicies.noRetry())) +.setTimeout(RaftProtos.RaftClientRequestProto.TypeCase.WRITE, +TimeDuration.valueOf(writeTimeout, TimeUnit.MILLISECONDS)) +.setTimeout(RaftProtos.RaftClientRequestProto.TypeCase.WATCH, +TimeDuration.valueOf(watchTimeout, TimeUnit.MILLISECONDS)) +.build(); + } + + private static ExponentialBackoffRetry createExponentialBackoffPolicy( + ConfigurationSource conf) { +long exponentialBaseSleep = conf.getTimeDuration( +OzoneConfigKeys.DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_BASE_SLEEP, +OzoneConfigKeys.DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_BASE_SLEEP_DEFAULT +.toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS); +long exponentialMaxSleep = conf.getTimeDuration( +OzoneConfigKeys.DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_MAX_SLEEP, +OzoneConfigKeys. +DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_MAX_SLEEP_DEFAULT +.toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS); +return ExponentialBackoffRetry.newBuilder() +.setBaseSleepTime( +TimeDuration.valueOf(exponentialBaseSleep, TimeUnit.MILLISECONDS)) +.setMaxSleepTime( +TimeDuration.valueOf(exponentialMaxSleep, TimeUnit.MILLISECONDS)) +.build(); + } + + private static ExceptionDependentRetry createExceptionDependentPolicy( + ExponentialBackoffRetry exponentialBackoffRetry, + MultipleLinearRandomRetry multipleLinearRandomRetry, Review comment: RaftLogIOException is never received at raft client. I have added AlreadyClosedException. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3622) Implement rocksdb tool to parse scm db
[ https://issues.apache.org/jira/browse/HDDS-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sadanand Shenoy resolved HDDS-3622. --- Target Version/s: 0.6.0 Resolution: Resolved > Implement rocksdb tool to parse scm db > --- > > Key: HDDS-3622 > URL: https://issues.apache.org/jira/browse/HDDS-3622 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Tools >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > This tool parses content from scm.db file and displays specified table > contents. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3622) Implement rocksdb tool to parse scm db
[ https://issues.apache.org/jira/browse/HDDS-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sadanand Shenoy updated HDDS-3622: -- Component/s: Tools > Implement rocksdb tool to parse scm db > --- > > Key: HDDS-3622 > URL: https://issues.apache.org/jira/browse/HDDS-3622 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Tools >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > This tool parses content from scm.db file and displays specified table > contents. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] sadanand48 closed pull request #864: HDDS-3405. Tool for Listing keys from the OpenKeyTable
sadanand48 closed pull request #864: URL: https://github.com/apache/hadoop-ozone/pull/864 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] sadanand48 edited a comment on pull request #864: HDDS-3405. Tool for Listing keys from the OpenKeyTable
sadanand48 edited a comment on pull request #864: URL: https://github.com/apache/hadoop-ozone/pull/864#issuecomment-642019017 > Do we need this patch? It seems to be more easy to extend #945 / [HDDS-3622](https://issues.apache.org/jira/browse/HDDS-3622) with supporting OM... > > What do you think? Yes . I will close the pr. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] sadanand48 commented on pull request #864: HDDS-3405. Tool for Listing keys from the OpenKeyTable
sadanand48 commented on pull request #864: URL: https://github.com/apache/hadoop-ozone/pull/864#issuecomment-642019017 > Do we need this patch? It seems to be more easy to extend #945 / [HDDS-3622](https://issues.apache.org/jira/browse/HDDS-3622) with supporting OM... > > What do you think? Yes . I will close the pr. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3757) Add test coverage of the acceptance tests to overall test coverage
[ https://issues.apache.org/jira/browse/HDDS-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3757: - Labels: pull-request-available (was: ) > Add test coverage of the acceptance tests to overall test coverage > --- > > Key: HDDS-3757 > URL: https://issues.apache.org/jira/browse/HDDS-3757 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > Labels: pull-request-available > > Acceptance test coverage should be added to the generic coverage numbers. We > have a lot of important tests there... -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek opened a new pull request #1050: HDDS-3757. Add test coverage of the acceptance tests to overall test coverage
elek opened a new pull request #1050: URL: https://github.com/apache/hadoop-ozone/pull/1050 ## What changes were proposed in this pull request? This patch adds the coverage data from the acceptance test to the generic coverage measurement. There was one question during the implementation: I decided to add the required HADOOP_OPTS to all the docker-compose file without using tricky docker-compose extension. I found that I need to add a few lines anyway, and I preferred to keep it simple, even if a possible change would require slightly more work (but can be done with easy search and replace) ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-3757 ## How was this patch tested? Pushed the branch to apache repo and checked sonar cloud. https://sonarcloud.io/dashboard?branch=HDDS-3757=hadoop-ozone This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek commented on pull request #864: HDDS-3405. Tool for Listing keys from the OpenKeyTable
elek commented on pull request #864: URL: https://github.com/apache/hadoop-ozone/pull/864#issuecomment-642015500 Do we need this patch? It seems to be more easy to extend #945 / HDDS-3622 with supporting OM... What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3772) Add LOG to S3ErrorTable for easier problem locating
Sammi Chen created HDDS-3772: Summary: Add LOG to S3ErrorTable for easier problem locating Key: HDDS-3772 URL: https://issues.apache.org/jira/browse/HDDS-3772 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Sammi Chen Assignee: Sammi Chen -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g
elek commented on pull request #1031: URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-642010606 Thanks the patch @runzhiwang It looks correct me, but it's also a question about the long-term usage of `getServiceInfo`. Originally it was introduced (as far as I remember) to get the address of the SCM. But over the time the client is improved to avoid all the direct calls to the SCM. I agree, that long-term we should use proxy users for S3 and pooling the connections. Short-term this patch looks good to me, but why don't we use the `getServiceInfo` in case of secure clusters? Do we need to replace it with something simple. I would be interested about the opinion of @nandakumar131. As far as I remember he worked on the original implementation. Personally I think a generic `getServiceClient` can be useful. For example active `storage-class`-es can be downloaded from the server at the beginning of the connection. But that's a long term plan and this patch can help short term. Let's wait for more opinions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3750) Improve SCM performance with 3.2% by avoid stream.collect
[ https://issues.apache.org/jira/browse/HDDS-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Elek resolved HDDS-3750. --- Fix Version/s: 0.6.0 Resolution: Fixed > Improve SCM performance with 3.2% by avoid stream.collect > - > > Key: HDDS-3750 > URL: https://issues.apache.org/jira/browse/HDDS-3750 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png > > > I start a ozone cluster with 1000 datanodes and 10 s3gateway, and run two > weeks with heavy workload, and perf scm. > !screenshot-1.png! > !screenshot-2.png! > !screenshot-3.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek merged pull request #1035: HDDS-3750. Improve SCM performance with 3.2% by avoid stream.collect
elek merged pull request #1035: URL: https://github.com/apache/hadoop-ozone/pull/1035 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] runzhiwang closed pull request #1048: HDDS-3481. SCM ask too many datanodes to replicate the same container
runzhiwang closed pull request #1048: URL: https://github.com/apache/hadoop-ozone/pull/1048 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3481) SCM ask too many datanodes to replicate the same container
[ https://issues.apache.org/jira/browse/HDDS-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130640#comment-17130640 ] Stephen O'Donnell commented on HDDS-3481: - I saw this problem a long time back in theory from reading the code. I don't think it is a good idea for SCM to hand out all the replication work immediately. After SCM passes out the commands, it loses the ability to adjust the work later. It is effectively flooding downstream workers who have no ability to provide back pressure and indicate they are overloaded. Eg, if it needs to replicate 1000 containers, and it gives 500 to node 1 and 500 to node 2. What if node one completes its work more quickly (maybe its under less read load, has faster disks, is on the same rack as the target ... ) - then we cannot just take some of the containers allocated to node 2 and give them to node 1 to complete replication faster, as the commands are fired with no easy way to see their progress or cancel them. It is better for the supervisor (SCM) to hand out the work incrementally as the workers have capacity for it. Even with a longer timeout, I reckon this bad feedback loop will happen. This is roughly how HDFS does it - there is a replication queue in the namenode, and each datanode has a limit of how many replications it can have. On each heartbeat, it gets given more work up to its maximum. The namenode holds the work back until the workers have capacity to receive it. There isn't a feedback loop for the commands in HDFS, but the limit of work + a relatively short deadline to complete that work results in it working well. > SCM ask too many datanodes to replicate the same container > -- > > Key: HDDS-3481 > URL: https://issues.apache.org/jira/browse/HDDS-3481 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Blocker > Labels: TriagePending, pull-request-available > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > > *What's the problem ?* > As the image shows, scm ask 31 datanodes to replicate container 2037 every > 10 minutes from 2020-04-17 23:38:51. And at 2020-04-18 08:58:52 scm find the > replicate num of container 2037 is 12, then it ask 11 datanodes to delete > container 2037. > !screenshot-1.png! > !screenshot-2.png! > *What's the reason ?* > scm check whether (container replicates num + > inflightReplication.get(containerId).size() - > inflightDeletion.get(containerId).size()) is less than 3. If less than 3, it > will ask some datanode to replicate the container, and add the action into > inflightReplication.get(containerId). The replicate action time out is 10 > minutes, if action timeout, scm will delete the action from > inflightReplication.get(containerId) as the image shows. Then (container > replicates num + inflightReplication.get(containerId).size() - > inflightDeletion.get(containerId).size()) is less than 3 again, and scm ask > another datanode to replicate the container. > Because replicate container cost a long time, sometimes it cannot finish in > 10 minutes, thus 31 datanodes has to replicate the container every 10 > minutes. 19 of 31 datanodes replicate container from the same source > datanode, it will also cause big pressure on the source datanode and > replicate container become slower. Actually it cost 4 hours to finish the > first replicate. > !screenshot-4.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3755) Storage-class support for Ozone
[ https://issues.apache.org/jira/browse/HDDS-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130629#comment-17130629 ] Marton Elek commented on HDDS-3755: --- [~maobaolong] Would be great to provide an example configuration for the current scheme (Ratis/THREE -> Ratis/ONE). I think we can start a fork branch and create a POC. Also: the abstraction level can be improved over the time. We can start with defining the replication factors for the existing scheme and continue with defining the full state transitions. > Storage-class support for Ozone > --- > > Key: HDDS-3755 > URL: https://issues.apache.org/jira/browse/HDDS-3755 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > > Use a storage-class as an abstraction which combines replication > configuration, container states and transitions. > See this thread for the detailed design doc: > > [https://lists.apache.org/thread.html/r1e2a5d5581abe9dd09834305ca65a6807f37bd229a07b8b31bda32ad%40%3Cozone-dev.hadoop.apache.org%3E] > which is also uploaded to here: > https://hackmd.io/4kxufJBOQNaKn7PKFK_6OQ?edit -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3755) Storage-class support for Ozone
[ https://issues.apache.org/jira/browse/HDDS-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130626#comment-17130626 ] Marton Elek commented on HDDS-3755: --- *Decouple EC from this feature:* For me storage-class, is a framework which can make easier to implement some new features (TWO replications, EC). I agree with decoupling the detailed design, but it's important to find an abstraction level which is good enough for all the considered EC implementations. {quote}Would like to see better defined use-cases, and some discussion on the use-cases before we get into the design. {quote} Can you please help me with some questions. For me the important use cases: 1. EC ( I understand that the detailed design discussion is separated, but this framework should be good enough for EC) 2. Defining different Closed replication schemes (currently you couldn't configure different replication factor for Closed containers) 3. Simplify configuration and hide the implementation details from the user, but keep the flexibility for the admins 4. Make it easy to experiment with different replication scheme (like TWO) > Storage-class support for Ozone > --- > > Key: HDDS-3755 > URL: https://issues.apache.org/jira/browse/HDDS-3755 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > > Use a storage-class as an abstraction which combines replication > configuration, container states and transitions. > See this thread for the detailed design doc: > > [https://lists.apache.org/thread.html/r1e2a5d5581abe9dd09834305ca65a6807f37bd229a07b8b31bda32ad%40%3Cozone-dev.hadoop.apache.org%3E] > which is also uploaded to here: > https://hackmd.io/4kxufJBOQNaKn7PKFK_6OQ?edit -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3481) SCM ask too many datanodes to replicate the same container
[ https://issues.apache.org/jira/browse/HDDS-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130610#comment-17130610 ] Marton Elek commented on HDDS-3481: --- {quote}bq. Should we consider balancing the replication source among datanodes or throttle the replication per datanode? {quote} Agree, sooner or later we need this. A few days ago I learned that using High Density datanodes (data nodes with extreme capacity) can be more and more commons. But requesting the replication of ALL containers of a missing datanode with multiple hundred of terrabytes can be a disaster. > SCM ask too many datanodes to replicate the same container > -- > > Key: HDDS-3481 > URL: https://issues.apache.org/jira/browse/HDDS-3481 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Blocker > Labels: TriagePending, pull-request-available > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > > *What's the problem ?* > As the image shows, scm ask 31 datanodes to replicate container 2037 every > 10 minutes from 2020-04-17 23:38:51. And at 2020-04-18 08:58:52 scm find the > replicate num of container 2037 is 12, then it ask 11 datanodes to delete > container 2037. > !screenshot-1.png! > !screenshot-2.png! > *What's the reason ?* > scm check whether (container replicates num + > inflightReplication.get(containerId).size() - > inflightDeletion.get(containerId).size()) is less than 3. If less than 3, it > will ask some datanode to replicate the container, and add the action into > inflightReplication.get(containerId). The replicate action time out is 10 > minutes, if action timeout, scm will delete the action from > inflightReplication.get(containerId) as the image shows. Then (container > replicates num + inflightReplication.get(containerId).size() - > inflightDeletion.get(containerId).size()) is less than 3 again, and scm ask > another datanode to replicate the container. > Because replicate container cost a long time, sometimes it cannot finish in > 10 minutes, thus 31 datanodes has to replicate the container every 10 > minutes. 19 of 31 datanodes replicate container from the same source > datanode, it will also cause big pressure on the source datanode and > replicate container become slower. Actually it cost 4 hours to finish the > first replicate. > !screenshot-4.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] timmylicheng commented on pull request #1019: HDDS-3679. Add unit tests for PipelineManagerV2.
timmylicheng commented on pull request #1019: URL: https://github.com/apache/hadoop-ozone/pull/1019#issuecomment-641967456 > FYI: can reproduce it on linux, locally. > > It seems to be disappeared when I upgraded my rocksdb version in the main `pom.xml`: > > ``` > -6.6.4 > +6.8.1 > ``` > > I think it's good to upgrade as multiple corruption issues are fixed since 6.8.1... @elek Shall we make a separate commit to upgrade rocksdb version? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2880) Rename legacy/current ozonefs to isolated/share
[ https://issues.apache.org/jira/browse/HDDS-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Elek resolved HDDS-2880. --- Release Note: They are refactor and renamed during HDDS-3458. We don't have isolated any more as we don't have specific classloader. Resolution: Won't Fix > Rename legacy/current ozonefs to isolated/share > --- > > Key: HDDS-2880 > URL: https://issues.apache.org/jira/browse/HDDS-2880 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: build >Reporter: Marton Elek >Priority: Major > Labels: TriagePending, newbie > > When we started to provide two different packaging for ozonefs we named it > legacy and current. > "Legacy" contains all the required hadoop classes and a classloader > separation with the help of a specific class loader instance. > Current contains only the shaded (with package relocation) dependencies but > without Hadoop classes and specific classloader. > > The "current" can be used with the latest Hadoop version, "legacy" can be > used with "all" version of Hadoop. > > As "legacy" is a more generic approach it might be better to use better > naming. I suggest to name based on the function: > > * rename "legacy" to "isolated" > * rename "current" to "shared" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3519) Finalize network and storage protocol of Ozone
[ https://issues.apache.org/jira/browse/HDDS-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130596#comment-17130596 ] Marton Elek commented on HDDS-3519: --- I think it's very close to any intra-service protocol (as far as understood its used inside the Ratis log entries). I would put it to the hadoop-hdds/interface-server project, but it sounds reasonable to keep it in a separated file. > Finalize network and storage protocol of Ozone > -- > > Key: HDDS-3519 > URL: https://issues.apache.org/jira/browse/HDDS-3519 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: build >Reporter: Marton Elek >Priority: Critical > Labels: TriagePending > > One of the next releases of Ozone can be named as GA which means that > backward compatibility should be more important. > Before GA I propose to cleanup the current RPC interface and stabilize the > storage interface. > Goals: > * Clearly define the client / storage interfaces and monitor the changes > * Separate Client RPC from intra-service / admin interfaces (for security > reasons) > * Remove unusued / out-of date messages > I propose the following steps > 1. We should separate client / admin / server calls on the services. > -> Majority of existing calls are "client" calls, used by the client > -> Admin calls are supposed to be used by admin CLI (local only in a secure > environment > -> Server calls are intra-server calls (like db HB) > 2. We should use unified naming convention > 3. protocol files can be moved to separated maven project to make it easier > to reuse from language binding and make it easier to monitor API change > 4. We should use RDBStore interface everywhere instead of the old > Metadatastore interface > 5. We can move all the table definition interfaces to separated project and > monitor the API changes > This is my previous proposal for naming convetion, which was discussed and > accepted during one of the community meetings: > {quote}My simplified name convention suggest to separate only the server > (like om2scm) the client (like client2om) and admin (like pipeline list, safe > mode administration, etc.) protocol services. > 1. admin services should be available only from the cluster (use > AdminProtocol as name) > 2. client can be available from inside and outside (use ClientProtocol as > name) > 3. server protocol can be restricted to be used only between the services. > (use ..ServerProtocol as name) > Based on this convention: > --> OMClientProtocol > Should contain all the client calls (OzoneManagerProtocol) > --> OMAdminProtocol > It's a new service can contain the new omha commands > --> SCMAdminProtocol > Can contain all the admin commands from StorageContainerLocation protocol > (QueryNode, InsafeMode, ) > --> SCMClientProtocol > It seems that we don't need it any more as client doesn't require any > connection to the SCM (please confirm) > --> SCMServerProtocol (server2server calls) > * Remaining part of the StorageContainerLocation protocol (allocate > container, get container) > * Content of the SCMSecurityProtocol.proto > * Content of SCMBlockLocationProtocol > -> SCMHeartbeatProtocol > Well, it's so specific that we can create a custom postfix instead of Server. > This is the HB (StorageContainerDatanodeProtocol) > -> DatanodeClientProtocol > > Chunks, upload from the DatanodeContainerProtocol > --> DatanodeServerProtocol > There is one service here which publishes the container.tar.gz for the other > services. As of now it's combined with the DatanodeClientProtocol. > {quote} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1328) Add a new API getS3Bucket
[ https://issues.apache.org/jira/browse/HDDS-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Elek resolved HDDS-1328. --- Resolution: Won't Fix We don't need it after the recent s3 volume mapping change. > Add a new API getS3Bucket > - > > Key: HDDS-1328 > URL: https://issues.apache.org/jira/browse/HDDS-1328 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > > Currently, to get an s3bucket, we need 3 RPC's. > # Get OzoneVolumeName > # Get OzoneVolume > # then getBucket. > With the proposed approach, we can have one RPC call getS3Bucket, with which > we can save 2 RPC's for each operation in S3. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek commented on pull request #1019: HDDS-3679. Add unit tests for PipelineManagerV2.
elek commented on pull request #1019: URL: https://github.com/apache/hadoop-ozone/pull/1019#issuecomment-641960612 FYI: can reproduce it on linux, locally. It seems to be disappeared when I upgraded my rocksdb version in the main `pom.xml`: ``` -6.6.4 +6.8.1 ``` I think it's good to upgrade as multiple corruption issues are fixed since 6.8.1... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3761) "ozone fs -get" is way to slow than "ozone sh key get"
[ https://issues.apache.org/jira/browse/HDDS-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130574#comment-17130574 ] Sammi Chen commented on HDDS-3761: -- Hi [~msingh] and [~rakeshr], Please hold on the investigation, the data is captured in our production environment, I will verify it with the lastest master again. > "ozone fs -get" is way to slow than "ozone sh key get" > -- > > Key: HDDS-3761 > URL: https://issues.apache.org/jira/browse/HDDS-3761 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Sammi Chen >Priority: Major > > Read time spend to download a 7GB+ object, > time ozone fs -get > o3fs://konajdk-profiler.s325d55ad283aa400af464c76d713c07ad/part-0 > ./part-0-back > 2020-06-09 11:19:47,284 [main] INFO impl.MetricsConfig: Loaded properties > from hadoop-metrics2.properties > 2020-06-09 11:19:47,339 [main] INFO impl.MetricsSystemImpl: Scheduled Metric > snapshot period at 10 second(s). > 2020-06-09 11:19:47,339 [main] INFO impl.MetricsSystemImpl: > XceiverClientMetrics metrics system started > real 45m26.152s > user 0m28.576s > sys 0m13.488s > 222 > time bin/hadoop fs -get s3a://konajdk-profiler/part-0 ./part-0-back-1 > 20/06/09 11:19:57 INFO security.UserGroupInformation: Hadoop UGI > authentication : SIMPLE > real 3m3.542s > user 0m7.644s > sys 0m12.016s > 222 > time bin/ozone sh key get > s325d55ad283aa400af464c76d713c07ad/konajdk-profiler/part-0 > ./part-0-back > real 1m26.900s > user 0m19.604s > sys 0m10.280s -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] timmylicheng edited a comment on pull request #1019: HDDS-3679. Add unit tests for PipelineManagerV2.
timmylicheng edited a comment on pull request #1019: URL: https://github.com/apache/hadoop-ozone/pull/1019#issuecomment-641767465 Dump shows it's related to rocksdb. Not sure if it's related to multiple DBs being merged. Stack looks weird to me. Any ideas? @nandakumar131 @elek @xiaoyuyao ``` JRE version: Java(TM) SE Runtime Environment (8.0_211-b12) (build 1.8.0_211-b12) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.211-b12 mixed mode bsd-amd64 compressed oops) # Problematic frame: # C [librocksdbjni2954960755376440018.jnilib+0x602b8] rocksdb::GetColumnFamilyID(rocksdb::ColumnFamilyHandle*)+0x8 See full dump at [https://the-asf.slack.com/files/U0159PV5Z6U/F0152UAJF0S/hs_err_pid90655.log?origin_team=T4S1WH2J3_channel=D014L2URB6E](url) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek commented on pull request #1038: HDDS-3754. Rename framework to common-server
elek commented on pull request #1038: URL: https://github.com/apache/hadoop-ozone/pull/1038#issuecomment-641935708 @nandakumar131 Not sure if understood your proposal. Do you propose to create separated projects for separated framework/services? Like `framework-eventqueue`, or `framework-dbstore`? I am fine with that. But the current project (which is used only to collect server side utilities and classes) is more like `common` for the server side. But I am fine to leave it as is, if you don't think it's confusing... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3662) decouple finalize and destroy pipeline
[ https://issues.apache.org/jira/browse/HDDS-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3662: - Labels: pull-request-available (was: ) > decouple finalize and destroy pipeline > -- > > Key: HDDS-3662 > URL: https://issues.apache.org/jira/browse/HDDS-3662 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > > We have to decouple finalize and destroy pipeline. We should have two > separate calls, closePipeline and destroyPipeline. > Close pipeline should only update the pipeline state, it’s the job of the > caller to issue close container commands to all the containers in the > pipeline. > Destroy pipeline should be called from pipeline scrubber, once a pipeline has > spent enough time in closed state the pipeline scrubber should call destroy > pipeline. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] timmylicheng opened a new pull request #1049: HDDS-3662 Decouple finalizeAndDestroyPipeline.
timmylicheng opened a new pull request #1049: URL: https://github.com/apache/hadoop-ozone/pull/1049 ## What changes were proposed in this pull request? Decouple finalizeAndDestroyPipeline. Close pipeline should only update the pipeline state, it’s the job of the caller to issue close container commands to all the containers in the pipeline. Destroy pipeline should be called from pipeline scrubber, once a pipeline has spent enough time in closed state the pipeline scrubber should call destroy pipeline. (Please fill in changes proposed in this fix) ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-3662 (Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HDDS-. Fix a typo in YYY.) Please replace this section with the link to the Apache JIRA) ## How was this patch tested? UT (Please explain how this patch was tested. Ex: unit tests, manual tests) (If this patch involves UI changes, please attach a screen-shot; otherwise, remove this) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3764) Spark is failing with no such method exception
[ https://issues.apache.org/jira/browse/HDDS-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130482#comment-17130482 ] Marton Elek commented on HDDS-3764: --- Sure, I merged it. Yes, it seems to be duplicate (Spark uses Hadoop 2.7.4 by default). Will retest spark with your patch and close this issue. > Spark is failing with no such method exception > -- > > Key: HDDS-3764 > URL: https://issues.apache.org/jira/browse/HDDS-3764 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: build >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Critical > > When I tested the existing documentation (Spark + Kubernetes) I found that > Spark (default distribution with Hadoop 2.7) is failing with NoSuchMethod > exception: > {code:java} > Exception in thread "main" org.apache.spark.SparkException: Job aborted due > to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: > Lost task 0.3 in stage 0.0 (TID 3, 10.42.0.169, executor 1): > java.lang.NoSuchMethodError: org.apache.hadoop.util.Time.monotonicNowNanos()J > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandAsync(XceiverClientGrpc.java:437) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3715) Improvement for OzoneFS client to work with Hadoop 2.7.3
[ https://issues.apache.org/jira/browse/HDDS-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Elek resolved HDDS-3715. --- Fix Version/s: 0.6.0 Target Version/s: 0.6.0 Resolution: Fixed > Improvement for OzoneFS client to work with Hadoop 2.7.3 > > > Key: HDDS-3715 > URL: https://issues.apache.org/jira/browse/HDDS-3715 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > > The background is the hadoop production clusters we used internally is based > on Hadoop 2.7.3. Currenlty we maintain an internal OzoneFS client for Hadoop > 2.7.3. With HDDS-3627 merged, it is the right time to use the community > version instead of an internal version. > The improvement are some Hadoop2.7.7 newly added functions which are not > available in 2.7.3. So refact the code to use an older version with same > functionality. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3715) Improvement for OzoneFS client to work with Hadoop 2.7.3
[ https://issues.apache.org/jira/browse/HDDS-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3715: - Labels: pull-request-available (was: ) > Improvement for OzoneFS client to work with Hadoop 2.7.3 > > > Key: HDDS-3715 > URL: https://issues.apache.org/jira/browse/HDDS-3715 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > Labels: pull-request-available > > The background is the hadoop production clusters we used internally is based > on Hadoop 2.7.3. Currenlty we maintain an internal OzoneFS client for Hadoop > 2.7.3. With HDDS-3627 merged, it is the right time to use the community > version instead of an internal version. > The improvement are some Hadoop2.7.7 newly added functions which are not > available in 2.7.3. So refact the code to use an older version with same > functionality. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek merged pull request #1036: HDDS-3715. Improvement for OzoneFS client to work with Hadoop 2.7.3.
elek merged pull request #1036: URL: https://github.com/apache/hadoop-ozone/pull/1036 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek merged pull request #945: HDDS-3622. Implement rocksdb tool to parse scm db
elek merged pull request #945: URL: https://github.com/apache/hadoop-ozone/pull/945 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3755) Storage-class support for Ozone
[ https://issues.apache.org/jira/browse/HDDS-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130397#comment-17130397 ] maobaolong commented on HDDS-3755: -- [~arp] Yeah, i agree with decouple EC from this feature. [~elek] I think storage-class can be a suite of parameters to describe how to write a file, for example, it can contains replication factor, replicationType. Transfer rule: We can define some rule to describe when(condition or timer) invoke a convert action. For example, convert files from a storage-class to another storage-class when files lives 7 days. Phase: Files can have many phases, we can defines some rules to connect the phase. file > phase1(storageClassA) --rule1-> phase2(storageClassB) --rule2-> phase3(storageClassC) --> rule3 -->phase4(deleted) Transfer chain: A whole chain from start phase to the end phase. Please correct me if i am wrong. > Storage-class support for Ozone > --- > > Key: HDDS-3755 > URL: https://issues.apache.org/jira/browse/HDDS-3755 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > > Use a storage-class as an abstraction which combines replication > configuration, container states and transitions. > See this thread for the detailed design doc: > > [https://lists.apache.org/thread.html/r1e2a5d5581abe9dd09834305ca65a6807f37bd229a07b8b31bda32ad%40%3Cozone-dev.hadoop.apache.org%3E] > which is also uploaded to here: > https://hackmd.io/4kxufJBOQNaKn7PKFK_6OQ?edit -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-3755) Storage-class support for Ozone
[ https://issues.apache.org/jira/browse/HDDS-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128866#comment-17128866 ] maobaolong edited comment on HDDS-3755 at 6/10/20, 8:06 AM: [~elek]Thanks for bring this invention to Ozone, after a concretely discussion with you, i saw a great feature from this. Now i would like to do some works to started this. In my view, we can define some concept clearly, for example(feel free to add or remote the following items) - Storage-class - Transfer rule - Phase - Transfer chain We can also define boundaries of each develop phase, then we can create some definite sub task, some contributors like me can take some tickets to do. For example, - Use Storage-class to combine a suite of parameters about how to write and store data - Two replication factor support My draft about storage-as-a-framework and Ozone-storage-transfer related docs, feel free to discuss with me or left any comments in the docs. I think that after several round of discussion, we can reach the consensus. https://docs.google.com/document/d/1gfjiKEpfyEfqXI3aT12dMibc0i14YYv5o09lKm_F-ZA/edit?usp=sharing was (Author: maobaolong): [~elek]Thanks for bring this invention to Ozone, after a concretely discussion with you, i saw a great feature from this. Now i would like to do some works to started this. In my view, we can define some concept clearly, for example(feel free to add or remote the following items) - Storage-class - Transfer rule - Phase - Transfer chain We can also define boundaries of each develop phase, then we can create some definite sub task, some contributors like me can take some tickets to do. For example, - Use Storage-class to combine a suite of parameters about how to write and store data - Two replication factor support My draft about storage-as-a-framework and Ozone-storage-transfer related docs, feel free to discuss with me or left any comments in the docs. I think that after several round of discussion, we can reach the consensus. > Storage-class support for Ozone > --- > > Key: HDDS-3755 > URL: https://issues.apache.org/jira/browse/HDDS-3755 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > > Use a storage-class as an abstraction which combines replication > configuration, container states and transitions. > See this thread for the detailed design doc: > > [https://lists.apache.org/thread.html/r1e2a5d5581abe9dd09834305ca65a6807f37bd229a07b8b31bda32ad%40%3Cozone-dev.hadoop.apache.org%3E] > which is also uploaded to here: > https://hackmd.io/4kxufJBOQNaKn7PKFK_6OQ?edit -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3771) Block when using ’ozone fs -cat o3fs://xxxxx.xxxx/xxx‘
[ https://issues.apache.org/jira/browse/HDDS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mingchao zhao resolved HDDS-3771. - Resolution: Cannot Reproduce > Block when using ’ozone fs -cat o3fs://x./xxx‘ > -- > > Key: HDDS-3771 > URL: https://issues.apache.org/jira/browse/HDDS-3771 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Affects Versions: 0.6.0 >Reporter: mingchao zhao >Priority: Major > Attachments: image-2020-06-10-11-48-12-299.png > > > Block when I use ’ozone fs -cat o3fs://x./xxx‘. And no logs are seen > in the background. This is normal when I use ’ozone sh key cat > /x//xxx ‘. > !image-2020-06-10-11-48-12-299.png|width=919,height=88! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3771) Block when using ’ozone fs -cat o3fs://xxxxx.xxxx/xxx‘
[ https://issues.apache.org/jira/browse/HDDS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130368#comment-17130368 ] mingchao zhao commented on HDDS-3771: - This problem is no longer present in the latest version of the code. Close this. > Block when using ’ozone fs -cat o3fs://x./xxx‘ > -- > > Key: HDDS-3771 > URL: https://issues.apache.org/jira/browse/HDDS-3771 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Affects Versions: 0.6.0 >Reporter: mingchao zhao >Priority: Major > Attachments: image-2020-06-10-11-48-12-299.png > > > Block when I use ’ozone fs -cat o3fs://x./xxx‘. And no logs are seen > in the background. This is normal when I use ’ozone sh key cat > /x//xxx ‘. > !image-2020-06-10-11-48-12-299.png|width=919,height=88! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3683) Ozone fuse support
[ https://issues.apache.org/jira/browse/HDDS-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130367#comment-17130367 ] maobaolong commented on HDDS-3683: -- [~msingh] [~aryangupta1998] [~nanda] Thanks for your help to let me run the dfs-fuse on my test cluster. I did a bundle tests for `dfs-fuse` and `hcfsfuse` to measure the read performance, the following is some screenshot of the test results. the file is 1.6GB !screenshot-1.png! !screenshot-2.png! !screenshot-3.png! >From the above screenshot, i think the write and warm read performance is >almost the same, for the cold read performance, the hcfsfuse is better than >dfsfuse. > Ozone fuse support > --- > > Key: HDDS-3683 > URL: https://issues.apache.org/jira/browse/HDDS-3683 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: maobaolong >Assignee: maobaolong >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png > > > https://github.com/opendataio/hcfsfuse -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3725) Ozone sh volume client support quota option.
[ https://issues.apache.org/jira/browse/HDDS-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-3725: - Target Version/s: 0.7.0 > Ozone sh volume client support quota option. > > > Key: HDDS-3725 > URL: https://issues.apache.org/jira/browse/HDDS-3725 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Simon Su >Assignee: Simon Su >Priority: Major > Labels: pull-request-available > Time Spent: 49h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3683) Ozone fuse support
[ https://issues.apache.org/jira/browse/HDDS-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maobaolong updated HDDS-3683: - Attachment: screenshot-2.png > Ozone fuse support > --- > > Key: HDDS-3683 > URL: https://issues.apache.org/jira/browse/HDDS-3683 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: maobaolong >Assignee: maobaolong >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > https://github.com/opendataio/hcfsfuse -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3683) Ozone fuse support
[ https://issues.apache.org/jira/browse/HDDS-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maobaolong updated HDDS-3683: - Attachment: screenshot-3.png > Ozone fuse support > --- > > Key: HDDS-3683 > URL: https://issues.apache.org/jira/browse/HDDS-3683 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: maobaolong >Assignee: maobaolong >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png > > > https://github.com/opendataio/hcfsfuse -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org