[jira] [Resolved] (HDDS-3074) Make the configuration of container scrub consistent
[ https://issues.apache.org/jira/browse/HDDS-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-3074. -- Fix Version/s: 0.6.0 Resolution: Fixed > Make the configuration of container scrub consistent > > > Key: HDDS-3074 > URL: https://issues.apache.org/jira/browse/HDDS-3074 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: YiSheng Lien >Assignee: Neo Yang >Priority: Minor > Labels: pull-request-available > Fix For: 0.6.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The prefix of configuration of container scrub in > [ozone-site.xml|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/common/src/main/resources/ozone-default.xml] > is *hdds.container.scrub*, but in > [ContainerScrubberConfiguration|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerScrubberConfiguration.java] > is *hdds.containerscrub*. > This situation would lead to not work under configuration. > For example, when we set *hdds.container.scrub.enabled* true the cluster > didn't work on container scrub, and if we set *hdds.containerscrub.enable* > true, it did work. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on issue #722: HDDS-3074. Make the configuration of container scrub consistent.
bharatviswa504 commented on issue #722: HDDS-3074. Make the configuration of container scrub consistent. URL: https://github.com/apache/hadoop-ozone/pull/722#issuecomment-604820134 Thank You @cku328 for the contribution. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on issue #722: HDDS-3074. Make the configuration of container scrub consistent.
bharatviswa504 commented on issue #722: HDDS-3074. Make the configuration of container scrub consistent. URL: https://github.com/apache/hadoop-ozone/pull/722#issuecomment-604819927 Test failures are unrelated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 merged pull request #722: HDDS-3074. Make the configuration of container scrub consistent.
bharatviswa504 merged pull request #722: HDDS-3074. Make the configuration of container scrub consistent. URL: https://github.com/apache/hadoop-ozone/pull/722 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #727: HDDS-3273. getConf does not return all OM addresses.
bharatviswa504 commented on a change in pull request #727: HDDS-3273. getConf does not return all OM addresses. URL: https://github.com/apache/hadoop-ozone/pull/727#discussion_r399039110 ## File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/OmUtils.java ## @@ -89,6 +93,31 @@ public static InetSocketAddress getOmAddress(Configuration conf) { return NetUtils.createSocketAddr(getOmRpcAddress(conf)); } + /** + * Return list of OM addresses by service ids - when HA is enabled. + * + * @param conf {@link Configuration} + * @return {service.id -> [{@link InetSocketAddress}]} + */ + public static Map> getOmHAAddressesById( + Configuration conf) { +Map> result = new HashMap<>(); +for (String serviceId : conf.getTrimmedStringCollection( +OZONE_OM_SERVICE_IDS_KEY)) { + if (!result.containsKey(serviceId)) { +result.put(serviceId, new ArrayList<>()); + } + for (String nodeId : getOMNodeIds(conf, serviceId)) { +String rpcAddr = getOmRpcAddress(conf, +addKeySuffixes(OZONE_OM_ADDRESS_KEY, serviceId, nodeId)); +if (rpcAddr != null) { Review comment: One minor comment. when for one of nodeId is undefined, do we want to print "unknown address" instead of silently ignoring? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #727: HDDS-3273. getConf does not return all OM addresses.
bharatviswa504 commented on a change in pull request #727: HDDS-3273. getConf does not return all OM addresses. URL: https://github.com/apache/hadoop-ozone/pull/727#discussion_r399039110 ## File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/OmUtils.java ## @@ -89,6 +93,31 @@ public static InetSocketAddress getOmAddress(Configuration conf) { return NetUtils.createSocketAddr(getOmRpcAddress(conf)); } + /** + * Return list of OM addresses by service ids - when HA is enabled. + * + * @param conf {@link Configuration} + * @return {service.id -> [{@link InetSocketAddress}]} + */ + public static Map> getOmHAAddressesById( + Configuration conf) { +Map> result = new HashMap<>(); +for (String serviceId : conf.getTrimmedStringCollection( +OZONE_OM_SERVICE_IDS_KEY)) { + if (!result.containsKey(serviceId)) { +result.put(serviceId, new ArrayList<>()); + } + for (String nodeId : getOMNodeIds(conf, serviceId)) { +String rpcAddr = getOmRpcAddress(conf, +addKeySuffixes(OZONE_OM_ADDRESS_KEY, serviceId, nodeId)); +if (rpcAddr != null) { Review comment: One minor comment. when for one of nodeId address is undefined, do we want to print "unknown address" instead of silently ignoring? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3286) Support batchDelete when deleting path.
[ https://issues.apache.org/jira/browse/HDDS-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mingchao zhao updated HDDS-3286: Description: Currently delete file is to get all the keys in the directory, and then delete one by one. This makes for poor performance. By tested the deletion path with 100,000 files, which took 7320.964 sec. We plan to change this part to a batch operation to improve performance. was: Currently delete file is to get all the keys in the directory, and then delete one by one. This makes for poor performance. By tested the deletion path with 100,000 keys, which took 7320.964 sec. We plan to change this part to a batch operation to improve performance. > Support batchDelete when deleting path. > --- > > Key: HDDS-3286 > URL: https://issues.apache.org/jira/browse/HDDS-3286 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Filesystem >Reporter: mingchao zhao >Priority: Major > > Currently delete file is to get all the keys in the directory, and then > delete one by one. This makes for poor performance. By tested the deletion > path with 100,000 files, which took 7320.964 sec. > We plan to change this part to a batch operation to improve performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3286) Support batchDelete when deleting path.
mingchao zhao created HDDS-3286: --- Summary: Support batchDelete when deleting path. Key: HDDS-3286 URL: https://issues.apache.org/jira/browse/HDDS-3286 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Filesystem Reporter: mingchao zhao Currently delete file is to get all the keys in the directory, and then delete one by one. This makes for poor performance. By tested the deletion path with 100,000 keys, which took 7320.964 sec. We plan to change this part to a batch operation to improve performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3271) The block file is not deleted after the key is deleted
[ https://issues.apache.org/jira/browse/HDDS-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mingchao zhao resolved HDDS-3271. - Resolution: Not A Problem > The block file is not deleted after the key is deleted > -- > > Key: HDDS-3271 > URL: https://issues.apache.org/jira/browse/HDDS-3271 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: mingchao zhao >Priority: Major > Attachments: image-2020-03-25-11-41-26-972.png > > > When I successfully deleted the key, I was still able to see the block file > in the chunk directory. Block files are not deleted altogether. > !image-2020-03-25-11-41-26-972.png|width=1169,height=143! > This may be an existing bug, and I will confirm the reason. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] swagle commented on issue #718: HDDS-3224. Enforce volume and bucket name rule at create time.
swagle commented on issue #718: HDDS-3224. Enforce volume and bucket name rule at create time. URL: https://github.com/apache/hadoop-ozone/pull/718#issuecomment-604776938 S3BucketCreateRequest checked for length but the verifyResourceName is more comprehensive. We are still doing length check in S3BucketDeleteRequest, technically we should not need validation in the delete path but kept those changes asis. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3273) OM HA: getconf must return all OMs
[ https://issues.apache.org/jira/browse/HDDS-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3273: - Labels: pull-request-available (was: ) > OM HA: getconf must return all OMs > -- > > Key: HDDS-3273 > URL: https://issues.apache.org/jira/browse/HDDS-3273 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Tools >Affects Versions: 0.5.0 >Reporter: Dinesh Chitlangia >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > > Discovered by [~xyao] when testing 0.5.0-beta rc2: > > ozone getconf -ozonemanagers does not return all the om instances > bash-4.2$ ozone getconf -ozonemanagers > 0.0.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #625: HDDS-2980. Delete replayed entry from OpenKeyTable during commit
bharatviswa504 commented on a change in pull request #625: HDDS-2980. Delete replayed entry from OpenKeyTable during commit URL: https://github.com/apache/hadoop-ozone/pull/625#discussion_r398972414 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/s3/multipart/S3MultipartUploadCommitPartRequest.java ## @@ -147,11 +146,6 @@ public OMClientResponse validateAndUpdateCache(OzoneManager ozoneManager, throw new OMException("Failed to commit Multipart Upload key, as " + openKey + "entry is not found in the openKey table", KEY_NOT_FOUND); - } else { -// Check the OpenKeyTable if this transaction is a replay of ratis logs. Review comment: Not understood why this check is removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] adoroszlai commented on issue #725: HDDS-3284. ozonesecure-mr test fails due to lack of disk space
adoroszlai commented on issue #725: HDDS-3284. ozonesecure-mr test fails due to lack of disk space URL: https://github.com/apache/hadoop-ozone/pull/725#issuecomment-604678295 Thanks @sodonnel and @vivekratnavel for the review, and @hanishakoneru for merging it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] hanishakoneru commented on issue #723: HDDS-3281. Add timeouts to all robot tests
hanishakoneru commented on issue #723: HDDS-3281. Add timeouts to all robot tests URL: https://github.com/apache/hadoop-ozone/pull/723#issuecomment-604676324 @adoroszlai, I think even with that limitation the timeout will help us isolate the problem. Let's say the acceptance suit is cancelled, we could still get to know which test contributed to the time out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] hanishakoneru merged pull request #725: HDDS-3284. ozonesecure-mr test fails due to lack of disk space
hanishakoneru merged pull request #725: HDDS-3284. ozonesecure-mr test fails due to lack of disk space URL: https://github.com/apache/hadoop-ozone/pull/725 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] hanishakoneru commented on issue #725: HDDS-3284. ozonesecure-mr test fails due to lack of disk space
hanishakoneru commented on issue #725: HDDS-3284. ozonesecure-mr test fails due to lack of disk space URL: https://github.com/apache/hadoop-ozone/pull/725#issuecomment-604623673 Thanks @adoroszlai for working on this. I will go ahead and merge this as the change is only in acceptance tests. Thanks @sodonnel and @vivekratnavel for the reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] esahekmat opened a new pull request #726: HDDS-3267. replace containerCache in blockUtils by LoadingCache
esahekmat opened a new pull request #726: HDDS-3267. replace containerCache in blockUtils by LoadingCache URL: https://github.com/apache/hadoop-ozone/pull/726 ## What changes were proposed in this pull request? ContainerCache removed and instead of it, LoadingCache replaced in BlockUtils, The count in ReferenceCountedDB removed and this class renamed to ReferenceDB ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-3267 ## How was this patch tested? mvn clean package acceptance.sh checkstyle.sh This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3267) Replace ContainerCache in BlockUtils by LoadingCache
[ https://issues.apache.org/jira/browse/HDDS-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3267: - Labels: pull-request-available (was: ) > Replace ContainerCache in BlockUtils by LoadingCache > > > Key: HDDS-3267 > URL: https://issues.apache.org/jira/browse/HDDS-3267 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Isa Hekmatizadeh >Assignee: Isa Hekmatizadeh >Priority: Minor > Labels: pull-request-available > > As discussed in [here|https://github.com/apache/hadoop-ozone/pull/705] > current version of ContainerCache is just used by BlockUtils and has several > architectural issues. for example: > * It uses a ReentrantLock which could be replaced by synchronized methods > * It should maintain a referenceCount for each DBHandler > * It extends LRUMap while it would be better to hide it by the composition > and not expose LRUMap related methods. > As [~pifta] suggests, we could replace all ContainerCache functionality by > using Guava LoadingCache. > This new LoadingCache could be configured to evict by size, by this > configuration the functionality would be slightly different as it may evict > DBHandlers while they are in use (referenceCount>0) but we can configure it > to use reference base eviction based on CacheBuilder.weakValues() > I want to open this discussion here instead of Github so I created this > ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3088) maxRetries value is too large while trying to reconnect to SCM server
[ https://issues.apache.org/jira/browse/HDDS-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067895#comment-17067895 ] Arpit Agarwal commented on HDDS-3088: - cc [~shashikant], does this tie in with the retry settings you are looking at? > maxRetries value is too large while trying to reconnect to SCM server > - > > Key: HDDS-3088 > URL: https://issues.apache.org/jira/browse/HDDS-3088 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Nilotpal Nandi >Assignee: Nanda kumar >Priority: Major > > MaxRetries value is 2147483647 which is too high > It keeps on retrying to connect to SCM server. > > {noformat} > 2020-02-27 05:54:43,430 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. > Already tried 10535 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 > MILLISECONDS) > 2020-02-27 05:54:44,431 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. > Already tried 10536 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 > MILLISECONDS) > 2020-02-27 05:54:45,432 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. > Already tried 10537 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 > MILLISECONDS) > 2020-02-27 05:54:46,433 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. > Already tried 10538 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 > MILLISECONDS){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #399: HDDS-2424. Add the recover-trash command server side handling.
bharatviswa504 commented on a change in pull request #399: HDDS-2424. Add the recover-trash command server side handling. URL: https://github.com/apache/hadoop-ozone/pull/399#discussion_r398700964 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/response/key/OMTrashRecoverResponse.java ## @@ -0,0 +1,64 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.ozone.om.response.key; + +import org.apache.hadoop.ozone.OmUtils; +import org.apache.hadoop.ozone.om.OMMetadataManager; +import org.apache.hadoop.ozone.om.helpers.OmKeyInfo; +import org.apache.hadoop.ozone.om.helpers.RepeatedOmKeyInfo; +import org.apache.hadoop.ozone.om.response.OMClientResponse; +import org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos +.OMResponse; +import org.apache.hadoop.hdds.utils.db.BatchOperation; + +import java.io.IOException; +import javax.annotation.Nullable; +import javax.annotation.Nonnull; + +/** + * Response for RecoverTrash request. + */ +public class OMTrashRecoverResponse extends OMClientResponse { + private OmKeyInfo omKeyInfo; + + public OMTrashRecoverResponse(@Nullable OmKeyInfo omKeyInfo, + @Nonnull OMResponse omResponse) { +super(omResponse); +this.omKeyInfo = omKeyInfo; + } + + @Override + public void addToDBBatch(OMMetadataManager omMetadataManager, + BatchOperation batchOperation) throws IOException { + + /* TODO: HDDS-2425. HDDS-2426. */ +String trashKey = omMetadataManager +.getOzoneKey(omKeyInfo.getVolumeName(), +omKeyInfo.getBucketName(), omKeyInfo.getKeyName()); +RepeatedOmKeyInfo repeatedOmKeyInfo = omMetadataManager +.getDeletedTable().get(trashKey); +omKeyInfo = OmUtils.prepareKeyForRecover(omKeyInfo, repeatedOmKeyInfo); +omMetadataManager.getDeletedTable() +.deleteWithBatch(batchOperation, omKeyInfo.getKeyName()); +/* TODO: trashKey should be updated to destinationBucket. */ Review comment: I am fine with recovering last delete key if that is the expected behavior. > (And when recovering the latest key, I think we should clear the old deleted key.) We should not delete the other keys, as those keys will be picked by background trash service and the data for those keys need to be deleted. And also doing this way, is also not correct from my understanding, let us say, we put those keys in delete table, and background delete key service will pick them up and send to SCM for deletion, at this point we got a recover trash command, so there is a chance that we recover the key which might have no data, as we submitted the request to SCM for deletion, and SCM, in turn, it will send to DN. How we shall handle this kind of scenarios? Because deletion from delete table will happen when key purge request happens. Code snippet link [#link]( https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyDeletingService.java#L167 ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3285) MiniOzoneChaosCluster exits because of deadline exceeding
[ https://issues.apache.org/jira/browse/HDDS-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-3285: Description: 2020-03-26 21:26:48,869 [pool-326-thread-2] INFO util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: java.io.IOException: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io. grpc.StatusRuntimeException: DEADLINE_EXCEEDED: ClientCall started after deadline exceeded: -4.330590725s from now {code} 2020-03-26 21:26:48,866 [pool-326-thread-2] ERROR loadgenerators.LoadExecutors (LoadExecutors.java:load(64)) - FileSystem LOADGEN: null Exiting due to exception java.io.IOException: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: ClientCall started after deadline exceeded: -4.330590725s from now at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:359) at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:281) at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:259) at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:119) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:199) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:133) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:254) at org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:197) at org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:63) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.ozone.utils.LoadBucket$ReadOp.doPostOp(LoadBucket.java:205) at org.apache.hadoop.ozone.utils.LoadBucket$Op.execute(LoadBucket.java:121) at org.apache.hadoop.ozone.utils.LoadBucket$ReadOp.execute(LoadBucket.java:180) at org.apache.hadoop.ozone.utils.LoadBucket.readKey(LoadBucket.java:82) at org.apache.hadoop.ozone.loadgenerators.FilesystemLoadGenerator.generateLoad(FilesystemLoadGenerator.java:54) at org.apache.hadoop.ozone.loadgenerators.LoadExecutors.load(LoadExecutors.java:62) at org.apache.hadoop.ozone.loadgenerators.LoadExecutors.lambda$startLoad$0(LoadExecutors.java:78) at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: ClientCall started after deadline exceeded: -4.330590725s from now at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:336) ... 20 more Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: ClientCall started after deadline exceeded: -4.330590725s from now at org.apache.ratis.thirdparty.io.grpc.Status.asRuntimeException(Status.java:533) at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:442) at org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) at org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:700) at org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) at org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:399) at
[jira] [Updated] (HDDS-3285) MiniOzoneChaosCluster exits because of deadline exceeding
[ https://issues.apache.org/jira/browse/HDDS-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-3285: Description: 2020-03-26 21:26:48,869 [pool-326-thread-2] INFO util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: java.io.IOException: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io. grpc.StatusRuntimeException: DEADLINE_EXCEEDED: ClientCall started after deadline exceeded: -4.330590725s from now > MiniOzoneChaosCluster exits because of deadline exceeding > - > > Key: HDDS-3285 > URL: https://issues.apache.org/jira/browse/HDDS-3285 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Mukul Kumar Singh >Priority: Major > Labels: MiniOzoneChaosCluster > Attachments: complete.log.gz > > > 2020-03-26 21:26:48,869 [pool-326-thread-2] INFO util.ExitUtil > (ExitUtil.java:terminate(210)) - Exiting with status 1: java.io.IOException: > java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io. > grpc.StatusRuntimeException: DEADLINE_EXCEEDED: ClientCall started after > deadline exceeded: -4.330590725s from now -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3285) MiniOzoneChaosCluster exits because of deadline exceeding
Mukul Kumar Singh created HDDS-3285: --- Summary: MiniOzoneChaosCluster exits because of deadline exceeding Key: HDDS-3285 URL: https://issues.apache.org/jira/browse/HDDS-3285 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Mukul Kumar Singh Attachments: complete.log.gz -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3267) Replace ContainerCache in BlockUtils by LoadingCache
[ https://issues.apache.org/jira/browse/HDDS-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067774#comment-17067774 ] Mukul Kumar Singh commented on HDDS-3267: - [~esa.hekmat], I have added you as a contributor to the Ozone project and also assigned the jira as well. > Replace ContainerCache in BlockUtils by LoadingCache > > > Key: HDDS-3267 > URL: https://issues.apache.org/jira/browse/HDDS-3267 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Isa Hekmatizadeh >Assignee: Isa Hekmatizadeh >Priority: Minor > > As discussed in [here|https://github.com/apache/hadoop-ozone/pull/705] > current version of ContainerCache is just used by BlockUtils and has several > architectural issues. for example: > * It uses a ReentrantLock which could be replaced by synchronized methods > * It should maintain a referenceCount for each DBHandler > * It extends LRUMap while it would be better to hide it by the composition > and not expose LRUMap related methods. > As [~pifta] suggests, we could replace all ContainerCache functionality by > using Guava LoadingCache. > This new LoadingCache could be configured to evict by size, by this > configuration the functionality would be slightly different as it may evict > DBHandlers while they are in use (referenceCount>0) but we can configure it > to use reference base eviction based on CacheBuilder.weakValues() > I want to open this discussion here instead of Github so I created this > ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3267) Replace ContainerCache in BlockUtils by LoadingCache
[ https://issues.apache.org/jira/browse/HDDS-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh reassigned HDDS-3267: --- Assignee: Isa Hekmatizadeh > Replace ContainerCache in BlockUtils by LoadingCache > > > Key: HDDS-3267 > URL: https://issues.apache.org/jira/browse/HDDS-3267 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Isa Hekmatizadeh >Assignee: Isa Hekmatizadeh >Priority: Minor > > As discussed in [here|https://github.com/apache/hadoop-ozone/pull/705] > current version of ContainerCache is just used by BlockUtils and has several > architectural issues. for example: > * It uses a ReentrantLock which could be replaced by synchronized methods > * It should maintain a referenceCount for each DBHandler > * It extends LRUMap while it would be better to hide it by the composition > and not expose LRUMap related methods. > As [~pifta] suggests, we could replace all ContainerCache functionality by > using Guava LoadingCache. > This new LoadingCache could be configured to evict by size, by this > configuration the functionality would be slightly different as it may evict > DBHandlers while they are in use (referenceCount>0) but we can configure it > to use reference base eviction based on CacheBuilder.weakValues() > I want to open this discussion here instead of Github so I created this > ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] sodonnel commented on issue #678: HDDS-3179 Pipeline placement based on Topology does not have fallback
sodonnel commented on issue #678: HDDS-3179 Pipeline placement based on Topology does not have fallback URL: https://github.com/apache/hadoop-ozone/pull/678#issuecomment-604486699 Thanks for quickly addressing the final issues. I am +1 on this now. I will commit it later pending the CI checks looking good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] timmylicheng commented on issue #678: HDDS-3179 Pipeline placement based on Topology does not have fallback
timmylicheng commented on issue #678: HDDS-3179 Pipeline placement based on Topology does not have fallback URL: https://github.com/apache/hadoop-ozone/pull/678#issuecomment-604466974 > Thanks for the updates here. I think the code looks much cleaner now with the debug statements and refactored block in getResultSet(). > > There are just a couple of minor changes needed to finish this one off. Thanks for the detailed review. It really helps. @sodonnel This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] timmylicheng commented on a change in pull request #678: HDDS-3179 Pipeline placement based on Topology does not have fallback
timmylicheng commented on a change in pull request #678: HDDS-3179 Pipeline placement based on Topology does not have fallback URL: https://github.com/apache/hadoop-ozone/pull/678#discussion_r398622023 ## File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/PipelinePlacementPolicy.java ## @@ -403,7 +408,7 @@ private boolean checkAllNodesAreEqual(NetworkTopology topology) { @VisibleForTesting protected DatanodeDetails chooseNodeFromNetworkTopology( NetworkTopology networkTopology, DatanodeDetails anchor, - List excludedNodes) { + List excludedNodes) throws SCMException { Review comment: Updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] timmylicheng commented on a change in pull request #678: HDDS-3179 Pipeline placement based on Topology does not have fallback
timmylicheng commented on a change in pull request #678: HDDS-3179 Pipeline placement based on Topology does not have fallback URL: https://github.com/apache/hadoop-ozone/pull/678#discussion_r398621936 ## File path: hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/pipeline/TestPipelinePlacementPolicy.java ## @@ -43,36 +47,98 @@ private MockNodeManager nodeManager; private OzoneConfiguration conf; private PipelinePlacementPolicy placementPolicy; + private NetworkTopologyImpl cluster; private static final int PIPELINE_PLACEMENT_MAX_NODES_COUNT = 10; + private List nodesWithOutRackAwareness = new ArrayList<>(); + private List nodesWithRackAwareness = new ArrayList<>(); + @Before public void init() throws Exception { -nodeManager = new MockNodeManager(true, -PIPELINE_PLACEMENT_MAX_NODES_COUNT); +cluster = initTopology(); +// start with nodes with rack awareness. +nodeManager = new MockNodeManager(cluster, getNodesWithRackAwareness(), +false, PIPELINE_PLACEMENT_MAX_NODES_COUNT); conf = new OzoneConfiguration(); conf.setInt(OZONE_DATANODE_PIPELINE_LIMIT, 5); placementPolicy = new PipelinePlacementPolicy( nodeManager, new PipelineStateManager(), conf); } + private NetworkTopologyImpl initTopology() { +NodeSchema[] schemas = new NodeSchema[] +{ROOT_SCHEMA, RACK_SCHEMA, LEAF_SCHEMA}; +NodeSchemaManager.getInstance().init(schemas, true); +NetworkTopologyImpl topology = +new NetworkTopologyImpl(NodeSchemaManager.getInstance()); +return topology; + } + + private List getNodesWithRackAwareness() { +List datanodes = new ArrayList<>(); +for (Node node : NODES) { + DatanodeDetails datanode = overwriteLocationInNode( + getNodesWithoutRackAwareness(), node); + nodesWithRackAwareness.add(datanode); + datanodes.add(datanode); +} +return datanodes; + } + + private DatanodeDetails getNodesWithoutRackAwareness() { +DatanodeDetails node = MockDatanodeDetails.randomDatanodeDetails(); +nodesWithOutRackAwareness.add(node); +return node; + } + @Test - public void testChooseNodeBasedOnNetworkTopology() { -List healthyNodes = -nodeManager.getNodes(HddsProtos.NodeState.HEALTHY); -DatanodeDetails anchor = placementPolicy.chooseNode(healthyNodes); + public void testChooseNodeBasedOnNetworkTopology() throws SCMException { +DatanodeDetails anchor = placementPolicy.chooseNode(nodesWithRackAwareness); // anchor should be removed from healthyNodes after being chosen. -Assert.assertFalse(healthyNodes.contains(anchor)); +Assert.assertFalse(nodesWithRackAwareness.contains(anchor)); List excludedNodes = new ArrayList<>(PIPELINE_PLACEMENT_MAX_NODES_COUNT); excludedNodes.add(anchor); DatanodeDetails nextNode = placementPolicy.chooseNodeFromNetworkTopology( nodeManager.getClusterNetworkTopologyMap(), anchor, excludedNodes); Assert.assertFalse(excludedNodes.contains(nextNode)); -// nextNode should not be the same as anchor. +// next node should not be the same as anchor. Assert.assertTrue(anchor.getUuid() != nextNode.getUuid()); +// next node should be on the same rack based on topology. +Assert.assertEquals(anchor.getNetworkLocation(), +nextNode.getNetworkLocation()); } + @Test + public void testChooseNodeWithSingleNodeRack() throws SCMException { +// There is only one node on 3 racks altogether. +List datanodes = new ArrayList<>(); +for (Node node : SINGLE_NODE_RACK) { + DatanodeDetails datanode = overwriteLocationInNode( + MockDatanodeDetails.randomDatanodeDetails(), node); + datanodes.add(datanode); +} +MockNodeManager localNodeManager = new MockNodeManager(null, datanodes, Review comment: You are right. I updated this part. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] sodonnel commented on issue #678: HDDS-3179 Pipeline placement based on Topology does not have fallback
sodonnel commented on issue #678: HDDS-3179 Pipeline placement based on Topology does not have fallback URL: https://github.com/apache/hadoop-ozone/pull/678#issuecomment-604452924 Thanks for the updates here. I think the code looks much cleaner now with the debug statements and refactored block in getResultSet(). There are just a couple of minor changes needed to finish this one off. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] sodonnel commented on a change in pull request #678: HDDS-3179 Pipeline placement based on Topology does not have fallback
sodonnel commented on a change in pull request #678: HDDS-3179 Pipeline placement based on Topology does not have fallback URL: https://github.com/apache/hadoop-ozone/pull/678#discussion_r398601166 ## File path: hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/pipeline/TestPipelinePlacementPolicy.java ## @@ -43,36 +47,98 @@ private MockNodeManager nodeManager; private OzoneConfiguration conf; private PipelinePlacementPolicy placementPolicy; + private NetworkTopologyImpl cluster; private static final int PIPELINE_PLACEMENT_MAX_NODES_COUNT = 10; + private List nodesWithOutRackAwareness = new ArrayList<>(); + private List nodesWithRackAwareness = new ArrayList<>(); + @Before public void init() throws Exception { -nodeManager = new MockNodeManager(true, -PIPELINE_PLACEMENT_MAX_NODES_COUNT); +cluster = initTopology(); +// start with nodes with rack awareness. +nodeManager = new MockNodeManager(cluster, getNodesWithRackAwareness(), +false, PIPELINE_PLACEMENT_MAX_NODES_COUNT); conf = new OzoneConfiguration(); conf.setInt(OZONE_DATANODE_PIPELINE_LIMIT, 5); placementPolicy = new PipelinePlacementPolicy( nodeManager, new PipelineStateManager(), conf); } + private NetworkTopologyImpl initTopology() { +NodeSchema[] schemas = new NodeSchema[] +{ROOT_SCHEMA, RACK_SCHEMA, LEAF_SCHEMA}; +NodeSchemaManager.getInstance().init(schemas, true); +NetworkTopologyImpl topology = +new NetworkTopologyImpl(NodeSchemaManager.getInstance()); +return topology; + } + + private List getNodesWithRackAwareness() { +List datanodes = new ArrayList<>(); +for (Node node : NODES) { + DatanodeDetails datanode = overwriteLocationInNode( + getNodesWithoutRackAwareness(), node); + nodesWithRackAwareness.add(datanode); + datanodes.add(datanode); +} +return datanodes; + } + + private DatanodeDetails getNodesWithoutRackAwareness() { +DatanodeDetails node = MockDatanodeDetails.randomDatanodeDetails(); +nodesWithOutRackAwareness.add(node); +return node; + } + @Test - public void testChooseNodeBasedOnNetworkTopology() { -List healthyNodes = -nodeManager.getNodes(HddsProtos.NodeState.HEALTHY); -DatanodeDetails anchor = placementPolicy.chooseNode(healthyNodes); + public void testChooseNodeBasedOnNetworkTopology() throws SCMException { +DatanodeDetails anchor = placementPolicy.chooseNode(nodesWithRackAwareness); // anchor should be removed from healthyNodes after being chosen. -Assert.assertFalse(healthyNodes.contains(anchor)); +Assert.assertFalse(nodesWithRackAwareness.contains(anchor)); List excludedNodes = new ArrayList<>(PIPELINE_PLACEMENT_MAX_NODES_COUNT); excludedNodes.add(anchor); DatanodeDetails nextNode = placementPolicy.chooseNodeFromNetworkTopology( nodeManager.getClusterNetworkTopologyMap(), anchor, excludedNodes); Assert.assertFalse(excludedNodes.contains(nextNode)); -// nextNode should not be the same as anchor. +// next node should not be the same as anchor. Assert.assertTrue(anchor.getUuid() != nextNode.getUuid()); +// next node should be on the same rack based on topology. +Assert.assertEquals(anchor.getNetworkLocation(), +nextNode.getNetworkLocation()); } + @Test + public void testChooseNodeWithSingleNodeRack() throws SCMException { +// There is only one node on 3 racks altogether. +List datanodes = new ArrayList<>(); +for (Node node : SINGLE_NODE_RACK) { + DatanodeDetails datanode = overwriteLocationInNode( + MockDatanodeDetails.randomDatanodeDetails(), node); + datanodes.add(datanode); +} +MockNodeManager localNodeManager = new MockNodeManager(null, datanodes, Review comment: This test doesn't reproduce the error if the fix in this PR is removed. I commented out the fix in PipelinePlacement policy and added back in the old logic and ran this and it still passed. The reason, is that PipelinePlacementPolicy did not believe networkTopology was present. Changing this line as follows makes the test reproduce the problem: ``` MockNodeManager localNodeManager = new MockNodeManager(initTopology(), datanodes, false, datanodes.size()); ``` Note I added `initTopology()` when constructing the NodeManager. With that, the test fails without the fix, and passes with the fix in this PR, so that is good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe,
[GitHub] [hadoop-ozone] sodonnel commented on a change in pull request #678: HDDS-3179 Pipeline placement based on Topology does not have fallback
sodonnel commented on a change in pull request #678: HDDS-3179 Pipeline placement based on Topology does not have fallback URL: https://github.com/apache/hadoop-ozone/pull/678#discussion_r398582704 ## File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/PipelinePlacementPolicy.java ## @@ -403,7 +408,7 @@ private boolean checkAllNodesAreEqual(NetworkTopology topology) { @VisibleForTesting protected DatanodeDetails chooseNodeFromNetworkTopology( NetworkTopology networkTopology, DatanodeDetails anchor, - List excludedNodes) { + List excludedNodes) throws SCMException { Review comment: You can remove the `throws SCMException` in the method definition now, as it is no longer used. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] sodonnel commented on issue #678: HDDS-3179 Pipeline placement based on Topology does not have fallback
sodonnel commented on issue #678: HDDS-3179 Pipeline placement based on Topology does not have fallback URL: https://github.com/apache/hadoop-ozone/pull/678#issuecomment-604425486 Acceptance is failing with: ``` == Execute PI calculation| FAIL | 1 != 0 -- Execute WordCount | FAIL | 1 != 0 -- ozonesecure-mr-mapreduce :: Execute MR jobs | FAIL | ``` This is known issue and will be fixed soon - HDDS-3284. Integration tests are flaky. it-Freon: ``` [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 395.168 s <<< FAILURE! - in org.apache.hadoop.ozone.freon.TestRandomKeyGenerator [ERROR] bigFileThan2GB(org.apache.hadoop.ozone.freon.TestRandomKeyGenerator) Time elapsed: 326.297 s <<< FAILURE! java.lang.AssertionError: expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) ``` One of these issues covers it: https://issues.apache.org/jira/browse/HDDS-3266 and it-freon: https://issues.apache.org/jira/browse/HDDS-3257 it-client I am not sure This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3267) Replace ContainerCache in BlockUtils by LoadingCache
[ https://issues.apache.org/jira/browse/HDDS-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067652#comment-17067652 ] Isa Hekmatizadeh commented on HDDS-3267: please assign this task to me, I'm currently working on it > Replace ContainerCache in BlockUtils by LoadingCache > > > Key: HDDS-3267 > URL: https://issues.apache.org/jira/browse/HDDS-3267 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Isa Hekmatizadeh >Priority: Minor > > As discussed in [here|https://github.com/apache/hadoop-ozone/pull/705] > current version of ContainerCache is just used by BlockUtils and has several > architectural issues. for example: > * It uses a ReentrantLock which could be replaced by synchronized methods > * It should maintain a referenceCount for each DBHandler > * It extends LRUMap while it would be better to hide it by the composition > and not expose LRUMap related methods. > As [~pifta] suggests, we could replace all ContainerCache functionality by > using Guava LoadingCache. > This new LoadingCache could be configured to evict by size, by this > configuration the functionality would be slightly different as it may evict > DBHandlers while they are in use (referenceCount>0) but we can configure it > to use reference base eviction based on CacheBuilder.weakValues() > I want to open this discussion here instead of Github so I created this > ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] sodonnel commented on issue #725: HDDS-3284. ozonesecure-mr test fails due to lack of disk space
sodonnel commented on issue #725: HDDS-3284. ozonesecure-mr test fails due to lack of disk space URL: https://github.com/apache/hadoop-ozone/pull/725#issuecomment-604396747 Yea, I have seen random failures on it-client. As this is a yarn only change, I cannot see how it could impact the integration tests, so I think we are good to merge it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] adoroszlai commented on issue #725: HDDS-3284. ozonesecure-mr test fails due to lack of disk space
adoroszlai commented on issue #725: HDDS-3284. ozonesecure-mr test fails due to lack of disk space URL: https://github.com/apache/hadoop-ozone/pull/725#issuecomment-604392865 Thanks @sodonnel. Since it only changes acceptance tests, I think we can merge it even if `it-client` happens to fail. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] sodonnel commented on issue #725: HDDS-3284. ozonesecure-mr test fails due to lack of disk space
sodonnel commented on issue #725: HDDS-3284. ozonesecure-mr test fails due to lack of disk space URL: https://github.com/apache/hadoop-ozone/pull/725#issuecomment-604386700 Thanks for looking into this. +1 on this from me, pending CI, but some of the "it-*" tests are a bit flaky. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-3271) The block file is not deleted after the key is deleted
[ https://issues.apache.org/jira/browse/HDDS-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067551#comment-17067551 ] mingchao zhao edited comment on HDDS-3271 at 3/26/20, 10:37 AM: Hi [~msingh] Thank you for your reply. >The SCM maintains a log of deletes blocks and deletes them after the container >is closed. By testing, I found that after closing the container, the block is actually removed. I found that ozone has a configuration ozone.block.deleting.service.interval (default 60sec).It's going to be executed every minute. It is true that block will be remove [when the container is closed|https://github.com/apache/hadoop-ozone/blob/c8f14a560beb9a83c7d98388614a5ba36d7638f6/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DatanodeDeletedBlockTransactions.java#L66]. was (Author: micahzhao): Hi [~msingh] Thank you for your discussion. >The SCM maintains a log of deletes blocks and deletes them after the container >is closed. By testing, I found that after closing the container, the block is actually removed. I found that ozone has a configuration ozone.block.deleting.service.interval (default 60sec).It's going to be executed every minute. It is true that block will be remove [when the container is closed|https://github.com/apache/hadoop-ozone/blob/c8f14a560beb9a83c7d98388614a5ba36d7638f6/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DatanodeDeletedBlockTransactions.java#L66]. > The block file is not deleted after the key is deleted > -- > > Key: HDDS-3271 > URL: https://issues.apache.org/jira/browse/HDDS-3271 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: mingchao zhao >Priority: Major > Attachments: image-2020-03-25-11-41-26-972.png > > > When I successfully deleted the key, I was still able to see the block file > in the chunk directory. Block files are not deleted altogether. > !image-2020-03-25-11-41-26-972.png|width=1169,height=143! > This may be an existing bug, and I will confirm the reason. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-3271) The block file is not deleted after the key is deleted
[ https://issues.apache.org/jira/browse/HDDS-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067551#comment-17067551 ] mingchao zhao edited comment on HDDS-3271 at 3/26/20, 10:36 AM: Hi [~msingh] Thank you for your discussion. >The SCM maintains a log of deletes blocks and deletes them after the container >is closed. By testing, I found that after closing the container, the block is actually removed. I found that ozone has a configuration ozone.block.deleting.service.interval (default 60sec).It's going to be executed every minute. It is true that block will be remove [when the container is closed|https://github.com/apache/hadoop-ozone/blob/c8f14a560beb9a83c7d98388614a5ba36d7638f6/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DatanodeDeletedBlockTransactions.java#L66]. was (Author: micahzhao): Hi [~msingh] Thank you for your discussion. >The SCM maintains a log of deletes blocks and deletes them after the container >is closed. By testing, I found that after closing the container, the block is actually removed. Except when the container is closed, there is any other strategies for timed deletion? I found that ozone has a configuration ozone.block.deleting.service.interval (default 60sec).This should be deleted every minute, but it does not take effect. > The block file is not deleted after the key is deleted > -- > > Key: HDDS-3271 > URL: https://issues.apache.org/jira/browse/HDDS-3271 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: mingchao zhao >Priority: Major > Attachments: image-2020-03-25-11-41-26-972.png > > > When I successfully deleted the key, I was still able to see the block file > in the chunk directory. Block files are not deleted altogether. > !image-2020-03-25-11-41-26-972.png|width=1169,height=143! > This may be an existing bug, and I will confirm the reason. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3284) ozonesecure-mr test fails due to lack of disk space
[ https://issues.apache.org/jira/browse/HDDS-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-3284: --- Status: Patch Available (was: In Progress) > ozonesecure-mr test fails due to lack of disk space > --- > > Key: HDDS-3284 > URL: https://issues.apache.org/jira/browse/HDDS-3284 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {{ozonesecure-mr}} acceptance test is failing with {{No space available in > any of the local directories.}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] adoroszlai opened a new pull request #725: HDDS-3284. ozonesecure-mr test fails due to lack of disk space
adoroszlai opened a new pull request #725: HDDS-3284. ozonesecure-mr test fails due to lack of disk space URL: https://github.com/apache/hadoop-ozone/pull/725 ## What changes were proposed in this pull request? Disable YARN disk utilization check in `ozonesecure-mr` acceptance test. Plenty of disk space is available in CI, but more than 90% of the disk is used: ``` Filesystem Size Used Avail Use% Mounted on ... /dev/sda184G 75G 8.3G 91% / ``` Thus directory checker marks it as invalid: ``` WARN DirectoryCollection:418 - Directory /tmp/hadoop-hadoop/nm-local-dir error, used space above threshold of 90.0%, removing from list of valid directories WARN DirectoryCollection:418 - Directory /opt/hadoop/logs/userlogs error, used space above threshold of 90.0%, removing from list of valid directories ``` https://issues.apache.org/jira/browse/HDDS-3284 ## How was this patch tested? https://github.com/adoroszlai/hadoop-ozone/runs/535925282 ``` == Execute PI calculation| PASS | -- Execute WordCount | PASS | -- ozonesecure-mr-mapreduce :: Execute MR jobs | PASS | ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3284) ozonesecure-mr test fails due to lack of disk space
[ https://issues.apache.org/jira/browse/HDDS-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3284: - Labels: pull-request-available (was: ) > ozonesecure-mr test fails due to lack of disk space > --- > > Key: HDDS-3284 > URL: https://issues.apache.org/jira/browse/HDDS-3284 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > > {{ozonesecure-mr}} acceptance test is failing with {{No space available in > any of the local directories.}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3271) The block file is not deleted after the key is deleted
[ https://issues.apache.org/jira/browse/HDDS-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067551#comment-17067551 ] mingchao zhao commented on HDDS-3271: - Hi [~msingh] Thank you for your discussion. >The SCM maintains a log of deletes blocks and deletes them after the container >is closed. By testing, I found that after closing the container, the block is actually removed. Except when the container is closed, there is any other strategies for timed deletion? I found that ozone has a configuration ozone.block.deleting.service.interval (default 60sec).This should be deleted every minute, but it does not take effect. > The block file is not deleted after the key is deleted > -- > > Key: HDDS-3271 > URL: https://issues.apache.org/jira/browse/HDDS-3271 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: mingchao zhao >Priority: Major > Attachments: image-2020-03-25-11-41-26-972.png > > > When I successfully deleted the key, I was still able to see the block file > in the chunk directory. Block files are not deleted altogether. > !image-2020-03-25-11-41-26-972.png|width=1169,height=143! > This may be an existing bug, and I will confirm the reason. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] timmylicheng edited a comment on issue #720: HDDS-3185 Construct a standalone ratis server for SCM.
timmylicheng edited a comment on issue #720: HDDS-3185 Construct a standalone ratis server for SCM. URL: https://github.com/apache/hadoop-ozone/pull/720#issuecomment-604254744 @elek Hey Marton, Sorry this patch was meant to merge into HDDS-2823 as a dev branch for SCM HA. So far the doc work is ongoing in parallel with prototyping. The doc still needs improvements and I will find time to work on that. This patch is only to work on a prototype of SCM HA to get a feeling of it. To have Ratis server and state machine for SCM doesn't have debates. Current discussion for design is to collect how to handle Raft transaction for all different types of actions and reports on SCM. Besides going thru the current code base to make analysis, I'm also probing by prototyping. I start off implementing a stand alone Ratis Server and make steps. I will try to finalize the design doc and schedule a call with the community to include more guys. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] timmylicheng commented on issue #720: HDDS-3185 Construct a standalone ratis server for SCM.
timmylicheng commented on issue #720: HDDS-3185 Construct a standalone ratis server for SCM. URL: https://github.com/apache/hadoop-ozone/pull/720#issuecomment-604254744 @elek Hey Marton, So far the doc work is ongoing in parallel with prototyping. The doc still needs improvements and I will find time to work on that. This patch is only to work on a prototype of SCM HA to get a feeling of it. To have Ratis server and state machine for SCM doesn't have debates. Current discussion for design is to collect how to handle Raft transaction for all different types of actions and reports on SCM. Besides going thru the current code base to make analysis, I'm also probing by prototyping. I start off implementing a stand alone Ratis Server and make steps. I will try to finalize the design doc and schedule a call with the community to include more guys. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3001) NFS support for Ozone
[ https://issues.apache.org/jira/browse/HDDS-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067384#comment-17067384 ] Prashant Pogde commented on HDDS-3001: -- Attaching the design document. Please take a look. > NFS support for Ozone > - > > Key: HDDS-3001 > URL: https://issues.apache.org/jira/browse/HDDS-3001 > Project: Hadoop Distributed Data Store > Issue Type: New Feature > Components: Ozone Filesystem >Affects Versions: 0.5.0 >Reporter: Prashant Pogde >Assignee: Prashant Pogde >Priority: Major > Attachments: NFS Support for Ozone.pdf > > > Provide NFS support for Ozone -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3001) NFS support for Ozone
[ https://issues.apache.org/jira/browse/HDDS-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Pogde updated HDDS-3001: - Attachment: NFS Support for Ozone.pdf > NFS support for Ozone > - > > Key: HDDS-3001 > URL: https://issues.apache.org/jira/browse/HDDS-3001 > Project: Hadoop Distributed Data Store > Issue Type: New Feature > Components: Ozone Filesystem >Affects Versions: 0.5.0 >Reporter: Prashant Pogde >Assignee: Prashant Pogde >Priority: Major > Attachments: NFS Support for Ozone.pdf > > > Provide NFS support for Ozone -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-3254) Datanode memory increase so much
[ https://issues.apache.org/jira/browse/HDDS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067371#comment-17067371 ] runzhiwang edited comment on HDDS-3254 at 3/26/20, 6:06 AM: [~shashikant] I have stop send request to s3gateway for one day, but the datanode physical memory does not come down, and the CPU is more than 100%. I will try to reproduce it with some jvm options. And I will check RetryCache object size. -Xms10g -Xmx10g -Xmn4g -XX:+UseParallelGC -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=1024m -XX:SurvivorRatio=4 -verbose:gc -Xloggc:/var/datanode_gc.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/datanode_dump.hprof -XX:ParallelGCThreads=8 -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly was (Author: yjxxtd): [~shashikant] I have stop send request to s3gateway for one day, but the datanode physical memory does not come down, and the CPU is more than 100%. I will try to reproduce it with some jvm options. And I will check RetryCache object size. > Datanode memory increase so much > > > Key: HDDS-3254 > URL: https://issues.apache.org/jira/browse/HDDS-3254 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: image-2020-03-24-10-05-41-212.png, > image-2020-03-24-16-32-43-973.png, image-2020-03-24-16-33-20-795.png > > > As the image shows, the physical memory of datanode increase to 11.2GB, and > then crash. I will find out the root cause. > !image-2020-03-24-10-05-41-212.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-3254) Datanode memory increase so much
[ https://issues.apache.org/jira/browse/HDDS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067371#comment-17067371 ] runzhiwang edited comment on HDDS-3254 at 3/26/20, 6:06 AM: [~shashikant] I have stop send request to s3gateway for one day, but the datanode physical memory does not come down, and the CPU is more than 100%. I will try to reproduce it with following jvm options. And I will check RetryCache object size. -Xms10g -Xmx10g -Xmn4g -XX:+UseParallelGC -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=1024m -XX:SurvivorRatio=4 -verbose:gc -Xloggc:/var/datanode_gc.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/datanode_dump.hprof -XX:ParallelGCThreads=8 -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly was (Author: yjxxtd): [~shashikant] I have stop send request to s3gateway for one day, but the datanode physical memory does not come down, and the CPU is more than 100%. I will try to reproduce it with some jvm options. And I will check RetryCache object size. -Xms10g -Xmx10g -Xmn4g -XX:+UseParallelGC -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=1024m -XX:SurvivorRatio=4 -verbose:gc -Xloggc:/var/datanode_gc.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/datanode_dump.hprof -XX:ParallelGCThreads=8 -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly > Datanode memory increase so much > > > Key: HDDS-3254 > URL: https://issues.apache.org/jira/browse/HDDS-3254 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: image-2020-03-24-10-05-41-212.png, > image-2020-03-24-16-32-43-973.png, image-2020-03-24-16-33-20-795.png > > > As the image shows, the physical memory of datanode increase to 11.2GB, and > then crash. I will find out the root cause. > !image-2020-03-24-10-05-41-212.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org