Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
ptlrs commented on PR #8843: URL: https://github.com/apache/ozone/pull/8843#issuecomment-4216327335 Thanks for the reviews @ChenSammi @adoroszlai @Tejaskriya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
ChenSammi commented on PR #8843: URL: https://github.com/apache/ozone/pull/8843#issuecomment-4203858109 Thanks @ptlrs , and @Tejaskriya @adoroszlai for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
ChenSammi merged PR #8843: URL: https://github.com/apache/ozone/pull/8843 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
ptlrs commented on PR #8843: URL: https://github.com/apache/ozone/pull/8843#issuecomment-4201301527 I have removed the modifications from DeprecationDelta -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
ChenSammi commented on PR #8843: URL: https://github.com/apache/ozone/pull/8843#issuecomment-4198111549 @ptlrs , it looks like we cannot add "hdds.datanode.disk.check.io.test.count" to DeprecationDelta. It's different property type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
ptlrs commented on code in PR #8843:
URL: https://github.com/apache/ozone/pull/8843#discussion_r3030881349
##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java:
##
@@ -404,6 +409,19 @@ public class DatanodeConfiguration extends
ReconfigurableConfig {
)
private Duration diskCheckTimeout = DISK_CHECK_TIMEOUT_DEFAULT;
+ @Config(key = "hdds.datanode.disk.check.sliding.window.timeout",
Review Comment:
Ok, I reverted the change which removed the config and updated the
deprecated config list.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
ChenSammi commented on code in PR #8843:
URL: https://github.com/apache/ozone/pull/8843#discussion_r3019611363
##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java:
##
@@ -404,6 +409,19 @@ public class DatanodeConfiguration extends
ReconfigurableConfig {
)
private Duration diskCheckTimeout = DISK_CHECK_TIMEOUT_DEFAULT;
+ @Config(key = "hdds.datanode.disk.check.sliding.window.timeout",
Review Comment:
To not break existing users, it's recommend to add new property
"hdds.datanode.disk.check.io.test.enabled", instead of change the current
property "hdds.datanode.disk.check.io.test.count" to
"hdds.datanode.disk.check.io.test.enabled".
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
ptlrs commented on code in PR #8843:
URL: https://github.com/apache/ozone/pull/8843#discussion_r3013230591
##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java:
##
@@ -688,6 +706,23 @@ public void validate() {
diskCheckTimeout = DISK_CHECK_TIMEOUT_DEFAULT;
}
+if (diskCheckSlidingWindowTimeout.isNegative()) {
+ LOG.warn("{} must be greater than zero and was set to {}. Defaulting to
{}",
+ DISK_CHECK_SLIDING_WINDOW_TIMEOUT_KEY, diskCheckSlidingWindowTimeout,
+ DISK_CHECK_SLIDING_WINDOW_TIMEOUT_DEFAULT);
+ diskCheckSlidingWindowTimeout =
DISK_CHECK_SLIDING_WINDOW_TIMEOUT_DEFAULT;
+}
+
+// Do not set window timeout <= periodic disk check interval period, or
failures can be missed across sparse checks
+// e.g., every 120m interval with a 60m window rarely accumulates enough
failed events
+if
(diskCheckSlidingWindowTimeout.compareTo(Duration.ofMinutes(periodicDiskCheckIntervalMinutes))
< 0) {
+ LOG.warn("{} must be greater than or equal to {} minutes and was set to
{} minutes. Defaulting to {}",
+ DISK_CHECK_SLIDING_WINDOW_TIMEOUT_KEY,
periodicDiskCheckIntervalMinutes,
+ diskCheckSlidingWindowTimeout.toMinutes(),
+
DurationFormatUtils.formatDurationHMS(DISK_CHECK_SLIDING_WINDOW_TIMEOUT_DEFAULT.toMillis()));
+ diskCheckSlidingWindowTimeout =
DISK_CHECK_SLIDING_WINDOW_TIMEOUT_DEFAULT;
Review Comment:
Good catch. Done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
ptlrs commented on code in PR #8843:
URL: https://github.com/apache/ozone/pull/8843#discussion_r3013229707
##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java:
##
@@ -404,6 +409,19 @@ public class DatanodeConfiguration extends
ReconfigurableConfig {
)
private Duration diskCheckTimeout = DISK_CHECK_TIMEOUT_DEFAULT;
+ @Config(key = "hdds.datanode.disk.check.sliding.window.timeout",
Review Comment:
Done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
ChenSammi commented on code in PR #8843:
URL: https://github.com/apache/ozone/pull/8843#discussion_r3007843398
##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java:
##
@@ -404,6 +409,19 @@ public class DatanodeConfiguration extends
ReconfigurableConfig {
)
private Duration diskCheckTimeout = DISK_CHECK_TIMEOUT_DEFAULT;
+ @Config(key = "hdds.datanode.disk.check.sliding.window.timeout",
Review Comment:
With this sliding window introduced,
"hdds.datanode.disk.check.io.test.count" property function is half removed. We
should consider deprecate "hdds.datanode.disk.check.io.test.count" and
introduce a new boolean property with name, like
"hdds.datanode.disk.check.io.test.enabled".
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
ChenSammi commented on code in PR #8843:
URL: https://github.com/apache/ozone/pull/8843#discussion_r3007781558
##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java:
##
@@ -688,6 +706,23 @@ public void validate() {
diskCheckTimeout = DISK_CHECK_TIMEOUT_DEFAULT;
}
+if (diskCheckSlidingWindowTimeout.isNegative()) {
+ LOG.warn("{} must be greater than zero and was set to {}. Defaulting to
{}",
+ DISK_CHECK_SLIDING_WINDOW_TIMEOUT_KEY, diskCheckSlidingWindowTimeout,
+ DISK_CHECK_SLIDING_WINDOW_TIMEOUT_DEFAULT);
+ diskCheckSlidingWindowTimeout =
DISK_CHECK_SLIDING_WINDOW_TIMEOUT_DEFAULT;
+}
+
+// Do not set window timeout <= periodic disk check interval period, or
failures can be missed across sparse checks
+// e.g., every 120m interval with a 60m window rarely accumulates enough
failed events
+if
(diskCheckSlidingWindowTimeout.compareTo(Duration.ofMinutes(periodicDiskCheckIntervalMinutes))
< 0) {
+ LOG.warn("{} must be greater than or equal to {} minutes and was set to
{} minutes. Defaulting to {}",
+ DISK_CHECK_SLIDING_WINDOW_TIMEOUT_KEY,
periodicDiskCheckIntervalMinutes,
+ diskCheckSlidingWindowTimeout.toMinutes(),
+
DurationFormatUtils.formatDurationHMS(DISK_CHECK_SLIDING_WINDOW_TIMEOUT_DEFAULT.toMillis()));
+ diskCheckSlidingWindowTimeout =
DISK_CHECK_SLIDING_WINDOW_TIMEOUT_DEFAULT;
Review Comment:
Instead of set diskCheckSlidingWindowTimeout to static
DISK_CHECK_SLIDING_WINDOW_TIMEOUT_DEFAULT, how about set
diskCheckSlidingWindowTimeout to
Duration.ofMinutes(periodicDiskCheckInterval).plus(diskCheckTimeout)?
Same for above diskCheckSlidingWindowTimeout negative check.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
adoroszlai commented on code in PR #8843:
URL: https://github.com/apache/ozone/pull/8843#discussion_r3002594733
##
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/volume/TestStorageVolumeHealthChecks.java:
##
@@ -57,19 +60,22 @@ public static Stream volumeBuilders() {
new HddsVolume.Builder(volumePath.toString())
.datanodeUuid(DATANODE_UUID)
.conf(CONF)
-.usageCheckFactory(MockSpaceUsageCheckFactory.NONE);
+.usageCheckFactory(MockSpaceUsageCheckFactory.NONE)
+.clock(TEST_CLOCK);
Review Comment:
No need to update the patch for this, just a note for the future: adding
`.clock...` in a previous line would reduce unnecessary noise in the patch (due
to moving around the `;`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
ptlrs commented on PR #8843: URL: https://github.com/apache/ozone/pull/8843#issuecomment-418630 Hi @errose28 @ChenSammi @adoroszlai I have updated this PR, could you please take another look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
adoroszlai commented on code in PR #8843:
URL: https://github.com/apache/ozone/pull/8843#discussion_r2987432293
##
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/utils/SlidingWindow.java:
##
@@ -149,12 +149,16 @@ private long getCurrentTime() {
return clock.millis();
}
+ public long getExpiryDurationMillis() {
+return expiryDurationMillis;
+ }
+
/**
* A custom monotonic clock implementation.
* Implementation of Clock that uses System.nanoTime() for real usage.
* See {@see org.apache.ozone.test.TestClock}
Review Comment:
```
[WARNING]
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/utils/SlidingWindow.java:159:
warning - invalid usage of tag {@see org.apache.ozone.test.TestClock}
```
```suggestion
* @see org.apache.ozone.test.TestClock
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
adoroszlai commented on code in PR #8843:
URL: https://github.com/apache/ozone/pull/8843#discussion_r2987420224
##
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/volume/TestStorageVolumeHealthChecks.java:
##
@@ -304,13 +308,23 @@ private void
testCheckIOUntilFailure(StorageVolume.Builder builder,
DatanodeConfiguration dnConf = CONF.getObject(DatanodeConfiguration.class);
dnConf.setVolumeIOTestCount(ioTestCount);
dnConf.setVolumeIOFailureTolerance(ioFailureTolerance);
+dnConf.setDiskCheckSlidingWindowTimeout(Duration.ofMillis(ioTestCount));
CONF.setFromObject(dnConf);
builder.conf(CONF);
StorageVolume volume = builder.build();
volume.format(CLUSTER_ID);
volume.createTmpDirs(CLUSTER_ID);
+// Sliding window protocol transitioned from count-based to a time-based
system
+// Update the default failure duration of the window from 60 minutes to a
shorter duration for the test
+long eventRate = 1L;
+TestClock testClock = TestClock.newInstance();
+Field clock = SlidingWindow.class.getDeclaredField("clock");
+clock.setAccessible(true);
+clock.set(volume.getIoTestSlidingWindow(), testClock);
Review Comment:
Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
ptlrs commented on code in PR #8843:
URL: https://github.com/apache/ozone/pull/8843#discussion_r2985324862
##
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/volume/TestStorageVolumeHealthChecks.java:
##
@@ -304,13 +308,23 @@ private void
testCheckIOUntilFailure(StorageVolume.Builder builder,
DatanodeConfiguration dnConf = CONF.getObject(DatanodeConfiguration.class);
dnConf.setVolumeIOTestCount(ioTestCount);
dnConf.setVolumeIOFailureTolerance(ioFailureTolerance);
+dnConf.setDiskCheckSlidingWindowTimeout(Duration.ofMillis(ioTestCount));
CONF.setFromObject(dnConf);
builder.conf(CONF);
StorageVolume volume = builder.build();
volume.format(CLUSTER_ID);
volume.createTmpDirs(CLUSTER_ID);
+// Sliding window protocol transitioned from count-based to a time-based
system
+// Update the default failure duration of the window from 60 minutes to a
shorter duration for the test
+long eventRate = 1L;
+TestClock testClock = TestClock.newInstance();
+Field clock = SlidingWindow.class.getDeclaredField("clock");
+clock.setAccessible(true);
+clock.set(volume.getIoTestSlidingWindow(), testClock);
Review Comment:
Done
##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/StorageVolume.java:
##
@@ -717,39 +718,25 @@ public synchronized VolumeCheckResult check(@Nullable
Boolean unused)
// We can check again if disk is full. If it is full,
// in this case keep volume as healthy so that READ can still be served
if (!diskChecksPassed && getCurrentUsage().getAvailable() <
minimumDiskSpace) {
- ioTestSlidingWindow.add(true);
return VolumeCheckResult.HEALTHY;
}
-// Move the sliding window of IO test results forward 1 by adding the
-// latest entry and removing the oldest entry from the window.
-// Update the failure counter for the new window.
-ioTestSlidingWindow.add(diskChecksPassed);
if (!diskChecksPassed) {
- currentIOFailureCount.incrementAndGet();
-}
-if (ioTestSlidingWindow.size() > ioTestCount &&
-Objects.equals(ioTestSlidingWindow.poll(), Boolean.FALSE)) {
- currentIOFailureCount.decrementAndGet();
+ ioTestSlidingWindow.add();
}
-// If the failure threshold has been crossed, fail the volume without
-// further scans.
+// If the failure threshold has been crossed, fail the volume without
further scans.
// Once the volume is failed, it will not be checked anymore.
// The failure counts can be left as is.
-if (currentIOFailureCount.get() > ioFailureTolerance) {
- LOG.error("Failed IO test for volume {}: the last {} runs " +
- "encountered {} out of {} tolerated failures.", this,
- ioTestSlidingWindow.size(), currentIOFailureCount,
- ioFailureTolerance);
+if (ioTestSlidingWindow.isExceeded()) {
+ LOG.error("Failed IO test for volume {}: encountered more than the {}
tolerated failures within the past {} ms.",
+ this, ioTestSlidingWindow.getWindowSize(),
ioTestSlidingWindow.getExpiryDurationMillis());
return VolumeCheckResult.FAILED;
-} else if (LOG.isDebugEnabled()) {
- LOG.debug("IO test results for volume {}: the last {} runs encountered "
+
- "{} out of {} tolerated failures", this,
- ioTestSlidingWindow.size(),
- currentIOFailureCount, ioFailureTolerance);
}
+LOG.debug("IO test results for volume {}: encountered {} out of {}
tolerated failures",
+this, ioTestSlidingWindow.getNumEventsInWindow(),
ioTestSlidingWindow.getWindowSize());
Review Comment:
`getNumEventsInWindow` is correct here. It will show how many failures are
in the actual window. The window size is the number of failures that are
allowed.
This log line is logged every time in debug mode. Since our queue is larger
than the window, logging `getNumEvents` may give us the false impression that
we got more failures as it will include the errors outside the window as well.
We are using `getNumEvents` only for logging in tests.
##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java:
##
@@ -404,6 +407,16 @@ public class DatanodeConfiguration extends
ReconfigurableConfig {
)
private Duration diskCheckTimeout = DISK_CHECK_TIMEOUT_DEFAULT;
+ @Config(key = "hdds.datanode.disk.check.sliding.window.timeout",
+ defaultValue = "60m",
+ type = ConfigType.TIME,
+ tags = {ConfigTag.DATANODE},
+ description = "Time interval after which a disk check"
+ + " failure result stored in the sliding window will expire."
+ + " Unit could be defined with postfix (ns,ms,s,m,h,d)."
+ )
+ private Duration diskCheckSlidingWindowTimeout =
DISK_CHECK_SLIDING_WINDOW_TIMEOUT_DEFAULT;
Review Comment:
Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
ChenSammi commented on code in PR #8843:
URL: https://github.com/apache/ozone/pull/8843#discussion_r2915854722
##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/StorageVolume.java:
##
@@ -717,39 +718,25 @@ public synchronized VolumeCheckResult check(@Nullable
Boolean unused)
// We can check again if disk is full. If it is full,
// in this case keep volume as healthy so that READ can still be served
if (!diskChecksPassed && getCurrentUsage().getAvailable() <
minimumDiskSpace) {
- ioTestSlidingWindow.add(true);
return VolumeCheckResult.HEALTHY;
}
-// Move the sliding window of IO test results forward 1 by adding the
-// latest entry and removing the oldest entry from the window.
-// Update the failure counter for the new window.
-ioTestSlidingWindow.add(diskChecksPassed);
if (!diskChecksPassed) {
- currentIOFailureCount.incrementAndGet();
-}
-if (ioTestSlidingWindow.size() > ioTestCount &&
-Objects.equals(ioTestSlidingWindow.poll(), Boolean.FALSE)) {
- currentIOFailureCount.decrementAndGet();
+ ioTestSlidingWindow.add();
}
-// If the failure threshold has been crossed, fail the volume without
-// further scans.
+// If the failure threshold has been crossed, fail the volume without
further scans.
// Once the volume is failed, it will not be checked anymore.
// The failure counts can be left as is.
-if (currentIOFailureCount.get() > ioFailureTolerance) {
- LOG.error("Failed IO test for volume {}: the last {} runs " +
- "encountered {} out of {} tolerated failures.", this,
- ioTestSlidingWindow.size(), currentIOFailureCount,
- ioFailureTolerance);
+if (ioTestSlidingWindow.isExceeded()) {
+ LOG.error("Failed IO test for volume {}: encountered more than the {}
tolerated failures within the past {} ms.",
+ this, ioTestSlidingWindow.getWindowSize(),
ioTestSlidingWindow.getExpiryDurationMillis());
return VolumeCheckResult.FAILED;
-} else if (LOG.isDebugEnabled()) {
- LOG.debug("IO test results for volume {}: the last {} runs encountered "
+
- "{} out of {} tolerated failures", this,
- ioTestSlidingWindow.size(),
- currentIOFailureCount, ioFailureTolerance);
}
+LOG.debug("IO test results for volume {}: encountered {} out of {}
tolerated failures",
+this, ioTestSlidingWindow.getNumEventsInWindow(),
ioTestSlidingWindow.getWindowSize());
Review Comment:
Shall we call getNumEvents() here, instead of getNumEventsInWindow()?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
ChenSammi commented on code in PR #8843:
URL: https://github.com/apache/ozone/pull/8843#discussion_r2915835263
##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/StorageVolume.java:
##
@@ -163,9 +159,8 @@ protected StorageVolume(Builder b) throws IOException {
this.conf = b.conf;
this.dnConf = conf.getObject(DatanodeConfiguration.class);
this.ioTestCount = dnConf.getVolumeIOTestCount();
- this.ioFailureTolerance = dnConf.getVolumeIOFailureTolerance();
- this.ioTestSlidingWindow = new LinkedList<>();
- this.currentIOFailureCount = new AtomicInteger(0);
+ this.ioTestSlidingWindow = new
SlidingWindow(dnConf.getVolumeIOFailureTolerance(),
Review Comment:
Thanks @ptlrs for this improvement. Time window based solution is better
than current accumulated count solution, volume has the chance to survive
through the temporary failures.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-13108. Refactor StorageVolume to use SlidingWindow [ozone]
ChenSammi commented on code in PR #8843:
URL: https://github.com/apache/ozone/pull/8843#discussion_r2915827798
##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java:
##
@@ -404,6 +407,16 @@ public class DatanodeConfiguration extends
ReconfigurableConfig {
)
private Duration diskCheckTimeout = DISK_CHECK_TIMEOUT_DEFAULT;
+ @Config(key = "hdds.datanode.disk.check.sliding.window.timeout",
+ defaultValue = "60m",
+ type = ConfigType.TIME,
+ tags = {ConfigTag.DATANODE},
+ description = "Time interval after which a disk check"
+ + " failure result stored in the sliding window will expire."
+ + " Unit could be defined with postfix (ns,ms,s,m,h,d)."
+ )
+ private Duration diskCheckSlidingWindowTimeout =
DISK_CHECK_SLIDING_WINDOW_TIMEOUT_DEFAULT;
Review Comment:
What's the recommendation of value relationship between this new property
and PERIODIC_DISK_CHECK_INTERVAL_MINUTES_DEFAULT? Say if user reconfigured
PERIODIC_DISK_CHECK_INTERVAL_MINUTES_DEFAULT to 2h, or 30m, shall we suggest
user to reconfigure this property too?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
