[jira] [Updated] (HDDS-12090) Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
[ https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDDS-12090: --- Release Note: OM Follower Bootstrap Failures with Snapshots A race condition during the Ozone Manager (OM) bootstrap process could corrupt the database on follower nodes when snapshots were in use. This could cause the OM bootstrap to fail and impact cluster stability. This release introduces a lock to prevent this race condition, ensuring that OM bootstrapping is reliable and that the database remains consistent. As part of this change, a new OM checkpoint endpoint /v2/dbCheckpoint is introduced. New clients (2.2.0 and above) uses the new endpoint, whereas the old clients continues to use the old checkpoint endpoint /dbCheckpoint. The new behavior can be switched off using configuration property ozone.om.db.checkpoint.use.inode.based.transfer. Default is true. was: OM Follower Bootstrap Failures with Snapshots A race condition during the Ozone Manager (OM) bootstrap process could corrupt the database on follower nodes when snapshots were in use. This could cause the OM bootstrap to fail and impact cluster stability. This release introduces a lock to prevent this race condition, ensuring that OM bootstrapping is reliable and that the database remains consistent. As part of this change, a new OM checkpoint endpoint /dbCheckpointv2 is introduced. New clients (2.2.0 and above) uses the new endpoint, whereas the old clients continues to use the old checkpoint endpoint /dbCheckpoint. The new behavior can be switched off using configuration property ozone.om.db.checkpoint.use.inode.based.transfer. Default is true. > Fix Snapshot Bootstrapping race condition to prevent snapshot corruption > > > Key: HDDS-12090 > URL: https://issues.apache.org/jira/browse/HDDS-12090 > Project: Apache Ozone > Issue Type: Bug >Reporter: Swaminathan Balachandran >Assignee: Swaminathan Balachandran >Priority: Major > Fix For: 2.2.0 > > > Currently there is an issue with the existing bootstrapping logic when > dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken > and the bootstrapping runs along with active transactions happening on the > snapshot rocksdb which could lead to having a corrupted Rocksdb instance post > bootstrap on the follower OM. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
[jira] [Updated] (HDDS-12090) Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
[ https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDDS-12090: --- Release Note: OM Follower Bootstrap Failures with Snapshots A race condition during the Ozone Manager (OM) bootstrap process could corrupt the database on follower nodes when snapshots were in use. This could cause the OM bootstrap to fail and impact cluster stability. This release introduces a lock to prevent this race condition, ensuring that OM bootstrapping is reliable and that the database remains consistent. As part of this change, a new OM checkpoint endpoint /dbCheckpointv2 is introduced. New clients (2.2.0 and above) uses the new endpoint, whereas the old clients continues to use the old checkpoint endpoint /dbCheckpoint. The new behavior can be switched off using configuration property ozone.om.db.checkpoint.use.inode.based.transfer. Default is true. was: OM Follower Bootstrap Failures with Snapshots A race condition during the Ozone Manager (OM) bootstrap process could corrupt the database on follower nodes when snapshots were in use. This could cause the OM bootstrap to fail and impact cluster stability. This release introduces a lock to prevent this race condition, ensuring that OM bootstrapping is reliable and that the database remains consistent. As part of this change, a new OM checkpoint endpoint /dbCheckpointv2 is introduced. New clients (2.2.0 and above) uses the new endpoint, whereas the old clients continues to use the old checkpoint endpoint /dbCheckpoint. There are no configuration changes or special upgrade procedures required for this fix. > Fix Snapshot Bootstrapping race condition to prevent snapshot corruption > > > Key: HDDS-12090 > URL: https://issues.apache.org/jira/browse/HDDS-12090 > Project: Apache Ozone > Issue Type: Bug >Reporter: Swaminathan Balachandran >Assignee: Swaminathan Balachandran >Priority: Major > > Currently there is an issue with the existing bootstrapping logic when > dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken > and the bootstrapping runs along with active transactions happening on the > snapshot rocksdb which could lead to having a corrupted Rocksdb instance post > bootstrap on the follower OM. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
[jira] [Updated] (HDDS-12090) Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
[ https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDDS-12090: --- Release Note: OM Follower Bootstrap Failures with Snapshots A race condition during the Ozone Manager (OM) bootstrap process could corrupt the database on follower nodes when snapshots were in use. This could cause the OM bootstrap to fail and impact cluster stability. This release introduces a lock to prevent this race condition, ensuring that OM bootstrapping is reliable and that the database remains consistent. As part of this change, a new OM checkpoint endpoint /dbCheckpointv2 is introduced. New clients (2.2.0 and above) uses the new endpoint, whereas the old clients continues to use the old checkpoint endpoint /dbCheckpoint. There are no configuration changes or special upgrade procedures required for this fix. was: OM Follower Bootstrap Failures with Snapshots A race condition during the Ozone Manager (OM) bootstrap process could corrupt the database on follower nodes when snapshots were in use. This could cause the OM bootstrap to fail and impact cluster stability. This release introduces a lock to prevent this race condition, ensuring that OM bootstrapping is reliable and that the database remains consistent. There are no configuration changes or special upgrade procedures required for this fix. > Fix Snapshot Bootstrapping race condition to prevent snapshot corruption > > > Key: HDDS-12090 > URL: https://issues.apache.org/jira/browse/HDDS-12090 > Project: Apache Ozone > Issue Type: Bug >Reporter: Swaminathan Balachandran >Assignee: Swaminathan Balachandran >Priority: Major > > Currently there is an issue with the existing bootstrapping logic when > dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken > and the bootstrapping runs along with active transactions happening on the > snapshot rocksdb which could lead to having a corrupted Rocksdb instance post > bootstrap on the follower OM. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
[jira] [Updated] (HDDS-12090) Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
[ https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDDS-12090: --- Epic Link: HDDS-13747 (was: HDDS-12940) > Fix Snapshot Bootstrapping race condition to prevent snapshot corruption > > > Key: HDDS-12090 > URL: https://issues.apache.org/jira/browse/HDDS-12090 > Project: Apache Ozone > Issue Type: Bug >Reporter: Swaminathan Balachandran >Assignee: Swaminathan Balachandran >Priority: Major > > Currently there is an issue with the existing bootstrapping logic when > dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken > and the bootstrapping runs along with active transactions happening on the > snapshot rocksdb which could lead to having a corrupted Rocksdb instance post > bootstrap on the follower OM. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
[jira] [Updated] (HDDS-12090) Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
[ https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDDS-12090: --- Fix Version/s: (was: 2.1.0) Target Version/s: 2.2.0 > Fix Snapshot Bootstrapping race condition to prevent snapshot corruption > > > Key: HDDS-12090 > URL: https://issues.apache.org/jira/browse/HDDS-12090 > Project: Apache Ozone > Issue Type: Bug >Reporter: Swaminathan Balachandran >Assignee: Swaminathan Balachandran >Priority: Major > > Currently there is an issue with the existing bootstrapping logic when > dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken > and the bootstrapping runs along with active transactions happening on the > snapshot rocksdb which could lead to having a corrupted Rocksdb instance post > bootstrap on the follower OM. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
[jira] [Updated] (HDDS-12090) Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
[ https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDDS-12090: --- Release Note: OM Follower Bootstrap Failures with Snapshots A race condition during the Ozone Manager (OM) bootstrap process could corrupt the database on follower nodes when snapshots were in use. This could cause the OM bootstrap to fail and impact cluster stability. This release introduces a lock to prevent this race condition, ensuring that OM bootstrapping is reliable and that the database remains consistent. There are no configuration changes or special upgrade procedures required for this fix. > Fix Snapshot Bootstrapping race condition to prevent snapshot corruption > > > Key: HDDS-12090 > URL: https://issues.apache.org/jira/browse/HDDS-12090 > Project: Apache Ozone > Issue Type: Bug >Reporter: Swaminathan Balachandran >Assignee: Swaminathan Balachandran >Priority: Major > Fix For: 2.1.0 > > > Currently there is an issue with the existing bootstrapping logic when > dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken > and the bootstrapping runs along with active transactions happening on the > snapshot rocksdb which could lead to having a corrupted Rocksdb instance post > bootstrap on the follower OM. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
[jira] [Updated] (HDDS-12090) Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
[ https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDDS-12090: --- Epic Link: HDDS-12940 > Fix Snapshot Bootstrapping race condition to prevent snapshot corruption > > > Key: HDDS-12090 > URL: https://issues.apache.org/jira/browse/HDDS-12090 > Project: Apache Ozone > Issue Type: Bug >Reporter: Swaminathan Balachandran >Assignee: Swaminathan Balachandran >Priority: Major > > Currently there is an issue with the existing bootstrapping logic when > dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken > and the bootstrapping runs along with active transactions happening on the > snapshot rocksdb which could lead to having a corrupted Rocksdb instance post > bootstrap on the follower OM. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
[jira] [Updated] (HDDS-12090) Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
[ https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Andika updated HDDS-12090: --- Issue Type: Bug (was: Task) > Fix Snapshot Bootstrapping race condition to prevent snapshot corruption > > > Key: HDDS-12090 > URL: https://issues.apache.org/jira/browse/HDDS-12090 > Project: Apache Ozone > Issue Type: Bug >Reporter: Swaminathan Balachandran >Assignee: Swaminathan Balachandran >Priority: Major > > Currently there is an issue with the existing bootstrapping logic when > dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken > and the bootstrapping runs along with active transactions happening on the > snapshot rocksdb which could lead to having a corrupted Rocksdb instance post > bootstrap on the follower OM. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
[jira] [Updated] (HDDS-12090) Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
[ https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swaminathan Balachandran updated HDDS-12090: Description: Currently there is an issue with the existing bootstrapping logic when dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken and the bootstrapping runs along with active transactions happening on the snapshot rocksdb which could lead to having a corrupted Rocksdb instance post bootstrap on the follower OM. (was: Currently there is an issue with the existing bootstrapping logic when dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken and the bootstrapping runs along with active transactions happening on the snapshot rocksdb which could lead to having a corrupted Rocksdb instance post bootstrap on the follower OM. This doc looks into resolving this issue so that the bootstrapping happens on a consistent view of the system.) > Fix Snapshot Bootstrapping race condition to prevent snapshot corruption > > > Key: HDDS-12090 > URL: https://issues.apache.org/jira/browse/HDDS-12090 > Project: Apache Ozone > Issue Type: Task >Reporter: Swaminathan Balachandran >Assignee: Swaminathan Balachandran >Priority: Major > > Currently there is an issue with the existing bootstrapping logic when > dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken > and the bootstrapping runs along with active transactions happening on the > snapshot rocksdb which could lead to having a corrupted Rocksdb instance post > bootstrap on the follower OM. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
