[jira] [Updated] (HDDS-12090) Fix Snapshot Bootstrapping race condition to prevent snapshot corruption

2026-04-27 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDDS-12090:
---
Release Note: 
  OM Follower Bootstrap Failures with Snapshots

  A race condition during the Ozone Manager (OM) bootstrap process could 
corrupt the database on follower nodes when snapshots were in use.
  This could cause the OM bootstrap to fail and impact cluster stability.

  This release introduces a lock to prevent this race condition, ensuring that 
OM bootstrapping is reliable and that the database remains
  consistent.

  As part of this change, a new OM checkpoint endpoint /v2/dbCheckpoint is 
introduced. New clients (2.2.0 and above) uses the new endpoint, whereas the 
old clients continues to use the old checkpoint endpoint /dbCheckpoint.

  The new behavior can be switched off using configuration property 
ozone.om.db.checkpoint.use.inode.based.transfer. Default is true.



  was:
  OM Follower Bootstrap Failures with Snapshots

  A race condition during the Ozone Manager (OM) bootstrap process could 
corrupt the database on follower nodes when snapshots were in use.
  This could cause the OM bootstrap to fail and impact cluster stability.

  This release introduces a lock to prevent this race condition, ensuring that 
OM bootstrapping is reliable and that the database remains
  consistent.

  As part of this change, a new OM checkpoint endpoint /dbCheckpointv2 is 
introduced. New clients (2.2.0 and above) uses the new endpoint, whereas the 
old clients continues to use the old checkpoint endpoint /dbCheckpoint.

  The new behavior can be switched off using configuration property 
ozone.om.db.checkpoint.use.inode.based.transfer. Default is true.




> Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
> 
>
> Key: HDDS-12090
> URL: https://issues.apache.org/jira/browse/HDDS-12090
> Project: Apache Ozone
>  Issue Type: Bug
>Reporter: Swaminathan Balachandran
>Assignee: Swaminathan Balachandran
>Priority: Major
> Fix For: 2.2.0
>
>
> Currently there is an issue with the existing bootstrapping logic when 
> dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken 
> and the bootstrapping runs along with active transactions happening on the 
> snapshot rocksdb which could lead to having a corrupted Rocksdb instance post 
> bootstrap on the follower OM. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Updated] (HDDS-12090) Fix Snapshot Bootstrapping race condition to prevent snapshot corruption

2026-03-30 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDDS-12090:
---
Release Note: 
  OM Follower Bootstrap Failures with Snapshots

  A race condition during the Ozone Manager (OM) bootstrap process could 
corrupt the database on follower nodes when snapshots were in use.
  This could cause the OM bootstrap to fail and impact cluster stability.

  This release introduces a lock to prevent this race condition, ensuring that 
OM bootstrapping is reliable and that the database remains
  consistent.

  As part of this change, a new OM checkpoint endpoint /dbCheckpointv2 is 
introduced. New clients (2.2.0 and above) uses the new endpoint, whereas the 
old clients continues to use the old checkpoint endpoint /dbCheckpoint.

  The new behavior can be switched off using configuration property 
ozone.om.db.checkpoint.use.inode.based.transfer. Default is true.



  was:
  OM Follower Bootstrap Failures with Snapshots

  A race condition during the Ozone Manager (OM) bootstrap process could 
corrupt the database on follower nodes when snapshots were in use.
  This could cause the OM bootstrap to fail and impact cluster stability.

  This release introduces a lock to prevent this race condition, ensuring that 
OM bootstrapping is reliable and that the database remains
  consistent.

  As part of this change, a new OM checkpoint endpoint /dbCheckpointv2 is 
introduced. New clients (2.2.0 and above) uses the new endpoint, whereas the 
old clients continues to use the old checkpoint endpoint /dbCheckpoint.

  There are no configuration changes or special upgrade procedures required for 
this fix.




> Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
> 
>
> Key: HDDS-12090
> URL: https://issues.apache.org/jira/browse/HDDS-12090
> Project: Apache Ozone
>  Issue Type: Bug
>Reporter: Swaminathan Balachandran
>Assignee: Swaminathan Balachandran
>Priority: Major
>
> Currently there is an issue with the existing bootstrapping logic when 
> dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken 
> and the bootstrapping runs along with active transactions happening on the 
> snapshot rocksdb which could lead to having a corrupted Rocksdb instance post 
> bootstrap on the follower OM. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Updated] (HDDS-12090) Fix Snapshot Bootstrapping race condition to prevent snapshot corruption

2026-03-30 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDDS-12090:
---
Release Note: 
  OM Follower Bootstrap Failures with Snapshots

  A race condition during the Ozone Manager (OM) bootstrap process could 
corrupt the database on follower nodes when snapshots were in use.
  This could cause the OM bootstrap to fail and impact cluster stability.

  This release introduces a lock to prevent this race condition, ensuring that 
OM bootstrapping is reliable and that the database remains
  consistent.

  As part of this change, a new OM checkpoint endpoint /dbCheckpointv2 is 
introduced. New clients (2.2.0 and above) uses the new endpoint, whereas the 
old clients continues to use the old checkpoint endpoint /dbCheckpoint.

  There are no configuration changes or special upgrade procedures required for 
this fix.



  was:
  OM Follower Bootstrap Failures with Snapshots

  A race condition during the Ozone Manager (OM) bootstrap process could 
corrupt the database on follower nodes when snapshots were in use.
  This could cause the OM bootstrap to fail and impact cluster stability.

  This release introduces a lock to prevent this race condition, ensuring that 
OM bootstrapping is reliable and that the database remains
  consistent.

  There are no configuration changes or special upgrade procedures required for 
this fix.


> Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
> 
>
> Key: HDDS-12090
> URL: https://issues.apache.org/jira/browse/HDDS-12090
> Project: Apache Ozone
>  Issue Type: Bug
>Reporter: Swaminathan Balachandran
>Assignee: Swaminathan Balachandran
>Priority: Major
>
> Currently there is an issue with the existing bootstrapping logic when 
> dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken 
> and the bootstrapping runs along with active transactions happening on the 
> snapshot rocksdb which could lead to having a corrupted Rocksdb instance post 
> bootstrap on the follower OM. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Updated] (HDDS-12090) Fix Snapshot Bootstrapping race condition to prevent snapshot corruption

2026-02-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDDS-12090:
---
Epic Link: HDDS-13747  (was: HDDS-12940)

> Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
> 
>
> Key: HDDS-12090
> URL: https://issues.apache.org/jira/browse/HDDS-12090
> Project: Apache Ozone
>  Issue Type: Bug
>Reporter: Swaminathan Balachandran
>Assignee: Swaminathan Balachandran
>Priority: Major
>
> Currently there is an issue with the existing bootstrapping logic when 
> dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken 
> and the bootstrapping runs along with active transactions happening on the 
> snapshot rocksdb which could lead to having a corrupted Rocksdb instance post 
> bootstrap on the follower OM. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Updated] (HDDS-12090) Fix Snapshot Bootstrapping race condition to prevent snapshot corruption

2025-11-10 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDDS-12090:
---
   Fix Version/s: (was: 2.1.0)
Target Version/s: 2.2.0

> Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
> 
>
> Key: HDDS-12090
> URL: https://issues.apache.org/jira/browse/HDDS-12090
> Project: Apache Ozone
>  Issue Type: Bug
>Reporter: Swaminathan Balachandran
>Assignee: Swaminathan Balachandran
>Priority: Major
>
> Currently there is an issue with the existing bootstrapping logic when 
> dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken 
> and the bootstrapping runs along with active transactions happening on the 
> snapshot rocksdb which could lead to having a corrupted Rocksdb instance post 
> bootstrap on the follower OM. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Updated] (HDDS-12090) Fix Snapshot Bootstrapping race condition to prevent snapshot corruption

2025-08-25 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDDS-12090:
---
Release Note: 
  OM Follower Bootstrap Failures with Snapshots

  A race condition during the Ozone Manager (OM) bootstrap process could 
corrupt the database on follower nodes when snapshots were in use.
  This could cause the OM bootstrap to fail and impact cluster stability.

  This release introduces a lock to prevent this race condition, ensuring that 
OM bootstrapping is reliable and that the database remains
  consistent.

  There are no configuration changes or special upgrade procedures required for 
this fix.

> Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
> 
>
> Key: HDDS-12090
> URL: https://issues.apache.org/jira/browse/HDDS-12090
> Project: Apache Ozone
>  Issue Type: Bug
>Reporter: Swaminathan Balachandran
>Assignee: Swaminathan Balachandran
>Priority: Major
> Fix For: 2.1.0
>
>
> Currently there is an issue with the existing bootstrapping logic when 
> dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken 
> and the bootstrapping runs along with active transactions happening on the 
> snapshot rocksdb which could lead to having a corrupted Rocksdb instance post 
> bootstrap on the follower OM. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Updated] (HDDS-12090) Fix Snapshot Bootstrapping race condition to prevent snapshot corruption

2025-05-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDDS-12090:
---
Epic Link: HDDS-12940

> Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
> 
>
> Key: HDDS-12090
> URL: https://issues.apache.org/jira/browse/HDDS-12090
> Project: Apache Ozone
>  Issue Type: Bug
>Reporter: Swaminathan Balachandran
>Assignee: Swaminathan Balachandran
>Priority: Major
>
> Currently there is an issue with the existing bootstrapping logic when 
> dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken 
> and the bootstrapping runs along with active transactions happening on the 
> snapshot rocksdb which could lead to having a corrupted Rocksdb instance post 
> bootstrap on the follower OM. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Updated] (HDDS-12090) Fix Snapshot Bootstrapping race condition to prevent snapshot corruption

2025-02-26 Thread Ivan Andika (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika updated HDDS-12090:
---
Issue Type: Bug  (was: Task)

> Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
> 
>
> Key: HDDS-12090
> URL: https://issues.apache.org/jira/browse/HDDS-12090
> Project: Apache Ozone
>  Issue Type: Bug
>Reporter: Swaminathan Balachandran
>Assignee: Swaminathan Balachandran
>Priority: Major
>
> Currently there is an issue with the existing bootstrapping logic when 
> dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken 
> and the bootstrapping runs along with active transactions happening on the 
> snapshot rocksdb which could lead to having a corrupted Rocksdb instance post 
> bootstrap on the follower OM. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Updated] (HDDS-12090) Fix Snapshot Bootstrapping race condition to prevent snapshot corruption

2025-01-15 Thread Swaminathan Balachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swaminathan Balachandran updated HDDS-12090:

Description: Currently there is an issue with the existing bootstrapping 
logic when dealing with Snapshotted OM Rocksdb. While bootstrapping no locks 
are taken and the bootstrapping runs along with active transactions happening 
on the snapshot rocksdb which could lead to having a corrupted Rocksdb instance 
post bootstrap on the follower OM.   (was: Currently there is an issue with the 
existing bootstrapping logic when dealing with Snapshotted OM Rocksdb. While 
bootstrapping no locks are taken and the bootstrapping runs along with active 
transactions happening on the snapshot rocksdb which could lead to having a 
corrupted Rocksdb instance post bootstrap on the follower OM. This doc looks 
into resolving this issue so that the bootstrapping happens on a consistent 
view of the system.)

> Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
> 
>
> Key: HDDS-12090
> URL: https://issues.apache.org/jira/browse/HDDS-12090
> Project: Apache Ozone
>  Issue Type: Task
>Reporter: Swaminathan Balachandran
>Assignee: Swaminathan Balachandran
>Priority: Major
>
> Currently there is an issue with the existing bootstrapping logic when 
> dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken 
> and the bootstrapping runs along with active transactions happening on the 
> snapshot rocksdb which could lead to having a corrupted Rocksdb instance post 
> bootstrap on the follower OM. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]