Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]

2026-04-01 Thread via GitHub


github-actions[bot] commented on PR #9664:
URL: https://github.com/apache/ozone/pull/9664#issuecomment-4173693324

   This PR has been marked as stale due to 21 days of inactivity. Please 
comment or remove the stale label to keep it open. Otherwise, it will be 
automatically closed in 7 days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]

2026-03-11 Thread via GitHub


errose28 commented on code in PR #9664:
URL: https://github.com/apache/ozone/pull/9664#discussion_r2920908638


##
hadoop-hdds/docs/content/design/zdu-design.md:
##
@@ -0,0 +1,535 @@
+---
+jira: HDDS-3331
+authors:
+- Stephen O'Donnell
+- Ethan Rose
+- Istvan Fajth
+---
+
+
+
+# Zero Downtime Upgrade (ZDU)
+
+## The Goal
+
+The goal of Zero Downtime Upgrade (ZDU) is to allow the software running an 
existing Ozone cluster to be upgraded while the cluster remains operational. 
There should be no gaps in service and the upgrade should be transparent to 
applications using the cluster.
+
+Ozone is already designed to be fault tolerant, so the rolling restart of SCM, 
OM and Datanodes is already possible without impacting users of the cluster. 
The challenge with ZDU is therefore related to wire and disk compatibility, as 
different components within the cluster can be running different software 
versions concurrently. This design will focus on how we solve the wire and disk 
compatibility issues.
+
+## Component Upgrade Order
+
+To simplify reasoning about components of different types running in different 
versions, we should reduce the number of possible version combinations allowed 
as much as possible. Clients are considered external to the Ozone cluster, 
therefore we cannot control their version. However, we already have a framework 
to handle client/server cross compatibility, so rolling upgrade only needs to 
focus on compatibility of internal components. For internal Ozone components, 
we can define and enforce an order that the components must be upgraded in. 
Consider the following Ozone service diagram:
+
+![Ozone connection diagram](zdu-image1.png)
+
+Here the arrows represent client to server interactions between components, 
with the arrow pointing from the client to the server. The red arrow is 
external clients interacting with Ozone. The shield means that the client needs 
to see a consistent API surface despite leader changes in mixed version 
clusters so that APIs do not seem to disappear and reappear based on the node 
serving the request. The orange lines represent client to server interactions 
for internal Ozone components. For components connected by this internal line, 
**we can control the order that they are upgraded such that the server is 
always newer and handles all compatibility issues**. This greatly reduces the 
matrix of possible versions we may see within Ozone and mostly eliminates the 
need for internal Ozone components to be aware of each other’s versions, as 
long as servers remain backwards compatible. This order is:
+
+1. Upgrade all SCMs to the new version  
+2. Upgrade Recon to the new version  
+3. Upgrade all Datanodes to the new version  
+4. Upgrade all OMs to the new version  
+5. Upgrade all S3 gateways to the new version
+
+Note that in this ordering, Recon will still have a new client/old server 
relationship with OM for a period of time. The OM sync process in Recon is the 
only API that needs to account for this, and it is not on the main data read, 
write, delete, or recovery path. Recon should be upgraded with the SCMs because 
its container report processing from the datanodes shares SCM code, so we do 
not want Recon to handle a different version matrix among datanodes than SCM.
+
+## Software Version Framework
+
+The previous section defines an upgrade order to handle API compatibility 
between internal components of different types without the need for explicit 
versioning. For internal components of the same type, we need to provide 
stronger guarantees when they are in mixed versions:
+
+* Components of the same type must persist the same data  
+* Components of the same type must expose a consistent API surface
+
+To accomplish these goals, we need a versioning framework to track component 
specific versions and ensure components of the same type operate in unison. 
Note that this versioning framework will not extend beyond Ozone into lower 
level libraries like Ratis, Hadoop RPC, gRPC, and protobuf. We are dependent on 
these libraries providing their own cross compatibility guarantees for ZDU to 
function.
+
+### Versioning in the Existing Upgrade Framework
+
+Before discussing versioning in the context of ZDU, we should first review the 
versioning framework currently present which allows for upgrades and downgrades 
within Ozone, and cross compatibility between Ozone and external clients of 
various versions.
+
+Ozone components currently define their version in two classes: 
ComponentVersion and LayoutFeature. Any change to the on-disk format increments 
the Layout Feature/Version, which is internal to the component. You can see 
examples of the Layout Version in classes such as HDDSLayoutFeature, 
OMLayoutFeature and ReconLayoutFeature. Any change to the API layer which may 
affect external clients will increment the ComponentVersion. Component versions 
are defined in classes like OzoneManagerVersion and DatanodeVersion. 

Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]

2026-03-11 Thread via GitHub


errose28 commented on code in PR #9664:
URL: https://github.com/apache/ozone/pull/9664#discussion_r2920895756


##
hadoop-hdds/docs/content/design/zdu-design.md:
##
@@ -0,0 +1,535 @@
+---
+jira: HDDS-3331
+authors:
+- Stephen O'Donnell
+- Ethan Rose
+- Istvan Fajth
+---
+
+
+
+# Zero Downtime Upgrade (ZDU)
+
+## The Goal
+
+The goal of Zero Downtime Upgrade (ZDU) is to allow the software running an 
existing Ozone cluster to be upgraded while the cluster remains operational. 
There should be no gaps in service and the upgrade should be transparent to 
applications using the cluster.
+
+Ozone is already designed to be fault tolerant, so the rolling restart of SCM, 
OM and Datanodes is already possible without impacting users of the cluster. 
The challenge with ZDU is therefore related to wire and disk compatibility, as 
different components within the cluster can be running different software 
versions concurrently. This design will focus on how we solve the wire and disk 
compatibility issues.
+
+## Component Upgrade Order
+
+To simplify reasoning about components of different types running in different 
versions, we should reduce the number of possible version combinations allowed 
as much as possible. Clients are considered external to the Ozone cluster, 
therefore we cannot control their version. However, we already have a framework 
to handle client/server cross compatibility, so rolling upgrade only needs to 
focus on compatibility of internal components. For internal Ozone components, 
we can define and enforce an order that the components must be upgraded in. 
Consider the following Ozone service diagram:
+
+![Ozone connection diagram](zdu-image1.png)
+
+Here the arrows represent client to server interactions between components, 
with the arrow pointing from the client to the server. The red arrow is 
external clients interacting with Ozone. The shield means that the client needs 
to see a consistent API surface despite leader changes in mixed version 
clusters so that APIs do not seem to disappear and reappear based on the node 
serving the request. The orange lines represent client to server interactions 
for internal Ozone components. For components connected by this internal line, 
**we can control the order that they are upgraded such that the server is 
always newer and handles all compatibility issues**. This greatly reduces the 
matrix of possible versions we may see within Ozone and mostly eliminates the 
need for internal Ozone components to be aware of each other’s versions, as 
long as servers remain backwards compatible. This order is:
+
+1. Upgrade all SCMs to the new version  
+2. Upgrade Recon to the new version  
+3. Upgrade all Datanodes to the new version  
+4. Upgrade all OMs to the new version  

Review Comment:
   There upgrade/restart steps are done by an admin, possibly with an 
orchestration layer, so Ozone doesn't decide whether or not the decom nodes get 
upgraded. If they do, nothing about the decom/maintenace/recom process is 
expected to change though since ZDU means all existing operations are allowed 
throughout the upgrade and finalization process.
   
   Starting on line 227 we spec out how datanodes are handled relative to SCM, 
which includes if they are offline and come back later.  Let me know if there's 
more questions in that area. Note that once SCM is finalized, any datanodes 
that later appear with the old software version will be fenced out until the 
admin upgrades them.
   
   The doc currently doesn't specify whether nodes undergoing decom or 
maintenance will be instructed to finalize by SCM. I think we should still send 
them the finalize commands so they don't block further upgrade steps 
unnecessarily. @sodonnel what do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]

2026-03-11 Thread via GitHub


errose28 commented on code in PR #9664:
URL: https://github.com/apache/ozone/pull/9664#discussion_r2920779201


##
hadoop-hdds/docs/content/design/zdu-design.md:
##
@@ -0,0 +1,535 @@
+---
+jira: HDDS-3331
+authors:
+- Stephen O'Donnell
+- Ethan Rose
+- Istvan Fajth
+---
+
+
+
+# Zero Downtime Upgrade (ZDU)
+
+## The Goal
+
+The goal of Zero Downtime Upgrade (ZDU) is to allow the software running an 
existing Ozone cluster to be upgraded while the cluster remains operational. 
There should be no gaps in service and the upgrade should be transparent to 
applications using the cluster.
+
+Ozone is already designed to be fault tolerant, so the rolling restart of SCM, 
OM and Datanodes is already possible without impacting users of the cluster. 
The challenge with ZDU is therefore related to wire and disk compatibility, as 
different components within the cluster can be running different software 
versions concurrently. This design will focus on how we solve the wire and disk 
compatibility issues.
+
+## Component Upgrade Order
+
+To simplify reasoning about components of different types running in different 
versions, we should reduce the number of possible version combinations allowed 
as much as possible. Clients are considered external to the Ozone cluster, 
therefore we cannot control their version. However, we already have a framework 
to handle client/server cross compatibility, so rolling upgrade only needs to 
focus on compatibility of internal components. For internal Ozone components, 
we can define and enforce an order that the components must be upgraded in. 
Consider the following Ozone service diagram:
+
+![Ozone connection diagram](zdu-image1.png)
+
+Here the arrows represent client to server interactions between components, 
with the arrow pointing from the client to the server. The red arrow is 
external clients interacting with Ozone. The shield means that the client needs 
to see a consistent API surface despite leader changes in mixed version 
clusters so that APIs do not seem to disappear and reappear based on the node 
serving the request. The orange lines represent client to server interactions 
for internal Ozone components. For components connected by this internal line, 
**we can control the order that they are upgraded such that the server is 
always newer and handles all compatibility issues**. This greatly reduces the 
matrix of possible versions we may see within Ozone and mostly eliminates the 
need for internal Ozone components to be aware of each other’s versions, as 
long as servers remain backwards compatible. This order is:
+
+1. Upgrade all SCMs to the new version  
+2. Upgrade Recon to the new version  
+3. Upgrade all Datanodes to the new version  
+4. Upgrade all OMs to the new version  
+5. Upgrade all S3 gateways to the new version

Review Comment:
   This should not cause an issue, because the apparent versions the components 
will remain the same in the Ratis ring even as the software is updated. That 
means the components with newer software version will still write data in a way 
that the older components bootstrapping can understand (and vice versa). Check 
out the table around line 107 and the appendix to see how the apparent version 
moves in lock step for a Ratis ring.
   
   Finalization to move the apparent version forward can be done from a Ratis 
snapshot because the version is written to the DB as well as the version file. 
This is already handled in the current upgrade flow because finalization is an 
online operation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]

2026-03-08 Thread via GitHub


smengcl commented on code in PR #9664:
URL: https://github.com/apache/ozone/pull/9664#discussion_r2902408368


##
hadoop-hdds/docs/content/design/zdu-design.md:
##
@@ -0,0 +1,535 @@
+---
+jira: HDDS-3331
+authors:
+- Stephen O'Donnell
+- Ethan Rose
+- Istvan Fajth
+---
+
+
+
+# Zero Downtime Upgrade (ZDU)
+
+## The Goal
+
+The goal of Zero Downtime Upgrade (ZDU) is to allow the software running an 
existing Ozone cluster to be upgraded while the cluster remains operational. 
There should be no gaps in service and the upgrade should be transparent to 
applications using the cluster.
+
+Ozone is already designed to be fault tolerant, so the rolling restart of SCM, 
OM and Datanodes is already possible without impacting users of the cluster. 
The challenge with ZDU is therefore related to wire and disk compatibility, as 
different components within the cluster can be running different software 
versions concurrently. This design will focus on how we solve the wire and disk 
compatibility issues.
+
+## Component Upgrade Order
+
+To simplify reasoning about components of different types running in different 
versions, we should reduce the number of possible version combinations allowed 
as much as possible. Clients are considered external to the Ozone cluster, 
therefore we cannot control their version. However, we already have a framework 
to handle client/server cross compatibility, so rolling upgrade only needs to 
focus on compatibility of internal components. For internal Ozone components, 
we can define and enforce an order that the components must be upgraded in. 
Consider the following Ozone service diagram:
+
+![Ozone connection diagram](zdu-image1.png)
+
+Here the arrows represent client to server interactions between components, 
with the arrow pointing from the client to the server. The red arrow is 
external clients interacting with Ozone. The shield means that the client needs 
to see a consistent API surface despite leader changes in mixed version 
clusters so that APIs do not seem to disappear and reappear based on the node 
serving the request. The orange lines represent client to server interactions 
for internal Ozone components. For components connected by this internal line, 
**we can control the order that they are upgraded such that the server is 
always newer and handles all compatibility issues**. This greatly reduces the 
matrix of possible versions we may see within Ozone and mostly eliminates the 
need for internal Ozone components to be aware of each other’s versions, as 
long as servers remain backwards compatible. This order is:
+
+1. Upgrade all SCMs to the new version  
+2. Upgrade Recon to the new version  
+3. Upgrade all Datanodes to the new version  
+4. Upgrade all OMs to the new version  

Review Comment:
   Is it correct to assume that decommissioned datanodes would be ignored 
during the upgrade in Step 3? If so, what if they are recommissioned later (say 
after Step 4)?



##
hadoop-hdds/docs/content/design/zdu-design.md:
##
@@ -0,0 +1,535 @@
+---
+jira: HDDS-3331
+authors:
+- Stephen O'Donnell
+- Ethan Rose
+- Istvan Fajth
+---
+
+
+
+# Zero Downtime Upgrade (ZDU)
+
+## The Goal
+
+The goal of Zero Downtime Upgrade (ZDU) is to allow the software running an 
existing Ozone cluster to be upgraded while the cluster remains operational. 
There should be no gaps in service and the upgrade should be transparent to 
applications using the cluster.
+
+Ozone is already designed to be fault tolerant, so the rolling restart of SCM, 
OM and Datanodes is already possible without impacting users of the cluster. 
The challenge with ZDU is therefore related to wire and disk compatibility, as 
different components within the cluster can be running different software 
versions concurrently. This design will focus on how we solve the wire and disk 
compatibility issues.
+
+## Component Upgrade Order
+
+To simplify reasoning about components of different types running in different 
versions, we should reduce the number of possible version combinations allowed 
as much as possible. Clients are considered external to the Ozone cluster, 
therefore we cannot control their version. However, we already have a framework 
to handle client/server cross compatibility, so rolling upgrade only needs to 
focus on compatibility of internal components. For internal Ozone components, 
we can define and enforce an order that the components must be upgraded in. 
Consider the following Ozone service diagram:
+
+![Ozone connection diagram](zdu-image1.png)
+
+Here the arrows represent client to server interactions between components, 
with the arrow pointing from the client to the server. The red arrow is 
external clients interacting with Ozone. The shield means that the client needs 
to see a consistent API surface despite leader changes in mixed version 
clusters so that APIs do not seem to disappear and reappear based on the node 
serving the request. The orange lines rep

Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]

2026-02-24 Thread via GitHub


errose28 commented on code in PR #9664:
URL: https://github.com/apache/ozone/pull/9664#discussion_r2848989487


##
hadoop-hdds/docs/content/design/zdu-design.md:
##
@@ -0,0 +1,523 @@
+---
+jira: HDDS-3331

Review Comment:
   We aren't using Hugo anymore and haven't defined the design doc requirements 
for the new website. For now, as long as the docs are valid markdown I think it 
is ok to commit them here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]

2026-01-29 Thread via GitHub


adoroszlai commented on code in PR #9664:
URL: https://github.com/apache/ozone/pull/9664#discussion_r2740806273


##
hadoop-hdds/docs/content/design/zdu-design.md:
##
@@ -0,0 +1,523 @@
+---
+jira: HDDS-3331

Review Comment:
   https://github.com/user-attachments/assets/b6a61e65-85a3-425b-8c91-f92b30a097c5";
 />
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]

2026-01-29 Thread via GitHub


adoroszlai commented on code in PR #9664:
URL: https://github.com/apache/ozone/pull/9664#discussion_r2740724301


##
hadoop-hdds/docs/content/design/zdu-design.md:
##
@@ -0,0 +1,523 @@
+---
+jira: HDDS-3331

Review Comment:
   https://github.com/user-attachments/assets/f240ca44-8b55-49b9-8218-4b540929bae4";
 />
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]

2026-01-28 Thread via GitHub


ptlrs commented on code in PR #9664:
URL: https://github.com/apache/ozone/pull/9664#discussion_r2739492223


##
hadoop-hdds/docs/content/design/zdu-design.md:
##
@@ -0,0 +1,535 @@
+---
+jira: HDDS-3331
+authors:
+- Stephen O'Donnell
+- Ethan Rose
+- Istvan Fajth
+---
+
+
+
+# Zero Downtime Upgrade (ZDU)
+
+## The Goal
+
+The goal of Zero Downtime Upgrade (ZDU) is to allow the software running an 
existing Ozone cluster to be upgraded while the cluster remains operational. 
There should be no gaps in service and the upgrade should be transparent to 
applications using the cluster.
+
+Ozone is already designed to be fault tolerant, so the rolling restart of SCM, 
OM and Datanodes is already possible without impacting users of the cluster. 
The challenge with ZDU is therefore related to wire and disk compatibility, as 
different components within the cluster can be running different software 
versions concurrently. This design will focus on how we solve the wire and disk 
compatibility issues.
+
+## Component Upgrade Order
+
+To simplify reasoning about components of different types running in different 
versions, we should reduce the number of possible version combinations allowed 
as much as possible. Clients are considered external to the Ozone cluster, 
therefore we cannot control their version. However, we already have a framework 
to handle client/server cross compatibility, so rolling upgrade only needs to 
focus on compatibility of internal components. For internal Ozone components, 
we can define and enforce an order that the components must be upgraded in. 
Consider the following Ozone service diagram:
+
+![Ozone connection diagram](zdu-image1.png)
+
+Here the arrows represent client to server interactions between components, 
with the arrow pointing from the client to the server. The red arrow is 
external clients interacting with Ozone. The shield means that the client needs 
to see a consistent API surface despite leader changes in mixed version 
clusters so that APIs do not seem to disappear and reappear based on the node 
serving the request. The orange lines represent client to server interactions 
for internal Ozone components. For components connected by this internal line, 
**we can control the order that they are upgraded such that the server is 
always newer and handles all compatibility issues**. This greatly reduces the 
matrix of possible versions we may see within Ozone and mostly eliminates the 
need for internal Ozone components to be aware of each other’s versions, as 
long as servers remain backwards compatible. This order is:
+
+1. Upgrade all SCMs to the new version  
+2. Upgrade Recon to the new version  
+3. Upgrade all Datanodes to the new version  
+4. Upgrade all OMs to the new version  
+5. Upgrade all S3 gateways to the new version
+
+Note that in this ordering, Recon will still have a new client/old server 
relationship with OM for a period of time. The OM sync process in Recon is the 
only API that needs to account for this, and it is not on the main data read, 
write, delete, or recovery path. Recon should be upgraded with the SCMs because 
its container report processing from the datanodes shares SCM code, so we do 
not want Recon to handle a different version matrix among datanodes than SCM.
+
+## Software Version Framework
+
+The previous section defines an upgrade order to handle API compatibility 
between internal components of different types without the need for explicit 
versioning. For internal components of the same type, we need to provide 
stronger guarantees when they are in mixed versions:
+
+* Components of the same type must persist the same data  
+* Components of the same type must expose a consistent API surface
+
+To accomplish these goals, we need a versioning framework to track component 
specific versions and ensure components of the same type operate in unison. 
Note that this versioning framework will not extend beyond Ozone into lower 
level libraries like Ratis, Hadoop RPC, gRPC, and protobuf. We are dependent on 
these libraries providing their own cross compatibility guarantees for ZDU to 
function.
+
+### Versioning in the Existing Upgrade Framework
+
+Before discussing versioning in the context of ZDU, we should first review the 
versioning framework currently present which allows for upgrades and downgrades 
within Ozone, and cross compatibility between Ozone and external clients of 
various versions.
+
+Ozone components currently define their version in two classes: 
ComponentVersion and LayoutFeature. Any change to the on-disk format increments 
the Layout Feature/Version, which is internal to the component. You can see 
examples of the Layout Version in classes such as HDDSLayoutFeature, 
OMLayoutFeature and ReconLayoutFeature. Any change to the API layer which may 
affect external clients will increment the ComponentVersion. Component versions 
are defined in classes like OzoneManagerVersion and DatanodeVersion. One

Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]

2026-01-28 Thread via GitHub


jojochuang commented on code in PR #9664:
URL: https://github.com/apache/ozone/pull/9664#discussion_r2733889822


##
hadoop-hdds/docs/content/design/zdu-design.md:
##
@@ -0,0 +1,523 @@
+---
+jira: HDDS-3331

Review Comment:
   My local hugo failed to start until I added the date tag.
   ```suggestion
   jira: HDDS-3331
   date: 2026-01-23
   ```
   
   And looks like it also needs these two 
   
   ```suggestion
   title: Zero Downtime Upgrade (ZDU)
   summary: New and improved framework to allow rolling upgrade without cluster 
downtime.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]