Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]
github-actions[bot] commented on PR #9664: URL: https://github.com/apache/ozone/pull/9664#issuecomment-4173693324 This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]
errose28 commented on code in PR #9664: URL: https://github.com/apache/ozone/pull/9664#discussion_r2920908638 ## hadoop-hdds/docs/content/design/zdu-design.md: ## @@ -0,0 +1,535 @@ +--- +jira: HDDS-3331 +authors: +- Stephen O'Donnell +- Ethan Rose +- Istvan Fajth +--- + + + +# Zero Downtime Upgrade (ZDU) + +## The Goal + +The goal of Zero Downtime Upgrade (ZDU) is to allow the software running an existing Ozone cluster to be upgraded while the cluster remains operational. There should be no gaps in service and the upgrade should be transparent to applications using the cluster. + +Ozone is already designed to be fault tolerant, so the rolling restart of SCM, OM and Datanodes is already possible without impacting users of the cluster. The challenge with ZDU is therefore related to wire and disk compatibility, as different components within the cluster can be running different software versions concurrently. This design will focus on how we solve the wire and disk compatibility issues. + +## Component Upgrade Order + +To simplify reasoning about components of different types running in different versions, we should reduce the number of possible version combinations allowed as much as possible. Clients are considered external to the Ozone cluster, therefore we cannot control their version. However, we already have a framework to handle client/server cross compatibility, so rolling upgrade only needs to focus on compatibility of internal components. For internal Ozone components, we can define and enforce an order that the components must be upgraded in. Consider the following Ozone service diagram: + + + +Here the arrows represent client to server interactions between components, with the arrow pointing from the client to the server. The red arrow is external clients interacting with Ozone. The shield means that the client needs to see a consistent API surface despite leader changes in mixed version clusters so that APIs do not seem to disappear and reappear based on the node serving the request. The orange lines represent client to server interactions for internal Ozone components. For components connected by this internal line, **we can control the order that they are upgraded such that the server is always newer and handles all compatibility issues**. This greatly reduces the matrix of possible versions we may see within Ozone and mostly eliminates the need for internal Ozone components to be aware of each other’s versions, as long as servers remain backwards compatible. This order is: + +1. Upgrade all SCMs to the new version +2. Upgrade Recon to the new version +3. Upgrade all Datanodes to the new version +4. Upgrade all OMs to the new version +5. Upgrade all S3 gateways to the new version + +Note that in this ordering, Recon will still have a new client/old server relationship with OM for a period of time. The OM sync process in Recon is the only API that needs to account for this, and it is not on the main data read, write, delete, or recovery path. Recon should be upgraded with the SCMs because its container report processing from the datanodes shares SCM code, so we do not want Recon to handle a different version matrix among datanodes than SCM. + +## Software Version Framework + +The previous section defines an upgrade order to handle API compatibility between internal components of different types without the need for explicit versioning. For internal components of the same type, we need to provide stronger guarantees when they are in mixed versions: + +* Components of the same type must persist the same data +* Components of the same type must expose a consistent API surface + +To accomplish these goals, we need a versioning framework to track component specific versions and ensure components of the same type operate in unison. Note that this versioning framework will not extend beyond Ozone into lower level libraries like Ratis, Hadoop RPC, gRPC, and protobuf. We are dependent on these libraries providing their own cross compatibility guarantees for ZDU to function. + +### Versioning in the Existing Upgrade Framework + +Before discussing versioning in the context of ZDU, we should first review the versioning framework currently present which allows for upgrades and downgrades within Ozone, and cross compatibility between Ozone and external clients of various versions. + +Ozone components currently define their version in two classes: ComponentVersion and LayoutFeature. Any change to the on-disk format increments the Layout Feature/Version, which is internal to the component. You can see examples of the Layout Version in classes such as HDDSLayoutFeature, OMLayoutFeature and ReconLayoutFeature. Any change to the API layer which may affect external clients will increment the ComponentVersion. Component versions are defined in classes like OzoneManagerVersion and DatanodeVersion.
Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]
errose28 commented on code in PR #9664: URL: https://github.com/apache/ozone/pull/9664#discussion_r2920895756 ## hadoop-hdds/docs/content/design/zdu-design.md: ## @@ -0,0 +1,535 @@ +--- +jira: HDDS-3331 +authors: +- Stephen O'Donnell +- Ethan Rose +- Istvan Fajth +--- + + + +# Zero Downtime Upgrade (ZDU) + +## The Goal + +The goal of Zero Downtime Upgrade (ZDU) is to allow the software running an existing Ozone cluster to be upgraded while the cluster remains operational. There should be no gaps in service and the upgrade should be transparent to applications using the cluster. + +Ozone is already designed to be fault tolerant, so the rolling restart of SCM, OM and Datanodes is already possible without impacting users of the cluster. The challenge with ZDU is therefore related to wire and disk compatibility, as different components within the cluster can be running different software versions concurrently. This design will focus on how we solve the wire and disk compatibility issues. + +## Component Upgrade Order + +To simplify reasoning about components of different types running in different versions, we should reduce the number of possible version combinations allowed as much as possible. Clients are considered external to the Ozone cluster, therefore we cannot control their version. However, we already have a framework to handle client/server cross compatibility, so rolling upgrade only needs to focus on compatibility of internal components. For internal Ozone components, we can define and enforce an order that the components must be upgraded in. Consider the following Ozone service diagram: + + + +Here the arrows represent client to server interactions between components, with the arrow pointing from the client to the server. The red arrow is external clients interacting with Ozone. The shield means that the client needs to see a consistent API surface despite leader changes in mixed version clusters so that APIs do not seem to disappear and reappear based on the node serving the request. The orange lines represent client to server interactions for internal Ozone components. For components connected by this internal line, **we can control the order that they are upgraded such that the server is always newer and handles all compatibility issues**. This greatly reduces the matrix of possible versions we may see within Ozone and mostly eliminates the need for internal Ozone components to be aware of each other’s versions, as long as servers remain backwards compatible. This order is: + +1. Upgrade all SCMs to the new version +2. Upgrade Recon to the new version +3. Upgrade all Datanodes to the new version +4. Upgrade all OMs to the new version Review Comment: There upgrade/restart steps are done by an admin, possibly with an orchestration layer, so Ozone doesn't decide whether or not the decom nodes get upgraded. If they do, nothing about the decom/maintenace/recom process is expected to change though since ZDU means all existing operations are allowed throughout the upgrade and finalization process. Starting on line 227 we spec out how datanodes are handled relative to SCM, which includes if they are offline and come back later. Let me know if there's more questions in that area. Note that once SCM is finalized, any datanodes that later appear with the old software version will be fenced out until the admin upgrades them. The doc currently doesn't specify whether nodes undergoing decom or maintenance will be instructed to finalize by SCM. I think we should still send them the finalize commands so they don't block further upgrade steps unnecessarily. @sodonnel what do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]
errose28 commented on code in PR #9664: URL: https://github.com/apache/ozone/pull/9664#discussion_r2920779201 ## hadoop-hdds/docs/content/design/zdu-design.md: ## @@ -0,0 +1,535 @@ +--- +jira: HDDS-3331 +authors: +- Stephen O'Donnell +- Ethan Rose +- Istvan Fajth +--- + + + +# Zero Downtime Upgrade (ZDU) + +## The Goal + +The goal of Zero Downtime Upgrade (ZDU) is to allow the software running an existing Ozone cluster to be upgraded while the cluster remains operational. There should be no gaps in service and the upgrade should be transparent to applications using the cluster. + +Ozone is already designed to be fault tolerant, so the rolling restart of SCM, OM and Datanodes is already possible without impacting users of the cluster. The challenge with ZDU is therefore related to wire and disk compatibility, as different components within the cluster can be running different software versions concurrently. This design will focus on how we solve the wire and disk compatibility issues. + +## Component Upgrade Order + +To simplify reasoning about components of different types running in different versions, we should reduce the number of possible version combinations allowed as much as possible. Clients are considered external to the Ozone cluster, therefore we cannot control their version. However, we already have a framework to handle client/server cross compatibility, so rolling upgrade only needs to focus on compatibility of internal components. For internal Ozone components, we can define and enforce an order that the components must be upgraded in. Consider the following Ozone service diagram: + + + +Here the arrows represent client to server interactions between components, with the arrow pointing from the client to the server. The red arrow is external clients interacting with Ozone. The shield means that the client needs to see a consistent API surface despite leader changes in mixed version clusters so that APIs do not seem to disappear and reappear based on the node serving the request. The orange lines represent client to server interactions for internal Ozone components. For components connected by this internal line, **we can control the order that they are upgraded such that the server is always newer and handles all compatibility issues**. This greatly reduces the matrix of possible versions we may see within Ozone and mostly eliminates the need for internal Ozone components to be aware of each other’s versions, as long as servers remain backwards compatible. This order is: + +1. Upgrade all SCMs to the new version +2. Upgrade Recon to the new version +3. Upgrade all Datanodes to the new version +4. Upgrade all OMs to the new version +5. Upgrade all S3 gateways to the new version Review Comment: This should not cause an issue, because the apparent versions the components will remain the same in the Ratis ring even as the software is updated. That means the components with newer software version will still write data in a way that the older components bootstrapping can understand (and vice versa). Check out the table around line 107 and the appendix to see how the apparent version moves in lock step for a Ratis ring. Finalization to move the apparent version forward can be done from a Ratis snapshot because the version is written to the DB as well as the version file. This is already handled in the current upgrade flow because finalization is an online operation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]
smengcl commented on code in PR #9664: URL: https://github.com/apache/ozone/pull/9664#discussion_r2902408368 ## hadoop-hdds/docs/content/design/zdu-design.md: ## @@ -0,0 +1,535 @@ +--- +jira: HDDS-3331 +authors: +- Stephen O'Donnell +- Ethan Rose +- Istvan Fajth +--- + + + +# Zero Downtime Upgrade (ZDU) + +## The Goal + +The goal of Zero Downtime Upgrade (ZDU) is to allow the software running an existing Ozone cluster to be upgraded while the cluster remains operational. There should be no gaps in service and the upgrade should be transparent to applications using the cluster. + +Ozone is already designed to be fault tolerant, so the rolling restart of SCM, OM and Datanodes is already possible without impacting users of the cluster. The challenge with ZDU is therefore related to wire and disk compatibility, as different components within the cluster can be running different software versions concurrently. This design will focus on how we solve the wire and disk compatibility issues. + +## Component Upgrade Order + +To simplify reasoning about components of different types running in different versions, we should reduce the number of possible version combinations allowed as much as possible. Clients are considered external to the Ozone cluster, therefore we cannot control their version. However, we already have a framework to handle client/server cross compatibility, so rolling upgrade only needs to focus on compatibility of internal components. For internal Ozone components, we can define and enforce an order that the components must be upgraded in. Consider the following Ozone service diagram: + + + +Here the arrows represent client to server interactions between components, with the arrow pointing from the client to the server. The red arrow is external clients interacting with Ozone. The shield means that the client needs to see a consistent API surface despite leader changes in mixed version clusters so that APIs do not seem to disappear and reappear based on the node serving the request. The orange lines represent client to server interactions for internal Ozone components. For components connected by this internal line, **we can control the order that they are upgraded such that the server is always newer and handles all compatibility issues**. This greatly reduces the matrix of possible versions we may see within Ozone and mostly eliminates the need for internal Ozone components to be aware of each other’s versions, as long as servers remain backwards compatible. This order is: + +1. Upgrade all SCMs to the new version +2. Upgrade Recon to the new version +3. Upgrade all Datanodes to the new version +4. Upgrade all OMs to the new version Review Comment: Is it correct to assume that decommissioned datanodes would be ignored during the upgrade in Step 3? If so, what if they are recommissioned later (say after Step 4)? ## hadoop-hdds/docs/content/design/zdu-design.md: ## @@ -0,0 +1,535 @@ +--- +jira: HDDS-3331 +authors: +- Stephen O'Donnell +- Ethan Rose +- Istvan Fajth +--- + + + +# Zero Downtime Upgrade (ZDU) + +## The Goal + +The goal of Zero Downtime Upgrade (ZDU) is to allow the software running an existing Ozone cluster to be upgraded while the cluster remains operational. There should be no gaps in service and the upgrade should be transparent to applications using the cluster. + +Ozone is already designed to be fault tolerant, so the rolling restart of SCM, OM and Datanodes is already possible without impacting users of the cluster. The challenge with ZDU is therefore related to wire and disk compatibility, as different components within the cluster can be running different software versions concurrently. This design will focus on how we solve the wire and disk compatibility issues. + +## Component Upgrade Order + +To simplify reasoning about components of different types running in different versions, we should reduce the number of possible version combinations allowed as much as possible. Clients are considered external to the Ozone cluster, therefore we cannot control their version. However, we already have a framework to handle client/server cross compatibility, so rolling upgrade only needs to focus on compatibility of internal components. For internal Ozone components, we can define and enforce an order that the components must be upgraded in. Consider the following Ozone service diagram: + + + +Here the arrows represent client to server interactions between components, with the arrow pointing from the client to the server. The red arrow is external clients interacting with Ozone. The shield means that the client needs to see a consistent API surface despite leader changes in mixed version clusters so that APIs do not seem to disappear and reappear based on the node serving the request. The orange lines rep
Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]
errose28 commented on code in PR #9664: URL: https://github.com/apache/ozone/pull/9664#discussion_r2848989487 ## hadoop-hdds/docs/content/design/zdu-design.md: ## @@ -0,0 +1,523 @@ +--- +jira: HDDS-3331 Review Comment: We aren't using Hugo anymore and haven't defined the design doc requirements for the new website. For now, as long as the docs are valid markdown I think it is ok to commit them here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]
adoroszlai commented on code in PR #9664: URL: https://github.com/apache/ozone/pull/9664#discussion_r2740806273 ## hadoop-hdds/docs/content/design/zdu-design.md: ## @@ -0,0 +1,523 @@ +--- +jira: HDDS-3331 Review Comment: https://github.com/user-attachments/assets/b6a61e65-85a3-425b-8c91-f92b30a097c5"; /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]
adoroszlai commented on code in PR #9664: URL: https://github.com/apache/ozone/pull/9664#discussion_r2740724301 ## hadoop-hdds/docs/content/design/zdu-design.md: ## @@ -0,0 +1,523 @@ +--- +jira: HDDS-3331 Review Comment: https://github.com/user-attachments/assets/f240ca44-8b55-49b9-8218-4b540929bae4"; /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]
ptlrs commented on code in PR #9664: URL: https://github.com/apache/ozone/pull/9664#discussion_r2739492223 ## hadoop-hdds/docs/content/design/zdu-design.md: ## @@ -0,0 +1,535 @@ +--- +jira: HDDS-3331 +authors: +- Stephen O'Donnell +- Ethan Rose +- Istvan Fajth +--- + + + +# Zero Downtime Upgrade (ZDU) + +## The Goal + +The goal of Zero Downtime Upgrade (ZDU) is to allow the software running an existing Ozone cluster to be upgraded while the cluster remains operational. There should be no gaps in service and the upgrade should be transparent to applications using the cluster. + +Ozone is already designed to be fault tolerant, so the rolling restart of SCM, OM and Datanodes is already possible without impacting users of the cluster. The challenge with ZDU is therefore related to wire and disk compatibility, as different components within the cluster can be running different software versions concurrently. This design will focus on how we solve the wire and disk compatibility issues. + +## Component Upgrade Order + +To simplify reasoning about components of different types running in different versions, we should reduce the number of possible version combinations allowed as much as possible. Clients are considered external to the Ozone cluster, therefore we cannot control their version. However, we already have a framework to handle client/server cross compatibility, so rolling upgrade only needs to focus on compatibility of internal components. For internal Ozone components, we can define and enforce an order that the components must be upgraded in. Consider the following Ozone service diagram: + + + +Here the arrows represent client to server interactions between components, with the arrow pointing from the client to the server. The red arrow is external clients interacting with Ozone. The shield means that the client needs to see a consistent API surface despite leader changes in mixed version clusters so that APIs do not seem to disappear and reappear based on the node serving the request. The orange lines represent client to server interactions for internal Ozone components. For components connected by this internal line, **we can control the order that they are upgraded such that the server is always newer and handles all compatibility issues**. This greatly reduces the matrix of possible versions we may see within Ozone and mostly eliminates the need for internal Ozone components to be aware of each other’s versions, as long as servers remain backwards compatible. This order is: + +1. Upgrade all SCMs to the new version +2. Upgrade Recon to the new version +3. Upgrade all Datanodes to the new version +4. Upgrade all OMs to the new version +5. Upgrade all S3 gateways to the new version + +Note that in this ordering, Recon will still have a new client/old server relationship with OM for a period of time. The OM sync process in Recon is the only API that needs to account for this, and it is not on the main data read, write, delete, or recovery path. Recon should be upgraded with the SCMs because its container report processing from the datanodes shares SCM code, so we do not want Recon to handle a different version matrix among datanodes than SCM. + +## Software Version Framework + +The previous section defines an upgrade order to handle API compatibility between internal components of different types without the need for explicit versioning. For internal components of the same type, we need to provide stronger guarantees when they are in mixed versions: + +* Components of the same type must persist the same data +* Components of the same type must expose a consistent API surface + +To accomplish these goals, we need a versioning framework to track component specific versions and ensure components of the same type operate in unison. Note that this versioning framework will not extend beyond Ozone into lower level libraries like Ratis, Hadoop RPC, gRPC, and protobuf. We are dependent on these libraries providing their own cross compatibility guarantees for ZDU to function. + +### Versioning in the Existing Upgrade Framework + +Before discussing versioning in the context of ZDU, we should first review the versioning framework currently present which allows for upgrades and downgrades within Ozone, and cross compatibility between Ozone and external clients of various versions. + +Ozone components currently define their version in two classes: ComponentVersion and LayoutFeature. Any change to the on-disk format increments the Layout Feature/Version, which is internal to the component. You can see examples of the Layout Version in classes such as HDDSLayoutFeature, OMLayoutFeature and ReconLayoutFeature. Any change to the API layer which may affect external clients will increment the ComponentVersion. Component versions are defined in classes like OzoneManagerVersion and DatanodeVersion. One
Re: [PR] HDDS-14498. Zero Downtime Upgrade Design (ZDU) [ozone]
jojochuang commented on code in PR #9664: URL: https://github.com/apache/ozone/pull/9664#discussion_r2733889822 ## hadoop-hdds/docs/content/design/zdu-design.md: ## @@ -0,0 +1,523 @@ +--- +jira: HDDS-3331 Review Comment: My local hugo failed to start until I added the date tag. ```suggestion jira: HDDS-3331 date: 2026-01-23 ``` And looks like it also needs these two ```suggestion title: Zero Downtime Upgrade (ZDU) summary: New and improved framework to allow rolling upgrade without cluster downtime. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
