Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
xichen01 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-4436643699 > @ivandika3, @xichen01, as we have [HDDS-15239](https://issues.apache.org/jira/browse/HDDS-15239) merged and a new branch created. Are we ready to move forward? Yes, we will continue to submit new PRs after the merge -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
greenwich commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-4435956576 @ivandika3, @xichen01, as we have HDDS-15239 merged and a new branch created. Are we ready to move forward? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
greenwich commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-4384175779 > @ivandika3 @greenwich @chungen0126 #10191 Please help to review Looks good! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
xichen01 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-4379351270 @ivandika3 @greenwich @chungen0126 https://github.com/apache/ozone/pull/10191Please help to review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
xichen01 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-4378706326 The new HDDS-11233 branch https://github.com/apache/ozone/tree/HDDS-11233 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
xichen01 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-4378698452 > @xichen01 Could you help to take over? I'm currently quite busy and my GH account was recently blocked from GH actions (still waiting for support) so no CI can be triggered. OK, I will -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
ivandika3 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-4378415007 @xichen01 Could you help to take over? My GH account was recently blocked from GH actions (still waiting for support) so no CI can be triggered. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
xichen01 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-4366105397 > @xichen01 How about we cut a new [HDDS-11233](https://issues.apache.org/jira/browse/HDDS-11233) branch? The current branch can be done without the transition patch ([HDDS-8342](https://issues.apache.org/jira/browse/HDDS-8342)). We can add the transition after [HDDS-8342](https://issues.apache.org/jira/browse/HDDS-8342) is merged to master. I think we can create a new HDDS-11233 branch on the Apache Ozone repositories, just like https://github.com/apache/ozone/tree/HDDS-8342. Then we can merge related Commit to this branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
ivandika3 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-4365281668 @xichen01 How about we cut a new HDDS-11233 branch? The current branch can be done without the transition patch ([HDDS-8342](https://issues.apache.org/jira/browse/HDDS-8342)). We can add the transition after HDDS-8342 is merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
xichen01 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-4364194566 > @greenwich @xichen01 FYI, test in https://github.com/ivandika3/ozone/tree/refs/heads/backport-storage-policy-storage-class passed. You can refer to the diff in https://github.com/ivandika3/ozone/pull/4 @ivandika3 We can create corresponding sub tasks for these commits and then merge them into the HDDS-8342 branch via MR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
ivandika3 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-4353095024 > Regarding Phase 2, just thinking ahead — would it make sense to initially target a simpler mover-style implementation (similar in spirit to HDFS Mover) before introducing a separate job worker subsystem? That might allow basic Storage Policy Migration functionality to be delivered earlier and iterated on over time. @greenwich Yes, we can first implement a standalone StoragePolicySatisfier (similar to HDFS Mover or HDFS StoragePolicySatisfier) without needing to implement a separate subsystem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
ivandika3 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-4351185647 @greenwich @xichen01 FYI, test in https://github.com/ivandika3/ozone/tree/refs/heads/backport-storage-policy-storage-class passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
ivandika3 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-4319936554 Thanks @greenwich , @xichen01 would you mind take a look at that? I'm currently backporting it in my fork, but the test are still failing (https://github.com/ivandika3/ozone/tree/refs/heads/backport-storage-policy-storage-class) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
greenwich commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-4317800028 > @greenwich Sorry for the late reply. Let me try to use an AI agent to backport the storage policy to my fork. @ivandika3, I managed to do a similar thing because I wanted to prepare everything up front to make things go faster for this PR. - I picked the patch attached to this PR (that was branched out from ozone 1.4) - Merged it to sync with master around two weeks ago, fixed all the conflicts and broken tests - Yesterday I synced upstream's master with that branch again to be up to date. - Please have a look, maybe it already has what you were planning to do: https://github.com/greenwich/ozone/tree/refs/heads/HDDS-11233_patch_merged - All tests are green: https://github.com/greenwich/ozone/actions/runs/24915662451 cc: @xichen01 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
ivandika3 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-4295042380 @greenwich Sorry for the late reply. Let me try to use AI agent to backport this to my fork. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
greenwich commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-4234740368 @ivandika3, Have you managed to backport the patch as per the discussion here: https://github.com/apache/ozone/pull/6989#issuecomment-2953554337? Any help required? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
greenwich commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r3007348520 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -0,0 +1,607 @@ +--- +title: Ozone Storage Policy Support +summary: Support storage policy in Ozone to write key data into specified types of storage media. +date: 2026-03-23 +jira: HDDS-11233 +status: draft +--- + + + +# Terminology + +## Definitions + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The type of each Datanode volume or container replica. Each Datanode volume can be configured with a + storage type, including SSD, DISK, and ARCHIVE. +- Storage Tier: A specific storage tier is composed of all replicas of a container based on their storage type. For + example, a 3-replica SSD tier consists of 3 replicas of SSD type. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode. +- Key: In this document, a key refers to an object in Ozone, including entries in both the KeyTable and FileTable. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relationship between Storage Policy, Storage Type, and Storage Tier: + +- The storage policy is the property of key/bucket (managed by OM). +- The storage tier is the property of Pipeline and Container (managed by SCM). +- The storage type is the property of volume and container replica (managed by DN). +- Only the storage policy can be modified by the user directly via the ozone command. + +Example: + +For a keyA, its storage policy is Hot, Its Container tier is SSD tier, the Container has three replicas, all of which +are of the SSD storage type. + +# User Scenarios + +- User A needs a bucket that supports high-performance IO, so they create a bucket with the storage policy set to Hot. + Data written by User A to the bucket will automatically be distributed across SSD disks in the cluster. +- User B needs higher IO performance for a specific key. They write a key with the storage policy set to Hot. The + key's data will be distributed across SSD disks in the cluster. +- User C uses the command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class STANDARD` to upload a file + to the Ozone SSD tier. The key's data will be distributed across SSD disks in the cluster. + +# Goals + +- Storage Policy: Introduce storage policy and related concepts. Define multiple storage policies and support S3 + storage class. +- Storage Policy Writing: Allow writing keys/files to specified storage tiers based on storage policy. Support S3, + API, and shell command interfaces. +- Storage Policy Update: Enable setting and unsetting storage policies for buckets, and setting storage tiers for + containers. +- Storage Policy Display: Support displaying the storage policy attribute of buckets and keys. Support displaying the + storage tier of SCM containers and pipelines. Support displaying Datanode storage type usage information. Support + checking whether the key storage policy is satisfied. +- Container Balancer: Support migrating container replicas between Datanodes to volumes of the matching storage type. + For example, SSD type container replicas will be migrated to SSD type volumes, and will not be migrated to DISK + type volumes. +- ReplicationManager: Support managing the storage type of container replicas to ensure that container replicas on + Datanodes reside on the correct volumes. Ensure that the storage types of container replicas forming a storage + tier are correct. For example, a 3-replica SSD storage tier container in SCM should consist of 3 SSD type container + replicas, and each container replica should reside on an SSD type volume. +- DiskBalancerService: Support migrating container replicas within a Datanode to volumes of the matching storage type. + For example, SSD type container replicas will be migrated to SSD type volumes, and will not be migrated to DISK + type volumes. + +# Design + +## Supported Storage Policies + +- Supported storage policies: Hot / Warm / Cold +- Supported storage tiers: SSD / DISK / ARCHIVE / EMPTY +- Supported storage types: SSD / DISK / ARCHIVE +- Supported bucket layouts: FILE_SYSTEM_OPTIMIZED, OBJECT_STORE, LEGACY +- S3 storage classes: STANDARD / STANDARD_IA / GLACIER + +### Storage Policy Map to Storage Tier + +| Storage Policy | Storage Tier for Write | Fallback Tier for Write | +|||-| +| Hot| SSD| DISK| +| Warm | DISK | EMPTY | +| Cold | ARCHIVE| EMPTY | + +- Storage Tier for Write: The primary storage tier where data is written when a storage policy is specified. +- Fallbac
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
greenwich commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r3007348520 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -0,0 +1,607 @@ +--- +title: Ozone Storage Policy Support +summary: Support storage policy in Ozone to write key data into specified types of storage media. +date: 2026-03-23 +jira: HDDS-11233 +status: draft +--- + + + +# Terminology + +## Definitions + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The type of each Datanode volume or container replica. Each Datanode volume can be configured with a + storage type, including SSD, DISK, and ARCHIVE. +- Storage Tier: A specific storage tier is composed of all replicas of a container based on their storage type. For + example, a 3-replica SSD tier consists of 3 replicas of SSD type. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode. +- Key: In this document, a key refers to an object in Ozone, including entries in both the KeyTable and FileTable. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relationship between Storage Policy, Storage Type, and Storage Tier: + +- The storage policy is the property of key/bucket (managed by OM). +- The storage tier is the property of Pipeline and Container (managed by SCM). +- The storage type is the property of volume and container replica (managed by DN). +- Only the storage policy can be modified by the user directly via the ozone command. + +Example: + +For a keyA, its storage policy is Hot, Its Container tier is SSD tier, the Container has three replicas, all of which +are of the SSD storage type. + +# User Scenarios + +- User A needs a bucket that supports high-performance IO, so they create a bucket with the storage policy set to Hot. + Data written by User A to the bucket will automatically be distributed across SSD disks in the cluster. +- User B needs higher IO performance for a specific key. They write a key with the storage policy set to Hot. The + key's data will be distributed across SSD disks in the cluster. +- User C uses the command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class STANDARD` to upload a file + to the Ozone SSD tier. The key's data will be distributed across SSD disks in the cluster. + +# Goals + +- Storage Policy: Introduce storage policy and related concepts. Define multiple storage policies and support S3 + storage class. +- Storage Policy Writing: Allow writing keys/files to specified storage tiers based on storage policy. Support S3, + API, and shell command interfaces. +- Storage Policy Update: Enable setting and unsetting storage policies for buckets, and setting storage tiers for + containers. +- Storage Policy Display: Support displaying the storage policy attribute of buckets and keys. Support displaying the + storage tier of SCM containers and pipelines. Support displaying Datanode storage type usage information. Support + checking whether the key storage policy is satisfied. +- Container Balancer: Support migrating container replicas between Datanodes to volumes of the matching storage type. + For example, SSD type container replicas will be migrated to SSD type volumes, and will not be migrated to DISK + type volumes. +- ReplicationManager: Support managing the storage type of container replicas to ensure that container replicas on + Datanodes reside on the correct volumes. Ensure that the storage types of container replicas forming a storage + tier are correct. For example, a 3-replica SSD storage tier container in SCM should consist of 3 SSD type container + replicas, and each container replica should reside on an SSD type volume. +- DiskBalancerService: Support migrating container replicas within a Datanode to volumes of the matching storage type. + For example, SSD type container replicas will be migrated to SSD type volumes, and will not be migrated to DISK + type volumes. + +# Design + +## Supported Storage Policies + +- Supported storage policies: Hot / Warm / Cold +- Supported storage tiers: SSD / DISK / ARCHIVE / EMPTY +- Supported storage types: SSD / DISK / ARCHIVE +- Supported bucket layouts: FILE_SYSTEM_OPTIMIZED, OBJECT_STORE, LEGACY +- S3 storage classes: STANDARD / STANDARD_IA / GLACIER + +### Storage Policy Map to Storage Tier + +| Storage Policy | Storage Tier for Write | Fallback Tier for Write | +|||-| +| Hot| SSD| DISK| +| Warm | DISK | EMPTY | +| Cold | ARCHIVE| EMPTY | + +- Storage Tier for Write: The primary storage tier where data is written when a storage policy is specified. +- Fallbac
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
greenwich commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r3007344267 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- + + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); +- The storage type is the property of volume and Container replicas (Managed by DN); +- Only the storage policy can be modified by the user directly via ozone command; + +Example: + +For a keyA, its storage policy is Hot, its Container 1 tier is SSD tier, and Container 1 has three replicas, all of which are of the SSD storage type. + +# User Scenarios + +- User A needs a bucket that supports high-performance IO, so create a bucket with the storage policy set to Hot. Data written by User A to bucket will automatically be distributed across the SSD disks in the cluster. +- User B needs higher IO performance for the directory/prefix /project/metadata, so set the storage policy for the prefix /project/metadata to Hot. Subsequently, data written to /project/metadata will be automatically distributed across the SSD disks in the cluster. +- User C has already written key1 to the cluster and requires better IO performance. The storage policy for key1 can be set to Hot, and then a migration can be triggered to move key1 to the SSD disks. +- Use D use command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class XXX` upload a file the Ozone SSD tier + +# Current Status + +- Ozone currently has some support for tiered storage such as storage type, and some parts of this article may already be implemented. +- Currently, in Ozone, when a key is created, the key's Block can appear on any volume of a Datanode. When a key is created, SCM first needs to allocate a Block for the key through Pipelines. The Client then writes the Block to the corresponding Datanode based on the Pipeline information. In this process, the smallest element managed by the SCM Pipeline is the Datanode, and when the Datanode creates a Container, the Container may appear on any volume with enough remaining space. Under the current architecture, Ozone does not support writing data to specific disks + +# Goal Requirements Specification + +### **Support for Storage Policy Writing and Management** + +- **Writing keys**: Allow keys to be written to specified storage tiers based on storage policies. +- **Policy Management**: Enable setting, unsetting, and inheriting storage policies for keys, prefixes, and buckets. Inherit policies based on the longest matching prefix or bucket if no specific policy is set. + +### **Support for Data Migration Across Different Storage Policies** + +- **Data Migration**: Support data migration across different storage policies via manual triggers, ensuring data is moved to the appropriate storage tiers. + +### **Adaptation of AWS S3 StorageClass** + +- **S3 StorageClass Mapping**: Map AWS S3 storage classes to Ozone storage policies, supporting related API operations (PutObject, CopyObject, Multipart Upload, GetObject, HeadObject, ListObjects). + +### **Management and Monitoring Tools** + +- **Storage Policy Commands**: Provide tools to view storage policies of containers, datanode usage, and pipeline information. +- **Metrics and Monitoring**: Enable visibility into storage policy compliance, container storage types, and space information across different storage policies. + +### **Future Enhancements** + +- **Intelligent Storage Policies**: Plan to support automatic data migration based on access frequency, similar to S3 Intelligent-Tiering. +- **Bucket StorageClass Lifecycle Rules: Support setting storage policies Lifecycle Rules at the bucket level.** +- **Recon Support**: Enhance Recon to display
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
chungen0126 commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r2999375115 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -19,379 +20,588 @@ status: draft # Terminology -## Terminology +## Definitions - Storage Policy: Defines where key data replicas should be stored in specific storage tiers. -- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. -- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. -- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. -- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. +- Storage Type: The type of each Datanode volume or container replica. Each Datanode volume can be configured with a + storage type, including SSD, DISK, and ARCHIVE. +- Storage Tier: A specific storage tier is composed of all replicas of a container based on their storage type. For + example, a 3-replica SSD tier consists of 3 replicas of SSD type. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode. +- Key: In this document, a key refers to an object in Ozone, including entries in both the KeyTable and FileTable. ## Storage Policy vs Storage Type vs Storage Tier  -The relation of Storage Policy, Storage Type and Storage Tier +The relationship between Storage Policy, Storage Type, and Storage Tier: -- The storage policy is the property of key/bucket/ prefix (Managed by OM); -- The storage tier is the property of Pipeline and Container (Managed by SCM); -- The storage type is the property of volume and Container replicas (Managed by DN); -- Only the storage policy can be modified by the user directly via ozone command; +- The storage policy is the property of key/bucket (managed by OM). +- The storage tier is the property of Pipeline and Container (managed by SCM). +- The storage type is the property of volume and container replica (managed by DN). +- Only the storage policy can be modified by the user directly via the ozone command. Example: -For a keyA, its storage policy is Hot, its Container 1 tier is SSD tier, and Container 1 has three replicas, all of which are of the SSD storage type. +For a keyA, its storage policy is Hot, Its Container tier is SSD tier, the Container has three replicas, all of which +are of the SSD storage type. # User Scenarios -- User A needs a bucket that supports high-performance IO, so create a bucket with the storage policy set to Hot. Data written by User A to bucket will automatically be distributed across the SSD disks in the cluster. -- User B needs higher IO performance for the directory/prefix /project/metadata, so set the storage policy for the prefix /project/metadata to Hot. Subsequently, data written to /project/metadata will be automatically distributed across the SSD disks in the cluster. -- User C has already written key1 to the cluster and requires better IO performance. The storage policy for key1 can be set to Hot, and then a migration can be triggered to move key1 to the SSD disks. -- Use D use command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class XXX` upload a file the Ozone SSD tier +- User A needs a bucket that supports high-performance IO, so they create a bucket with the storage policy set to Hot. + Data written by User A to the bucket will automatically be distributed across SSD disks in the cluster. +- User B needs higher IO performance for a specific key. They write a key with the storage policy set to Hot. The + key's data will be distributed across SSD disks in the cluster. +- User C uses the command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class STANDARD` to upload a file + to the Ozone SSD tier. The key's data will be distributed across SSD disks in the cluster. + +# Goals + +- Storage Policy: Introduce storage policy and related concepts. Define multiple storage policies and support S3 + storage class. +- Storage Policy Writing: Allow writing keys/files to specified storage tiers based on storage policy. Support S3, + API, and shell command interfaces. +- Storage Policy Update: Enable setting and unsetting storage policies for buckets, and setting storage tiers for + containers. +- Storage Policy Display: Support displaying the storage policy attribute of buckets and keys. Support displaying the + storage tier of SCM containers and pipelines. Support displaying Datanode storage type usage information. Support + checking whether the key storage policy is satisfied. +- Co
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
xichen01 commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r2996045328 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -19,379 +20,588 @@ status: draft # Terminology -## Terminology +## Definitions - Storage Policy: Defines where key data replicas should be stored in specific storage tiers. -- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. -- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. -- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. -- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. +- Storage Type: The type of each Datanode volume or container replica. Each Datanode volume can be configured with a + storage type, including SSD, DISK, and ARCHIVE. +- Storage Tier: A specific storage tier is composed of all replicas of a container based on their storage type. For + example, a 3-replica SSD tier consists of 3 replicas of SSD type. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode. +- Key: In this document, a key refers to an object in Ozone, including entries in both the KeyTable and FileTable. ## Storage Policy vs Storage Type vs Storage Tier  -The relation of Storage Policy, Storage Type and Storage Tier +The relationship between Storage Policy, Storage Type, and Storage Tier: -- The storage policy is the property of key/bucket/ prefix (Managed by OM); -- The storage tier is the property of Pipeline and Container (Managed by SCM); -- The storage type is the property of volume and Container replicas (Managed by DN); -- Only the storage policy can be modified by the user directly via ozone command; +- The storage policy is the property of key/bucket (managed by OM). +- The storage tier is the property of Pipeline and Container (managed by SCM). +- The storage type is the property of volume and container replica (managed by DN). +- Only the storage policy can be modified by the user directly via the ozone command. Example: -For a keyA, its storage policy is Hot, its Container 1 tier is SSD tier, and Container 1 has three replicas, all of which are of the SSD storage type. +For a keyA, its storage policy is Hot, Its Container tier is SSD tier, the Container has three replicas, all of which +are of the SSD storage type. # User Scenarios -- User A needs a bucket that supports high-performance IO, so create a bucket with the storage policy set to Hot. Data written by User A to bucket will automatically be distributed across the SSD disks in the cluster. -- User B needs higher IO performance for the directory/prefix /project/metadata, so set the storage policy for the prefix /project/metadata to Hot. Subsequently, data written to /project/metadata will be automatically distributed across the SSD disks in the cluster. -- User C has already written key1 to the cluster and requires better IO performance. The storage policy for key1 can be set to Hot, and then a migration can be triggered to move key1 to the SSD disks. -- Use D use command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class XXX` upload a file the Ozone SSD tier +- User A needs a bucket that supports high-performance IO, so they create a bucket with the storage policy set to Hot. + Data written by User A to the bucket will automatically be distributed across SSD disks in the cluster. +- User B needs higher IO performance for a specific key. They write a key with the storage policy set to Hot. The + key's data will be distributed across SSD disks in the cluster. +- User C uses the command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class STANDARD` to upload a file + to the Ozone SSD tier. The key's data will be distributed across SSD disks in the cluster. + +# Goals + +- Storage Policy: Introduce storage policy and related concepts. Define multiple storage policies and support S3 + storage class. +- Storage Policy Writing: Allow writing keys/files to specified storage tiers based on storage policy. Support S3, + API, and shell command interfaces. +- Storage Policy Update: Enable setting and unsetting storage policies for buckets, and setting storage tiers for + containers. +- Storage Policy Display: Support displaying the storage policy attribute of buckets and keys. Support displaying the + storage tier of SCM containers and pipelines. Support displaying Datanode storage type usage information. Support + checking whether the key storage policy is satisfied. +- Conta
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
chungen0126 commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r2993398281 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- + + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); +- The storage type is the property of volume and Container replicas (Managed by DN); +- Only the storage policy can be modified by the user directly via ozone command; + +Example: + +For a keyA, its storage policy is Hot, its Container 1 tier is SSD tier, and Container 1 has three replicas, all of which are of the SSD storage type. + +# User Scenarios + +- User A needs a bucket that supports high-performance IO, so create a bucket with the storage policy set to Hot. Data written by User A to bucket will automatically be distributed across the SSD disks in the cluster. +- User B needs higher IO performance for the directory/prefix /project/metadata, so set the storage policy for the prefix /project/metadata to Hot. Subsequently, data written to /project/metadata will be automatically distributed across the SSD disks in the cluster. +- User C has already written key1 to the cluster and requires better IO performance. The storage policy for key1 can be set to Hot, and then a migration can be triggered to move key1 to the SSD disks. +- Use D use command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class XXX` upload a file the Ozone SSD tier + +# Current Status + +- Ozone currently has some support for tiered storage such as storage type, and some parts of this article may already be implemented. +- Currently, in Ozone, when a key is created, the key's Block can appear on any volume of a Datanode. When a key is created, SCM first needs to allocate a Block for the key through Pipelines. The Client then writes the Block to the corresponding Datanode based on the Pipeline information. In this process, the smallest element managed by the SCM Pipeline is the Datanode, and when the Datanode creates a Container, the Container may appear on any volume with enough remaining space. Under the current architecture, Ozone does not support writing data to specific disks + +# Goal Requirements Specification + +### **Support for Storage Policy Writing and Management** + +- **Writing keys**: Allow keys to be written to specified storage tiers based on storage policies. +- **Policy Management**: Enable setting, unsetting, and inheriting storage policies for keys, prefixes, and buckets. Inherit policies based on the longest matching prefix or bucket if no specific policy is set. + +### **Support for Data Migration Across Different Storage Policies** + +- **Data Migration**: Support data migration across different storage policies via manual triggers, ensuring data is moved to the appropriate storage tiers. + +### **Adaptation of AWS S3 StorageClass** + +- **S3 StorageClass Mapping**: Map AWS S3 storage classes to Ozone storage policies, supporting related API operations (PutObject, CopyObject, Multipart Upload, GetObject, HeadObject, ListObjects). + +### **Management and Monitoring Tools** + +- **Storage Policy Commands**: Provide tools to view storage policies of containers, datanode usage, and pipeline information. +- **Metrics and Monitoring**: Enable visibility into storage policy compliance, container storage types, and space information across different storage policies. + +### **Future Enhancements** + +- **Intelligent Storage Policies**: Plan to support automatic data migration based on access frequency, similar to S3 Intelligent-Tiering. +- **Bucket StorageClass Lifecycle Rules: Support setting storage policies Lifecycle Rules at the bucket level.** +- **Recon Support**: Enhance Recon to displa
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
chungen0126 commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r2993177512 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -19,379 +20,588 @@ status: draft # Terminology -## Terminology +## Definitions - Storage Policy: Defines where key data replicas should be stored in specific storage tiers. -- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. -- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. -- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. -- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. +- Storage Type: The type of each Datanode volume or container replica. Each Datanode volume can be configured with a + storage type, including SSD, DISK, and ARCHIVE. +- Storage Tier: A specific storage tier is composed of all replicas of a container based on their storage type. For + example, a 3-replica SSD tier consists of 3 replicas of SSD type. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode. +- Key: In this document, a key refers to an object in Ozone, including entries in both the KeyTable and FileTable. ## Storage Policy vs Storage Type vs Storage Tier  -The relation of Storage Policy, Storage Type and Storage Tier +The relationship between Storage Policy, Storage Type, and Storage Tier: -- The storage policy is the property of key/bucket/ prefix (Managed by OM); -- The storage tier is the property of Pipeline and Container (Managed by SCM); -- The storage type is the property of volume and Container replicas (Managed by DN); -- Only the storage policy can be modified by the user directly via ozone command; +- The storage policy is the property of key/bucket (managed by OM). +- The storage tier is the property of Pipeline and Container (managed by SCM). +- The storage type is the property of volume and container replica (managed by DN). +- Only the storage policy can be modified by the user directly via the ozone command. Example: -For a keyA, its storage policy is Hot, its Container 1 tier is SSD tier, and Container 1 has three replicas, all of which are of the SSD storage type. +For a keyA, its storage policy is Hot, Its Container tier is SSD tier, the Container has three replicas, all of which +are of the SSD storage type. # User Scenarios -- User A needs a bucket that supports high-performance IO, so create a bucket with the storage policy set to Hot. Data written by User A to bucket will automatically be distributed across the SSD disks in the cluster. -- User B needs higher IO performance for the directory/prefix /project/metadata, so set the storage policy for the prefix /project/metadata to Hot. Subsequently, data written to /project/metadata will be automatically distributed across the SSD disks in the cluster. -- User C has already written key1 to the cluster and requires better IO performance. The storage policy for key1 can be set to Hot, and then a migration can be triggered to move key1 to the SSD disks. -- Use D use command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class XXX` upload a file the Ozone SSD tier +- User A needs a bucket that supports high-performance IO, so they create a bucket with the storage policy set to Hot. + Data written by User A to the bucket will automatically be distributed across SSD disks in the cluster. +- User B needs higher IO performance for a specific key. They write a key with the storage policy set to Hot. The + key's data will be distributed across SSD disks in the cluster. +- User C uses the command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class STANDARD` to upload a file + to the Ozone SSD tier. The key's data will be distributed across SSD disks in the cluster. + +# Goals + +- Storage Policy: Introduce storage policy and related concepts. Define multiple storage policies and support S3 + storage class. +- Storage Policy Writing: Allow writing keys/files to specified storage tiers based on storage policy. Support S3, + API, and shell command interfaces. +- Storage Policy Update: Enable setting and unsetting storage policies for buckets, and setting storage tiers for + containers. +- Storage Policy Display: Support displaying the storage policy attribute of buckets and keys. Support displaying the + storage tier of SCM containers and pipelines. Support displaying Datanode storage type usage information. Support + checking whether the key storage policy is satisfied. +- Co
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
xichen01 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-4109608481 @greenwich @chungen0126 @errose28 @vtutrinov The document has been updated. Please check it. All the content covered in the current document has been implemented in our internal version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
chungen0126 commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r2906025795 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- + + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); +- The storage type is the property of volume and Container replicas (Managed by DN); +- Only the storage policy can be modified by the user directly via ozone command; + +Example: + +For a keyA, its storage policy is Hot, its Container 1 tier is SSD tier, and Container 1 has three replicas, all of which are of the SSD storage type. + +# User Scenarios + +- User A needs a bucket that supports high-performance IO, so create a bucket with the storage policy set to Hot. Data written by User A to bucket will automatically be distributed across the SSD disks in the cluster. +- User B needs higher IO performance for the directory/prefix /project/metadata, so set the storage policy for the prefix /project/metadata to Hot. Subsequently, data written to /project/metadata will be automatically distributed across the SSD disks in the cluster. +- User C has already written key1 to the cluster and requires better IO performance. The storage policy for key1 can be set to Hot, and then a migration can be triggered to move key1 to the SSD disks. +- Use D use command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class XXX` upload a file the Ozone SSD tier + +# Current Status + +- Ozone currently has some support for tiered storage such as storage type, and some parts of this article may already be implemented. +- Currently, in Ozone, when a key is created, the key's Block can appear on any volume of a Datanode. When a key is created, SCM first needs to allocate a Block for the key through Pipelines. The Client then writes the Block to the corresponding Datanode based on the Pipeline information. In this process, the smallest element managed by the SCM Pipeline is the Datanode, and when the Datanode creates a Container, the Container may appear on any volume with enough remaining space. Under the current architecture, Ozone does not support writing data to specific disks + +# Goal Requirements Specification + +### **Support for Storage Policy Writing and Management** + +- **Writing keys**: Allow keys to be written to specified storage tiers based on storage policies. +- **Policy Management**: Enable setting, unsetting, and inheriting storage policies for keys, prefixes, and buckets. Inherit policies based on the longest matching prefix or bucket if no specific policy is set. + +### **Support for Data Migration Across Different Storage Policies** + +- **Data Migration**: Support data migration across different storage policies via manual triggers, ensuring data is moved to the appropriate storage tiers. + +### **Adaptation of AWS S3 StorageClass** + +- **S3 StorageClass Mapping**: Map AWS S3 storage classes to Ozone storage policies, supporting related API operations (PutObject, CopyObject, Multipart Upload, GetObject, HeadObject, ListObjects). + +### **Management and Monitoring Tools** + +- **Storage Policy Commands**: Provide tools to view storage policies of containers, datanode usage, and pipeline information. +- **Metrics and Monitoring**: Enable visibility into storage policy compliance, container storage types, and space information across different storage policies. + +### **Future Enhancements** + +- **Intelligent Storage Policies**: Plan to support automatic data migration based on access frequency, similar to S3 Intelligent-Tiering. +- **Bucket StorageClass Lifecycle Rules: Support setting storage policies Lifecycle Rules at the bucket level.** +- **Recon Support**: Enhance Recon to displa
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
greenwich commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r2875955485 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- + + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); +- The storage type is the property of volume and Container replicas (Managed by DN); +- Only the storage policy can be modified by the user directly via ozone command; + +Example: + +For a keyA, its storage policy is Hot, its Container 1 tier is SSD tier, and Container 1 has three replicas, all of which are of the SSD storage type. + +# User Scenarios + +- User A needs a bucket that supports high-performance IO, so create a bucket with the storage policy set to Hot. Data written by User A to bucket will automatically be distributed across the SSD disks in the cluster. +- User B needs higher IO performance for the directory/prefix /project/metadata, so set the storage policy for the prefix /project/metadata to Hot. Subsequently, data written to /project/metadata will be automatically distributed across the SSD disks in the cluster. +- User C has already written key1 to the cluster and requires better IO performance. The storage policy for key1 can be set to Hot, and then a migration can be triggered to move key1 to the SSD disks. +- Use D use command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class XXX` upload a file the Ozone SSD tier + +# Current Status + +- Ozone currently has some support for tiered storage such as storage type, and some parts of this article may already be implemented. +- Currently, in Ozone, when a key is created, the key's Block can appear on any volume of a Datanode. When a key is created, SCM first needs to allocate a Block for the key through Pipelines. The Client then writes the Block to the corresponding Datanode based on the Pipeline information. In this process, the smallest element managed by the SCM Pipeline is the Datanode, and when the Datanode creates a Container, the Container may appear on any volume with enough remaining space. Under the current architecture, Ozone does not support writing data to specific disks + +# Goal Requirements Specification + +### **Support for Storage Policy Writing and Management** + +- **Writing keys**: Allow keys to be written to specified storage tiers based on storage policies. +- **Policy Management**: Enable setting, unsetting, and inheriting storage policies for keys, prefixes, and buckets. Inherit policies based on the longest matching prefix or bucket if no specific policy is set. + +### **Support for Data Migration Across Different Storage Policies** + +- **Data Migration**: Support data migration across different storage policies via manual triggers, ensuring data is moved to the appropriate storage tiers. + +### **Adaptation of AWS S3 StorageClass** + +- **S3 StorageClass Mapping**: Map AWS S3 storage classes to Ozone storage policies, supporting related API operations (PutObject, CopyObject, Multipart Upload, GetObject, HeadObject, ListObjects). + +### **Management and Monitoring Tools** + +- **Storage Policy Commands**: Provide tools to view storage policies of containers, datanode usage, and pipeline information. +- **Metrics and Monitoring**: Enable visibility into storage policy compliance, container storage types, and space information across different storage policies. + +### **Future Enhancements** + +- **Intelligent Storage Policies**: Plan to support automatic data migration based on access frequency, similar to S3 Intelligent-Tiering. +- **Bucket StorageClass Lifecycle Rules: Support setting storage policies Lifecycle Rules at the bucket level.** +- **Recon Support**: Enhance Recon to display
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
greenwich commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r2875959913 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- + + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); +- The storage type is the property of volume and Container replicas (Managed by DN); +- Only the storage policy can be modified by the user directly via ozone command; + +Example: + +For a keyA, its storage policy is Hot, its Container 1 tier is SSD tier, and Container 1 has three replicas, all of which are of the SSD storage type. + +# User Scenarios + +- User A needs a bucket that supports high-performance IO, so create a bucket with the storage policy set to Hot. Data written by User A to bucket will automatically be distributed across the SSD disks in the cluster. +- User B needs higher IO performance for the directory/prefix /project/metadata, so set the storage policy for the prefix /project/metadata to Hot. Subsequently, data written to /project/metadata will be automatically distributed across the SSD disks in the cluster. +- User C has already written key1 to the cluster and requires better IO performance. The storage policy for key1 can be set to Hot, and then a migration can be triggered to move key1 to the SSD disks. +- Use D use command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class XXX` upload a file the Ozone SSD tier + +# Current Status + +- Ozone currently has some support for tiered storage such as storage type, and some parts of this article may already be implemented. +- Currently, in Ozone, when a key is created, the key's Block can appear on any volume of a Datanode. When a key is created, SCM first needs to allocate a Block for the key through Pipelines. The Client then writes the Block to the corresponding Datanode based on the Pipeline information. In this process, the smallest element managed by the SCM Pipeline is the Datanode, and when the Datanode creates a Container, the Container may appear on any volume with enough remaining space. Under the current architecture, Ozone does not support writing data to specific disks + +# Goal Requirements Specification + +### **Support for Storage Policy Writing and Management** + +- **Writing keys**: Allow keys to be written to specified storage tiers based on storage policies. +- **Policy Management**: Enable setting, unsetting, and inheriting storage policies for keys, prefixes, and buckets. Inherit policies based on the longest matching prefix or bucket if no specific policy is set. + +### **Support for Data Migration Across Different Storage Policies** + +- **Data Migration**: Support data migration across different storage policies via manual triggers, ensuring data is moved to the appropriate storage tiers. + +### **Adaptation of AWS S3 StorageClass** + +- **S3 StorageClass Mapping**: Map AWS S3 storage classes to Ozone storage policies, supporting related API operations (PutObject, CopyObject, Multipart Upload, GetObject, HeadObject, ListObjects). + +### **Management and Monitoring Tools** + +- **Storage Policy Commands**: Provide tools to view storage policies of containers, datanode usage, and pipeline information. +- **Metrics and Monitoring**: Enable visibility into storage policy compliance, container storage types, and space information across different storage policies. + +### **Future Enhancements** + +- **Intelligent Storage Policies**: Plan to support automatic data migration based on access frequency, similar to S3 Intelligent-Tiering. +- **Bucket StorageClass Lifecycle Rules: Support setting storage policies Lifecycle Rules at the bucket level.** +- **Recon Support**: Enhance Recon to display
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
greenwich commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r2875955485 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- + + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); +- The storage type is the property of volume and Container replicas (Managed by DN); +- Only the storage policy can be modified by the user directly via ozone command; + +Example: + +For a keyA, its storage policy is Hot, its Container 1 tier is SSD tier, and Container 1 has three replicas, all of which are of the SSD storage type. + +# User Scenarios + +- User A needs a bucket that supports high-performance IO, so create a bucket with the storage policy set to Hot. Data written by User A to bucket will automatically be distributed across the SSD disks in the cluster. +- User B needs higher IO performance for the directory/prefix /project/metadata, so set the storage policy for the prefix /project/metadata to Hot. Subsequently, data written to /project/metadata will be automatically distributed across the SSD disks in the cluster. +- User C has already written key1 to the cluster and requires better IO performance. The storage policy for key1 can be set to Hot, and then a migration can be triggered to move key1 to the SSD disks. +- Use D use command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class XXX` upload a file the Ozone SSD tier + +# Current Status + +- Ozone currently has some support for tiered storage such as storage type, and some parts of this article may already be implemented. +- Currently, in Ozone, when a key is created, the key's Block can appear on any volume of a Datanode. When a key is created, SCM first needs to allocate a Block for the key through Pipelines. The Client then writes the Block to the corresponding Datanode based on the Pipeline information. In this process, the smallest element managed by the SCM Pipeline is the Datanode, and when the Datanode creates a Container, the Container may appear on any volume with enough remaining space. Under the current architecture, Ozone does not support writing data to specific disks + +# Goal Requirements Specification + +### **Support for Storage Policy Writing and Management** + +- **Writing keys**: Allow keys to be written to specified storage tiers based on storage policies. +- **Policy Management**: Enable setting, unsetting, and inheriting storage policies for keys, prefixes, and buckets. Inherit policies based on the longest matching prefix or bucket if no specific policy is set. + +### **Support for Data Migration Across Different Storage Policies** + +- **Data Migration**: Support data migration across different storage policies via manual triggers, ensuring data is moved to the appropriate storage tiers. + +### **Adaptation of AWS S3 StorageClass** + +- **S3 StorageClass Mapping**: Map AWS S3 storage classes to Ozone storage policies, supporting related API operations (PutObject, CopyObject, Multipart Upload, GetObject, HeadObject, ListObjects). + +### **Management and Monitoring Tools** + +- **Storage Policy Commands**: Provide tools to view storage policies of containers, datanode usage, and pipeline information. +- **Metrics and Monitoring**: Enable visibility into storage policy compliance, container storage types, and space information across different storage policies. + +### **Future Enhancements** + +- **Intelligent Storage Policies**: Plan to support automatic data migration based on access frequency, similar to S3 Intelligent-Tiering. +- **Bucket StorageClass Lifecycle Rules: Support setting storage policies Lifecycle Rules at the bucket level.** +- **Recon Support**: Enhance Recon to display
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
xichen01 commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r2873208891 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- + + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); +- The storage type is the property of volume and Container replicas (Managed by DN); +- Only the storage policy can be modified by the user directly via ozone command; + +Example: + +For a keyA, its storage policy is Hot, its Container 1 tier is SSD tier, and Container 1 has three replicas, all of which are of the SSD storage type. + +# User Scenarios + +- User A needs a bucket that supports high-performance IO, so create a bucket with the storage policy set to Hot. Data written by User A to bucket will automatically be distributed across the SSD disks in the cluster. +- User B needs higher IO performance for the directory/prefix /project/metadata, so set the storage policy for the prefix /project/metadata to Hot. Subsequently, data written to /project/metadata will be automatically distributed across the SSD disks in the cluster. +- User C has already written key1 to the cluster and requires better IO performance. The storage policy for key1 can be set to Hot, and then a migration can be triggered to move key1 to the SSD disks. +- Use D use command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class XXX` upload a file the Ozone SSD tier + +# Current Status + +- Ozone currently has some support for tiered storage such as storage type, and some parts of this article may already be implemented. +- Currently, in Ozone, when a key is created, the key's Block can appear on any volume of a Datanode. When a key is created, SCM first needs to allocate a Block for the key through Pipelines. The Client then writes the Block to the corresponding Datanode based on the Pipeline information. In this process, the smallest element managed by the SCM Pipeline is the Datanode, and when the Datanode creates a Container, the Container may appear on any volume with enough remaining space. Under the current architecture, Ozone does not support writing data to specific disks + +# Goal Requirements Specification + +### **Support for Storage Policy Writing and Management** + +- **Writing keys**: Allow keys to be written to specified storage tiers based on storage policies. +- **Policy Management**: Enable setting, unsetting, and inheriting storage policies for keys, prefixes, and buckets. Inherit policies based on the longest matching prefix or bucket if no specific policy is set. + +### **Support for Data Migration Across Different Storage Policies** + +- **Data Migration**: Support data migration across different storage policies via manual triggers, ensuring data is moved to the appropriate storage tiers. + +### **Adaptation of AWS S3 StorageClass** + +- **S3 StorageClass Mapping**: Map AWS S3 storage classes to Ozone storage policies, supporting related API operations (PutObject, CopyObject, Multipart Upload, GetObject, HeadObject, ListObjects). + +### **Management and Monitoring Tools** + +- **Storage Policy Commands**: Provide tools to view storage policies of containers, datanode usage, and pipeline information. +- **Metrics and Monitoring**: Enable visibility into storage policy compliance, container storage types, and space information across different storage policies. + +### **Future Enhancements** + +- **Intelligent Storage Policies**: Plan to support automatic data migration based on access frequency, similar to S3 Intelligent-Tiering. +- **Bucket StorageClass Lifecycle Rules: Support setting storage policies Lifecycle Rules at the bucket level.** +- **Recon Support**: Enhance Recon to display r
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
xichen01 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-3985905946 @greenwich @chungen0126 @errose28 @vtutrinov I will update this document, adding more detailed content and incorporating some minor changes. I will try to complete it this week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
greenwich commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r2870767609 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- + + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); +- The storage type is the property of volume and Container replicas (Managed by DN); +- Only the storage policy can be modified by the user directly via ozone command; + +Example: + +For a keyA, its storage policy is Hot, its Container 1 tier is SSD tier, and Container 1 has three replicas, all of which are of the SSD storage type. + +# User Scenarios + +- User A needs a bucket that supports high-performance IO, so create a bucket with the storage policy set to Hot. Data written by User A to bucket will automatically be distributed across the SSD disks in the cluster. +- User B needs higher IO performance for the directory/prefix /project/metadata, so set the storage policy for the prefix /project/metadata to Hot. Subsequently, data written to /project/metadata will be automatically distributed across the SSD disks in the cluster. +- User C has already written key1 to the cluster and requires better IO performance. The storage policy for key1 can be set to Hot, and then a migration can be triggered to move key1 to the SSD disks. +- Use D use command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class XXX` upload a file the Ozone SSD tier + +# Current Status + +- Ozone currently has some support for tiered storage such as storage type, and some parts of this article may already be implemented. +- Currently, in Ozone, when a key is created, the key's Block can appear on any volume of a Datanode. When a key is created, SCM first needs to allocate a Block for the key through Pipelines. The Client then writes the Block to the corresponding Datanode based on the Pipeline information. In this process, the smallest element managed by the SCM Pipeline is the Datanode, and when the Datanode creates a Container, the Container may appear on any volume with enough remaining space. Under the current architecture, Ozone does not support writing data to specific disks + +# Goal Requirements Specification + +### **Support for Storage Policy Writing and Management** + +- **Writing keys**: Allow keys to be written to specified storage tiers based on storage policies. +- **Policy Management**: Enable setting, unsetting, and inheriting storage policies for keys, prefixes, and buckets. Inherit policies based on the longest matching prefix or bucket if no specific policy is set. + +### **Support for Data Migration Across Different Storage Policies** + +- **Data Migration**: Support data migration across different storage policies via manual triggers, ensuring data is moved to the appropriate storage tiers. + +### **Adaptation of AWS S3 StorageClass** + +- **S3 StorageClass Mapping**: Map AWS S3 storage classes to Ozone storage policies, supporting related API operations (PutObject, CopyObject, Multipart Upload, GetObject, HeadObject, ListObjects). + +### **Management and Monitoring Tools** + +- **Storage Policy Commands**: Provide tools to view storage policies of containers, datanode usage, and pipeline information. +- **Metrics and Monitoring**: Enable visibility into storage policy compliance, container storage types, and space information across different storage policies. + +### **Future Enhancements** + +- **Intelligent Storage Policies**: Plan to support automatic data migration based on access frequency, similar to S3 Intelligent-Tiering. +- **Bucket StorageClass Lifecycle Rules: Support setting storage policies Lifecycle Rules at the bucket level.** +- **Recon Support**: Enhance Recon to display
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
xichen01 commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r2867351940 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- + + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); +- The storage type is the property of volume and Container replicas (Managed by DN); +- Only the storage policy can be modified by the user directly via ozone command; + +Example: + +For a keyA, its storage policy is Hot, its Container 1 tier is SSD tier, and Container 1 has three replicas, all of which are of the SSD storage type. + +# User Scenarios + +- User A needs a bucket that supports high-performance IO, so create a bucket with the storage policy set to Hot. Data written by User A to bucket will automatically be distributed across the SSD disks in the cluster. +- User B needs higher IO performance for the directory/prefix /project/metadata, so set the storage policy for the prefix /project/metadata to Hot. Subsequently, data written to /project/metadata will be automatically distributed across the SSD disks in the cluster. +- User C has already written key1 to the cluster and requires better IO performance. The storage policy for key1 can be set to Hot, and then a migration can be triggered to move key1 to the SSD disks. +- Use D use command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class XXX` upload a file the Ozone SSD tier + +# Current Status + +- Ozone currently has some support for tiered storage such as storage type, and some parts of this article may already be implemented. +- Currently, in Ozone, when a key is created, the key's Block can appear on any volume of a Datanode. When a key is created, SCM first needs to allocate a Block for the key through Pipelines. The Client then writes the Block to the corresponding Datanode based on the Pipeline information. In this process, the smallest element managed by the SCM Pipeline is the Datanode, and when the Datanode creates a Container, the Container may appear on any volume with enough remaining space. Under the current architecture, Ozone does not support writing data to specific disks + +# Goal Requirements Specification + +### **Support for Storage Policy Writing and Management** + +- **Writing keys**: Allow keys to be written to specified storage tiers based on storage policies. +- **Policy Management**: Enable setting, unsetting, and inheriting storage policies for keys, prefixes, and buckets. Inherit policies based on the longest matching prefix or bucket if no specific policy is set. + +### **Support for Data Migration Across Different Storage Policies** + +- **Data Migration**: Support data migration across different storage policies via manual triggers, ensuring data is moved to the appropriate storage tiers. + +### **Adaptation of AWS S3 StorageClass** + +- **S3 StorageClass Mapping**: Map AWS S3 storage classes to Ozone storage policies, supporting related API operations (PutObject, CopyObject, Multipart Upload, GetObject, HeadObject, ListObjects). + +### **Management and Monitoring Tools** + +- **Storage Policy Commands**: Provide tools to view storage policies of containers, datanode usage, and pipeline information. +- **Metrics and Monitoring**: Enable visibility into storage policy compliance, container storage types, and space information across different storage policies. + +### **Future Enhancements** + +- **Intelligent Storage Policies**: Plan to support automatic data migration based on access frequency, similar to S3 Intelligent-Tiering. +- **Bucket StorageClass Lifecycle Rules: Support setting storage policies Lifecycle Rules at the bucket level.** +- **Recon Support**: Enhance Recon to display r
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
greenwich commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-3956982569 @ivandika3 Thanks for the clarification — that makes sense. I fully understand that contributions are voluntary, and I really appreciate the time and effort everyone is putting into this. I’m very interested in helping move this forward and would be glad to contribute where it makes the most sense. Regarding Phase 2, just thinking ahead — would it make sense to initially target a simpler mover-style implementation (similar in spirit to HDFS Mover) before introducing a separate job worker subsystem? That might allow basic Storage Policy Migration functionality to be delivered earlier and iterated on over time. Of course, I’m happy to align with the broader design direction — just exploring whether an incremental path could also work here. Please let me know how I can best contribute. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
ivandika3 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-3956283360 > the diff above has references to the following non-existent files (relative to ozone-1.4.1): These refer to our internal placement policy to support multi-DC setup and can be ignored as these do not pertain the any current functionality in community Ozone. The patch diff serve only as an overview of what the changes might look like. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
ivandika3 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-3956078125 @xichen01 Let's reopen this patch and move this forward. I am willing spend some time on the storage policy support development. Hopefully other community members (@errose28 @chungen0126) are also able to help to push the review process. @greenwich As mentioned in the https://github.com/apache/ozone/discussions/8811#discussioncomment-13928693 discussion last year. There are mainly two phases of Ozone Storage Policy Support 1. Supporting storage policy and storage types on Ozone 2. Storage Policy Migration support Phase 1 goal is to allow client to upload keys with different storage policy and integrate storage policy and types to all Ozone pipeline, containers, etc. This will be our short term focus now since we are not introducing any new subsystem Phase 2 goal is to support Storage Policy Migration. However, our current implementation requires a separate job worker subsystem. This will be longer term since we are introducing some new subsystem. If you don't require a separate job worker subsystem, you might need to write your own implementation of "Storage Policy Satisfier". That said, hope you understand that all contributions to Ozone are purely voluntary and made by members with other higher level priorities and therefore we cannot 100% guarantee that this will be done in a timely manner. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
greenwich commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-3955939286 @xichen01, Thanks for the update. That's great news indeed! In that case, there is no point for me to invent a wheel. I had a quick look at the patch you submitted earlier, and it looks comprehensive. I will have a detailed look today. Would it be possible to move forward with that PR (and merge it, or at least create a working branch and sync it with master)? Also, as I am very interested in that feature, I can provide assistance on my end. cc: @ivandika3 @chungen0126 Please let me know. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
xichen01 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-3953103047 @errose28 Thanks for noticing this PR. @greenwich Thank you for the update. ### Regarding the current status of this PR The current PR hasn't been updated for a while, mainly because there doesn't seem to be a strong demand for this feature from other members of the community, so reviews have stagnated. However, this feature has been fully implemented internally, including support for StoragePolicy across all S3 and Filesystem write interfaces, as well as support for StoragePolicy in ReplicationManager and ContainerBalancer. We've basically implemented it according to this design document (some parts of the document need updating; I can update the document if needed). We also support S3 Lifecycle (https://issues.apache.org/jira/browse/HDDS-8342), allowing you to set a Lifecycle for a Bucket to migrate specified keys to a specified StoragePolicy at a specified time (including from SSD migrate to DISK, and also from THREE Replication migrate to EC) or similar HDFS SPS (Storage Policy Satisfaction) functionality. ### Follow-up If you or others in the community are willing, we can continue to move forward with this PR, of course, you can also move forward with your own proposals, we can cooperate if you need it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
greenwich commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-3947561978 @errose28 Thanks for looking! I am very sorry for the noise my PR caused; it wasn't intended to be public (My bad - I haven't set it to Draft when I created it). Let me explain my motivation. 1. My team **needs** a storage policy and tiering support for Ozone. Unfortunately, reality is tough, and if we don't have it this year (ideally H1), then the Ozone might be deprioritised because we use other Ozone competitors in the company. 2. The patch attached to this pull request doesn't seem to be up to date with the master or even 2.0 or 2.1 releases. 3. As I can see, there has been no work on this PR since Nov 2025. 4. Plus, I might be wrong, but I got an impression that we want a perfect design and implementation here before we start coding. 5. My approach is different - I want to design and build it **incrementally** because that's the simplest way to adopt it in my team, start using it and collect the feedback from others. So - I analysed the current state, reviewed the contents of the patch in this pull request, and created a small roadmap of the features I want in storage tiering. - I prioritised them and split them into small releases. I called them MVP-1, MVP-2, etc. Each MVP should take me around 1 week to implement. - Each MVP should have a complete set of features that work e2e. At the moment, I work on MVP-1. - I planned to test MVPs on our PROD environment, and if it works as expected, then go back to the Ozone community and share what I have. 6. As I haven't created any Jira tickets because I thought of implementing and testing my MVP first. However, each independent feature is pushed as a separate commit, so it might later be retrofitted into different ASF jira tickeets (if needed). 7. The last thing, using Apache Ozone's PR feature is very useful for me because it triggers the whole CI and whatnot and allows seeing diff with the master to make my branch in sync. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
errose28 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-3946550817 Hi @greenwich I see you have also opened #9807. If you would like to continue work on this, we should start by reaching agreement on a design doc. I'm not sure we finished that process yet. @ivandika3 @xichen01 does this doc need more updates/review? Should we continue work on it here or open a new PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
greenwich commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-3919009991 Hi, do you know if any work is planned for this ticket? AFAIK, the patch diff wasn't added to the branch and is probably out of date right now. What are the next steps here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
github-actions[bot] closed pull request #6989: HDDS-11233. Ozone Storage Policy Support. URL: https://github.com/apache/ozone/pull/6989 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
github-actions[bot] commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-3549968542 Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
github-actions[bot] commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-3519269594 This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
vtutrinov commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-2988190589 @ivandika3 @xichen01 the diff above has references to the following non-existent files (relative to ozone-1.4.1): ``` hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementDataCenterAware.java hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementDataRecovery.java hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/PipelinePlacementDataCenterAware.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/TestSCMContainerPlacementDataCenterAware.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/TestSCMContainerPlacementDataCenterAwareSpecialCase.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/TestSCMContainerPlacementDataRecovery.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/TestSCMContainerPlacementDcFlow.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/pipeline/TestPipelinePlacementDataCenterAware.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/diskbalancer/TestDiskBalancerService.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/TestSCMContainerPlacementStorageTier.java hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/client/StorageTierUtil.java hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/NodeUtils.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/TestSpecialCloseContainerEventHandler.java hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/container/TestPeriodicContainerCloser.java hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/fs/ozone/AbstractRootedOzoneFileSystemTest.java hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/shell/UpdateBucketOptions.java hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/client/StorageTypeUtils.java hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestOzoneStoragePolicy.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/balancer/TestContainerBalancerTaskDcFlow.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/diskbalancer/DiskBalancerService.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/diskbalancer/DiskBalancerUtils.java hadoop-hdds/common/src/test/java/org/apache/hadoop/hdds/client/StorageTierUtilTest.java hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/dcflow/ContainerBalancerSelectionCriteriaDcFlow.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/AbstractStorageTypeChoosingPolicy.java hadoop-hdds/tools/src/main/java/org/apache/hadoop/hdds/scm/cli/storagepolicy/StoragePolicyCommands.java hadoop-hdds/tools/src/main/java/org/apache/hadoop/hdds/scm/cli/storagepolicy/UsageInfoSubCommand.java ``` Could you provide them too, or point me to the commit where I can fetch them? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
ivandika3 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-2976778259 @vtutrinov Thanks for the reminder. I have attached https://issues.apache.org/jira/secure/attachment/13077025/storage-policy-diff.tar.gz for the list of diffs of the storage policy integration. Please be reminded to attribute @xichen01 for any commits generated from these diffs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
vtutrinov commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-2976630111 @ivandika3 I don't want to rush, but is there any news about the mentioned diff? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
vtutrinov commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-2955514606 @ivandika3 it would be great! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
ivandika3 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-2955463014 @vtutrinov the fastest way I think we can do is to provide you with the diffs. However this diff won't apply cleanly on the master branch since our branch is based on 1.4.1 version with some of our internal specific changes. I probably can provide some of it this weekend. Feature branch in my fork might take a while since we need to resolve the conflicts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
vtutrinov commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-2955438634 @ivandika3 thanks for the response! Can we glance at the implementation as the first phase (maybe in a custom feature branch)? Or are there too many private details? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
ivandika3 commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-2953554337 @vtutrinov Currently the implementation is being worked on internally for the past one year. The basic implementation of storage policy and storage types integration on containers, pipelines, volumes, s3 storage class, and creating key / file with storage policy has been implemented but still need extensive testing. Currently we are focusing on storage policy migration implementation. @xichen01 would know more about the approximate timestamps, but we hope to have a working implementation in the next quarter (i.e. Q3 2025) or so. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
vtutrinov commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-2915981608 @xichen01 @kerneltime @sodonnel, could you help somehow to force the review of the design doc? The feature is very needed, and I would gladly start implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
vtutrinov commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-2900322703 @xichen01 is there an understanding of the time frame for the functionality to be implemented? I'd start creating the JIRA tickets and implementing them -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
xichen01 commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r2100544578 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- + + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); Review Comment: Storage Tier is more like the `ReplicationConfig`, will be a independent fields in `ContainerInfo` and `Pipeline` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
vtutrinov commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r2098394743 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- + + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); Review Comment: I meant org.apache.hadoop.hdds.scm.net.NodeSchema. Will the Storage Tier (aka `rack of specific storage volumes`) become a part of the network topology -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
xichen01 commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r2098242333 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- + + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); +- The storage type is the property of volume and Container replicas (Managed by DN); +- Only the storage policy can be modified by the user directly via ozone command; + +Example: + +For a keyA, its storage policy is Hot, its Container 1 tier is SSD tier, and Container 1 has three replicas, all of which are of the SSD storage type. + +# User Scenarios + +- User A needs a bucket that supports high-performance IO, so create a bucket with the storage policy set to Hot. Data written by User A to bucket will automatically be distributed across the SSD disks in the cluster. +- User B needs higher IO performance for the directory/prefix /project/metadata, so set the storage policy for the prefix /project/metadata to Hot. Subsequently, data written to /project/metadata will be automatically distributed across the SSD disks in the cluster. +- User C has already written key1 to the cluster and requires better IO performance. The storage policy for key1 can be set to Hot, and then a migration can be triggered to move key1 to the SSD disks. +- Use D use command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class XXX` upload a file the Ozone SSD tier + +# Current Status + +- Ozone currently has some support for tiered storage such as storage type, and some parts of this article may already be implemented. +- Currently, in Ozone, when a key is created, the key's Block can appear on any volume of a Datanode. When a key is created, SCM first needs to allocate a Block for the key through Pipelines. The Client then writes the Block to the corresponding Datanode based on the Pipeline information. In this process, the smallest element managed by the SCM Pipeline is the Datanode, and when the Datanode creates a Container, the Container may appear on any volume with enough remaining space. Under the current architecture, Ozone does not support writing data to specific disks + +# Goal Requirements Specification + +### **Support for Storage Policy Writing and Management** + +- **Writing keys**: Allow keys to be written to specified storage tiers based on storage policies. +- **Policy Management**: Enable setting, unsetting, and inheriting storage policies for keys, prefixes, and buckets. Inherit policies based on the longest matching prefix or bucket if no specific policy is set. + +### **Support for Data Migration Across Different Storage Policies** + +- **Data Migration**: Support data migration across different storage policies via manual triggers, ensuring data is moved to the appropriate storage tiers. + +### **Adaptation of AWS S3 StorageClass** + +- **S3 StorageClass Mapping**: Map AWS S3 storage classes to Ozone storage policies, supporting related API operations (PutObject, CopyObject, Multipart Upload, GetObject, HeadObject, ListObjects). + +### **Management and Monitoring Tools** + +- **Storage Policy Commands**: Provide tools to view storage policies of containers, datanode usage, and pipeline information. +- **Metrics and Monitoring**: Enable visibility into storage policy compliance, container storage types, and space information across different storage policies. + +### **Future Enhancements** + +- **Intelligent Storage Policies**: Plan to support automatic data migration based on access frequency, similar to S3 Intelligent-Tiering. +- **Bucket StorageClass Lifecycle Rules: Support setting storage policies Lifecycle Rules at the bucket level.** +- **Recon Support**: Enhance Recon to display r
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
vtutrinov commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r2097266132 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- + + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); Review Comment: Will we deal with the storage tier as an entry of the cluster topology? ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- + + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); +- The storage type is the property of volume and Container replicas (Managed by DN); +- Only the storage policy can be modified by the user directly via ozone command; + +Example: + +For a keyA, its storage policy is Hot, its Container 1 tier is SSD tier, and Container 1 has three replicas, all of which are of the SSD storage type. + +# User Scenarios + +- User A needs a bucket that supports high-performance IO, so create a bucket with the storage policy set to Hot. Data written by User A to bucket will automatically be distributed across the SSD disks in the cluster. +- User B needs higher IO performance for the directory/prefix /project/metadata, so set the storage policy for the prefix /project/metadata to Hot. Subsequently, data written to /project/metadata will be automatically distributed across the SSD disks in the cluster. +- User C has already written key1 to the cluster and requires better IO performance. The storage policy for key1 can be set to Hot, and then a migration can be triggered to move key1 to the SSD disks. +- Use D use command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class XXX` upload a file the Ozone SSD tier + +# Current Status + +- Ozone currently has some support for tiered storage such as storage type, and some parts of this article may already be implemented. +- Currently, in Ozone, when a key is created, the key's Block can appear on any volume of a Datanode. When a key is created, SCM first needs to allocate a Block for the key through Pipelines. The Client then writes the Block to the corresponding Datanode based on the Pipeline information. In this process, the smallest element managed by the SCM Pipeline is the Datanode, and when the Datanode creates a Container, the Container may appear on any volume with enough remaining space. Under the current architecture, Ozone does not support writing data to specific disks + +# Goal Requirements Specification + +### **Support for Storage Policy Writing
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
ivandika3 commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r1692365363 ## hadoop-hdds/docs/content/design/storage-policy.md: ## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- + + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); +- The storage type is the property of volume and Container replicas (Managed by DN); +- Only the storage policy can be modified by the user directly via ozone command; + +Example: + +For a keyA, its storage policy is Hot, its Container 1 tier is SSD tier, and Container 1 has three replicas, all of which are of the SSD storage type. + +# User Scenarios + +- User A needs a bucket that supports high-performance IO, so create a bucket with the storage policy set to Hot. Data written by User A to bucket will automatically be distributed across the SSD disks in the cluster. +- User B needs higher IO performance for the directory/prefix /project/metadata, so set the storage policy for the prefix /project/metadata to Hot. Subsequently, data written to /project/metadata will be automatically distributed across the SSD disks in the cluster. +- User C has already written key1 to the cluster and requires better IO performance. The storage policy for key1 can be set to Hot, and then a migration can be triggered to move key1 to the SSD disks. +- Use D use command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class XXX` upload a file the Ozone SSD tier + +# Current Status + +- Ozone currently has some support for tiered storage such as storage type, and some parts of this article may already be implemented. +- Currently, in Ozone, when a key is created, the key's Block can appear on any volume of a Datanode. When a key is created, SCM first needs to allocate a Block for the key through Pipelines. The Client then writes the Block to the corresponding Datanode based on the Pipeline information. In this process, the smallest element managed by the SCM Pipeline is the Datanode, and when the Datanode creates a Container, the Container may appear on any volume with enough remaining space. Under the current architecture, Ozone does not support writing data to specific disks + +# Goal Requirements Specification + +### **Support for Storage Policy Writing and Management** + +- **Writing keys**: Allow keys to be written to specified storage tiers based on storage policies. +- **Policy Management**: Enable setting, unsetting, and inheriting storage policies for keys, prefixes, and buckets. Inherit policies based on the longest matching prefix or bucket if no specific policy is set. + +### **Support for Data Migration Across Different Storage Policies** + +- **Data Migration**: Support data migration across different storage policies via manual triggers, ensuring data is moved to the appropriate storage tiers. + +### **Adaptation of AWS S3 StorageClass** + +- **S3 StorageClass Mapping**: Map AWS S3 storage classes to Ozone storage policies, supporting related API operations (PutObject, CopyObject, Multipart Upload, GetObject, HeadObject, ListObjects). + +### **Management and Monitoring Tools** + +- **Storage Policy Commands**: Provide tools to view storage policies of containers, datanode usage, and pipeline information. +- **Metrics and Monitoring**: Enable visibility into storage policy compliance, container storage types, and space information across different storage policies. + +### **Future Enhancements** + +- **Intelligent Storage Policies**: Plan to support automatic data migration based on access frequency, similar to S3 Intelligent-Tiering. +- **Bucket StorageClass Lifecycle Rules: Support setting storage policies Lifecycle Rules at the bucket level.** +- **Recon Support**: Enhance Recon to display
Re: [PR] HDDS-11233. Ozone Storage Policy Support. [ozone]
kerneltime commented on PR #6989: URL: https://github.com/apache/ozone/pull/6989#issuecomment-2251173313 cc @sodonnel -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
