[ https://issues.apache.org/jira/browse/HDFS-8888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702393#comment-14702393 ]
Sanjay Radia edited comment on HDFS-8888 at 8/19/15 3:58 AM: ------------------------------------------------------------- There are several motivations for introducing Volumes to HDFS. Simplify management and implementation * Volumes make the management of some HDFS features simpler: Quotas, Encryption, Snapshots can become volume properties rather than properties of individual directories. As a unit of management, Volumes also offers strong isolations in the security settings. * It can simplify the implementation of some them. For example if we don’t allow renaming across a volume boundary then Snapshots’ implementation become easier. Will customers accept this restriction? Won’t some apps like Hive have to change since they rename from temp to final destination? Recall we disallow renames across encryption zones and customers have found that acceptable. Further, we changed Hive to deal with this restriction. * Volumes can also simplify the management of datasets. For example one can associate different other policies for volumes. For example one can setup backup policies across DR zones based on volumes. Isn’t it more flexible to have features like encryption, snapshots on arbitrary directories? Having a car with independent steering for each wheel is more flexible, but steering 2 wheels together makes a car easier to control. Volumes, while restricting the granularity, will simplify management and also the implementation. *Relation to Federation* How are volumes related to Federation? Currently in federation, each NN has a single volume. This Jira will allow each NN to have multiple volumes. Volumes adds to the Federation model. One can distribute/load balance volumes across NNs. Further it allows N+K failover especially when we add partial namespace caching (HDFS-XXXX). (More on this later.) *Other things to explore with Volumes* (outside the scope of this Jira) * Each volume could become its own RW lock with in the NN. This would improve parallelism within NN without much additional effort. * Each volume could have its own image/journal to allow relocation of a volume to another NN (see federation). * Associate storage policies with a volume such as the volume is backed by the same storage. The semantic allows new features like co-located data. was (Author: sanjay.radia): There are several motivations for introducing Volumes to HDFS. Simplify management and implementation * Volumes make the management of some HDFS features simpler: Quotas, Encryption, Snapshots can become volume properties rather than properties of individual directories. As a unit of management, Volumes also offers strong isolations in the security settings. * It can simplify the implementation of some them. For example if we don’t allow renaming across a volume boundary then Snapshots’ implementation become easier. Will customers accept this restriction? Won’t some apps like Hive have to change since they rename from temp to final destination? Recall we disallow renames across encryption zones and customers have found that acceptable. Further, we changed Hive to deal with this restriction. * Volumes can also simplify the management of datasets. For example one can associate different other policies for volumes. For example one can setup backup policies across DR zones based on volumes. Isn’t it more flexible to have features like encryption, snapshots on arbitrary directories? Having a car with independent steering for each wheel is more flexible, but steering 2 wheels together makes a car easier to control. Volumes, while restricting the granularity, will simplify management and also the implementation. *Relation to Federation* How are volumes related to Federation? Currently in federation, each NN has a single volume. This Jira will allow each NN to have multiple volumes. Volumes adds to the Federation model. One can distribute/load balance volumes across NNs. Further it allows N+K failover especially when we add partial namespace caching (HDFS-XXXX). (More on this later.) Other things to explore with Volumes (outside the scope of this Jira) * Each volume could become its own RW lock with in the NN. This would improve parallelism within NN without much additional effort. * Each volume could have its own image/journal to allow relocation of a volume to another NN (see federation). * Associate storage policies with a volume such as the volume is backed by the same storage. The semantic allows new features like co-located data. > Support volumes in HDFS > ----------------------- > > Key: HDFS-8888 > URL: https://issues.apache.org/jira/browse/HDFS-8888 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Haohui Mai > > There are multiple types of zones (e.g., snapshottable directories, > encryption zones, directories with quotas) which are conceptually close to > namespace volumes in traditional file systems. > This jira proposes to introduce the concept of volume to simplify the > implementation of snapshots and encryption zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)