[ 
https://issues.apache.org/jira/browse/HDFS-8888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702393#comment-14702393
 ] 

Sanjay Radia edited comment on HDFS-8888 at 8/19/15 3:58 AM:
-------------------------------------------------------------

There are several motivations for introducing Volumes to HDFS.

Simplify management and implementation
* Volumes make  the management of some HDFS features simpler: Quotas, 
Encryption, Snapshots can become volume properties rather than properties of 
individual directories. As a unit of management, Volumes also offers strong 
isolations in the security settings.
* It can simplify  the implementation of some them. For example if we don’t 
allow renaming across a volume boundary then Snapshots’ implementation become 
easier.  Will customers accept this restriction? Won’t some apps like Hive have 
to change since they rename from temp to final destination? Recall we disallow 
renames across encryption zones and customers have found that acceptable. 
Further, we changed Hive to  deal with this restriction.
* Volumes can also simplify the management of datasets. For example one can 
associate different other policies for volumes. For example one can setup 
backup policies across DR zones based on volumes. 

Isn’t it  more flexible to have features like encryption, snapshots on 
arbitrary directories? Having a car with independent steering for each wheel is 
more flexible, but steering 2 wheels together makes a car easier to control. 
Volumes, while restricting the granularity, will simplify management and also 
the implementation.

*Relation to Federation*
How are volumes related to Federation? Currently in federation, each NN has a 
single volume. This Jira will allow each NN to have multiple volumes. Volumes 
adds to the Federation model. One can distribute/load balance volumes across 
NNs. Further it allows N+K failover especially when we add partial namespace 
caching (HDFS-XXXX). (More on this later.)

*Other things to explore with Volumes* (outside the scope of this Jira)
* Each volume could become its own RW lock with in the NN. This would improve 
parallelism within NN without much additional effort.
* Each volume could have its own image/journal to allow relocation of a volume 
to another NN (see federation).
* Associate storage policies with  a volume such as the volume is  backed by 
the same storage. The semantic allows new features like co-located data.






was (Author: sanjay.radia):
There are several motivations for introducing Volumes to HDFS.

Simplify management and implementation
* Volumes make  the management of some HDFS features simpler: Quotas, 
Encryption, Snapshots can become volume properties rather than properties of 
individual directories. As a unit of management, Volumes also offers strong 
isolations in the security settings.
* It can simplify  the implementation of some them. For example if we don’t 
allow renaming across a volume boundary then Snapshots’ implementation become 
easier.  Will customers accept this restriction? Won’t some apps like Hive have 
to change since they rename from temp to final destination? Recall we disallow 
renames across encryption zones and customers have found that acceptable. 
Further, we changed Hive to  deal with this restriction.
* Volumes can also simplify the management of datasets. For example one can 
associate different other policies for volumes. For example one can setup 
backup policies across DR zones based on volumes. 

Isn’t it  more flexible to have features like encryption, snapshots on 
arbitrary directories? Having a car with independent steering for each wheel is 
more flexible, but steering 2 wheels together makes a car easier to control. 
Volumes, while restricting the granularity, will simplify management and also 
the implementation.

*Relation to Federation*
How are volumes related to Federation? Currently in federation, each NN has a 
single volume. This Jira will allow each NN to have multiple volumes. Volumes 
adds to the Federation model. One can distribute/load balance volumes across 
NNs. Further it allows N+K failover especially when we add partial namespace 
caching (HDFS-XXXX). (More on this later.)

Other things to explore with Volumes (outside the scope of this Jira)
* Each volume could become its own RW lock with in the NN. This would improve 
parallelism within NN without much additional effort.
* Each volume could have its own image/journal to allow relocation of a volume 
to another NN (see federation).
* Associate storage policies with  a volume such as the volume is  backed by 
the same storage. The semantic allows new features like co-located data.





> Support volumes in HDFS
> -----------------------
>
>                 Key: HDFS-8888
>                 URL: https://issues.apache.org/jira/browse/HDFS-8888
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haohui Mai
>
> There are multiple types of zones (e.g., snapshottable directories, 
> encryption zones, directories with quotas) which are conceptually close to 
> namespace volumes in traditional file systems.
> This jira proposes to introduce the concept of volume to simplify the 
> implementation of snapshots and encryption zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to