[GitHub] [hadoop-ozone] arp7 commented on a change in pull request #1009: HDDS-3612. Document details of bucket mount design

GitBox Tue, 02 Jun 2020 09:31:19 -0700


arp7 commented on a change in pull request #1009:
URL: https://github.com/apache/hadoop-ozone/pull/1009#discussion_r434011254




##########
File path: hadoop-hdds/docs/content/design/ozone-volume-management.md
##########
@@ -4,7 +4,7 @@ summary: A simplified version of mapping between S3 buckets and 
Ozone volume/buc
 date: 2020-04-02
 jira: HDDS-3331
 status: accepted
-author: Marton Elek, Arpit Agarwall, Sunjay Radia
+author: Marton Elek, Arpit Agarwal, Sanjay Radia

Review comment:
       You can remove my name, I really cannot claim any authorship credit for 
this idea. 🙂 

##########
File path: hadoop-hdds/docs/content/design/ozone-volume-management.md
##########
@@ -106,19 +106,27 @@ This is an easy an fast method, but with this approach 
not all the volumes are a
 
 The first approach required a secondary cache table and it violates the naming 
hierarchy. The s3 bucket name is a global unique name, therefore it's more than 
just a single attribute on a specific object. It's more like an element in the 
hierachy. For this reason the second option is proposed:
 
-For example if the default s3 volume is `s3`
+For example if the default s3 volume is `s3v`
 
- 1. Every new buckets created via s3 interface will be placed under the `/s3` 
volume
- 2. Any existing **Ozone** buckets can be exposed with mounting it to s3: 
`ozone sh mount /vol1/bucket1 /s3/s3bucketname`
+ 1. Every new buckets created via s3 interface will be placed under the `/s3v` 
volume
+ 2. Any existing **Ozone** buckets can be exposed with mounting it to s3: 
`ozone sh mount /vol1/bucket1 /s3v/s3bucketname`
 
 **Lock contention problem**
 
-One possible problem with using just one volume is using the locks of the same 
volume for all the D3 buckets (thanks Xiaoyu). But this shouldn't be a big 
problem.
+One possible problem with using just one volume is using the locks of the same 
volume for all the S3 buckets (thanks Xiaoyu). But this shouldn't be a big 
problem.
 
  1. We hold only a READ lock. Most of the time it can acquired without any 
contention (writing lock is required only to change owner / set quota)
  2. For symbolic link / bind mounts the read lock is only required for the 
first read. After that the lock of the referenced volume will be used. In case 
of any performance problem multiple volumes + bind mounts can be used.
 
-Note: Sunjay is added to the authors as the original proposal of this approach.
+Note: Sanjay is added to the authors as the original proposal of this approach.
+
+#### Implementation details
+
+ * Let bucket mount operation create a link bucket.  Links are like regular 
buckets, stored in DB the same way, but with two new, optional pieces of 
information: source volume and bucket.
+ * Existing bucket operations (info, delete, ACL) work on the link object in 
the same way as they do on regular buckets.  No new link-specific RPC is 
required.
+ * Links are followed for key operations (list, get, put, etc.).  Checks for 
existence of the source bucket, as well as ACL, are performed at this time 
(similar to symlinks).  This avoids the need for reverse checks for each bucket 
delete or ACL change.

Review comment:
       Yeah this should work a lot like symlinks, so we shouldn't perform 
reverse checks on changes to the target bucket.

##########
File path: hadoop-hdds/docs/content/design/ozone-volume-management.md
##########
@@ -106,19 +106,27 @@ This is an easy an fast method, but with this approach 
not all the volumes are a
 
 The first approach required a secondary cache table and it violates the naming 
hierarchy. The s3 bucket name is a global unique name, therefore it's more than 
just a single attribute on a specific object. It's more like an element in the 
hierachy. For this reason the second option is proposed:
 
-For example if the default s3 volume is `s3`
+For example if the default s3 volume is `s3v`
 
- 1. Every new buckets created via s3 interface will be placed under the `/s3` 
volume
- 2. Any existing **Ozone** buckets can be exposed with mounting it to s3: 
`ozone sh mount /vol1/bucket1 /s3/s3bucketname`
+ 1. Every new buckets created via s3 interface will be placed under the `/s3v` 
volume
+ 2. Any existing **Ozone** buckets can be exposed with mounting it to s3: 
`ozone sh mount /vol1/bucket1 /s3v/s3bucketname`
 
 **Lock contention problem**
 
-One possible problem with using just one volume is using the locks of the same 
volume for all the D3 buckets (thanks Xiaoyu). But this shouldn't be a big 
problem.
+One possible problem with using just one volume is using the locks of the same 
volume for all the S3 buckets (thanks Xiaoyu). But this shouldn't be a big 
problem.
 
  1. We hold only a READ lock. Most of the time it can acquired without any 
contention (writing lock is required only to change owner / set quota)
  2. For symbolic link / bind mounts the read lock is only required for the 
first read. After that the lock of the referenced volume will be used. In case 
of any performance problem multiple volumes + bind mounts can be used.
 
-Note: Sunjay is added to the authors as the original proposal of this approach.
+Note: Sanjay is added to the authors as the original proposal of this approach.
+
+#### Implementation details
+
+ * Let bucket mount operation create a link bucket.  Links are like regular 
buckets, stored in DB the same way, but with two new, optional pieces of 
information: source volume and bucket.
+ * Existing bucket operations (info, delete, ACL) work on the link object in 
the same way as they do on regular buckets.  No new link-specific RPC is 
required.
+ * Links are followed for key operations (list, get, put, etc.).  Checks for 
existence of the source bucket, as well as ACL, are performed at this time 
(similar to symlinks).  This avoids the need for reverse checks for each bucket 
delete or ACL change.
+ * The same permission is required on both the link and the source bucket to 
be able to perform the operation via the link.  This allows finer-grained 
access control.

Review comment:
       We should probably try to match the behavior of Unix symlinks wrt 
permissions.

##########
File path: hadoop-hdds/docs/content/design/ozone-volume-management.md
##########
@@ -106,19 +106,27 @@ This is an easy an fast method, but with this approach 
not all the volumes are a
 
 The first approach required a secondary cache table and it violates the naming 
hierarchy. The s3 bucket name is a global unique name, therefore it's more than 
just a single attribute on a specific object. It's more like an element in the 
hierachy. For this reason the second option is proposed:
 
-For example if the default s3 volume is `s3`
+For example if the default s3 volume is `s3v`
 
- 1. Every new buckets created via s3 interface will be placed under the `/s3` 
volume
- 2. Any existing **Ozone** buckets can be exposed with mounting it to s3: 
`ozone sh mount /vol1/bucket1 /s3/s3bucketname`
+ 1. Every new buckets created via s3 interface will be placed under the `/s3v` 
volume
+ 2. Any existing **Ozone** buckets can be exposed with mounting it to s3: 
`ozone sh mount /vol1/bucket1 /s3v/s3bucketname`
 
 **Lock contention problem**
 
-One possible problem with using just one volume is using the locks of the same 
volume for all the D3 buckets (thanks Xiaoyu). But this shouldn't be a big 
problem.
+One possible problem with using just one volume is using the locks of the same 
volume for all the S3 buckets (thanks Xiaoyu). But this shouldn't be a big 
problem.
 
  1. We hold only a READ lock. Most of the time it can acquired without any 
contention (writing lock is required only to change owner / set quota)
  2. For symbolic link / bind mounts the read lock is only required for the 
first read. After that the lock of the referenced volume will be used. In case 
of any performance problem multiple volumes + bind mounts can be used.
 
-Note: Sunjay is added to the authors as the original proposal of this approach.
+Note: Sanjay is added to the authors as the original proposal of this approach.
+
+#### Implementation details
+
+ * Let bucket mount operation create a link bucket.  Links are like regular 
buckets, stored in DB the same way, but with two new, optional pieces of 
information: source volume and bucket.

Review comment:
       > Let bucket mount operation create a link bucket
   Didn't understand this sentence. Does it mean that when you try to mount a 
bucket in a new volume it silently creates a link under the covers? Is the link 
reused next time we try to mount again?
   
   Also how do we handle name collisions? Can the user choose any name for the 
link/mount point?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hadoop-ozone] arp7 commented on a change in pull request #1009: HDDS-3612. Document details of bucket mount design

Reply via email to