arp7 commented on a change in pull request #1009:
URL: https://github.com/apache/hadoop-ozone/pull/1009#discussion_r434011254
##########
File path: hadoop-hdds/docs/content/design/ozone-volume-management.md
##########
@@ -4,7 +4,7 @@ summary: A simplified version of mapping between S3 buckets and
Ozone volume/buc
date: 2020-04-02
jira: HDDS-3331
status: accepted
-author: Marton Elek, Arpit Agarwall, Sunjay Radia
+author: Marton Elek, Arpit Agarwal, Sanjay Radia
Review comment:
You can remove my name, I really cannot claim any authorship credit for
this idea. 🙂
##########
File path: hadoop-hdds/docs/content/design/ozone-volume-management.md
##########
@@ -106,19 +106,27 @@ This is an easy an fast method, but with this approach
not all the volumes are a
The first approach required a secondary cache table and it violates the naming
hierarchy. The s3 bucket name is a global unique name, therefore it's more than
just a single attribute on a specific object. It's more like an element in the
hierachy. For this reason the second option is proposed:
-For example if the default s3 volume is `s3`
+For example if the default s3 volume is `s3v`
- 1. Every new buckets created via s3 interface will be placed under the `/s3`
volume
- 2. Any existing **Ozone** buckets can be exposed with mounting it to s3:
`ozone sh mount /vol1/bucket1 /s3/s3bucketname`
+ 1. Every new buckets created via s3 interface will be placed under the `/s3v`
volume
+ 2. Any existing **Ozone** buckets can be exposed with mounting it to s3:
`ozone sh mount /vol1/bucket1 /s3v/s3bucketname`
**Lock contention problem**
-One possible problem with using just one volume is using the locks of the same
volume for all the D3 buckets (thanks Xiaoyu). But this shouldn't be a big
problem.
+One possible problem with using just one volume is using the locks of the same
volume for all the S3 buckets (thanks Xiaoyu). But this shouldn't be a big
problem.
1. We hold only a READ lock. Most of the time it can acquired without any
contention (writing lock is required only to change owner / set quota)
2. For symbolic link / bind mounts the read lock is only required for the
first read. After that the lock of the referenced volume will be used. In case
of any performance problem multiple volumes + bind mounts can be used.
-Note: Sunjay is added to the authors as the original proposal of this approach.
+Note: Sanjay is added to the authors as the original proposal of this approach.
+
+#### Implementation details
+
+ * Let bucket mount operation create a link bucket. Links are like regular
buckets, stored in DB the same way, but with two new, optional pieces of
information: source volume and bucket.
+ * Existing bucket operations (info, delete, ACL) work on the link object in
the same way as they do on regular buckets. No new link-specific RPC is
required.
+ * Links are followed for key operations (list, get, put, etc.). Checks for
existence of the source bucket, as well as ACL, are performed at this time
(similar to symlinks). This avoids the need for reverse checks for each bucket
delete or ACL change.
Review comment:
Yeah this should work a lot like symlinks, so we shouldn't perform
reverse checks on changes to the target bucket.
##########
File path: hadoop-hdds/docs/content/design/ozone-volume-management.md
##########
@@ -106,19 +106,27 @@ This is an easy an fast method, but with this approach
not all the volumes are a
The first approach required a secondary cache table and it violates the naming
hierarchy. The s3 bucket name is a global unique name, therefore it's more than
just a single attribute on a specific object. It's more like an element in the
hierachy. For this reason the second option is proposed:
-For example if the default s3 volume is `s3`
+For example if the default s3 volume is `s3v`
- 1. Every new buckets created via s3 interface will be placed under the `/s3`
volume
- 2. Any existing **Ozone** buckets can be exposed with mounting it to s3:
`ozone sh mount /vol1/bucket1 /s3/s3bucketname`
+ 1. Every new buckets created via s3 interface will be placed under the `/s3v`
volume
+ 2. Any existing **Ozone** buckets can be exposed with mounting it to s3:
`ozone sh mount /vol1/bucket1 /s3v/s3bucketname`
**Lock contention problem**
-One possible problem with using just one volume is using the locks of the same
volume for all the D3 buckets (thanks Xiaoyu). But this shouldn't be a big
problem.
+One possible problem with using just one volume is using the locks of the same
volume for all the S3 buckets (thanks Xiaoyu). But this shouldn't be a big
problem.
1. We hold only a READ lock. Most of the time it can acquired without any
contention (writing lock is required only to change owner / set quota)
2. For symbolic link / bind mounts the read lock is only required for the
first read. After that the lock of the referenced volume will be used. In case
of any performance problem multiple volumes + bind mounts can be used.
-Note: Sunjay is added to the authors as the original proposal of this approach.
+Note: Sanjay is added to the authors as the original proposal of this approach.
+
+#### Implementation details
+
+ * Let bucket mount operation create a link bucket. Links are like regular
buckets, stored in DB the same way, but with two new, optional pieces of
information: source volume and bucket.
+ * Existing bucket operations (info, delete, ACL) work on the link object in
the same way as they do on regular buckets. No new link-specific RPC is
required.
+ * Links are followed for key operations (list, get, put, etc.). Checks for
existence of the source bucket, as well as ACL, are performed at this time
(similar to symlinks). This avoids the need for reverse checks for each bucket
delete or ACL change.
+ * The same permission is required on both the link and the source bucket to
be able to perform the operation via the link. This allows finer-grained
access control.
Review comment:
We should probably try to match the behavior of Unix symlinks wrt
permissions.
##########
File path: hadoop-hdds/docs/content/design/ozone-volume-management.md
##########
@@ -106,19 +106,27 @@ This is an easy an fast method, but with this approach
not all the volumes are a
The first approach required a secondary cache table and it violates the naming
hierarchy. The s3 bucket name is a global unique name, therefore it's more than
just a single attribute on a specific object. It's more like an element in the
hierachy. For this reason the second option is proposed:
-For example if the default s3 volume is `s3`
+For example if the default s3 volume is `s3v`
- 1. Every new buckets created via s3 interface will be placed under the `/s3`
volume
- 2. Any existing **Ozone** buckets can be exposed with mounting it to s3:
`ozone sh mount /vol1/bucket1 /s3/s3bucketname`
+ 1. Every new buckets created via s3 interface will be placed under the `/s3v`
volume
+ 2. Any existing **Ozone** buckets can be exposed with mounting it to s3:
`ozone sh mount /vol1/bucket1 /s3v/s3bucketname`
**Lock contention problem**
-One possible problem with using just one volume is using the locks of the same
volume for all the D3 buckets (thanks Xiaoyu). But this shouldn't be a big
problem.
+One possible problem with using just one volume is using the locks of the same
volume for all the S3 buckets (thanks Xiaoyu). But this shouldn't be a big
problem.
1. We hold only a READ lock. Most of the time it can acquired without any
contention (writing lock is required only to change owner / set quota)
2. For symbolic link / bind mounts the read lock is only required for the
first read. After that the lock of the referenced volume will be used. In case
of any performance problem multiple volumes + bind mounts can be used.
-Note: Sunjay is added to the authors as the original proposal of this approach.
+Note: Sanjay is added to the authors as the original proposal of this approach.
+
+#### Implementation details
+
+ * Let bucket mount operation create a link bucket. Links are like regular
buckets, stored in DB the same way, but with two new, optional pieces of
information: source volume and bucket.
Review comment:
> Let bucket mount operation create a link bucket
Didn't understand this sentence. Does it mean that when you try to mount a
bucket in a new volume it silently creates a link under the covers? Is the link
reused next time we try to mount again?
Also how do we handle name collisions? Can the user choose any name for the
link/mount point?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]