Re: Collect feedback for HDFS-15638

2020-10-26 Thread Xinli shang
Hi Stephen, I like your idea to have a Sentry-like plugin without changing the permission model. I will update once we finish it. Xinli On Mon, Oct 26, 2020 at 4:56 AM Stephen O'Donnell wrote: > My concern about this is that it breaks the posix compliance that HDFS > otherwise sticks to as

Re: Collect feedback for HDFS-15638

2020-10-26 Thread Stephen O'Donnell
My concern about this is that it breaks the posix compliance that HDFS otherwise sticks to as closely as possible. If we do this, then it opens the door for other "non-posix" things, which is a slippery slope. I would like to understand where Sentry breaks down performance wise, as management

Re: Collect feedback for HDFS-15638

2020-10-23 Thread Xinli shang
So far we have feedbacks regarding the use of default ACLs and Sentry with the NameNode plugin. Are there concerns about the risk of regression if we add this feature? On Mon, Oct 19, 2020 at 7:19 AM Xinli shang wrote: > Thanks Stephen for sharing out! > > I don't have those details as our

Re: Collect feedback for HDFS-15638

2020-10-19 Thread Xinli shang
Thanks Stephen for sharing out! I don't have those details as our benchmarking test was done quite a while ago and I wasn't a participant. But using Sentry does add one more service to HDFS critical path which will add overhead more or less especially for reliability. In a long run, we consider

Re: Collect feedback for HDFS-15638

2020-10-19 Thread Stephen O'Donnell
Were you able to trace where the Sentry plugin was causing problems? Was it during the initial sync of ACLs, or updates of ACLs, or during permission lookups when accessing files? Some time back, I dealt with a ~230M file cluster which was using Sentry. Accidentally, all the Sentry provided ACLs

Re: Collect feedback for HDFS-15638

2020-10-18 Thread Xinli shang
We are using Apache Sentry. On the large scale of HDFS, which is our case, we see a performance downgrade when enabling the Sentry plugin in NameNode. So we have to disable the plugin in NN and map Sentry policies to HDFS ACL. It works great so far. This is the only major issue we see. On Sun,

Re: Collect feedback for HDFS-15638

2020-10-18 Thread Stephen O'Donnell
I agree with Owen on this - I don't think this is a feature we should add to HDFS. If managing the permissions for Hive tables is becoming a big overhead for you, you should look into something like Sentry. It allows you to manage the permissions of all the files and folders under Hive tables in

Re: Collect feedback for HDFS-15638

2020-10-17 Thread Xinli shang
Hi Vinayakumar, The staging tables are dynamic. From the Hadoop security team perspective, it is unrealistic to force every data writer to do that because they are so many and they write in different ways. Rename is just one scenario and there are other scenarios. For example, when permission is

Re: Collect feedback for HDFS-15638

2020-10-17 Thread Vinayakumar B
IIUC, hive renames are from hive’s staging directory during write to final destination within table. Why not set the default ACLs of staging directory to whatever expected, and then continue write remaining files. In this way even after rename you will have expected ACLs on the final files.

Re: Collect feedback for HDFS-15638

2020-10-16 Thread Xinli shang
Thanks Owen for your reply! As mentioned in the Jira, default ACLs don't apply to rename. Any idea how rename can work without setting ACLs per file? On Fri, Oct 16, 2020 at 7:25 PM Owen O'Malley wrote: > I'm very -1 on adding these semantics. > > When you create the table's directory, set the

Re: Collect feedback for HDFS-15638

2020-10-16 Thread Owen O'Malley
I'm very -1 on adding these semantics. When you create the table's directory, set the default ACL. That will have exactly the effect that you are looking for without creating additional semantics. .. Owen On Fri, Oct 16, 2020 at 7:02 PM Xinli shang wrote: > Hi all, > > I opened

Collect feedback for HDFS-15638

2020-10-16 Thread Xinli shang
Hi all, I opened https://issues.apache.org/jira/browse/HDFS-15638 and want to collect feedback from the community. I know whenever changing the permission model that follows POSIX model is never a trivial change. So please comment on if you have concerns. For reading convenience, here is a copy