We have a current setup with HW 2.5 using our own group provider for HDFS 
permissions. We did this because we manage permissions for thousands of users 
in thousands or groups which are recalculated every 15 minutes. We had namenode 
stability problems when we were using an LDAP provider (slow responses caused 
NN to fall over resulting in outages), once we switched to our group provider, 
we now get sub second response times and it's stable.

My question is we want to add HDFS encryption to this cluster. Our 'data lake' 
implementation maintains meta data on all data that we keep in the cluster. 
Right now, we have a single master Kerberos which owns every file in our lake 
folder. Every dataset that we store in a folder tree is owned by that user and 
uses a dataset specific group to control read access.

We are moving to a 'ring' system where instead of having a single Kerberos, we 
have a Kerberos per ring. Each ring owns an exclusive set of datasets. So, it's 
exactly the same but now ownership of the files is divided between the rings. 
The ring Kerberos must be a member of the dataset groups in order for chgrp to 
work.

Now, back to the question. We want to enable encryption. For now, simply using 
a single encryption zone for all rings is sufficient. How do we automatically 
configure the ranger side of this using the entitlement meta data in our lake. 
We'd basically want to configure N master Kerberos with RW access and a list of 
Hadoop groups (thousands) for the readers. All these would need to be able to 
access encrypted files, the union of the groups need to read, the masters need 
to write.

Is it possible to do this using REST APIs, efficiently. Ranger would be 
maintaining these information on behalf of HDFS as far as I can see. Can Ranger 
use the HDFS group provider as a source of groups for example? This would make 
this very easy.

Thanks for help
Billy

Reply via email to