[Pig Wiki] Update of "HowlSecurity" by AlanGates

Apache Wiki Wed, 01 Sep 2010 16:08:42 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.


The "HowlSecurity" page has been changed by AlanGates.
http://wiki.apache.org/pig/HowlSecurity

--------------------------------------------------

New page:
This page will outline design of Howl Security. 

== Related Hive Work ==
[[https://issues.apache.org/jira/browse/HIVE-78|Jira for authorization support 
in Hive]]

== Authorization ==

Initially the thought is that Howl will have authorization implemented at some 
level to provide security. The initial implementation will be based on HDFS 
directory permissions. This may be enhanced/replaced by a role based model in a 
later release.
   
=== Permissions ===
The initial idea for authorization in Howl is to use the HDFS permissions to 
authorize metadata operations. To be able to do this, we would like to extend 
createTable() to add the ability to record a different group from the user's 
primary group and to record the complete Unix permissions on the table 
directory. Also, we would like to have a way for partition directories to 
inherit permissions and group information based on the table directory. To keep 
the metastore backward compatible for use with Hive, I propose having conf 
variables to achieve these objectives:
 * `table.group.name` : value will indicate the name of the Unix group for the 
table directory. This will be used by `createTable()` to perform a chgrp to the 
value provided. This property will provide the user the ability to choose from 
one of the many Unix groups he is part of to associate with the table.
 * `table.permissions` : value will be of the form `rwxrwxrwx` to indicate 
read-write-execute permissions on the table directory. This will be used by 
`createTable()` to perform a chmod to the value provided. This will let the 
user decide what permissions he wants on the table.
 * `partitions.inherit.permissions` : a value of true will indicate that 
partitions inherit the group name and permissions of the table level directory. 
 This will be used by `addPartition()` to perform a chgrp and chmod to the 
values as on the table directory.

Conf properties are preferable over API changes since the complete 
authorization design for Hive is not finalized yet. These properties can be 
deprecated/removed when that is in place. These properties would also be useful 
to some installation of vanilla Hive since at least DFS level authorization can 
now be achieved by Hive without the user having to manually perform chgrp and 
chmod operations on DFS.

=== Reading data(Select)/Writing data (Insert) ===
This will simply be governed by the dfs permission at the time of the read and 
will result in runtime errors if the user does not have permissions.

=== Create table ===

==== Internal/External table without location specified ====
If the user has permissions to the directory pointed by 
`hive.metastore.warehouse.dir` then he can create the table. 

==== Internal/External table with location specified ====
If the user has permissions to the location specified then he can create the 
table.

=== Drop Table ===
A user can drop a table (internal or external) only if he has write permissions 
to the table directory. A user could have write permission either by virtue of 
him being the owner of the table or through the group he belongs
to. So if the permissions on the table directory allow him to write to it, he 
can drop the table.

=== Partition permissions ===
Partition directories will inherit the permissions/(owner,group) of the table 
directory.

=== Alter table ===
A user can "alter" table if he has write permissions on the table directory. So 
any of the following alter table commands are allowed only if the user has 
write permissions on the table directory:
 * `ALTER TABLE table_name ADD partition_spec [ LOCATION 'location1' ] 
partition_spec [ LOCATION 'location2' ] ...`
 * `ALTER TABLE table_name DROP partition_spec, partition_spec,...`
 * `ALTER TABLE table_name RENAME TO new_table_name`
 * `ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name 
column_type [COMMENT col_comment] [FIRST|AFTER column_name]`
 * `ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT 
col_comment], ...)`
 * `ALTER TABLE table_name SET TBLPROPERTIES table_properties`
 * `ALTER TABLE table_name SET SERDE serde_class_name [WITH SERDEPROPERTIES 
serde_properties]`
 * `ALTER TABLE table_name SET SERDEPROPERTIES serde_properties`
 * `ALTER TABLE table_name SET FILEFORMAT file_format`
 * `ALTER TABLE table_name CLUSTERED BY (col_name, col_name, ...) [SORTED BY 
(col_name, ...)] INTO num_buckets BUCKETS`
 * `ALTER TABLE table_name TOUCH;`
 * `ALTER TABLE table_name TOUCH PARTITION partition_spec;`

=== Show tables ===
Since the top level warehouse dir will have read/write permissions for 
everyone, show tables will show all tables to all users.

=== Show Table/Partitions Extended ===
A user can issue "show table/partitions extended" on a table only if he has 
read permissions on the table directory. This query is of the form:
 * `SHOW TABLE EXTENDED [IN|FROM database_name] LIKE identifier_with_wildcards 
[PARTITION(partition_desc)]`

=== Show partitions ===
A user can issue `show partitions` on a table only if he has read permissions 
on the table directory.

=== Describe table/column/partition ===
A user can issue `describe table/column/partition` on a table only if he has 
read permissions on the table directory.

=== Database related operations ===

==== create db ====
Just like `create table`, `create db` will have `db.group.name` and 
`db.permissions` properties which will dictate the group and permissions of the 
db directory.
This will be set up by the Howl CLI and the database directory will need to be 
updated with the appropriate chgroup and chmod operations. There will be 
'''NO''' inheritance of permissions from db directory to table directory. The 
table directory can have potentially different group/perms from the db 
directory.

==== use db ====
`use db` will be permitted only if the user has read permission on the db 
directory. So subsequent operation like `create table` will still be authorized 
based on the rules laid above once the `use db` call has been authorized. So 
the user would need write permission on the db directory to be able create the 
table directory under it.

If  db.tablename syntax is supported (I believe it may not be supported in the 
initial commit), then `create db.tablename` will need to check that the user 
has write permission on db directory.

=== Implementation Details ===

==== Howl specific semantic Analyzers ====
To implement a CLI, Howl will have Howl specific semantic analyzers in place. 
It will be in these Howl specific semantic analyzers that the checks outlined 
above will be made to implement authorization.

==== Howl CLI ====
The Howl CLI program will take `--group` and `--perms` commandline options 
which will only apply to `create table` DDL queries. The value for `--group` 
will
indicate the name of the Unix group for the table directory. The value for 
`--perms` will be of the form `rwxrwxrwx` which will indicate the Unix 
permissions on
the table directory. The CLI program will have to partially parse the user 
supplied query to look for `create table .*` and set these values in the 
!HiveConf
(for use by `createTable()` metastore API). If these are not supplied, the CLI 
program should check what the umask is and warn the user that the table will be
create with permissions dictated by the umask and if that is not intended, the 
user should drop and re-create the table with `--group` and `--perms` options. 
Similarly it should warn when the perms are too permissive like `rwx` for 
others.

To be able to do this, we should extend `createTable()` to add the ability to 
record a different group from the user's primary group and to record the 
complete
Unix permissions on the table directory. Also, we would like to have a way for 
partition directories to inherit permissions and group information based on the
table directory. To keep the metastore backward compatible for use with Hive, 
the conf variables discussed above will be used.

The Howl CLI will always set the property `partitions.inherit.permissions` to 
true. `createTable()` should also store these as table properties in the 
metastore
so that a subsequent `addPartition()` can look at these and also do a chgrp and 
chmod - the changes in `addPartition()` should also be implemented.

== Authentication ==

One line of thought is to use HTTP as transport and Thrift as serialization 
mechanism. Since in this setup the Howl server would be a Tomcat server, 
standard means of authentication for a tomcat server can be used. The one 
challenge is that !HowlOutputFormat will need to connect to this server from 
the cluster nodes - authenticating those requests is difficult since they are 
on behalf of the user and not by the user himself.

Design yet to come.

[[https://issues.apache.org/jira/browse/THRIFT-814|Jira on HTTP servlet support 
in Thrift]]

Pradeep

[Pig Wiki] Update of "HowlSecurity" by AlanGates

Reply via email to