[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757299#action_12757299 ] Edward Capriolo commented on HIVE-78: - @namit, Yes, I agree/agreed. I was off topic there, describing how we could do it if we wanted to. I will open a separate Jira for that. Upcoming at Hadoop World NYC someone is going to present the new authentication code in Hadoop, I would like to watch that then we(I) might better understand what the long term strategy is for Hadoop. I will split off authentication and authorization into two separate Jira to avoid confusion. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757252#action_12757252 ] Namit Jain commented on HIVE-78: coping a earlier comment from the jira: I agree that authentication and authorization (much of what I have been talking about in this comment), need to be separated out and while we use the directory infrastructure for authentication, we should store the authorization information in the metastore as that is specific to our application and no sane directory administrator would allow us to touch the directory to support custom attributes. I agree with the above - it might be a good idea to not do password handling in hive in the first step - we can add it later if need be. Let us assume that the user has already been authenticated by some external entity, and proceed from there. What do you think ? > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757179#action_12757179 ] Edward Capriolo commented on HIVE-78: - @Min (this may be somewhat mistated but) Hadoop-Core gets the user/group information for a posix user by running shell commands like, WHOAMI, GROUPS, ID, etc. The hive CLI will inherit this information as does HiveServer, HWI. The hive web interface starts as the user sho ran the start script. The first screen on the web interface is a defacto log-in screen. This allows the user to enter their user and group information in text boxes. When HWI starts the session on behalf of the user it runs "SET hadoop.ugi={what user entered in the test box}" at that point if the user initiates a hive job, the output of that job should be files owned by that user. I am pretty sure the code in QL just chown's the files at job end or perhaps the entire job runs as that user (I cant remember). My comment above is just referencing the fact that in some cases Hadoop ACL and our Hive authorization rules would conflict. IE If the files were owned by mzhou. "saying grant delete to * user edward" would not give me privileges to drop files you owned. In that case sections of the HiveServer would have to run as superuser to elevate privileges, but we punted on that issue too. (We are like a football team with bad offense. always punting) (If we were going to tackle password we could do it in this way) I would think if we wanted to enforce strong user/password authentication we could do this {noformat} hive.password.insession hive_password empty for no password checking, if defined this is the session variable to look for password" {noformat} In this way QL would read this value and would not execute any task for the user unless they had run "set hive_password=XYXYXYY" Does that make sense? Session already holds the user. It could hold the password as well. Do you see anything wrong with that approach? I will trim down some of the stuff I have and get upload it for reference > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756951#action_12756951 ] Min Zhou commented on HIVE-78: -- >From the words you commented: {noformat} Daemons like HiveService and HiveWebInterface will have to run as supergroup or a hive group? {noformat} > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756949#action_12756949 ] Min Zhou commented on HIVE-78: -- I do not think the HiveServer in your mind is the same as mine, which support multiple users, not only one. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756936#action_12756936 ] Edward Capriolo commented on HIVE-78: - @Min I would think the code should apply to any client cli, hive server, or HWI. We should probably also provide a configuration variable {noformat} hive.authorize true {noformat} > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756904#action_12756904 ] Min Zhou commented on HIVE-78: -- Let me guess, you are all talking about CLI. But we are using HiveServer as a multi-user server, not just support only one user like mysqld does. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756823#action_12756823 ] Ashish Thusoo commented on HIVE-78: --- @Min I agree with Edwards thought here. We have to foster a collaborative environment and not be dismissive of each others ideas and approaches. Much of the work in the community happens on a volunteer basis and whatever time anyone puts on the project is a bonus and should be respected by all. It does make sense to keep authentication separate from authorization because in most environments there are already directories which deal with the former. Creating yet another store for passwords just leads to an administration nightmare as the account administrators have to create accounts for new users at multiple places. So lets just focus on authorization and let the directory infrastructure deal with authentication. Will look at your patch as well. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756817#action_12756817 ] Edward Capriolo commented on HIVE-78: - @namit, I think, I can explain why AS made sense at the time. My plan was not to decouple users from a rule. See my little patch. {noformat} +struct AccessControl { + 1: list user, + 2: list group, + 3: list database, + 4: list table, + 5: list partition, + 6: list column, + 7: list priv, + 8: stringname +} {noformat} I wanted to be more or less immutable or support really simple syntax. Something like this is doable {noformat} GRANT my_permission to USER3; {noformat} But it seems to imply that users are decoupled from the rule. This is really not true (in my design) a user or group is just another multivalued attribute of the rule. I would like the format to be inter-changable {noformat} ALTER my_permission add db 'db'; ALTER my_permission add table 'db.table'; ALTER my_permission drop table 'db.table'; {noformat} @Min, Above in this Jira see Ashish's comment.. {noformat} I agree, it is best to punt authentication to the authentication systems (LDAP, kerb etc. etc.) and concentrate on authorization (privileges) here. {noformat} The goal here is to trust the User/group information as hadoop does, and create a system that grants/revokes privileges. Authentication and Authorization are two separate things so our Jira is misnamed :) I will review your patch, just to see what you came up with. As I said, you are farther along then I am, and this has been off my radar so I don't mind passing the baton, but Namit is right we have to agree on the syntax because and what we are controlling because down the road it will be an issue. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756662#action_12756662 ] Namit Jain commented on HIVE-78: I think, we should spend some time on finalizing the functionality before implementing it - it is very difficult to change something once it is out, due to all kinds of backward compatibility issues. For the syntax, AS wont it be simpler to add permissions to a role, and then assign roles to a user. GRANT WITH_GRANT,RC, ON '*' TO 'USER1','USER2' AS my_permission ALTER GRANT my_permission add USER 'USER3' Can I revoke some privileges from my_permissions ? If yes, how is it different from doing the two things differently ? CREATE ROLE my_permission AS GRANT WITH_GRANT,RC, ON '*' ; GRANT my_permission to USER1, USER2; later GRANT my_permission to USER3; > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756335#action_12756335 ] Min Zhou commented on HIVE-78: -- @Edward Sorry for my abuse of some words, I hope this will not affect our work. Can you give me the jiras you decided not to store username/password information in hive and hadoop will? I think most companies are using hadoop versions from 0.17 to 0.20 , which don't have good password securities. Once a company takes a particular version, upgrades for them is a very important issue, many companies will adopt a more stable version. Moreover, now hadoop still do not have that feature, which may cost a very long time to implement. Why should we are waiting for, rather than accomplish it? I think Hive is necessary to support user/password at least for current versions of hadoop. There are many companies who are using hive reflected that current hive is inconvenient for multi-user, as long as environment isolation, table sharing, security, etc. We must try to meet the requirements of most of them. Regarding the syntax, I guess we can do it in two steps. # support GRANT/REVOKE privileges to users. # support some sort of server administration privileges as Ashish metioned. The GRANT statement enables system administrators to create Hive user accounts and to grant rights to accounts. To use GRANT, you must have the GRANT OPTION privilege, and you must have the privileges that you are grantingad. The REVOKE statement is related and enables ministrators to remove account privileges. File hive-78-syntax-v1.patch modifies the syntax. Any comments on that? > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756068#action_12756068 ] Edward Capriolo commented on HIVE-78: - Min, First, let me say you have probably come along much farther then me on this issue. Your approach is too strong. Hive is an open-community process. Through it is not very detailed we have loosely agreed on a spec (above), in that spec we have decided not to store username/password information in hive. Rather upstream is still going to be responsible for this information. We also agreed on syntax. You should not throw up a new spec, and some code, and say something along the lines of "We are going to take over and do it this way". Imagine if each jira issue you working on you were 20% to 50% done. And then someone jumped in and said "I already finished it a different way", that would be rather annoying. It would be a "first patch wins" system. First, before you are going to write a line of code you should let someone know your intention to work on it. Otherwise what is the point of having two people work on something where one version gets thrown away? It is a waste, and this would be the second issue this has happened to me. Second even if you want to starting coding it up it has to be what people agreed on. We agreed not to store user/pass (hadoop will be doing this upstream soon), and we agreed on syntax, if you want to reopen that issue you should discuss it before coding it. It has to be good for the community, not just your deployment. So where do we go from here? Do we go back to the design phase and describe all the syntax we want to support? > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755882#action_12755882 ] Min Zhou commented on HIVE-78: -- We currently use seperated mysql dbs for achieving an isolated CLI environment, which is not practical. An authentication infrastructure is urgently needed for us. Almost all statements would be influenced, for example SELECT INSERT SHOW TABLES SHOW PARTITIONS DESCRIBE TABLE MSCK CREATE TABLE CREATE FUNCTION -- we are considering how to control people creating udfs. DROP TABLE DROP FUNCTION LOAD added with GRANT/REVOKE themselft, and CREATE USER/DROP USER/SET PASSWORD. Even includes some non-sql commands like set , add file ,add jar. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-syntax-v1.patch, hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755876#action_12755876 ] Min Zhou commented on HIVE-78: -- we will take over this issue, it would be finished in two weeks. Here are the sql statements will be added: {noformat} CREATE USER, DROP USER; ALTER USER SET PASSOWRD; GRANT; REVOKE {noformat} Metadata is stored at some sort of persistent media such as mysql DBMS through jdo. We will add three tables for this issue, they are USER, DBS_PRIV, TABLES_PRIV. Privileges can be granted at several levels, each table above are corresponding to a privilege level. # Global level Global privileges apply to all databases on a given server. These privileges are stored in the USER table. GRANT ALL ON *.* and REVOKE ALL ON *.* grant and revoke only global privileges. GRANT ALL ON *.* TO 'someuser'; GRANT SELECT, INSERT ON *.* TO 'someuser'; # Database level Database privileges apply to all objects in a given database. These privileges are stored in the DBS_PRIV table. GRANT ALL ON db_name.* and REVOKE ALL ON db_name.* grant and revoke only database privileges. GRANT ALL ON mydb.* TO 'someuser'; GRANT SELECT, INSERT ON mydb.* TO 'someuser'; Although we can't create DBs currently, it would take a reserved place till hive support. # Table level Table privileges apply to all columns in a given table. These privileges are stored in the TABLES_PRIV table. GRANT ALL ON db_name.tbl_name and REVOKE ALL ON db_name.tbl_name grant and revoke only table privileges. GRANT ALL ON mydb.mytbl TO 'someuser'; GRANT SELECT, INSERT ON mydb.mytbl TO 'someuser'; Hive account information is stored in USER table, includes username, password and kinds of privileges. User who has been granted any privilege to, such as select/insert/drop on a particular table, always have a right to show that table. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699313#action_12699313 ] Edward Capriolo commented on HIVE-78: - All those points make sense. >>1. I am not sure what AS is used for. I am thinking AS is the way to name the PermissionSet. Imagine a rule like this: {noformat} GRANT WITH_GRANT,RC, ON '*' TO 'USER1','USER2' AS my_permission {noformat} At some point 'USER3' might become an administrator. It would be nice to issue a command like: {noformat} ALTER GRANT my_permission add USER 'USER3' {noformat} It also makes the grant self documenting. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699306#action_12699306 ] Ashish Thusoo commented on HIVE-78: --- I agree, it is best to punt authentication to the authentication systems (LDAP, kerb etc. etc.) and concentrate on authorization (privileges) here. About the syntax: 1. I am not sure what AS is used for. 2. column level permissions are good but they can perhaps be addressed with views and treating permissions on views as we do for tables. 3. I would add the key word TABLE in the GRANT statement, like mysql because we may have permissions on User defined functions and types in future... so something like.. GRANT SELECT ON TABLE 'cat1' TO 'USER1' 4. Also maybe in the TO clause make the user and group explict - TO USERS a, b, c GROUPS g1, g2 otherwise the reader of the command may not know what is a group and what is a user. I presume this would also make the authorization logic somewhat simpler as you would know exactly what to look for? About the blocker that you mentioned, we should perhaps let the hadoop file permissions be independent of Hive ACLs. Of course you need both to be able to do anything on the table. Can be tricky though.. Will spend a bit more time thinking about this - this looks pretty cool... > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699270#action_12699270 ] Edward Capriolo commented on HIVE-78: - >> 1) What would be the syntax to create user/passwd combos and logging in? username and password would come externally. I notice a hadoop Jira on authenticate via Kerb4 and LDAP. We are best off splitting the authentication and authorization as we spoke of above. user and group are your external posix groups >> 2) Are the permissions stored in metastore are per user or per table or a >> combo? They should be stored in the metastore. a rule like GRANT * on '*' TO '*' AS my_permission would have to be stored everywhere and that would be a PITA. >> 3) Do we really need groups? I don't think MySQL implements groups The group is your posix login group. Allowing groups is a simple way to reduce the number of per user rules. >> 4) Right again. The separation here is we let the authentication system carry all the burden of username, groups and password. The metastore is only concerned with what that user can do inside hive. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699253#action_12699253 ] Prasad Chakka commented on HIVE-78: --- This is great. I have few questions.. 1) What would be the syntax to create user/passwd combos and logging in? 2) Are the permissions stored in metastore are per user or per table or a combo? 3) Do we really need groups? I don't think MySQL implements groups. 4) I am totally naive in authentication systems, but I am assuming only access details are stored in metastore and authentication is done by one of the systems discussed. is that correct? > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699247#action_12699247 ] Edward Capriolo commented on HIVE-78: - GRANT * SELECT * ALTER * INSERT * UPDATE --RESERVED * DROP * CREATE GLOBAL GRANT PERMISSIONS * PROCESS_LIST -List Query * PROCESS_KILL -Kill query * RC - start shutdown * WITH_GRANT - Give user permission to grant other permissions SPECIAL * 'ALL' ALL PERMISSIONS Target Objects: ALL, DataBase, Table, Partition, Column * Permissions are additive * Upper level implies lower level i.e. select on table implies select on all columns in table Suggested Syntax * GRANT WITH_GRANT,RC, ON '*' TO 'USER1','USER2' AS my_permission * GRANT SELECT ON 'cat1','cat2' TO 'USER1' AS my_permission * GRANT SELECT ON 'cat1.*', 'cat2.homes.name' TO 'USER4', '%GROUP1' AS my_permission * GRANT SELECT on 'cat1.*', 'cat2.homes.PARTITION="5.5.4".owner' TO 'USER5' AS my_permission In the metastore we can store the permissions like this: PERMISSION SET { Vector , Vector , Vector , String Name } > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698771#action_12698771 ] Edward Capriolo commented on HIVE-78: - My last comment is a blocker in my mind. How can we implement complex access controls at the Hive level when we have basic file ownership issues at the file level? Daemons like HiveService and HiveWebInterface will have to run as supergroup or a hive group? How is this this effect the CLI that will run as the individual user? These are not as much Hive issues as they are environment/setup issues, but I do not want to assume my environment is the target environment. Will we be assuming users are members of a 'hive' posix group or that all the files in the warehouse are owned by user 'Hive' group 'Hive'? I wanted to get others opinion on this. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698305#action_12698305 ] Min Zhou commented on HIVE-78: -- Is there any further progess on this issue? > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682719#action_12682719 ] Edward Capriolo commented on HIVE-78: - We also have to look at this on the file system level. For example, files in my warehouse are owned by the user who created the table. {quote} /user/hive/warehouse/edward 2008-10-30 17:13 rwxr-xr-x edward supergroup {quote} Regardless of what permissions are granted in the metastore (via this jira), hadoop ACL governs what a user can do to that file. This is not an issue in mysql. In a typical mysql deployment all of the data files are owned by a mysql user. I do not see a clear cut solution for this. In one scenario we make sure all the files in the warehouse are owned RW to all, or owned by a specific user. A component like HiveServer, CLI, or HWI would decide if the user action would succeed based on the meta data. The other option is that an operation like 'GRANT SELECT' would have to physically modify the Hadoop ACL/owner. This method will not help us get the fine grained control we desire. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652252#action_12652252 ] Ashish Thusoo commented on HIVE-78: --- The roles are actually per object. I would say that these are atleast per table, if not per partition. I don't have a use case for the later but seperation on the basis of table is actually very very desirable. Given that, and the fact that currently we have around 5000 tables in our warehouse, do you have some idea of how realms with scale with such a large number of objects. I agree that a generic recursive role infrastructure does not have a lot of utility, but considering that we have so many permissions, I would think that it would be quite cumbersome for an administrator to enumerate all of them for every user that is created (though some good defaults can surely alleviate some of the concerns here). So I think being able to package permissions into some higher level roles would help. Note that we do not need a generic role within a role, but it would be nice to have a role be a set of permissions on certain objects and an ability to allow authorization framework to be able to associate a role or permission with a user. The other way to do this is to define groups which can be assigned a set of permissions and a set of users. That level of indirection would also work in reducing the number of user to permission assignments that we would have to make otherwise. I agree that authentication and authorization (much of what I have been talking about in this comment), need to be separated out and while we use the directory infrastructure for authentication, we should store the authorization information in the metastore as that is specific to our application and no sane directory administrator would allow us to touch the directory to support custom attributes. If we do that separation, then Realms perhaps can take care of just the authentication portion, and once the user is authenticated, the authorization infrastructure looks up the user by ID in metastore to figure out what capabilities the user has. Is that what you have in mind? In this scenario, I presume that we would have a realm for AD and just have all the users authenticate with that realm. So the number of realms would be a function of the number of directories or user repositories as opposed to being a function of the number of objects. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652162#action_12652162 ] Edward Capriolo commented on HIVE-78: - Recursive Role processing is probably not possible with JDBCRealm. Recursive Role processing is generally difficult to implement. N.I.S. Net Groups is an example of this, because of the recursive nature you have a more complicated implementation. Firstly, you have to check for loops in the group definition. Role1 memberOf-> Role2-> memberOf Role3-> memberOf ->Role1. This needs to be done when the rule is created, or evaluated, or both. I have found (in my experience) dynamic/recursive groups are are less practical then they originally seem. They do have merit however. The roles you mentioned were: * SELECT * INSERT * ALTER TABLE * CREATE * DROP * KILL SESSION(QUERY) * SHUTDOWN * STARTUP * VIEW SESSIONS IMPORTANT: Are roles global or per object? Realms really only make sense with global permissions. Lets look at a scenario: * Hive ** tableA ** tableB ** tableC * Users ** john *** uid 3000 *** gid 3000,4000 ** bob *** uid 3001 *** gid 3001,4000 * Groups ** john *** gid 3000 ** bob *** gid 3001 ** hr *** gid 4000 Goal to implement root has full access to all tables, john has access to table a, and bob has access to table b. tablec can be read by anyone in hr * Realms ** tableA_select *** root *** john ** tableA_insert *** root *** john ** tableB_select *** root *** bob ** tableB_insert *** root *** bob ** tableC_select *** root *** bob *** john Using '_' as a delimiter and constructing several roles per table is a slightly non standard for realms, but it would work. User lists are flat. About these permissions: * SELECT * INSERT * ALTER TABLE * CREATE * DROP If an external table was created. If my UID has access to the file through HDFS I would expect to have select access inside Hive. If I could not write the file in HDFS hive would not expect hive to give me these permissions. I think we should clearly define the difference between AUTHENTICATION and ACCESS. For example, the AUTHENTICATION information for a user is commonly stored in Active Directory. However ACCESS information like, what tables a user may run SELECT on can not be stored in Active Directory without changing the Active Directory schema. Realm or JAAS gives us a quick way to answer the authorization question. As to the ACCESS we either have to store that information in the meta store or an external system. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652091#action_12652091 ] Ashish Thusoo commented on HIVE-78: --- On the first looks Realms seems to be a nice fit for this problem. One capability that is missing there and which may become an issue later is the ability to compose roles into higher level roles. To me it seems that roles are strictly flat and are not hierarchical, so I cannot create an admin role that has the basic roles within it . Can this be achieved with Realms? I have not used it before so I am not sure that if it is achievable? The other issue that I can think of is whether Realms is generic enough to protect any kind of a resource and not just limited to web resourrces. We have tables and partitions, servers etc. Could you elaborate on how this would work for the capabilities that I listed in my previous comments. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652083#action_12652083 ] Edward Capriolo commented on HIVE-78: - I would like to leverage the 'REALM' has already been done with tomcat. This would give us the ability to plug into many standard authentication architectures. http://tomcat.apache.org/tomcat-4.1-doc/catalina/docs/api/org/apache/catalina/realm/package-tree.html If we including a jar file in a binary format from tomcat should it be part of the patch or should we fork some of the tomcat source? We should have not have to alter the original code we will be using it directly or extending it. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650407#action_12650407 ] Ashish Thusoo commented on HIVE-78: --- For Active Directory I think JNDI will work as long as we work off GSSAPI - so I think Kerb V should work with JNDI. However, the traditional authentication mechanisms of NTLM and NTLMv2, I think those will not work with AD as they are proprietary protocols and the only public domain implementations of those are present in Samba. They are mostly an issue for old machines and old directory installations. We may as well do JNDI for now and then address these later. Will check out JDBCRealm, I have not used those in the past. For query side roles we could just model those on mysql privileges. Some of the basic ones include: - SELECT - INSERT - ALTER TABLE - CREATE - DROP And on the server administration side, things like: - KILL SESSION(QUERY) - SHUTDOWN - STARTUP - VIEW SESSIONS are useful... We could role these privileges up into role objects so essentially your hiveuser role would become SELECT, INSERT, CREATE while hiveadmin would become KILL SESSION, SHUTDOWN, STARTUP, VIEW SESSIONS, DROP, ALTER + whatever is in hiveusers > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Ashish Thusoo >Assignee: Ashish Thusoo > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650289#action_12650289 ] Edward Capriolo commented on HIVE-78: - I wanted to mention one more solution. JDBCRealm. This is pretty well established in tomcat. It should be easy to retrofit. It has support for roles. Password file is a good solution as well. Q. Active Directory is an LDAP at its core. What is a case that you need samba to get at data in LDAP? It seems like we should be able to support active directory and LDAP using JNDI-- http://forums.sun.com/thread.jspa?threadID=581425 I was thinking about 'roles'. hiveuser - Can issue queries kill their own queries , hiveadmin - can kill users queries > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Ashish Thusoo >Assignee: Ashish Thusoo > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650271#action_12650271 ] Ashish Thusoo commented on HIVE-78: --- +1 on this. I also wanted to integrate this with AD through kerberos as that is perhaps the dominant user repositories in most enterprises and at least internally we have some users that do not have unix accounts (mostly analysts). We could use samba to provide the bridge to AD as there are certain nuances when it comes to Kerberos with AD as well as NTLM and NTLMv2 auths that samba has already solved. Also we should also think of providing integration with unix accounts - those maintained in passwd db specially for folks who want to just test authentication specific features. In the past the most dominant directories that I have found in enterprise environments as AD (can be bridged through LDAP and Samba), Sun Java One, Novell and OID (all LDAP directories) and Unix accounts. Thoughts? > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Ashish Thusoo >Assignee: Ashish Thusoo > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649855#action_12649855 ] Edward Capriolo commented on HIVE-78: - LDAP seems like a good way to handle this. We have a few alternatives. Any posixAccount can log into hive. LDAP search would be (&(objectClass=posixAccount (uid=)) We could enforce that the user must be have some other attribute (&(objectClass)=posixAccount (uid=)(businessCategory="hiveuser")) We could enforce that the user must be valid and they must be inside of a specific groupOfUniqueNames (&(objectClass=posixAccount (uid=) and memberof (hiveGroup) apache mod_ldap can do this We can create a supplemental schema attribute we can append to already exists ldap users. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Ashish Thusoo >Assignee: Ashish Thusoo > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.