Apache Knox for authentication makes sense. For Hive authorization there are tools such as Apache ranger or Sentry, which themselves can connect via LDAP.
> On 09 Mar 2016, at 16:58, Alan Gates <alanfga...@gmail.com> wrote: > > One way people have gotten around the lack of LDAP connectivity in HS2 has > been to use Apache Knox. That project’s goal is to provide a single login > capability for Hadoop related projects so that users can tie their LDAP or > Active Directory servers into Hadoop. > > Alan. > >> On Mar 8, 2016, at 16:00, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: >> >> The current scenario resembles a three tier architecture but without the >> security of second tier. In a typical three-tier you have users connecting >> to the application server (read Hive server2) are independently >> authenticated and if OK, the second tier creates new ,NET type or JDBC >> threads to connect to database much like multi-threading. The problem I >> believe is that Hive server 2 does not have that concept of handling the >> individual loggings yet. Hive server 2 should be able to handle LDAP logins >> as well. It is a useful layer to have. >> >> Dr Mich Talebzadeh >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> http://talebzadehmich.wordpress.com >> >> >> On 8 March 2016 at 23:28, Alex <this.side.of.confus...@gmail.com> wrote: >> Yes, when creating a Hive Context a Hive Metastore client should be created >> with a user that the Spark application will talk to the *remote* Hive >> Metastore with. We would like to add a custom authorization plugin to our >> remote Hive Metastore to authorize the query requests that the spark >> application is submitting which would also add authorization for any other >> applications hitting the Hive Metastore. Furthermore we would like to extend >> this so that we can submit "jobs" to our Spark application that will allow >> us to run against the metastore as different users while leveraging the >> abilities of our spark cluster. But as you mentioned only one login connects >> to the Hive Metastore is shared among all HiveContext sessions. >> >> Likely the authentication would have to be completed either through a >> secured Hive Metastore (Kerberos) or by having the requests go through >> HiveServer2. >> >> --Alex >> >> >>> On 3/8/2016 3:13 PM, Mich Talebzadeh wrote: >>> Hi, >>> >>> What do you mean by Hive Metastore Client? Are you referring to Hive server >>> login much like beeline? >>> >>> Spark uses hive-site.xml to get the details of Hive metastore and the login >>> to the metastore which could be any database. Mine is Oracle and as far as >>> I know even in Hive 2, hive-site.xml has an entry for >>> javax.jdo.option.ConnectionUserName that specifies username to use against >>> metastore database. These are all multi-threaded JDBC connections to the >>> database, the same login as shown below: >>> >>> LOGIN SID/serial# LOGGED IN S HOST OS PID Client PID >>> PROGRAM MEM/KB Logical I/O Physical I/O ACT >>> -------- ----------- ----------- ---------- -------------- -------------- >>> --------------- ------------ ---------------- ------------ --- >>> INFO >>> ------- >>> HIVEUSER 67,6160 08/03 08:11 rhes564 oracle/20539 hduser/1234 >>> JDBC Thin Clien 1,017 37 0 N >>> HIVEUSER 89,6421 08/03 08:11 rhes564 oracle/20541 hduser/1234 >>> JDBC Thin Clien 1,081 528 0 N >>> HIVEUSER 112,561 08/03 10:45 rhes564 oracle/24624 hduser/1234 >>> JDBC Thin Clien 889 37 0 N >>> HIVEUSER 131,8811 08/03 08:11 rhes564 oracle/20543 hduser/1234 >>> JDBC Thin Clien 1,017 37 0 N >>> HIVEUSER 47,30114 08/03 10:45 rhes564 oracle/24626 hduser/1234 >>> JDBC Thin Clien 1,017 37 0 N >>> HIVEUSER 170,8955 08/03 08:11 rhes564 oracle/20545 hduser/1234 >>> JDBC Thin Clien 1,017 323 0 N >>> >>> As I understand what you are suggesting is that each Spark user uses >>> different login to connect to Hive metastore. As of now there is only one >>> login that connects to Hive metastore shared among all >>> >>> 2016-03-08T23:08:01,890 INFO [pool-5-thread-72]: HiveMetaStore.audit >>> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser ip=50.140.197.217 >>> cmd=source:50.140.197.217 get_table : db=test tbl=t >>> 2016-03-08T23:18:10,432 INFO [pool-5-thread-81]: HiveMetaStore.audit >>> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser ip=50.140.197.216 >>> cmd=source:50.140.197.216 get_tables: db=asehadoop pat=.* >>> >>> And this is an entry in Hive log when connection is made theough Zeppelin UI >>> >>> 2016-03-08T23:20:13,546 INFO [pool-5-thread-84]: metastore.HiveMetaStore >>> (HiveMetaStore.java:newRawStore(499)) - 84: Opening raw store with >>> implementation class:org.apache.hadoop.hive.metastore.ObjectStore >>> 2016-03-08T23:20:13,547 INFO [pool-5-thread-84]: metastore.ObjectStore >>> (ObjectStore.java:initialize(318)) - ObjectStore, initialize called >>> 2016-03-08T23:20:13,550 INFO [pool-5-thread-84]: >>> metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:<init>(142)) - Using >>> direct SQL, underlying DB is ORACLE >>> 2016-03-08T23:20:13,550 INFO [pool-5-thread-84]: metastore.ObjectStore >>> (ObjectStore.java:setConf(301)) - Initialized ObjectStore >>> >>> I am not sure there is currently such plan to have different logins allowed >>> to Hive Metastore. But it will add another level of security. Though I am >>> not sure how this would be authenticated. >>> >>> HTH >>> >>> >>> >>> Dr Mich Talebzadeh >>> >>> LinkedIn >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> On 8 March 2016 at 22:23, Alex F <this.side.of.confus...@gmail.com> wrote: >>> As of Spark 1.6.0 it is now possible to create new Hive Context sessions >>> sharing various components but right now the Hive Metastore Client is >>> shared amongst each new Hive Context Session. >>> >>> Are there any plans to create individual Metastore Clients for each Hive >>> Context? >>> >>> Related to the question above are there any plans to create an interface >>> for customizing the username that the Metastore Client uses to connect to >>> the Hive Metastore? Right now it either uses the user specified in an >>> environment variable or the application's process owner. >