Apache Knox for authentication makes sense. For Hive authorization there are 
tools such as Apache ranger or Sentry, which themselves can connect via LDAP.

> On 09 Mar 2016, at 16:58, Alan Gates <alanfga...@gmail.com> wrote:
> 
> One way people have gotten around the lack of LDAP connectivity in HS2 has 
> been to use Apache Knox.  That project’s goal is to provide a single login 
> capability for Hadoop related projects so that users can tie their LDAP or 
> Active Directory servers into Hadoop.
> 
> Alan.
> 
>> On Mar 8, 2016, at 16:00, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
>> 
>> The current scenario resembles a three tier architecture but without the 
>> security of second tier. In a typical three-tier you have users connecting 
>> to the application server (read Hive server2) are independently 
>> authenticated and if OK, the second tier creates new ,NET type or JDBC 
>> threads to connect to database much like multi-threading. The problem I 
>> believe is that Hive server 2 does not have that concept of handling the 
>> individual loggings yet. Hive server 2 should be able to handle LDAP logins 
>> as well. It is a useful layer to have.
>> 
>> Dr Mich Talebzadeh
>> 
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> 
>> http://talebzadehmich.wordpress.com
>> 
>> 
>> On 8 March 2016 at 23:28, Alex <this.side.of.confus...@gmail.com> wrote:
>> Yes, when creating a Hive Context a Hive Metastore client should be created 
>> with a user that the Spark application will talk to the *remote* Hive 
>> Metastore with. We would like to add a custom authorization plugin to our 
>> remote Hive Metastore to authorize the query requests that the spark 
>> application is submitting which would also add authorization for any other 
>> applications hitting the Hive Metastore. Furthermore we would like to extend 
>> this so that we can submit "jobs" to our Spark application that will allow 
>> us to run against the metastore as different users while leveraging the 
>> abilities of our spark cluster. But as you mentioned only one login connects 
>> to the Hive Metastore is shared among all HiveContext sessions.
>> 
>> Likely the authentication would have to be completed either through a 
>> secured Hive Metastore (Kerberos) or by having the requests go through 
>> HiveServer2.
>> 
>> --Alex
>> 
>> 
>>> On 3/8/2016 3:13 PM, Mich Talebzadeh wrote:
>>> Hi,
>>> 
>>> What do you mean by Hive Metastore Client? Are you referring to Hive server 
>>> login much like beeline?
>>> 
>>> Spark uses hive-site.xml to get the details of Hive metastore and the login 
>>> to the metastore which could be any database. Mine is Oracle and as far as 
>>> I know even in  Hive 2, hive-site.xml has an entry for 
>>> javax.jdo.option.ConnectionUserName that specifies username to use against 
>>> metastore database. These are all multi-threaded JDBC connections to the 
>>> database, the same login as shown below:
>>> 
>>> LOGIN    SID/serial# LOGGED IN S HOST       OS PID         Client PID     
>>> PROGRAM               MEM/KB      Logical I/O Physical I/O ACT
>>> -------- ----------- ----------- ---------- -------------- -------------- 
>>> --------------- ------------ ---------------- ------------ ---
>>> INFO
>>> -------
>>> HIVEUSER 67,6160     08/03 08:11 rhes564    oracle/20539   hduser/1234    
>>> JDBC Thin Clien        1,017               37            0 N
>>> HIVEUSER 89,6421     08/03 08:11 rhes564    oracle/20541   hduser/1234    
>>> JDBC Thin Clien        1,081              528            0 N
>>> HIVEUSER 112,561     08/03 10:45 rhes564    oracle/24624   hduser/1234    
>>> JDBC Thin Clien          889               37            0 N
>>> HIVEUSER 131,8811    08/03 08:11 rhes564    oracle/20543   hduser/1234    
>>> JDBC Thin Clien        1,017               37            0 N
>>> HIVEUSER 47,30114    08/03 10:45 rhes564    oracle/24626   hduser/1234    
>>> JDBC Thin Clien        1,017               37            0 N
>>> HIVEUSER 170,8955    08/03 08:11 rhes564    oracle/20545   hduser/1234    
>>> JDBC Thin Clien        1,017              323            0 N
>>> 
>>> As I understand what you are suggesting is that each Spark user uses 
>>> different login to connect to Hive metastore. As of now there is only one 
>>> login that connects to Hive metastore shared among all
>>> 
>>> 2016-03-08T23:08:01,890 INFO  [pool-5-thread-72]: HiveMetaStore.audit 
>>> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser      ip=50.140.197.217 
>>>       cmd=source:50.140.197.217 get_table : db=test tbl=t
>>> 2016-03-08T23:18:10,432 INFO  [pool-5-thread-81]: HiveMetaStore.audit 
>>> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser      ip=50.140.197.216 
>>>       cmd=source:50.140.197.216 get_tables: db=asehadoop pat=.*
>>> 
>>> And this is an entry in Hive log when connection is made theough Zeppelin UI
>>> 
>>> 2016-03-08T23:20:13,546 INFO  [pool-5-thread-84]: metastore.HiveMetaStore 
>>> (HiveMetaStore.java:newRawStore(499)) - 84: Opening raw store with 
>>> implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>> 2016-03-08T23:20:13,547 INFO  [pool-5-thread-84]: metastore.ObjectStore 
>>> (ObjectStore.java:initialize(318)) - ObjectStore, initialize called
>>> 2016-03-08T23:20:13,550 INFO  [pool-5-thread-84]: 
>>> metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:<init>(142)) - Using 
>>> direct SQL, underlying DB is ORACLE
>>> 2016-03-08T23:20:13,550 INFO  [pool-5-thread-84]: metastore.ObjectStore 
>>> (ObjectStore.java:setConf(301)) - Initialized ObjectStore
>>> 
>>> I am not sure there is currently such plan to have different logins allowed 
>>> to Hive Metastore. But it will add another level of security. Though I am 
>>> not sure how this would be authenticated.
>>> 
>>> HTH
>>> 
>>> 
>>> 
>>> Dr Mich Talebzadeh
>>> 
>>> LinkedIn  
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> 
>>> http://talebzadehmich.wordpress.com
>>> 
>>> 
>>> On 8 March 2016 at 22:23, Alex F <this.side.of.confus...@gmail.com> wrote:
>>> As of Spark 1.6.0 it is now possible to create new Hive Context sessions 
>>> sharing various components but right now the Hive Metastore Client is 
>>> shared amongst each new Hive Context Session.
>>> 
>>> Are there any plans to create individual Metastore Clients for each Hive 
>>> Context?
>>> 
>>> Related to the question above are there any plans to create an interface 
>>> for customizing the username that the Metastore Client uses to connect to 
>>> the Hive Metastore? Right now it either uses the user specified in an 
>>> environment variable or the application's process owner.
> 

Reply via email to