Yes, when creating a Hive Context a Hive Metastore client should be created with a user that the Spark application will talk to the *remote* Hive Metastore with. We would like to add a custom authorization plugin to our remote Hive Metastore to authorize the query requests that the spark application is submitting which would also add authorization for any other applications hitting the Hive Metastore. Furthermore we would like to extend this so that we can submit "jobs" to our Spark application that will allow us to run against the metastore as different users while leveraging the abilities of our spark cluster. But as you mentioned only one login connects to the Hive Metastore is shared among all HiveContext sessions.

Likely the authentication would have to be completed either through a secured Hive Metastore (Kerberos) or by having the requests go through HiveServer2.

--Alex

On 3/8/2016 3:13 PM, Mich Talebzadeh wrote:
Hi,

What do you mean by Hive Metastore Client? Are you referring to Hive server login much like beeline?

Spark uses hive-site.xml to get the details of Hive metastore and the login to the metastore which could be any database. Mine is Oracle and as far as I know even in Hive 2, hive-site.xml has an entry for javax.jdo.option.ConnectionUserName that specifies username to use against metastore database. These are all multi-threaded JDBC connections to the database, the same login as shown below:

LOGIN SID/serial# LOGGED IN S HOST OS PID Client PID PROGRAM MEM/KB Logical I/O Physical I/O ACT -------- ----------- ----------- ---------- -------------- -------------- --------------- ------------ ---------------- ------------ ---
INFO
-------
HIVEUSER 67,6160 08/03 08:11 rhes564 oracle/20539 hduser/1234 JDBC Thin Clien 1,017 37 0 N HIVEUSER 89,6421 08/03 08:11 rhes564 oracle/20541 hduser/1234 JDBC Thin Clien 1,081 528 0 N HIVEUSER 112,561 08/03 10:45 rhes564 oracle/24624 hduser/1234 JDBC Thin Clien 889 37 0 N HIVEUSER 131,8811 08/03 08:11 rhes564 oracle/20543 hduser/1234 JDBC Thin Clien 1,017 37 0 N HIVEUSER 47,30114 08/03 10:45 rhes564 oracle/24626 hduser/1234 JDBC Thin Clien 1,017 37 0 N HIVEUSER 170,8955 08/03 08:11 rhes564 oracle/20545 hduser/1234 JDBC Thin Clien 1,017 323 0 N

As I understand what you are suggesting is that each Spark user uses different login to connect to Hive metastore. As of now there is only one login that connects to Hive metastore shared among all

2016-03-08T23:08:01,890 INFO [pool-5-thread-72]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser ip=50.140.197.217 cmd=source:50.140.197.217 get_table : db=test tbl=t 2016-03-08T23:18:10,432 INFO [pool-5-thread-81]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser ip=50.140.197.216 cmd=source:50.140.197.216 get_tables: db=asehadoop pat=.*

And this is an entry in Hive log when connection is made theough Zeppelin UI

2016-03-08T23:20:13,546 INFO [pool-5-thread-84]: metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(499)) - 84: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore 2016-03-08T23:20:13,547 INFO [pool-5-thread-84]: metastore.ObjectStore (ObjectStore.java:initialize(318)) - ObjectStore, initialize called 2016-03-08T23:20:13,550 INFO [pool-5-thread-84]: metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:<init>(142)) - Using direct SQL, underlying DB is ORACLE 2016-03-08T23:20:13,550 INFO [pool-5-thread-84]: metastore.ObjectStore (ObjectStore.java:setConf(301)) - Initialized ObjectStore

I am not sure there is currently such plan to have different logins allowed to Hive Metastore. But it will add another level of security. Though I am not sure how this would be authenticated.

HTH



Dr Mich Talebzadeh

LinkedIn /https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>


On 8 March 2016 at 22:23, Alex F <this.side.of.confus...@gmail.com <mailto:this.side.of.confus...@gmail.com>> wrote:

    As of Spark 1.6.0 it is now possible to create new Hive Context
    sessions sharing various components but right now the Hive
    Metastore Client is shared amongst each new Hive Context Session.

    Are there any plans to create individual Metastore Clients for
    each Hive Context?

    Related to the question above are there any plans to create an
    interface for customizing the username that the Metastore Client
    uses to connect to the Hive Metastore? Right now it either uses
    the user specified in an environment variable or the application's
    process owner.



Reply via email to