I agree it is a useful layer and during my investigations in to
individual user connections from a spark application I was running some
tests with HiveServer2 and using Beeline I was able to authenticate the
users passed in correctly but when it came down to authorizing the
queries on the metastore they were all using the initial user connection
that HiveServer2 had made with the Hive Metastore.
It is my intention that should we get access to the Hive Metastore
Client and its configuration through the Hive Context that we could
create new HiveContext sessions each with their own connections to the
Hive Metastore and have the authorization for the query be completed on
the Metastore itself and we would handle the authentication of the users
acting as the second tier.
It sounds like this functionality is not likely to be implemented any
time soon though so we will have to find a solution in the meantime.
Thanks,
Alex
On 3/8/2016 4:00 PM, Mich Talebzadeh wrote:
The current scenario resembles a three tier architecture but without
the security of second tier. In a typical three-tier you have users
connecting to the application server (read Hive server2)
are independently authenticated and if OK, the second tier creates new
,NET type or JDBC threads to connect to database much like
multi-threading. The problem I believe is that Hive server 2 does not
have that concept of handling the individual loggings yet. Hive server
2 should be able to handle LDAP logins as well. It is a useful layer
to have.
Dr Mich Talebzadeh
LinkedIn
/https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/
http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
On 8 March 2016 at 23:28, Alex <this.side.of.confus...@gmail.com
<mailto:this.side.of.confus...@gmail.com>> wrote:
Yes, when creating a Hive Context a Hive Metastore client should
be created with a user that the Spark application will talk to the
*remote* Hive Metastore with. We would like to add a custom
authorization plugin to our remote Hive Metastore to authorize the
query requests that the spark application is submitting which
would also add authorization for any other applications hitting
the Hive Metastore. Furthermore we would like to extend this so
that we can submit "jobs" to our Spark application that will allow
us to run against the metastore as different users while
leveraging the abilities of our spark cluster. But as you
mentioned only one login connects to the Hive Metastore is shared
among all HiveContext sessions.
Likely the authentication would have to be completed either
through a secured Hive Metastore (Kerberos) or by having the
requests go through HiveServer2.
--Alex
On 3/8/2016 3:13 PM, Mich Talebzadeh wrote:
Hi,
What do you mean by Hive Metastore Client? Are you referring to
Hive server login much like beeline?
Spark uses hive-site.xml to get the details of Hive metastore and
the login to the metastore which could be any database. Mine is
Oracle and as far as I know even in Hive 2, hive-site.xml has an
entry for javax.jdo.option.ConnectionUserName that specifies
username to use against metastore database. These are all
multi-threaded JDBC connections to the database, the same login
as shown below:
LOGIN SID/serial# LOGGED IN S HOST OS PID Client
PID PROGRAM MEM/KB Logical I/O Physical I/O ACT
-------- ----------- ----------- ---------- --------------
-------------- --------------- ------------ ----------------
------------ ---
INFO
-------
HIVEUSER 67,6160 08/03 08:11 rhes564 oracle/20539
hduser/1234 JDBC Thin Clien 1,017 37 0 N
HIVEUSER 89,6421 08/03 08:11 rhes564 oracle/20541
hduser/1234 JDBC Thin Clien 1,081 528 0 N
HIVEUSER 112,561 08/03 10:45 rhes564 oracle/24624
hduser/1234 JDBC Thin Clien 889 37 0 N
HIVEUSER 131,8811 08/03 08:11 rhes564 oracle/20543
hduser/1234 JDBC Thin Clien 1,017 37 0 N
HIVEUSER 47,30114 08/03 10:45 rhes564 oracle/24626
hduser/1234 JDBC Thin Clien 1,017 37 0 N
HIVEUSER 170,8955 08/03 08:11 rhes564 oracle/20545
hduser/1234 JDBC Thin Clien 1,017 323 0 N
As I understand what you are suggesting is that each Spark user
uses different login to connect to Hive metastore. As of now
there is only one login that connects to Hive metastore shared
among all
2016-03-08T23:08:01,890 INFO [pool-5-thread-72]:
HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) -
ugi=hduser ip=50.140.197.217 cmd=source:50.140.197.217
get_table : db=test tbl=t
2016-03-08T23:18:10,432 INFO [pool-5-thread-81]:
HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) -
ugi=hduser ip=50.140.197.216 cmd=source:50.140.197.216
get_tables: db=asehadoop pat=.*
And this is an entry in Hive log when connection is made theough
Zeppelin UI
2016-03-08T23:20:13,546 INFO [pool-5-thread-84]:
metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(499)) -
84: Opening raw store with implementation
class:org.apache.hadoop.hive.metastore.ObjectStore
2016-03-08T23:20:13,547 INFO [pool-5-thread-84]:
metastore.ObjectStore (ObjectStore.java:initialize(318)) -
ObjectStore, initialize called
2016-03-08T23:20:13,550 INFO [pool-5-thread-84]:
metastore.MetaStoreDirectSql
(MetaStoreDirectSql.java:<init>(142)) - Using direct SQL,
underlying DB is ORACLE
2016-03-08T23:20:13,550 INFO [pool-5-thread-84]:
metastore.ObjectStore (ObjectStore.java:setConf(301)) -
Initialized ObjectStore
I am not sure there is currently such plan to have different
logins allowed to Hive Metastore. But it will add another level
of security. Though I am not sure how this would be authenticated.
HTH
Dr Mich Talebzadeh
LinkedIn
/https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/
http://talebzadehmich.wordpress.com
On 8 March 2016 at 22:23, Alex F
<this.side.of.confus...@gmail.com
<mailto:this.side.of.confus...@gmail.com>> wrote:
As of Spark 1.6.0 it is now possible to create new Hive
Context sessions sharing various components but right now the
Hive Metastore Client is shared amongst each new Hive Context
Session.
Are there any plans to create individual Metastore Clients
for each Hive Context?
Related to the question above are there any plans to create
an interface for customizing the username that the Metastore
Client uses to connect to the Hive Metastore? Right now it
either uses the user specified in an environment variable or
the application's process owner.