Re: Hive Context: Hive Metastore Client

2016-03-08 Thread Alex
I agree it is a useful layer and during my investigations in to 
individual user connections from a spark application I was running some 
tests with HiveServer2 and using Beeline I was able to authenticate the 
users passed in correctly but when it came down to authorizing the 
queries on the metastore they were all using the initial user connection 
that HiveServer2 had made with the Hive Metastore.


It is my intention that should we get access to the Hive Metastore 
Client and its configuration through the Hive Context that we could 
create new HiveContext sessions each with their own connections to the 
Hive Metastore and have the authorization for the query be completed on 
the Metastore itself and we would handle the authentication of the users 
acting as the second tier.


It sounds like this functionality is not likely to be implemented any 
time soon though so we will have to find a solution in the meantime.


Thanks,
Alex

On 3/8/2016 4:00 PM, Mich Talebzadeh wrote:
The current scenario resembles a three tier architecture but without 
the security of second tier. In a typical three-tier you have users 
connecting to the application server (read Hive server2) 
are independently authenticated and if OK, the second tier creates new 
,NET type or JDBC threads to connect to database much like 
multi-threading. The problem I believe is that Hive server 2 does not 
have that concept of handling the individual loggings yet. Hive server 
2 should be able to handle LDAP logins as well. It is a useful layer 
to have.


Dr Mich Talebzadeh

LinkedIn 
/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/


http://talebzadehmich.wordpress.com 


On 8 March 2016 at 23:28, Alex > wrote:


Yes, when creating a Hive Context a Hive Metastore client should
be created with a user that the Spark application will talk to the
*remote* Hive Metastore with. We would like to add a custom
authorization plugin to our remote Hive Metastore to authorize the
query requests that the spark application is submitting which
would also add authorization for any other applications hitting
the Hive Metastore. Furthermore we would like to extend this so
that we can submit "jobs" to our Spark application that will allow
us to run against the metastore as different users while
leveraging the abilities of our spark cluster. But as you
mentioned only one login connects to the Hive Metastore is shared
among all HiveContext sessions.

Likely the authentication would have to be completed either
through a secured Hive Metastore (Kerberos) or by having the
requests go through HiveServer2.

--Alex


On 3/8/2016 3:13 PM, Mich Talebzadeh wrote:

Hi,

What do you mean by Hive Metastore Client? Are you referring to
Hive server login much like beeline?

Spark uses hive-site.xml to get the details of Hive metastore and
the login to the metastore which could be any database. Mine is
Oracle and as far as I know even in  Hive 2, hive-site.xml has an
entry for javax.jdo.option.ConnectionUserName that specifies
username to use against metastore database. These are all
multi-threaded JDBC connections to the database, the same login
as shown below:

LOGIN SID/serial# LOGGED IN S HOST   OS PID Client
PID PROGRAM   MEM/KB  Logical I/O Physical I/O ACT
 --- --- -- --
-- ---  
 ---
INFO
---
HIVEUSER 67,6160 08/03 08:11 rhes564 oracle/20539  
hduser/1234JDBC Thin Clien1,017   37 0 N
HIVEUSER 89,6421 08/03 08:11 rhes564 oracle/20541  
hduser/1234JDBC Thin Clien1,081  528 0 N
HIVEUSER 112,561 08/03 10:45 rhes564 oracle/24624  
hduser/1234JDBC Thin Clien  889   37 0 N
HIVEUSER 131,881108/03 08:11 rhes564 oracle/20543  
hduser/1234JDBC Thin Clien1,017   37 0 N
HIVEUSER 47,3011408/03 10:45 rhes564 oracle/24626  
hduser/1234JDBC Thin Clien1,017   37 0 N
HIVEUSER 170,895508/03 08:11 rhes564 oracle/20545  
hduser/1234JDBC Thin Clien1,017  323 0 N


As I understand what you are suggesting is that each Spark user
uses different login to connect to Hive metastore. As of now
there is only one login that connects to Hive metastore shared
among all

2016-03-08T23:08:01,890 INFO  [pool-5-thread-72]:
HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) -
ugi=hduser  ip=50.140.197.217 cmd=source:50.140.197.217
get_table : db=test tbl=t
2016-03-08T23:18:10,432 INFO  [pool-5-thread-81]:
HiveMetaStore.audit 

Re: Hive Context: Hive Metastore Client

2016-03-08 Thread Mich Talebzadeh
The current scenario resembles a three tier architecture but without the
security of second tier. In a typical three-tier you have users connecting
to the application server (read Hive server2) are independently
authenticated and if OK, the second tier creates new ,NET type or JDBC
threads to connect to database much like multi-threading. The problem I
believe is that Hive server 2 does not have that concept of handling the
individual loggings yet. Hive server 2 should be able to handle LDAP logins
as well. It is a useful layer to have.

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 8 March 2016 at 23:28, Alex  wrote:

> Yes, when creating a Hive Context a Hive Metastore client should be
> created with a user that the Spark application will talk to the *remote*
> Hive Metastore with. We would like to add a custom authorization plugin to
> our remote Hive Metastore to authorize the query requests that the spark
> application is submitting which would also add authorization for any other
> applications hitting the Hive Metastore. Furthermore we would like to
> extend this so that we can submit "jobs" to our Spark application that will
> allow us to run against the metastore as different users while leveraging
> the abilities of our spark cluster. But as you mentioned only one login
> connects to the Hive Metastore is shared among all HiveContext sessions.
>
> Likely the authentication would have to be completed either through a
> secured Hive Metastore (Kerberos) or by having the requests go through
> HiveServer2.
>
> --Alex
>
>
> On 3/8/2016 3:13 PM, Mich Talebzadeh wrote:
>
> Hi,
>
> What do you mean by Hive Metastore Client? Are you referring to Hive
> server login much like beeline?
>
> Spark uses hive-site.xml to get the details of Hive metastore and the
> login to the metastore which could be any database. Mine is Oracle and as
> far as I know even in  Hive 2, hive-site.xml has an entry for
> javax.jdo.option.ConnectionUserName that specifies username to use against
> metastore database. These are all multi-threaded JDBC connections to the
> database, the same login as shown below:
>
> LOGINSID/serial# LOGGED IN S HOST   OS PID Client PID
> PROGRAM   MEM/KB  Logical I/O Physical I/O ACT
>  --- --- -- -- --
> ---    ---
> INFO
> ---
> HIVEUSER 67,6160 08/03 08:11 rhes564oracle/20539   hduser/1234
> JDBC Thin Clien1,017   370 N
> HIVEUSER 89,6421 08/03 08:11 rhes564oracle/20541   hduser/1234
> JDBC Thin Clien1,081  5280 N
> HIVEUSER 112,561 08/03 10:45 rhes564oracle/24624   hduser/1234
> JDBC Thin Clien  889   370 N
> HIVEUSER 131,881108/03 08:11 rhes564oracle/20543   hduser/1234
> JDBC Thin Clien1,017   370 N
> HIVEUSER 47,3011408/03 10:45 rhes564oracle/24626   hduser/1234
> JDBC Thin Clien1,017   370 N
> HIVEUSER 170,895508/03 08:11 rhes564oracle/20545   hduser/1234
> JDBC Thin Clien1,017  3230 N
>
> As I understand what you are suggesting is that each Spark user uses
> different login to connect to Hive metastore. As of now there is only one
> login that connects to Hive metastore shared among all
>
> 2016-03-08T23:08:01,890 INFO  [pool-5-thread-72]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.217   cmd=source:50.140.197.217 get_table : db=test tbl=t
> 2016-03-08T23:18:10,432 INFO  [pool-5-thread-81]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.216   cmd=source:50.140.197.216 get_tables: db=asehadoop
> pat=.*
>
> And this is an entry in Hive log when connection is made theough Zeppelin
> UI
>
> 2016-03-08T23:20:13,546 INFO  [pool-5-thread-84]: metastore.HiveMetaStore
> (HiveMetaStore.java:newRawStore(499)) - 84: Opening raw store with
> implementation class:org.apache.hadoop.hive.metastore.ObjectStore
> 2016-03-08T23:20:13,547 INFO  [pool-5-thread-84]: metastore.ObjectStore
> (ObjectStore.java:initialize(318)) - ObjectStore, initialize called
> 2016-03-08T23:20:13,550 INFO  [pool-5-thread-84]:
> metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:(142)) - Using
> direct SQL, underlying DB is ORACLE
> 2016-03-08T23:20:13,550 INFO  [pool-5-thread-84]: metastore.ObjectStore
> (ObjectStore.java:setConf(301)) - Initialized ObjectStore
>
> I am not sure there is currently such plan to have different logins
> allowed to Hive Metastore. But it will add another level of security.
> Though I am not sure 

Re: Hive Context: Hive Metastore Client

2016-03-08 Thread Alex
Yes, when creating a Hive Context a Hive Metastore client should be 
created with a user that the Spark application will talk to the *remote* 
Hive Metastore with. We would like to add a custom authorization plugin 
to our remote Hive Metastore to authorize the query requests that the 
spark application is submitting which would also add authorization for 
any other applications hitting the Hive Metastore. Furthermore we would 
like to extend this so that we can submit "jobs" to our Spark 
application that will allow us to run against the metastore as different 
users while leveraging the abilities of our spark cluster. But as you 
mentioned only one login connects to the Hive Metastore is shared among 
all HiveContext sessions.


Likely the authentication would have to be completed either through a 
secured Hive Metastore (Kerberos) or by having the requests go through 
HiveServer2.


--Alex

On 3/8/2016 3:13 PM, Mich Talebzadeh wrote:

Hi,

What do you mean by Hive Metastore Client? Are you referring to Hive 
server login much like beeline?


Spark uses hive-site.xml to get the details of Hive metastore and the 
login to the metastore which could be any database. Mine is Oracle and 
as far as I know even in  Hive 2, hive-site.xml has an entry for 
javax.jdo.option.ConnectionUserName that specifies username to use 
against metastore database. These are all multi-threaded JDBC 
connections to the database, the same login as shown below:


LOGIN SID/serial# LOGGED IN S HOST   OS PID Client PID 
PROGRAM   MEM/KB  Logical I/O Physical I/O ACT
 --- --- -- -- 
-- ---   
 ---

INFO
---
HIVEUSER 67,6160 08/03 08:11 rhes564oracle/20539 
hduser/1234JDBC Thin Clien1,017 370 N
HIVEUSER 89,6421 08/03 08:11 rhes564oracle/20541 
hduser/1234JDBC Thin Clien1,081 5280 N
HIVEUSER 112,561 08/03 10:45 rhes564oracle/24624 
hduser/1234JDBC Thin Clien  889 370 N
HIVEUSER 131,881108/03 08:11 rhes564oracle/20543 
hduser/1234JDBC Thin Clien1,017 370 N
HIVEUSER 47,3011408/03 10:45 rhes564oracle/24626 
hduser/1234JDBC Thin Clien1,017 370 N
HIVEUSER 170,895508/03 08:11 rhes564oracle/20545 
hduser/1234JDBC Thin Clien1,017 3230 N


As I understand what you are suggesting is that each Spark user uses 
different login to connect to Hive metastore. As of now there is only 
one login that connects to Hive metastore shared among all


2016-03-08T23:08:01,890 INFO  [pool-5-thread-72]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser 
ip=50.140.197.217   cmd=source:50.140.197.217 get_table : db=test 
tbl=t
2016-03-08T23:18:10,432 INFO  [pool-5-thread-81]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser 
ip=50.140.197.216   cmd=source:50.140.197.216 get_tables: 
db=asehadoop pat=.*


And this is an entry in Hive log when connection is made theough 
Zeppelin UI


2016-03-08T23:20:13,546 INFO  [pool-5-thread-84]: 
metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(499)) - 84: 
Opening raw store with implementation 
class:org.apache.hadoop.hive.metastore.ObjectStore
2016-03-08T23:20:13,547 INFO  [pool-5-thread-84]: 
metastore.ObjectStore (ObjectStore.java:initialize(318)) - 
ObjectStore, initialize called
2016-03-08T23:20:13,550 INFO  [pool-5-thread-84]: 
metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:(142)) - 
Using direct SQL, underlying DB is ORACLE
2016-03-08T23:20:13,550 INFO  [pool-5-thread-84]: 
metastore.ObjectStore (ObjectStore.java:setConf(301)) - Initialized 
ObjectStore


I am not sure there is currently such plan to have different logins 
allowed to Hive Metastore. But it will add another level of security. 
Though I am not sure how this would be authenticated.


HTH



Dr Mich Talebzadeh

LinkedIn 
/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/


http://talebzadehmich.wordpress.com 


On 8 March 2016 at 22:23, Alex F > wrote:


As of Spark 1.6.0 it is now possible to create new Hive Context
sessions sharing various components but right now the Hive
Metastore Client is shared amongst each new Hive Context Session.

Are there any plans to create individual Metastore Clients for
each Hive Context?

Related to the question above are there any plans to create an
interface for customizing the username that the Metastore Client
uses to connect to the Hive Metastore? Right now it either uses
the user specified in an environment variable or the application's
process owner.






Re: Hive Context: Hive Metastore Client

2016-03-08 Thread Mich Talebzadeh
Hi,

What do you mean by Hive Metastore Client? Are you referring to Hive server
login much like beeline?

Spark uses hive-site.xml to get the details of Hive metastore and the login
to the metastore which could be any database. Mine is Oracle and as far as
I know even in  Hive 2, hive-site.xml has an entry for
javax.jdo.option.ConnectionUserName that specifies username to use against
metastore database. These are all multi-threaded JDBC connections to the
database, the same login as shown below:

LOGINSID/serial# LOGGED IN S HOST   OS PID Client PID
PROGRAM   MEM/KB  Logical I/O Physical I/O ACT
 --- --- -- -- --
---    ---
INFO
---
HIVEUSER 67,6160 08/03 08:11 rhes564oracle/20539   hduser/1234
JDBC Thin Clien1,017   370 N
HIVEUSER 89,6421 08/03 08:11 rhes564oracle/20541   hduser/1234
JDBC Thin Clien1,081  5280 N
HIVEUSER 112,561 08/03 10:45 rhes564oracle/24624   hduser/1234
JDBC Thin Clien  889   370 N
HIVEUSER 131,881108/03 08:11 rhes564oracle/20543   hduser/1234
JDBC Thin Clien1,017   370 N
HIVEUSER 47,3011408/03 10:45 rhes564oracle/24626   hduser/1234
JDBC Thin Clien1,017   370 N
HIVEUSER 170,895508/03 08:11 rhes564oracle/20545   hduser/1234
JDBC Thin Clien1,017  3230 N

As I understand what you are suggesting is that each Spark user uses
different login to connect to Hive metastore. As of now there is only one
login that connects to Hive metastore shared among all

2016-03-08T23:08:01,890 INFO  [pool-5-thread-72]: HiveMetaStore.audit
(HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
ip=50.140.197.217   cmd=source:50.140.197.217 get_table : db=test tbl=t
2016-03-08T23:18:10,432 INFO  [pool-5-thread-81]: HiveMetaStore.audit
(HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
ip=50.140.197.216   cmd=source:50.140.197.216 get_tables: db=asehadoop
pat=.*

And this is an entry in Hive log when connection is made theough Zeppelin UI

2016-03-08T23:20:13,546 INFO  [pool-5-thread-84]: metastore.HiveMetaStore
(HiveMetaStore.java:newRawStore(499)) - 84: Opening raw store with
implementation class:org.apache.hadoop.hive.metastore.ObjectStore
2016-03-08T23:20:13,547 INFO  [pool-5-thread-84]: metastore.ObjectStore
(ObjectStore.java:initialize(318)) - ObjectStore, initialize called
2016-03-08T23:20:13,550 INFO  [pool-5-thread-84]:
metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:(142)) - Using
direct SQL, underlying DB is ORACLE
2016-03-08T23:20:13,550 INFO  [pool-5-thread-84]: metastore.ObjectStore
(ObjectStore.java:setConf(301)) - Initialized ObjectStore

I am not sure there is currently such plan to have different logins allowed
to Hive Metastore. But it will add another level of security. Though I am
not sure how this would be authenticated.

HTH



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 8 March 2016 at 22:23, Alex F  wrote:

> As of Spark 1.6.0 it is now possible to create new Hive Context sessions
> sharing various components but right now the Hive Metastore Client is
> shared amongst each new Hive Context Session.
>
> Are there any plans to create individual Metastore Clients for each Hive
> Context?
>
> Related to the question above are there any plans to create an interface
> for customizing the username that the Metastore Client uses to connect to
> the Hive Metastore? Right now it either uses the user specified in an
> environment variable or the application's process owner.
>