I agree it is a useful layer and during my investigations in to individual user connections from a spark application I was running some tests with HiveServer2 and using Beeline I was able to authenticate the users passed in correctly but when it came down to authorizing the queries on the metastore they were all using the initial user connection that HiveServer2 had made with the Hive Metastore.

It is my intention that should we get access to the Hive Metastore Client and its configuration through the Hive Context that we could create new HiveContext sessions each with their own connections to the Hive Metastore and have the authorization for the query be completed on the Metastore itself and we would handle the authentication of the users acting as the second tier.

It sounds like this functionality is not likely to be implemented any time soon though so we will have to find a solution in the meantime.

Thanks,
Alex

On 3/8/2016 4:00 PM, Mich Talebzadeh wrote:
The current scenario resembles a three tier architecture but without the security of second tier. In a typical three-tier you have users connecting to the application server (read Hive server2) are independently authenticated and if OK, the second tier creates new ,NET type or JDBC threads to connect to database much like multi-threading. The problem I believe is that Hive server 2 does not have that concept of handling the individual loggings yet. Hive server 2 should be able to handle LDAP logins as well. It is a useful layer to have.

Dr Mich Talebzadeh

LinkedIn /https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>


On 8 March 2016 at 23:28, Alex <this.side.of.confus...@gmail.com <mailto:this.side.of.confus...@gmail.com>> wrote:

    Yes, when creating a Hive Context a Hive Metastore client should
    be created with a user that the Spark application will talk to the
    *remote* Hive Metastore with. We would like to add a custom
    authorization plugin to our remote Hive Metastore to authorize the
    query requests that the spark application is submitting which
    would also add authorization for any other applications hitting
    the Hive Metastore. Furthermore we would like to extend this so
    that we can submit "jobs" to our Spark application that will allow
    us to run against the metastore as different users while
    leveraging the abilities of our spark cluster. But as you
    mentioned only one login connects to the Hive Metastore is shared
    among all HiveContext sessions.

    Likely the authentication would have to be completed either
    through a secured Hive Metastore (Kerberos) or by having the
    requests go through HiveServer2.

    --Alex


    On 3/8/2016 3:13 PM, Mich Talebzadeh wrote:
    Hi,

    What do you mean by Hive Metastore Client? Are you referring to
    Hive server login much like beeline?

    Spark uses hive-site.xml to get the details of Hive metastore and
    the login to the metastore which could be any database. Mine is
    Oracle and as far as I know even in  Hive 2, hive-site.xml has an
    entry for javax.jdo.option.ConnectionUserName that specifies
    username to use against metastore database. These are all
    multi-threaded JDBC connections to the database, the same login
    as shown below:

    LOGIN SID/serial# LOGGED IN S HOST       OS PID         Client
    PID PROGRAM               MEM/KB      Logical I/O Physical I/O ACT
    -------- ----------- ----------- ---------- --------------
    -------------- --------------- ------------ ----------------
    ------------ ---
    INFO
    -------
HIVEUSER 67,6160 08/03 08:11 rhes564 oracle/20539 hduser/1234 JDBC Thin Clien 1,017 37 0 N HIVEUSER 89,6421 08/03 08:11 rhes564 oracle/20541 hduser/1234 JDBC Thin Clien 1,081 528 0 N HIVEUSER 112,561 08/03 10:45 rhes564 oracle/24624 hduser/1234 JDBC Thin Clien 889 37 0 N HIVEUSER 131,8811 08/03 08:11 rhes564 oracle/20543 hduser/1234 JDBC Thin Clien 1,017 37 0 N HIVEUSER 47,30114 08/03 10:45 rhes564 oracle/24626 hduser/1234 JDBC Thin Clien 1,017 37 0 N HIVEUSER 170,8955 08/03 08:11 rhes564 oracle/20545 hduser/1234 JDBC Thin Clien 1,017 323 0 N

    As I understand what you are suggesting is that each Spark user
    uses different login to connect to Hive metastore. As of now
    there is only one login that connects to Hive metastore shared
    among all

    2016-03-08T23:08:01,890 INFO  [pool-5-thread-72]:
    HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) -
    ugi=hduser      ip=50.140.197.217 cmd=source:50.140.197.217
    get_table : db=test tbl=t
    2016-03-08T23:18:10,432 INFO  [pool-5-thread-81]:
    HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) -
    ugi=hduser      ip=50.140.197.216 cmd=source:50.140.197.216
    get_tables: db=asehadoop pat=.*

    And this is an entry in Hive log when connection is made theough
    Zeppelin UI

    2016-03-08T23:20:13,546 INFO  [pool-5-thread-84]:
    metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(499)) -
    84: Opening raw store with implementation
    class:org.apache.hadoop.hive.metastore.ObjectStore
    2016-03-08T23:20:13,547 INFO [pool-5-thread-84]:
    metastore.ObjectStore (ObjectStore.java:initialize(318)) -
    ObjectStore, initialize called
    2016-03-08T23:20:13,550 INFO [pool-5-thread-84]:
    metastore.MetaStoreDirectSql
    (MetaStoreDirectSql.java:<init>(142)) - Using direct SQL,
    underlying DB is ORACLE
    2016-03-08T23:20:13,550 INFO [pool-5-thread-84]:
    metastore.ObjectStore (ObjectStore.java:setConf(301)) -
    Initialized ObjectStore

    I am not sure there is currently such plan to have different
    logins allowed to Hive Metastore. But it will add another level
    of security. Though I am not sure how this would be authenticated.

    HTH



    Dr Mich Talebzadeh

    LinkedIn
    
/https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/

    http://talebzadehmich.wordpress.com


    On 8 March 2016 at 22:23, Alex F
    <this.side.of.confus...@gmail.com
    <mailto:this.side.of.confus...@gmail.com>> wrote:

        As of Spark 1.6.0 it is now possible to create new Hive
        Context sessions sharing various components but right now the
        Hive Metastore Client is shared amongst each new Hive Context
        Session.

        Are there any plans to create individual Metastore Clients
        for each Hive Context?

        Related to the question above are there any plans to create
        an interface for customizing the username that the Metastore
        Client uses to connect to the Hive Metastore? Right now it
        either uses the user specified in an environment variable or
        the application's process owner.





Reply via email to