[ 
https://issues.apache.org/jira/browse/SPARK-31514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanchay Javeria updated SPARK-31514:
------------------------------------
    Description: 
I'm using Spark-2.4, I have a Kerberos enabled cluster where I'm trying to run 
a query via the {{spark-sql}} shell.

The simplified setup basically looks like this: spark-sql shell running on one 
host in a Yarn cluster -> external hive-metastore running one host -> S3 to 
store table data.

When I launch the {{spark-sql}} shell with DEBUG logging enabled, this is what 
I see in the logs:
{code:java}
> bin/spark-sql --proxy-user proxy_user 

...
DEBUG HiveDelegationTokenProvider: Getting Hive delegation token for proxy_user 
against hive/_h...@realm.com at thrift://hive-metastore:9083 

DEBUG UserGroupInformation: PrivilegedAction as:spark/spark_h...@realm.com 
(auth:KERBEROS) 
from:org.apache.spark.deploy.security.HiveDelegationTokenProvider.doAsRealUser(HiveDelegationTokenProvider.scala:130){code}
This means that Spark made a call to fetch the delegation token from the Hive 
metastore and then added it to the list of credentials for the UGI. [This is 
the piece of 
code|https://github.com/apache/spark/blob/branch-2.4/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala#L129]
 that does that. I also verified in the metastore logs that the 
{{get_delegation_token()}} call was being made.

Now when I run a simple query like {{create table test_table (id int) location 
"s3://some/prefix";}} I get hit with an AWS credentials error. I modified the 
hive metastore code and added this right before the file system in Hadoop is 
initialized ([org/apache/hadoop/hive/metastore/Warehouse.java|#L116]):
{code:java}
 public static FileSystem getFs(Path f, Configuration conf) throws 
MetaException {
    try {
      UserGroupInformation ugi = UserGroupInformation.getCurrentUser();
      LOG.info("UGI information: " + ugi);
      Collection<Token<? extends TokenIdentifier>> tokens = 
ugi.getCredentials().getAllTokens();
      for(Token token : tokens) {
        LOG.info(token);
      }
    } catch (IOException e) {
      e.printStackTrace();
    }
...
{code}
In the metastore logs, this does print the correct UGI information:
{code:java}
UGI information: proxy_user (auth:PROXY) via hive/hive-metast...@realm.com 
(auth:KERBEROS){code}
but there are no tokens present in the UGI. Looks like [Spark 
code|https://github.com/apache/spark/blob/branch-2.4/core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala#L101]
 adds it with the alias {{hive.server2.delegation.token}} but I don't see it in 
the UGI. This makes me suspect that somehow the UGI scope is isolated and not 
being shared between spark-sql and hive metastore. How do I go about solving 
this? Any help will be really appreciated!

  was:
I'm using Spark-2.4, I have a Kerberos enabled cluster where I'm trying to run 
a query via the {{spark-sql}} shell.

The simplified setup basically looks like this: spark-sql shell running on one 
host in a Yarn cluster -> external hive-metastore running one host -> S3 to 
store table data.

When I launch the {{spark-sql}} shell with DEBUG logging enabled, this is what 
I see in the logs:
{code:java}
> bin/spark-sql --proxy-user proxy_user 

...
DEBUG HiveDelegationTokenProvider: Getting Hive delegation token for proxy_user 
against hive/_h...@realm.com at thrift://hive-metastore:9083 

DEBUG UserGroupInformation: PrivilegedAction as:spark/spark_h...@realm.com 
(auth:KERBEROS) 
from:org.apache.spark.deploy.security.HiveDelegationTokenProvider.doAsRealUser(HiveDelegationTokenProvider.scala:130){code}
This means that Spark made a call to fetch the delegation token from the Hive 
metastore and then added it to the list of credentials for the UGI. [This is 
the piece of 
code|https://github.com/apache/spark/blob/branch-2.4/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala#L129]
 that does that. I also verified in the metastore logs that the 
{{get_delegation_token()}} call was being made.

Now when I run a simple query like {{create table test_table (id int) location 
"s3://some/prefix";}} I get hit with an AWS credentials error. I modified the 
hive metastore code and added this right before the file system in Hadoop is 
initialized ([org/apache/hadoop/hive/metastore/Warehouse.java|#L116]]):
{code:java}
 public static FileSystem getFs(Path f, Configuration conf) throws 
MetaException {
    try {
      UserGroupInformation ugi = UserGroupInformation.getCurrentUser();
      LOG.info("UGI information: " + ugi);
      Collection<Token<? extends TokenIdentifier>> tokens = 
ugi.getCredentials().getAllTokens();
      for(Token token : tokens) {
        LOG.info(token);
      }
    } catch (IOException e) {
      e.printStackTrace();
    }
...
{code}
In the metastore logs, this does print the correct UGI information:
{code:java}
UGI information: proxy_user (auth:PROXY) via hive/hive-metast...@realm.com 
(auth:KERBEROS){code}
but there are no tokens present in the UGI. Looks like [Spark 
code|https://github.com/apache/spark/blob/branch-2.4/core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala#L101]
 adds it with the alias {{hive.server2.delegation.token}} but I don't see it in 
the UGI. This makes me suspect that somehow the UGI scope is isolated and not 
being shared between spark-sql and hive metastore. How do I go about solving 
this? Any help will be really appreciated!


> Kerberos: Spark UGI credentials are not getting passed down to Hive
> -------------------------------------------------------------------
>
>                 Key: SPARK-31514
>                 URL: https://issues.apache.org/jira/browse/SPARK-31514
>             Project: Spark
>          Issue Type: Question
>          Components: SQL
>    Affects Versions: 2.4.4
>            Reporter: Sanchay Javeria
>            Priority: Major
>
> I'm using Spark-2.4, I have a Kerberos enabled cluster where I'm trying to 
> run a query via the {{spark-sql}} shell.
> The simplified setup basically looks like this: spark-sql shell running on 
> one host in a Yarn cluster -> external hive-metastore running one host -> S3 
> to store table data.
> When I launch the {{spark-sql}} shell with DEBUG logging enabled, this is 
> what I see in the logs:
> {code:java}
> > bin/spark-sql --proxy-user proxy_user 
> ...
> DEBUG HiveDelegationTokenProvider: Getting Hive delegation token for 
> proxy_user against hive/_h...@realm.com at thrift://hive-metastore:9083 
> DEBUG UserGroupInformation: PrivilegedAction as:spark/spark_h...@realm.com 
> (auth:KERBEROS) 
> from:org.apache.spark.deploy.security.HiveDelegationTokenProvider.doAsRealUser(HiveDelegationTokenProvider.scala:130){code}
> This means that Spark made a call to fetch the delegation token from the Hive 
> metastore and then added it to the list of credentials for the UGI. [This is 
> the piece of 
> code|https://github.com/apache/spark/blob/branch-2.4/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala#L129]
>  that does that. I also verified in the metastore logs that the 
> {{get_delegation_token()}} call was being made.
> Now when I run a simple query like {{create table test_table (id int) 
> location "s3://some/prefix";}} I get hit with an AWS credentials error. I 
> modified the hive metastore code and added this right before the file system 
> in Hadoop is initialized 
> ([org/apache/hadoop/hive/metastore/Warehouse.java|#L116]):
> {code:java}
>  public static FileSystem getFs(Path f, Configuration conf) throws 
> MetaException {
>     try {
>       UserGroupInformation ugi = UserGroupInformation.getCurrentUser();
>       LOG.info("UGI information: " + ugi);
>       Collection<Token<? extends TokenIdentifier>> tokens = 
> ugi.getCredentials().getAllTokens();
>       for(Token token : tokens) {
>         LOG.info(token);
>       }
>     } catch (IOException e) {
>       e.printStackTrace();
>     }
> ...
> {code}
> In the metastore logs, this does print the correct UGI information:
> {code:java}
> UGI information: proxy_user (auth:PROXY) via hive/hive-metast...@realm.com 
> (auth:KERBEROS){code}
> but there are no tokens present in the UGI. Looks like [Spark 
> code|https://github.com/apache/spark/blob/branch-2.4/core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala#L101]
>  adds it with the alias {{hive.server2.delegation.token}} but I don't see it 
> in the UGI. This makes me suspect that somehow the UGI scope is isolated and 
> not being shared between spark-sql and hive metastore. How do I go about 
> solving this? Any help will be really appreciated!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to