Fwd: [Spark Standalone Mode] How to read from kerberised HDFS in spark standalone mode

2023-01-31 Thread Wei Yan
Glad to hear that!
And hope it can help any other guys facing the same problem.

-- Forwarded message -
发件人: Bansal, Jaimita 
Date: 2023年2月1日周三 03:15
Subject: RE: [Spark Standalone Mode] How to read from kerberised HDFS in
spark standalone mode
To: Wei Yan 
Cc: Chittajallu, Rajiv ,
abner.espin...@ny.email.gs.com 


Hey Wei,



This worked!  Thank you so much.



Thanks,

Jaimita



*From:* Wei Yan 
*Sent:* Thursday, January 19, 2023 7:08 PM
*To:* Bansal, Jaimita [Engineering] 
*Subject:* Re: [Spark Standalone Mode] How to read from kerberised HDFS in
spark standalone mode



Hi!

You can use the  Delegation Token.

In the spark standalone mode ,the simple way to use the Delegation Token is
set an environment variable in every node include master nodes and work
nodes, and the content of this environment variable is the path of the
Delegation Token file.

You should renew this file at a fixed time interval.

hdfs fetchdt -renewer hive /opt/spark/conf/delegation.token



Bansal, Jaimita  于2023年1月20日周五 07:46写道:

Hi Spark Team,



We are facing an issue when trying to read from HDFS via spark running in
standalone cluster.  The issue comes from the executor node not able to
authenticate. It is using auth:SIMPLE when actually we have setup auth as
Kerberos.  Could you please help in resolving this?



Caused by: java.io.IOException:
org.apache.hadoop.security.AccessControlException: Client cannot
authenticate via:[TOKEN, KERBEROS]

at
org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:778)
~[hadoop-common-3.1.1.7.1.7.1000-141.jar:na]





18:57:44.726 [main] DEBUG o.a.spark.deploy.SparkHadoopUtil - creating UGI
for user: 

18:57:45.045 [main] DEBUG o.a.h.security.UserGroupInformation - hadoop login

18:57:45.046 [main] DEBUG o.a.h.security.UserGroupInformation - hadoop
login commit

18:57:45.047 [main] DEBUG o.a.h.security.UserGroupInformation - using
kerberos user: @GS.COM

18:57:45.047 [main] DEBUG o.a.h.security.UserGroupInformation - Using user:
"@GS.COM" with name @GS.COM

18:57:45.047 [main] DEBUG o.a.h.security.UserGroupInformation - User entry:
" @GS.COM"

18:57:45.047 [main] DEBUG o.a.h.security.UserGroupInformation - UGI
loginUser:@GS.COM (auth:KERBEROS)

18:57:45.056 [main] DEBUG o.a.h.security.UserGroupInformation -
PrivilegedAction as: (auth:SIMPLE)
from:org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)

18:57:45.078 [TGT Renewer for @GS.COM] DEBUG
o.a.h.security.UserGroupInformation - Current time is 1674068265078

18:57:45.079 [TGT Renewer for @GS.COM] DEBUG
o.a.h.security.UserGroupInformation - Next refresh is 1674136785000

18:57:45.092 [main] INFO  org.apache.spark.SecurityManager - Changing view
acls to: root,

18:57:45.092 [main] INFO  org.apache.spark.SecurityManager - Changing
modify acls to: root,

18:57:45.093 [main] INFO  org.apache.spark.SecurityManager - Changing view
acls groups to:

18:57:45.093 [main] INFO  org.apache.spark.SecurityManager - Changing
modify acls groups to:



Thanks,

Jaimita



*Vice President, Data Lake Engineering*

*Goldman Sachs*




--


Your Personal Data: We may collect and process information about you that
may be subject to data protection laws. For more information about how we
use and disclose your personal data, how we protect your information, our
legal basis to use your information, your rights and who you can contact,
please refer to: www.gs.com/privacy-notices


--

Your Personal Data: We may collect and process information about you that
may be subject to data protection laws. For more information about how we
use and disclose your personal data, how we protect your information, our
legal basis to use your information, your rights and who you can contact,
please refer to: www.gs.com/privacy-notices


[Spark Standalone Mode] How to read from kerberised HDFS in spark standalone mode

2023-01-19 Thread Bansal, Jaimita
Hi Spark Team,

We are facing an issue when trying to read from HDFS via spark running in 
standalone cluster.  The issue comes from the executor node not able to 
authenticate. It is using auth:SIMPLE when actually we have setup auth as 
Kerberos.  Could you please help in resolving this?

Caused by: java.io.IOException: 
org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
via:[TOKEN, KERBEROS]
at 
org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:778) 
~[hadoop-common-3.1.1.7.1.7.1000-141.jar:na]


18:57:44.726 [main] DEBUG o.a.spark.deploy.SparkHadoopUtil - creating UGI for 
user: 
18:57:45.045 [main] DEBUG o.a.h.security.UserGroupInformation - hadoop login
18:57:45.046 [main] DEBUG o.a.h.security.UserGroupInformation - hadoop login 
commit
18:57:45.047 [main] DEBUG o.a.h.security.UserGroupInformation - using kerberos 
user: @GS.COM
18:57:45.047 [main] DEBUG o.a.h.security.UserGroupInformation - Using user: 
"@GS.COM" with name @GS.COM
18:57:45.047 [main] DEBUG o.a.h.security.UserGroupInformation - User entry: 
" @GS.COM"
18:57:45.047 [main] DEBUG o.a.h.security.UserGroupInformation - UGI 
loginUser:@GS.COM (auth:KERBEROS)
18:57:45.056 [main] DEBUG o.a.h.security.UserGroupInformation - 
PrivilegedAction as: (auth:SIMPLE) 
from:org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
18:57:45.078 [TGT Renewer for @GS.COM] DEBUG 
o.a.h.security.UserGroupInformation - Current time is 1674068265078
18:57:45.079 [TGT Renewer for @GS.COM] DEBUG 
o.a.h.security.UserGroupInformation - Next refresh is 1674136785000
18:57:45.092 [main] INFO  org.apache.spark.SecurityManager - Changing view acls 
to: root,
18:57:45.092 [main] INFO  org.apache.spark.SecurityManager - Changing modify 
acls to: root,
18:57:45.093 [main] INFO  org.apache.spark.SecurityManager - Changing view acls 
groups to:
18:57:45.093 [main] INFO  org.apache.spark.SecurityManager - Changing modify 
acls groups to:

Thanks,
Jaimita

Vice President, Data Lake Engineering
Goldman Sachs




Your Personal Data: We may collect and process information about you that may 
be subject to data protection laws. For more information about how we use and 
disclose your personal data, how we protect your information, our legal basis 
to use your information, your rights and who you can contact, please refer to: 
www.gs.com/privacy-notices