On the server where I have beam installed, from the command like I can do a
Kinit -kt <some keytab> <user> to generate a Kerberos ticket and have it in the 
cache
Then from the command line I can do
curl -ss --negotiate -u : 
http://${namenode_host}:50070/webhdfs/v1${path}?op=LISTSTATUS<http://$%7bnamenode_host%7d:50070/webhdfs/v1$%7bpath%7d?op=LISTSTATUS>
 and get data back from hdfs. So I know Kerberos is working, my environment is 
correct (I have HADOOP_CONFIG etc set up correctly) and I have no issues with 
firewalls or networks. So curling from the server is just a test.

From beam ,running in the same environment as above,  I can set up
hdfs_client_options = HadoopFileSystemOptions(["--hdfs_host=namenode_host",
                                       "--hdfs_user=user",
                                       "--hdfs_port=50070"
                                       ])
But this generates an authentication error when I try to access HDFS. I can’t 
see a way to tell it to negotiate with Kerberos.

So the question is, using the Python SDK, how do you set up a connection to a 
HDFS cluster that has Kerberos enabled.

From: Udi Meiri <[email protected]>
Sent: 03 February 2021 22:56
To: user <[email protected]>
Subject: Re: Python SDK and Kerberos

Beam only uses the InsecureClient and the host and port need to be given in the 
command line. The other client type uses "Hadoop token delegation security". Is 
that what you mean?

Can you give an example of the URL you're using with curl?

On Wed, Feb 3, 2021 at 11:58 AM Ahmet Altay 
<[email protected]<mailto:[email protected]>> wrote:
/cc +Udi Meiri<mailto:[email protected]>

On Wed, Feb 3, 2021 at 9:23 AM Doutre, Mark 
<[email protected]<mailto:[email protected]>> wrote:
Hi,
  I’m trying to use the python sdk to write data to hdfs. However our cluster 
is kerberized. Is it possible to do this with the current sdk? If it is, how to 
you get it to authenticate?

  Everything works fine from the command line in the box I am using. Can curl 
to the webhdfs server and port provided I tell curl to negotiate. Can’t find 
any way to do that from inside beam.

Thanks
Mark

Reply via email to