Re: unable to read kerberized HDFS using dataflow

2020-06-16 Thread Luke Cwik
Posted comments on your SO question.

On Tue, Jun 16, 2020 at 4:32 AM Vince Gonzalez 
wrote:

> Is there specific configuration required to ensure that workers get access
> to UserGroupInformation when using TextIO? I am using Beam 2.22.0 on the
> dataflow runner.
>
> My main method looks like this below. My HdfsTextIOOptions extends
> HadoopFileSystemOptions, and I set the HdfsConfiguration on the options
> instance. I am using a keytab to authenticate. I'm not sure whether
> using UserGroupInformation.setConfiguration() is sufficient to ensure the
> UGI makes it to all the workers. My pipeline fails with this exception:
>
> Error message from worker: org.apache.hadoop.security.AccessControlException: 
> SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
>
>
>   public static void main(String[] args) throws IOException {
> System.setProperty("java.security.krb5.realm", "MY_REALM");
> System.setProperty("java.security.krb5.kdc", "my.kdc.hostname");
>
> HdfsTextIOOptions options =
> PipelineOptionsFactory.fromArgs(args).withValidation().as(
> HdfsTextIOOptions.class);
>
> Storage storage = StorageOptions.getDefaultInstance().getService();
> URI uri = URI.create(options.getGcsKeytabPath());
> System.err.println(String
> .format("URI: %s, filesystem: %s, bucket: %s, filename: %s", 
> uri.toString(),
> uri.getScheme(), uri.getAuthority(),
> uri.getPath()));
> Blob keytabBlob = storage.get(BlobId.of(uri.getAuthority(),
> uri.getPath().startsWith("/") ? uri.getPath().substring(1) : 
> uri.getPath()));
> Path localKeytabPath = Paths.get("/tmp", uri.getPath());
> System.err.println(localKeytabPath);
>
> keytabBlob.downloadTo(localKeytabPath);
>
> Configuration conf = new Configuration();
> conf.set("fs.defaultFS", "hdfs://namenode:8020");
> conf.set("hadoop.security.authentication", "kerberos");
>
> UserGroupInformation
> .loginUserFromKeytab(options.getUserPrincipal(), 
> localKeytabPath.toString());
> UserGroupInformation.setConfiguration(conf);
>
> options.setHdfsConfiguration(ImmutableList.of(conf));
>
> Pipeline p = Pipeline.create(options);
>
> p.apply(TextIO.read().from(options.getInputFile()))
> ...
>
> I also posted to stackoverflow:
> https://stackoverflow.com/questions/62397379/google-cloud-dataflow-textio-and-kerberized-hdfs
>
> Thanks for any leads!
>
> --vince
>
>


unable to read kerberized HDFS using dataflow

2020-06-16 Thread Vince Gonzalez
Is there specific configuration required to ensure that workers get access
to UserGroupInformation when using TextIO? I am using Beam 2.22.0 on the
dataflow runner.

My main method looks like this below. My HdfsTextIOOptions extends
HadoopFileSystemOptions, and I set the HdfsConfiguration on the options
instance. I am using a keytab to authenticate. I'm not sure whether
using UserGroupInformation.setConfiguration() is sufficient to ensure the
UGI makes it to all the workers. My pipeline fails with this exception:

Error message from worker:
org.apache.hadoop.security.AccessControlException: SIMPLE
authentication is not enabled. Available:[TOKEN, KERBEROS]


  public static void main(String[] args) throws IOException {
System.setProperty("java.security.krb5.realm", "MY_REALM");
System.setProperty("java.security.krb5.kdc", "my.kdc.hostname");

HdfsTextIOOptions options =
PipelineOptionsFactory.fromArgs(args).withValidation().as(
HdfsTextIOOptions.class);

Storage storage = StorageOptions.getDefaultInstance().getService();
URI uri = URI.create(options.getGcsKeytabPath());
System.err.println(String
.format("URI: %s, filesystem: %s, bucket: %s, filename: %s",
uri.toString(),
uri.getScheme(), uri.getAuthority(),
uri.getPath()));
Blob keytabBlob = storage.get(BlobId.of(uri.getAuthority(),
uri.getPath().startsWith("/") ? uri.getPath().substring(1) :
uri.getPath()));
Path localKeytabPath = Paths.get("/tmp", uri.getPath());
System.err.println(localKeytabPath);

keytabBlob.downloadTo(localKeytabPath);

Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://namenode:8020");
conf.set("hadoop.security.authentication", "kerberos");

UserGroupInformation
.loginUserFromKeytab(options.getUserPrincipal(),
localKeytabPath.toString());
UserGroupInformation.setConfiguration(conf);

options.setHdfsConfiguration(ImmutableList.of(conf));

Pipeline p = Pipeline.create(options);

p.apply(TextIO.read().from(options.getInputFile()))
...

I also posted to stackoverflow:
https://stackoverflow.com/questions/62397379/google-cloud-dataflow-textio-and-kerberized-hdfs

Thanks for any leads!

--vince