Hello :
I see the below exception when I submit a MapReduce Job from standalone java
application to a remote Hadoop cluster. Cluster authentication mechanism is
Kerberos.
Below is the code. I am using user impersonation since I need to submit the job
as a hadoop cluster user (userx) from my machine, on which I am logged is as
user99. So:
userx -- user that is setup on the hadoop cluster.
user99 -- user on whoes machine the standalone java application code is
executing.
System.setProperty("HADOOP_USER_NAME", "userx");
final Configuration conf = new Configuration();
conf.set("hadoop.security.auth_to_local",
"RULE:[1:$1@$0](.*@\\Q\\E$)s/@\\Q\\E$//<mailto:.*@\\Q\\E$)s/@\\Q\\E$//>"
+
"RULE:[2:$1@$0](.*@\\Q\\E$)s/@\\Q\\E$//<mailto:.*@\\Q\\E$)s/@\\Q\\E$//>" +
"DEFAULT");
conf.set("mapred.job.tracker", "abcde.yyyy.com:9921");
conf.set("fs.defaultFS", "hdfs://xxxxx.yyyy.com:9920");
UserGroupInformation.setConfiguration(conf);
System.out.println("here ::::: "+
UserGroupInformation.getCurrentUser());
UserGroupInformation ugi = UserGroupInformation.createProxyUser("user99",
UserGroupInformation.getCurrentUser());
AuthenticationMethod am = AuthenticationMethod.KERBEROS;
ugi.setAuthenticationMethod(am);
final Path inPath = new Path("/user/userx/test.txt");
DateFormat df = new SimpleDateFormat("dd_MM_yyyy_hh_mm");
StringBuilder sb = new StringBuilder();
sb.append("wordcount_result_").append(df.format(new Date()));
// out
final Path outPath = new Path(sb.toString());
ugi.doAs(new PrivilegedExceptionAction<UserGroupInformation>() {
<<<<---------throws exception here!!!
public UserGroupInformation run() throws Exception {
// Submit a job
// create a new job based on the configuration
Job job = new Job(conf, "word count remote");
job.setJarByClass(WordCountJob.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, inPath);
FileOutputFormat.setOutputPath(job, outPath);
// this waits until the job completes
job.waitForCompletion(true);
if (job.isSuccessful()) {
System.out.println("Job completed successfully");
} else {
System.out.println("Job Failed");
}
return UserGroupInformation.getCurrentUser();
}
});
When the above code is executed, I get the below exception on the line
mentioned in the code above:
***************
12/11/28 09:43:51 ERROR security.UserGroupInformation:
PriviledgedActionException as: user99 (auth:KERBEROS) via userx (auth:SIMPLE)
cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
Authorization (hadoop.security.authorization) is enabled but authentication
(hadoop.security.authentication) is configured as simple. Please configure
another method like kerberos or digest.
Exception in thread "Main Thread"
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
Authorization (hadoop.security.authorization) is enabled but authentication
(hadoop.security.authentication) is configured as simple. Please configure
another method like kerberos or digest.
***************
Can someone tell me/point me in the right direction on what is going on here,
and how do i get over this exception? Any help will be greatly appreciated.
thanks!
Below are the hadoop cluster configuration files:
***************
Core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera CM on 2012-11-06T20:18:31.456Z-->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://xxxxx.yyyy.com:9920</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>65536</value>
</property>
<property>
<name>io.compression.codecs</name>
<value></value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.security.auth_to_local</name>
<value>RULE:[1:$1@$0](.*@\Q\E$)s/@\Q\E$//<mailto:.*@\Q\E$)s/@\Q\E$//>
RULE:[2:$1@$0](.*@\Q\E$)s/@\Q\E$//<mailto:.*@\Q\E$)s/@\Q\E$//>
DEFAULT</value>
</property>
</configuration>
Hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera CM on 2012-11-06T20:18:31.467Z-->
<configuration>
<property>
<name>dfs.https.address</name>
<value>xxxxx.yyyy.com:50470</value>
</property>
<property>
<name>dfs.https.port</name>
<value>50470</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>xxxxx.yyyy.com:50070</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/[email protected]</value<mailto:hdfs/[email protected]%3c/value>>
</property>
<property>
<name>dfs.namenode.kerberos.https.principal</name>
<value>host/[email protected]</value<mailto:host/[email protected]%3c/value>>
</property>
<property>
<name>dfs.namenode.kerberos.internal.spnego.principal</name>
<value>HTTP/[email protected]</value<mailto:HTTP/[email protected]%3c/value>>
</property>
</configuration>
Mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera CM on 2012-11-06T20:18:31.456Z-->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>abcde.yyyy.com:9921</value>
</property>
<property>
<name>mapred.output.compress</name>
<value>false</value>
</property>
<property>
<name>mapred.output.compression.type</name>
<value>BLOCK</value>
</property>
<property>
<name>mapred.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.DefaultCodec</value>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>io.sort.factor</name>
<value>64</value>
</property>
<property>
<name>io.sort.record.percent</name>
<value>0.05</value>
</property>
<property>
<name>io.sort.spill.percent</name>
<value>0.8</value>
</property>
<property>
<name>mapred.reduce.parallel.copies</name>
<value>10</value>
</property>
<property>
<name>mapred.submit.replication</name>
<value>10</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>72</value>
</property>
<property>
<name>io.sort.mb</name>
<value>256</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value> -Xmx1073741824</value>
</property>
<property>
<name>mapred.job.reuse.jvm.num.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.map.tasks.speculative.execution</name>
<value>false</value>
</property>
<property>
<name>mapred.reduce.tasks.speculative.execution</name>
<value>false</value>
</property>
<property>
<name>mapred.reduce.slowstart.completed.maps</name>
<value>1.0</value>
</property>
<property>
<name>mapreduce.jobtracker.kerberos.principal</name>
<value>mapred/[email protected]</value<mailto:mapred/[email protected]%3c/value>>
</property>
<property>
<name>mapreduce.jobtracker.kerberos.https.principal</name>
<value>host/[email protected]</value<mailto:host/[email protected]%3c/value>>
</property>
</configuration>
***************
----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may
contain information that is privileged, confidential and/or proprietary and
subject to important terms and conditions available at
http://www.bankofamerica.com/emaildisclaimer. If you are not the intended
recipient, please delete this message.