submitting a mapreduce job to remote cluster

Erravelli, Venkat Wed, 28 Nov 2012 08:05:36 -0800

Hello :

I see the below exception when I submit a MapReduce Job from standalone java 
application to a remote Hadoop cluster. Cluster authentication mechanism is 
Kerberos.


Below is the code. I am using user impersonation since I need to submit the job 
as a hadoop cluster user (userx) from my machine, on which I am logged is as 
user99. So:

userx -- user that is setup on the hadoop cluster.
user99 -- user on whoes machine the standalone java application code is 
executing.

                    System.setProperty("HADOOP_USER_NAME", "userx");

            final Configuration conf = new Configuration();

            conf.set("hadoop.security.auth_to_local",
                        
"RULE:[1:$1@$0](.*@\\Q\\E$)s/@\\Q\\E$//<mailto:.*@\\Q\\E$)s/@\\Q\\E$//>"
                                    + 
"RULE:[2:$1@$0](.*@\\Q\\E$)s/@\\Q\\E$//<mailto:.*@\\Q\\E$)s/@\\Q\\E$//>" + 
"DEFAULT");

            conf.set("mapred.job.tracker", "abcde.yyyy.com:9921");

            conf.set("fs.defaultFS", "hdfs://xxxxx.yyyy.com:9920");

            UserGroupInformation.setConfiguration(conf);

            System.out.println("here ::::: "+ 
UserGroupInformation.getCurrentUser());

UserGroupInformation ugi = UserGroupInformation.createProxyUser("user99", 
UserGroupInformation.getCurrentUser());
            AuthenticationMethod am = AuthenticationMethod.KERBEROS;
            ugi.setAuthenticationMethod(am);


            final Path inPath = new Path("/user/userx/test.txt");

            DateFormat df = new SimpleDateFormat("dd_MM_yyyy_hh_mm");
            StringBuilder sb = new StringBuilder();
            sb.append("wordcount_result_").append(df.format(new Date()));

            // out
            final Path outPath = new Path(sb.toString());

            ugi.doAs(new PrivilegedExceptionAction<UserGroupInformation>() {   
<<<<---------throws exception here!!!

                  public UserGroupInformation run() throws Exception {
                        // Submit a job
                        // create a new job based on the configuration
                        Job job = new Job(conf, "word count remote");

                        job.setJarByClass(WordCountJob.class);
                        job.setMapperClass(TokenizerMapper.class);
                        job.setCombinerClass(IntSumReducer.class);
                        job.setReducerClass(IntSumReducer.class);
                        job.setOutputKeyClass(Text.class);
                        job.setOutputValueClass(IntWritable.class);
                        FileInputFormat.addInputPath(job, inPath);
                        FileOutputFormat.setOutputPath(job, outPath);

                        // this waits until the job completes
                        job.waitForCompletion(true);

                        if (job.isSuccessful()) {
                              System.out.println("Job completed successfully");
                        } else {
                              System.out.println("Job Failed");
                        }
                        return UserGroupInformation.getCurrentUser();

                  }
            });

When the above code is executed, I get the below exception on the line 
mentioned in the code above:
***************
12/11/28 09:43:51 ERROR security.UserGroupInformation: 
PriviledgedActionException as: user99 (auth:KERBEROS) via userx (auth:SIMPLE) 
cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
 Authorization (hadoop.security.authorization) is enabled but authentication 
(hadoop.security.authentication) is configured as simple. Please configure 
another method like kerberos or digest.
Exception in thread "Main Thread" 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
 Authorization (hadoop.security.authorization) is enabled but authentication 
(hadoop.security.authentication) is configured as simple. Please configure 
another method like kerberos or digest.
***************
Can someone tell me/point me in the right direction on what is going on here, 
and how do i get over this exception? Any help will be greatly appreciated. 
thanks!

Below are the hadoop cluster configuration files:

***************
Core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera CM on 2012-11-06T20:18:31.456Z-->
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://xxxxx.yyyy.com:9920</value>
  </property>
  <property>
    <name>io.file.buffer.size</name>
    <value>65536</value>
  </property>
  <property>
    <name>io.compression.codecs</name>
    <value></value>
  </property>
  <property>
    <name>hadoop.security.authentication</name>
    <value>kerberos</value>
  </property>
  <property>
    <name>hadoop.security.auth_to_local</name>
    <value>RULE:[1:$1@$0](.*@\Q\E$)s/@\Q\E$//<mailto:.*@\Q\E$)s/@\Q\E$//>
RULE:[2:$1@$0](.*@\Q\E$)s/@\Q\E$//<mailto:.*@\Q\E$)s/@\Q\E$//>
DEFAULT</value>
  </property>
</configuration>


Hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera CM on 2012-11-06T20:18:31.467Z-->
<configuration>
  <property>
    <name>dfs.https.address</name>
    <value>xxxxx.yyyy.com:50470</value>
  </property>
  <property>
    <name>dfs.https.port</name>
    <value>50470</value>
  </property>
  <property>
    <name>dfs.namenode.http-address</name>
    <value>xxxxx.yyyy.com:50070</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.blocksize</name>
    <value>134217728</value>
  </property>
  <property>
    <name>dfs.client.use.datanode.hostname</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.block.access.token.enable</name>
    <value>true</value>
  </property>
  <property>
    <name>dfs.namenode.kerberos.principal</name>
    
<value>hdfs/[email protected]</value<mailto:hdfs/[email protected]%3c/value>>
  </property>
  <property>
    <name>dfs.namenode.kerberos.https.principal</name>
    
<value>host/[email protected]</value<mailto:host/[email protected]%3c/value>>
  </property>
  <property>
    <name>dfs.namenode.kerberos.internal.spnego.principal</name>
    
<value>HTTP/[email protected]</value<mailto:HTTP/[email protected]%3c/value>>
  </property>
</configuration>


Mapred-site.xml


<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera CM on 2012-11-06T20:18:31.456Z-->
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>abcde.yyyy.com:9921</value>
  </property>
  <property>
    <name>mapred.output.compress</name>
    <value>false</value>
  </property>
  <property>
    <name>mapred.output.compression.type</name>
    <value>BLOCK</value>
  </property>
  <property>
    <name>mapred.output.compression.codec</name>
    <value>org.apache.hadoop.io.compress.DefaultCodec</value>
  </property>
  <property>
    <name>mapred.map.output.compression.codec</name>
    <value>org.apache.hadoop.io.compress.SnappyCodec</value>
  </property>
  <property>
    <name>mapred.compress.map.output</name>
    <value>true</value>
  </property>
  <property>
    <name>io.sort.factor</name>
    <value>64</value>
  </property>
  <property>
    <name>io.sort.record.percent</name>
    <value>0.05</value>
  </property>
  <property>
    <name>io.sort.spill.percent</name>
    <value>0.8</value>
  </property>
  <property>
    <name>mapred.reduce.parallel.copies</name>
    <value>10</value>
  </property>
  <property>
    <name>mapred.submit.replication</name>
    <value>10</value>
  </property>
  <property>
    <name>mapred.reduce.tasks</name>
    <value>72</value>
  </property>
  <property>
    <name>io.sort.mb</name>
    <value>256</value>
  </property>
  <property>
    <name>mapred.child.java.opts</name>
    <value> -Xmx1073741824</value>
  </property>
  <property>
    <name>mapred.job.reuse.jvm.num.tasks</name>
    <value>1</value>
  </property>
  <property>
    <name>mapred.map.tasks.speculative.execution</name>
    <value>false</value>
  </property>
  <property>
    <name>mapred.reduce.tasks.speculative.execution</name>
    <value>false</value>
  </property>
  <property>
    <name>mapred.reduce.slowstart.completed.maps</name>
    <value>1.0</value>
  </property>
  <property>
    <name>mapreduce.jobtracker.kerberos.principal</name>
    
<value>mapred/[email protected]</value<mailto:mapred/[email protected]%3c/value>>
  </property>
  <property>
    <name>mapreduce.jobtracker.kerberos.https.principal</name>
    
<value>host/[email protected]</value<mailto:host/[email protected]%3c/value>>
  </property>
</configuration>


***************

----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may 
contain information that is privileged, confidential and/or proprietary and 
subject to important terms and conditions available at 
http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended 
recipient, please delete this message.

submitting a mapreduce job to remote cluster

Reply via email to