RE: submitting a mapreduce job to remote cluster

Erravelli, Venkat Wed, 28 Nov 2012 09:14:04 -0800

Tried the below :

conf.set("hadoop.security.authentication", "kerberos");  >>>>>>>>  Added this 
line.


UserGroupInformation.setConfiguration(conf);  <<<<<<<<<<<<< Now, it fails on 
this line with the below exception


Exception in thread "Main Thread" java.lang.ExceptionInInitializerError
        at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:227)
        at 
org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:268)
        at com.ard.WordCountJob.process2(WordCountJob.java:147)
        at com.ard.WordCountJob.main(WordCountJob.java:198)
Caused by: java.lang.IllegalArgumentException: Can't get Kerberos configuration
        at 
org.apache.hadoop.security.HadoopKerberosName.<clinit>(HadoopKerberosName.java:44)
        at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:227)
        at 
org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:267)
        at com.ard.WordCountJob.process2(WordCountJob.java:142)
        at com.ard.WordCountJob.main(WordCountJob.java:197)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:63)
        at 
org.apache.hadoop.security.HadoopKerberosName.<clinit>(HadoopKerberosName.java:41)
        at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:227)
        at 
org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:268)
        at com.ard.WordCountJob.process2(WordCountJob.java:147)
        at com.ard.WordCountJob.main(WordCountJob.java:198)
Caused by: KrbException: Could not load configuration file C:\WINNT\krb5.ini 
(The system cannot find the file specified)
        at sun.security.krb5.Config.<init>(Config.java:147)
        at sun.security.krb5.Config.getInstance(Config.java:79)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:63)
        at 
org.apache.hadoop.security.HadoopKerberosName.<clinit>(HadoopKerberosName.java:41)
        at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:227)
        at 
org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:267)
        at com.ard.WordCountJob.process2(WordCountJob.java:142)
        at com.ard.WordCountJob.main(WordCountJob.java:197)
Caused by: java.io.FileNotFoundException: C:\WINNT\krb5.ini (The system cannot 
find the file specified)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at java.io.FileInputStream.<init>(FileInputStream.java:66)
        at sun.security.krb5.Config$1.run(Config.java:539)
        at sun.security.krb5.Config.loadConfigFile(Config.java:535)
        at sun.security.krb5.Config.<init>(Config.java:144)
        ... 11 more

-----Original Message-----
From: Harsh J [mailto:[email protected]] 
Sent: Wednesday, November 28, 2012 11:35 AM
To: <[email protected]>
Subject: Re: submitting a mapreduce job to remote cluster

Are you positive that your cluster/client configuration files'
directory is on the classpath when you run this job? Only then its values would 
get automatically read when you instantiate the Configuration class.

Alternatively, you may try to set: "hadoop.security.authentication" to 
"kerberos" manually in your Configuration (conf) object.

On Wed, Nov 28, 2012 at 9:23 PM, Erravelli, Venkat <[email protected]> 
wrote:
> Hello :
>
>
>
> I see the below exception when I submit a MapReduce Job from 
> standalone java application to a remote Hadoop cluster. Cluster 
> authentication mechanism is Kerberos.
>
>
>
> Below is the code. I am using user impersonation since I need to 
> submit the job as a hadoop cluster user (userx) from my machine, on 
> which I am logged is as user99. So:
>
>
>
> userx -- user that is setup on the hadoop cluster.
>
> user99 -- user on whoes machine the standalone java application code 
> is executing.
>
>
>
>                     System.setProperty("HADOOP_USER_NAME", "userx");
>
>
>
>             final Configuration conf = new Configuration();
>
>
>
>             conf.set("hadoop.security.auth_to_local",
>
>                         "RULE:[1:$1@$0](.*@\\Q\\E$)s/@\\Q\\E$//"
>
>                                     +
> "RULE:[2:$1@$0](.*@\\Q\\E$)s/@\\Q\\E$//" + "DEFAULT");
>
>
>
>             conf.set("mapred.job.tracker", "abcde.yyyy.com:9921");
>
>
>
>             conf.set("fs.defaultFS", "hdfs://xxxxx.yyyy.com:9920");
>
>
>
>             UserGroupInformation.setConfiguration(conf);
>
>
>
>             System.out.println("here ::::: "+ 
> UserGroupInformation.getCurrentUser());
>
>
>
> UserGroupInformation ugi = 
> UserGroupInformation.createProxyUser("user99",
> UserGroupInformation.getCurrentUser());
>
>             AuthenticationMethod am = AuthenticationMethod.KERBEROS;
>
>             ugi.setAuthenticationMethod(am);
>
>
>
>
>
>             final Path inPath = new Path("/user/userx/test.txt");
>
>
>
>             DateFormat df = new SimpleDateFormat("dd_MM_yyyy_hh_mm");
>
>             StringBuilder sb = new StringBuilder();
>
>             sb.append("wordcount_result_").append(df.format(new 
> Date()));
>
>
>
>             // out
>
>             final Path outPath = new Path(sb.toString());
>
>
>
>             ugi.doAs(new 
> PrivilegedExceptionAction<UserGroupInformation>() { <<<<---------throws 
> exception here!!!
>
>
>
>                   public UserGroupInformation run() throws Exception {
>
>                         // Submit a job
>
>                         // create a new job based on the configuration
>
>                         Job job = new Job(conf, "word count remote");
>
>
>
>                         job.setJarByClass(WordCountJob.class);
>
>                         job.setMapperClass(TokenizerMapper.class);
>
>                         job.setCombinerClass(IntSumReducer.class);
>
>                         job.setReducerClass(IntSumReducer.class);
>
>                         job.setOutputKeyClass(Text.class);
>
>                         job.setOutputValueClass(IntWritable.class);
>
>                         FileInputFormat.addInputPath(job, inPath);
>
>                         FileOutputFormat.setOutputPath(job, outPath);
>
>
>
>                         // this waits until the job completes
>
>                         job.waitForCompletion(true);
>
>
>
>                         if (job.isSuccessful()) {
>
>                               System.out.println("Job completed 
> successfully");
>
>                         } else {
>
>                               System.out.println("Job Failed");
>
>                         }
>
>                         return UserGroupInformation.getCurrentUser();
>
>
>
>                   }
>
>             });
>
>
>
> When the above code is executed, I get the below exception on the line 
> mentioned in the code above:
>
> ***************
>
> 12/11/28 09:43:51 ERROR security.UserGroupInformation:
> PriviledgedActionException as: user99 (auth:KERBEROS) via userx
> (auth:SIMPLE)
> cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
> Authorization (hadoop.security.authorization) is enabled but 
> authentication
> (hadoop.security.authentication) is configured as simple. Please 
> configure another method like kerberos or digest.
>
> Exception in thread "Main Thread"
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
> Authorization (hadoop.security.authorization) is enabled but 
> authentication
> (hadoop.security.authentication) is configured as simple. Please 
> configure another method like kerberos or digest.
>
> ***************
>
> Can someone tell me/point me in the right direction on what is going 
> on here, and how do i get over this exception? Any help will be 
> greatly appreciated. thanks!
>
>
>
> Below are the hadoop cluster configuration files:
>
>
>
> ***************
>
> Core-site.xml
>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
>
>
>
> <!--Autogenerated by Cloudera CM on 2012-11-06T20:18:31.456Z-->
>
> <configuration>
>
>   <property>
>
>     <name>fs.defaultFS</name>
>
>     <value>hdfs://xxxxx.yyyy.com:9920</value>
>
>   </property>
>
>   <property>
>
>     <name>io.file.buffer.size</name>
>
>     <value>65536</value>
>
>   </property>
>
>   <property>
>
>     <name>io.compression.codecs</name>
>
>     <value></value>
>
>   </property>
>
>   <property>
>
>     <name>hadoop.security.authentication</name>
>
>     <value>kerberos</value>
>
>   </property>
>
>   <property>
>
>     <name>hadoop.security.auth_to_local</name>
>
>     <value>RULE:[1:$1@$0](.*@\Q\E$)s/@\Q\E$//
>
> RULE:[2:$1@$0](.*@\Q\E$)s/@\Q\E$//
>
> DEFAULT</value>
>
>   </property>
>
> </configuration>
>
>
>
>
>
> Hdfs-site.xml
>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
>
>
>
> <!--Autogenerated by Cloudera CM on 2012-11-06T20:18:31.467Z-->
>
> <configuration>
>
>   <property>
>
>     <name>dfs.https.address</name>
>
>     <value>xxxxx.yyyy.com:50470</value>
>
>   </property>
>
>   <property>
>
>     <name>dfs.https.port</name>
>
>     <value>50470</value>
>
>   </property>
>
>   <property>
>
>     <name>dfs.namenode.http-address</name>
>
>     <value>xxxxx.yyyy.com:50070</value>
>
>   </property>
>
>   <property>
>
>     <name>dfs.replication</name>
>
>     <value>3</value>
>
>   </property>
>
>   <property>
>
>     <name>dfs.blocksize</name>
>
>     <value>134217728</value>
>
>   </property>
>
>   <property>
>
>     <name>dfs.client.use.datanode.hostname</name>
>
>     <value>false</value>
>
>   </property>
>
>   <property>
>
>     <name>dfs.block.access.token.enable</name>
>
>     <value>true</value>
>
>   </property>
>
>   <property>
>
>     <name>dfs.namenode.kerberos.principal</name>
>
>     <value>hdfs/[email protected]</value>
>
>   </property>
>
>   <property>
>
>     <name>dfs.namenode.kerberos.https.principal</name>
>
>     <value>host/[email protected]</value>
>
>   </property>
>
>   <property>
>
>     <name>dfs.namenode.kerberos.internal.spnego.principal</name>
>
>     <value>HTTP/[email protected]</value>
>
>   </property>
>
> </configuration>
>
>
>
>
>
> Mapred-site.xml
>
>
>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
>
>
>
> <!--Autogenerated by Cloudera CM on 2012-11-06T20:18:31.456Z-->
>
> <configuration>
>
>   <property>
>
>     <name>mapred.job.tracker</name>
>
>     <value>abcde.yyyy.com:9921</value>
>
>   </property>
>
>   <property>
>
>     <name>mapred.output.compress</name>
>
>     <value>false</value>
>
>   </property>
>
>   <property>
>
>     <name>mapred.output.compression.type</name>
>
>     <value>BLOCK</value>
>
>   </property>
>
>   <property>
>
>     <name>mapred.output.compression.codec</name>
>
>     <value>org.apache.hadoop.io.compress.DefaultCodec</value>
>
>   </property>
>
>   <property>
>
>     <name>mapred.map.output.compression.codec</name>
>
>     <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>
>   </property>
>
>   <property>
>
>     <name>mapred.compress.map.output</name>
>
>     <value>true</value>
>
>   </property>
>
>   <property>
>
>     <name>io.sort.factor</name>
>
>     <value>64</value>
>
>   </property>
>
>   <property>
>
>     <name>io.sort.record.percent</name>
>
>     <value>0.05</value>
>
>   </property>
>
>   <property>
>
>     <name>io.sort.spill.percent</name>
>
>     <value>0.8</value>
>
>   </property>
>
>   <property>
>
>     <name>mapred.reduce.parallel.copies</name>
>
>     <value>10</value>
>
>   </property>
>
>   <property>
>
>     <name>mapred.submit.replication</name>
>
>     <value>10</value>
>
>   </property>
>
>   <property>
>
>     <name>mapred.reduce.tasks</name>
>
>     <value>72</value>
>
>   </property>
>
>   <property>
>
>     <name>io.sort.mb</name>
>
>     <value>256</value>
>
>   </property>
>
>   <property>
>
>     <name>mapred.child.java.opts</name>
>
>     <value> -Xmx1073741824</value>
>
>   </property>
>
>   <property>
>
>     <name>mapred.job.reuse.jvm.num.tasks</name>
>
>     <value>1</value>
>
>   </property>
>
>   <property>
>
>     <name>mapred.map.tasks.speculative.execution</name>
>
>     <value>false</value>
>
>   </property>
>
>   <property>
>
>     <name>mapred.reduce.tasks.speculative.execution</name>
>
>     <value>false</value>
>
>   </property>
>
>   <property>
>
>     <name>mapred.reduce.slowstart.completed.maps</name>
>
>     <value>1.0</value>
>
>   </property>
>
>   <property>
>
>     <name>mapreduce.jobtracker.kerberos.principal</name>
>
>     <value>mapred/[email protected]</value>
>
>   </property>
>
>   <property>
>
>     <name>mapreduce.jobtracker.kerberos.https.principal</name>
>
>     <value>host/[email protected]</value>
>
>   </property>
>
> </configuration>
>
>
>
>
>
> ***************
>
>
>
> ________________________________
> This message, and any attachments, is for the intended recipient(s) 
> only, may contain information that is privileged, confidential and/or 
> proprietary and subject to important terms and conditions available at 
> http://www.bankofamerica.com/emaildisclaimer. If you are not the 
> intended recipient, please delete this message.



--
Harsh J

----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may 
contain information that is privileged, confidential and/or proprietary and 
subject to important terms and conditions available at 
http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended 
recipient, please delete this message.

RE: submitting a mapreduce job to remote cluster

Reply via email to