Re: Access_Remote_Kerberized_Cluster_Through_Spark
Hi Ajay, I was able to resolve it by adding yarn user principal. here is complete code. def main(args: Array[String]) { // create Spark context with Spark configuration val cmdLine = Parse.commandLine(args) val configFile = cmdLine.getOptionValue("c") val propertyConfiguration = new PropertyConfiguration() val props = propertyConfiguration.get(configFile) // val fs=com.yourcompany.telematics.fs.FileSystem.getHdfsFileSystem(props); val fs=com.yourcompany.telematics.fs.FileSystem getHdfsFileSystem props; var sparkConfig = propertyConfiguration.initConfiguration() val sc = new SparkContext(sparkConfig) val configuration:org.apache.hadoop.conf.Configuration=sc.hadoopConfiguration java.lang.System.setProperty("javax.security.auth.useSubjectCredsOnly","true"); java.lang.System.setProperty("java.security.krb5.conf", "C:\\krb5.conf"); System.setProperty("sun.security.krb5.debug", "true") configuration.set("hadoop.security.authentication", "Kerberos"); configuration.set("hdfs.namenode.kerberos.principal","hdfs/_ h...@ad.yourcompany.com") configuration.set("hdfs.datanode.kerberos.principal.pattern","hdfs/*@ AD.yourcompany.COM") configuration.set("hdfs.master.kerberos.principal","hdfs/*@ AD.yourcompany.COM") configuration.set("yarn.nodemanager.principal","yarn/*@ AD.yourcompany.COM") configuration.set("yarn.resourcemanager.principal","yarn/*@ AD.yourcompany.COM") val hadoopConf="C:\\devtools\\hadoop\\hadoop-2.2.0\\hadoop-2.2.0\\conf" configuration.addResource(new Path(hadoopConf + "core-site.xml")); configuration.addResource(new Path(hadoopConf + "hdfs-site.xml")); configuration.addResource(new Path(hadoopConf + "mapred-site.xml")); configuration.addResource(new Path(hadoopConf + "yarn-site.xml")); configuration.addResource(new Path(hadoopConf + "hadoop-policy.xml")); configuration.set("hadoop.security.authentication", "kerberos"); UserGroupInformation.setConfiguration(configuration); UserGroupInformation.loginUserFromKeytab("va_d...@ad.yourcompany.com", "C:\\va_dflt.keytab"); // get threshold // val threshold = args(1).toInt // read in text file and split each document into words val lineRdd = sc.textFile("hdfs://XX:8020/user/yyy1k78/vehscanxmltext") val tokenized=lineRdd.flatMap(_.split(" ")) //System.out.println(tokenized.collect().mkString(", ")) // count the occurrence of each word val wordCounts = tokenized.map((_, 1)).reduceByKey(_ + _) // filter out words with fewer than threshold occurrences //val filtered = wordCounts.filter(_._2 >= threshold) // count characters //val charCounts = filtered.flatMap(_._1.toCharArray).map((_, 1)).reduceByKey(_ + _) System.out.println(wordCounts.collect().mkString(", ")) } } Thanks, Asmath. On Wed, Nov 9, 2016 at 7:44 PM, Ajay Chanderwrote: > Hi Everyone, > > I am still trying to figure this one out. I am stuck with this error > "java.io.IOException: > Can't get Master Kerberos principal for use as renewer ". Below is my code. > Can any of you please provide any insights on this? Thanks for your time. > > > import java.io.{BufferedInputStream, File, FileInputStream} > import java.net.URI > > import org.apache.hadoop.fs.FileSystem > import org.apache.hadoop.conf.Configuration > import org.apache.hadoop.fs.Path > import org.apache.hadoop.io.IOUtils > import org.apache.hadoop.security.UserGroupInformation > import org.apache.spark.deploy.SparkHadoopUtil > import org.apache.spark.{SparkConf, SparkContext} > > > object SparkHdfs { > > def main(args: Array[String]): Unit = { > > System.setProperty("java.security.krb5.conf", new > File("src\\main\\files\\krb5.conf").getAbsolutePath ) > System.setProperty("sun.security.krb5.debug", "true") > > val sparkConf = new SparkConf().setAppName("SparkHdfs").setMaster("local") > val sc = new SparkContext(sparkConf) > //Loading remote cluster configurations > sc.hadoopConfiguration.addResource(new > File("src\\main\\files\\core-site.xml").getAbsolutePath ) > sc.hadoopConfiguration.addResource(new > File("src\\main\\files\\hdfs-site.xml").getAbsolutePath ) > sc.hadoopConfiguration.addResource(new > File("src\\main\\files\\mapred-site.xml").getAbsolutePath ) > sc.hadoopConfiguration.addResource(new > File("src\\main\\files\\yarn-site.xml").getAbsolutePath ) > sc.hadoopConfiguration.addResource(new > File("src\\main\\files\\ssl-client.xml").getAbsolutePath ) > sc.hadoopConfiguration.addResource(new > File("src\\main\\files\\topology.map").getAbsolutePath ) > > val conf = new Configuration() > //Loading remote cluster configurations > conf.addResource(new Path(new > File("src\\main\\files\\core-site.xml").getAbsolutePath )) > conf.addResource(new Path(new > File("src\\main\\files\\hdfs-site.xml").getAbsolutePath )) > conf.addResource(new Path(new >
Re: Access_Remote_Kerberized_Cluster_Through_Spark
Hi Everyone, I am still trying to figure this one out. I am stuck with this error "java.io.IOException: Can't get Master Kerberos principal for use as renewer ". Below is my code. Can any of you please provide any insights on this? Thanks for your time. import java.io.{BufferedInputStream, File, FileInputStream} import java.net.URI import org.apache.hadoop.fs.FileSystem import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.Path import org.apache.hadoop.io.IOUtils import org.apache.hadoop.security.UserGroupInformation import org.apache.spark.deploy.SparkHadoopUtil import org.apache.spark.{SparkConf, SparkContext} object SparkHdfs { def main(args: Array[String]): Unit = { System.setProperty("java.security.krb5.conf", new File("src\\main\\files\\krb5.conf").getAbsolutePath ) System.setProperty("sun.security.krb5.debug", "true") val sparkConf = new SparkConf().setAppName("SparkHdfs").setMaster("local") val sc = new SparkContext(sparkConf) //Loading remote cluster configurations sc.hadoopConfiguration.addResource(new File("src\\main\\files\\core-site.xml").getAbsolutePath ) sc.hadoopConfiguration.addResource(new File("src\\main\\files\\hdfs-site.xml").getAbsolutePath ) sc.hadoopConfiguration.addResource(new File("src\\main\\files\\mapred-site.xml").getAbsolutePath ) sc.hadoopConfiguration.addResource(new File("src\\main\\files\\yarn-site.xml").getAbsolutePath ) sc.hadoopConfiguration.addResource(new File("src\\main\\files\\ssl-client.xml").getAbsolutePath ) sc.hadoopConfiguration.addResource(new File("src\\main\\files\\topology.map").getAbsolutePath ) val conf = new Configuration() //Loading remote cluster configurations conf.addResource(new Path(new File("src\\main\\files\\core-site.xml").getAbsolutePath )) conf.addResource(new Path(new File("src\\main\\files\\hdfs-site.xml").getAbsolutePath )) conf.addResource(new Path(new File("src\\main\\files\\mapred-site.xml").getAbsolutePath )) conf.addResource(new Path(new File("src\\main\\files\\yarn-site.xml").getAbsolutePath )) conf.addResource(new Path(new File("src\\main\\files\\ssl-client.xml").getAbsolutePath )) conf.addResource(new Path(new File("src\\main\\files\\topology.map").getAbsolutePath )) conf.set("hadoop.security.authentication", "Kerberos") UserGroupInformation.setConfiguration(conf) UserGroupInformation.loginUserFromKeytab("my...@internal.company.com", new File("src\\main\\files\\myusr.keytab").getAbsolutePath ) // SparkHadoopUtil.get.loginUserFromKeytab("tsad...@internal.imsglobal.com", // new File("src\\main\\files\\tsadusr.keytab").getAbsolutePath) //Getting this error: java.io.IOException: Can't get Master Kerberos principal for use as renewer sc.textFile("hdfs://vm1.comp.com:8020/user/myusr/temp/file1").collect().foreach(println) //Getting this error: java.io.IOException: Can't get Master Kerberos principal for use as renewer } } On Mon, Nov 7, 2016 at 9:42 PM, Ajay Chanderwrote: > Did anyone use https://www.codatlas.com/github.com/apache/spark/HEAD/ > core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala to > interact with secured Hadoop from Spark ? > > Thanks, > Ajay > > On Mon, Nov 7, 2016 at 4:37 PM, Ajay Chander wrote: > >> >> Hi Everyone, >> >> I am trying to develop a simple codebase on my machine to read data from >> secured Hadoop cluster. We have a development cluster which is secured >> through Kerberos and I want to run a Spark job from my IntelliJ to read >> some sample data from the cluster. Has anyone done this before ? Can you >> point me to some sample examples? >> >> I understand that, if we want to talk to secured cluster, we need to have >> keytab and principle. I tried using it through >> UserGroupInformation.loginUserFromKeytab and >> SparkHadoopUtil.get.loginUserFromKeytab but so far no luck. >> >> I have been trying to do this from quite a while ago. Please let me know >> if you need more info. Thanks >> >> Regards, >> Ajay >> > >
Re: Access_Remote_Kerberized_Cluster_Through_Spark
Did anyone use https://www.codatlas.com/github.com/apache/spark/HEAD/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala to interact with secured Hadoop from Spark ? Thanks, Ajay On Mon, Nov 7, 2016 at 4:37 PM, Ajay Chanderwrote: > > Hi Everyone, > > I am trying to develop a simple codebase on my machine to read data from > secured Hadoop cluster. We have a development cluster which is secured > through Kerberos and I want to run a Spark job from my IntelliJ to read > some sample data from the cluster. Has anyone done this before ? Can you > point me to some sample examples? > > I understand that, if we want to talk to secured cluster, we need to have > keytab and principle. I tried using it through > UserGroupInformation.loginUserFromKeytab > and SparkHadoopUtil.get.loginUserFromKeytab but so far no luck. > > I have been trying to do this from quite a while ago. Please let me know > if you need more info. Thanks > > Regards, > Ajay >