hi,
Due to the fact we are not on Hbase 2.00 we are using SparkOnHbase.
Dependency:
<dependency>
<groupId>com.cloudera</groupId>
<artifactId>spark-hbase</artifactId>
<version>0.0.2-clabs</version>
</dependency>
It is quite a small snippet of code. For a general scan using a start and
stop time as the scan time range.
val conf = new SparkConf().
set("spark.shuffle.consolidateFiles", "true").
set("spark.kryo.registrationRequired", "false").
set("spark.serializer", "org.apache.spark.serializer.KryoSerializer").
set("spark.kryoserializer.buffer", "30m").
set("spark.shuffle.spill", "true").
set("spark.shuffle.memoryFraction", "0.4")
val sc = new SparkContext(conf)
val scan = new Scan()
scan.addColumn(columnName, "column1")
scan.setTimeRange(scanRowStartTs, scanRowStopTs)
hc.hbaseRDD(inputTableName,scan,filter)
To run just use the following:
spark-submit --class ClassName --master yarn-client --driver-memory
2000M --executor-memory 5G --keytab <location of keytab> --principal
<location of principal>
That should work in a general way. Obviously you can utilise other scan /
put / gets etc methods.
Thanks,
Nkechi
On 9 August 2016 at 15:20, Aneela Saleem <[email protected]> wrote:
> Thanks Nkechi,
>
> Can you please direct me to some code snippet with hbase on spark module?
> I've been trying that for last few days but did not found a workaround.
>
>
>
> On Tue, Aug 9, 2016 at 6:13 PM, Nkechi Achara <[email protected]>
> wrote:
>
> > Hey,
> >
> > Have you tried hbase on spark module, or the spark-hbase module to
> connect?
> > The principal and keytab options should work out of the box for
> kerberized
> > access. I can attempt your code if you don't have the ability to use
> those
> > modules.
> >
> > Thanks
> > K
> >
> > On 9 Aug 2016 2:25 p.m., "Aneela Saleem" <[email protected]> wrote:
> >
> > > Hi all,
> > >
> > > I'm trying to connect to Hbase with security enabled using spark job. I
> > > have kinit'd from command line. When i run the following job i.e.,
> > >
> > > /usr/local/spark-2/bin/spark-submit --keytab
> > /etc/hadoop/conf/spark.keytab
> > > --principal spark/hadoop-master@platalyticsrealm --class
> > > com.platalytics.example.spark.App --master yarn --driver-class-path
> > > /root/hbase-1.2.2/conf /home/vm6/project-1-jar-with-dependencies.jar
> > >
> > > I get the error:
> > >
> > > 2016-08-07 20:43:57,617 WARN
> > > [hconnection-0x24b5fa45-metaLookup-shared--pool2-t1]
> > > ipc.RpcClientImpl: Exception encountered while connecting to the
> server :
> > > javax.security.sasl.SaslException: GSS initiate failed [Caused by
> > > GSSException: No valid credentials provided (Mechanism level: Failed to
> > > find any Kerberos tgt)] 2016-08-07 20:43:57,619 ERROR
> > > [hconnection-0x24b5fa45-metaLookup-shared--pool2-t1]
> ipc.RpcClientImpl:
> > > SASL authentication failed. The most likely cause is missing or invalid
> > > credentials. Consider 'kinit'. javax.security.sasl.SaslException: GSS
> > > initiate failed [Caused by GSSException: No valid credentials provided
> > > (Mechanism level: Failed to find any Kerberos tgt)] at
> > > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(
> > > GssKrb5Client.java:212)
> > > at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(
> > > HBaseSaslRpcClient.java:179)
> > > at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.
> > > setupSaslConnection(RpcClientImpl.java:617)
> > > at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.
> > > access$700(RpcClientImpl.java:162) at org.apache.hadoop.hbase.ipc.
> > > RpcClientImpl$Connection$2.run(RpcClientImpl.java:743)
> > >
> > > Following is my code:
> > >
> > > System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
> > > System.setProperty("java.security.auth.login.config",
> > > "/etc/hbase/conf/zk-jaas.conf");
> > >
> > > val hconf = HBaseConfiguration.create()
> > > val tableName = "emp"
> > > hconf.set("hbase.zookeeper.quorum", "hadoop-master")
> > > hconf.set(TableInputFormat.INPUT_TABLE, tableName)
> > > hconf.set("hbase.zookeeper.property.clientPort", "2181")
> > > hconf.set("hadoop.security.authentication", "kerberos")
> > > hconf.set("hbase.security.authentication", "kerberos")
> > > hconf.addResource(new Path("/etc/hbase/conf/core-site.xml"))
> > > hconf.addResource(new Path("/etc/hbase/conf/hbase-site.xml"))
> > > UserGroupInformation.setConfiguration(hconf)
> > > val keyTab = "/etc/hadoop/conf/spark.keytab"
> > > val ugi = UserGroupInformation.loginUserFromKeytabAndReturnUG
> > > I("spark/hadoop-master@platalyticsrealm", keyTab)
> > > UserGroupInformation.setLoginUser(ugi)
> > > ugi.doAs(new PrivilegedExceptionAction[Void]() {
> > > override def run(): Void = {
> > > val conf = new SparkConf
> > > val sc = new SparkContext(conf)
> > > sc.addFile(keyTab)
> > > var hBaseRDD = sc.newAPIHadoopRDD(hconf, classOf[TableInputFormat],
> > > classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
> > > classOf[org.apache.hadoop.hbase.client.Result])
> > > println("Number of Records found : " + hBaseRDD.count())
> > > hBaseRDD.foreach(x => {
> > > println(new String(x._2.getRow()))
> > > })
> > > sc.stop()
> > > return null
> > > }
> > > })
> > >
> > > Please have a look. And help me try finding the issue.
> > >
> > > Thanks
> > >
> >
>