Yes that was very helpful… ☺

Here are a few more I found on my quest to get HBase working with Spark –

This one details about Hbase dependencies and spark classpaths

http://www.abcn.net/2014/07/lighting-spark-with-hbase-full-edition.html

This one has a code overview –

http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html
http://www.vidyasource.com/blog/Programming/Scala/Java/Data/Hadoop/Analytics/2014/01/25/lighting-a-spark-with-hbase

All of them were very helpful…



From: Nicholas Chammas [mailto:nicholas.cham...@gmail.com]
Sent: Tuesday, September 16, 2014 10:30 AM
To: Jacob, Abraham (Financial&Risk)
Cc: tq00...@gmail.com; user
Subject: Re: HBase and non-existent TableInputFormat

Btw, there are some examples in the Spark GitHub repo that you may find 
helpful. Here's 
one<https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala>
 related to HBase.

On Tue, Sep 16, 2014 at 1:22 PM, 
<abraham.ja...@thomsonreuters.com<mailto:abraham.ja...@thomsonreuters.com>> 
wrote:
Hi,

I had a similar situation in which I needed to read data from HBase and work 
with the data inside of a spark context. After much goooogling, I finally got 
mine to work. There are a bunch of steps that you need to do get this working –

The problem is that the spark context does not know anything about hbase, so 
you have to provide all the information about hbase classes to both the driver 
code and executor code…


SparkConf sconf = new SparkConf().setAppName(“App").setMaster("local");
JavaSparkContext sc = new JavaSparkContext(sconf);

sparkConf.set("spark.executor.extraClassPath", "$(hbase classpath)");  
//<======= you will need to add this to tell the executor about the classpath 
for HBase.


Configuration conf = HBaseConfiguration.create();
conf.set(TableInputFormat.INPUT_TABLE, "Article");


JavaPairRDD<ImmutableBytesWritable, Result> hBaseRDD = sc.newAPIHadoopRDD(conf, 
TableInputFormat.class,org.apache.hadoop.hbase.io.ImmutableBytesWritable.class,
org.apache.hadoop.hbase.client.Result.class);


The when you submit the spark job –


spark-submit --driver-class-path $(hbase classpath) --jars 
/usr/lib/hbase/hbase-server.jar,/usr/lib/hbase/hbase-client.jar,/usr/lib/hbase/hbase-common.jar,/usr/lib/hbase/hbase-protocol.jar,/usr/lib/hbase/lib/protobuf-java-2.5.0.jar,/usr/lib/hbase/lib/htrace-core.jar
 --class YourClassName --master local App.jar


Try this and see if it works for you.


From: Y. Dong [mailto:tq00...@gmail.com<mailto:tq00...@gmail.com>]
Sent: Tuesday, September 16, 2014 8:18 AM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: HBase and non-existent TableInputFormat

Hello,

I’m currently using spark-core 1.1 and hbase 0.98.5 and I want to simply read 
from hbase. The Java code is attached. However the problem is TableInputFormat 
does not even exist in hbase-client API, is there any other way I can read from
hbase? Thanks

SparkConf sconf = new SparkConf().setAppName(“App").setMaster("local");
JavaSparkContext sc = new JavaSparkContext(sconf);


Configuration conf = HBaseConfiguration.create();
conf.set(TableInputFormat.INPUT_TABLE, "Article");


JavaPairRDD<ImmutableBytesWritable, Result> hBaseRDD = sc.newAPIHadoopRDD(conf, 
TableInputFormat.class,org.apache.hadoop.hbase.io.ImmutableBytesWritable.class,
org.apache.hadoop.hbase.client.Result.class);




Reply via email to