RE: HBase and non-existent TableInputFormat
Yes that was very helpful… ☺ Here are a few more I found on my quest to get HBase working with Spark – This one details about Hbase dependencies and spark classpaths http://www.abcn.net/2014/07/lighting-spark-with-hbase-full-edition.html This one has a code overview – http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html http://www.vidyasource.com/blog/Programming/Scala/Java/Data/Hadoop/Analytics/2014/01/25/lighting-a-spark-with-hbase All of them were very helpful… From: Nicholas Chammas [mailto:nicholas.cham...@gmail.com] Sent: Tuesday, September 16, 2014 10:30 AM To: Jacob, Abraham (Financial&Risk) Cc: tq00...@gmail.com; user Subject: Re: HBase and non-existent TableInputFormat Btw, there are some examples in the Spark GitHub repo that you may find helpful. Here's one<https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala> related to HBase. On Tue, Sep 16, 2014 at 1:22 PM, mailto:abraham.ja...@thomsonreuters.com>> wrote: Hi, I had a similar situation in which I needed to read data from HBase and work with the data inside of a spark context. After much ggling, I finally got mine to work. There are a bunch of steps that you need to do get this working – The problem is that the spark context does not know anything about hbase, so you have to provide all the information about hbase classes to both the driver code and executor code… SparkConf sconf = new SparkConf().setAppName(“App").setMaster("local"); JavaSparkContext sc = new JavaSparkContext(sconf); sparkConf.set("spark.executor.extraClassPath", "$(hbase classpath)"); //<=== you will need to add this to tell the executor about the classpath for HBase. Configuration conf = HBaseConfiguration.create(); conf.set(TableInputFormat.INPUT_TABLE, "Article"); JavaPairRDD hBaseRDD = sc.newAPIHadoopRDD(conf, TableInputFormat.class,org.apache.hadoop.hbase.io.ImmutableBytesWritable.class, org.apache.hadoop.hbase.client.Result.class); The when you submit the spark job – spark-submit --driver-class-path $(hbase classpath) --jars /usr/lib/hbase/hbase-server.jar,/usr/lib/hbase/hbase-client.jar,/usr/lib/hbase/hbase-common.jar,/usr/lib/hbase/hbase-protocol.jar,/usr/lib/hbase/lib/protobuf-java-2.5.0.jar,/usr/lib/hbase/lib/htrace-core.jar --class YourClassName --master local App.jar Try this and see if it works for you. From: Y. Dong [mailto:tq00...@gmail.com<mailto:tq00...@gmail.com>] Sent: Tuesday, September 16, 2014 8:18 AM To: user@spark.apache.org<mailto:user@spark.apache.org> Subject: HBase and non-existent TableInputFormat Hello, I’m currently using spark-core 1.1 and hbase 0.98.5 and I want to simply read from hbase. The Java code is attached. However the problem is TableInputFormat does not even exist in hbase-client API, is there any other way I can read from hbase? Thanks SparkConf sconf = new SparkConf().setAppName(“App").setMaster("local"); JavaSparkContext sc = new JavaSparkContext(sconf); Configuration conf = HBaseConfiguration.create(); conf.set(TableInputFormat.INPUT_TABLE, "Article"); JavaPairRDD hBaseRDD = sc.newAPIHadoopRDD(conf, TableInputFormat.class,org.apache.hadoop.hbase.io.ImmutableBytesWritable.class, org.apache.hadoop.hbase.client.Result.class);
Re: HBase and non-existent TableInputFormat
Btw, there are some examples in the Spark GitHub repo that you may find helpful. Here's one <https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala> related to HBase. On Tue, Sep 16, 2014 at 1:22 PM, wrote: > *Hi, * > > > > *I had a similar situation in which I needed to read data from HBase and > work with the data inside of a spark context. After much ggling, I > finally got mine to work. There are a bunch of steps that you need to do > get this working – * > > > > *The problem is that the spark context does not know anything about hbase, > so you have to provide all the information about hbase classes to both the > driver code and executor code…* > > > > > > SparkConf sconf = *new* SparkConf().setAppName(“App").setMaster("local"); > > JavaSparkContext sc = *new* JavaSparkContext(sconf); > > > > sparkConf.set("spark.executor.extraClassPath", "$(hbase classpath)"); > //ç= > you will need to add this to tell the executor about the classpath for > HBase. > > > > Configuration conf = HBaseConfiguration.*create*(); > > conf.set(*TableInputFormat*.INPUT_TABLE, "Article"); > > > > JavaPairRDD hBaseRDD = sc. > *newAPIHadoopRDD*(conf, *TableInputFormat*.*class* > ,org.apache.hadoop.hbase.io.ImmutableBytesWritable.*class*, > > org.apache.hadoop.hbase.client.Result.*class*); > > > > > > *The when you submit the spark job – * > > > > > > *spark-submit --driver-class-path $(hbase classpath) --jars > /usr/lib/hbase/hbase-server.jar,/usr/lib/hbase/hbase-client.jar,/usr/lib/hbase/hbase-common.jar,/usr/lib/hbase/hbase-protocol.jar,/usr/lib/hbase/lib/protobuf-java-2.5.0.jar,/usr/lib/hbase/lib/htrace-core.jar > --class YourClassName --master local App.jar * > > > > > > Try this and see if it works for you. > > > > > > *From:* Y. Dong [mailto:tq00...@gmail.com] > *Sent:* Tuesday, September 16, 2014 8:18 AM > *To:* user@spark.apache.org > *Subject:* HBase and non-existent TableInputFormat > > > > Hello, > > > > I’m currently using spark-core 1.1 and hbase 0.98.5 and I want to simply > read from hbase. The Java code is attached. However the problem is > TableInputFormat does not even exist in hbase-client API, is there any > other way I can read from > > hbase? Thanks > > > > SparkConf sconf = *new* SparkConf().setAppName(“App").setMaster("local"); > > JavaSparkContext sc = *new* JavaSparkContext(sconf); > > > > Configuration conf = HBaseConfiguration.*create*(); > > conf.set(*TableInputFormat*.INPUT_TABLE, "Article"); > > > > JavaPairRDD hBaseRDD = sc. > *newAPIHadoopRDD*(conf, *TableInputFormat*.*class* > ,org.apache.hadoop.hbase.io.ImmutableBytesWritable.*class*, > > org.apache.hadoop.hbase.client.Result.*class*); > > > > > > >
RE: HBase and non-existent TableInputFormat
Hi, I had a similar situation in which I needed to read data from HBase and work with the data inside of a spark context. After much ggling, I finally got mine to work. There are a bunch of steps that you need to do get this working - The problem is that the spark context does not know anything about hbase, so you have to provide all the information about hbase classes to both the driver code and executor code... SparkConf sconf = new SparkConf().setAppName("App").setMaster("local"); JavaSparkContext sc = new JavaSparkContext(sconf); sparkConf.set("spark.executor.extraClassPath", "$(hbase classpath)"); //<=== you will need to add this to tell the executor about the classpath for HBase. Configuration conf = HBaseConfiguration.create(); conf.set(TableInputFormat.INPUT_TABLE, "Article"); JavaPairRDD hBaseRDD = sc.newAPIHadoopRDD(conf, TableInputFormat.class,org.apache.hadoop.hbase.io.ImmutableBytesWritable.class, org.apache.hadoop.hbase.client.Result.class); The when you submit the spark job - spark-submit --driver-class-path $(hbase classpath) --jars /usr/lib/hbase/hbase-server.jar,/usr/lib/hbase/hbase-client.jar,/usr/lib/hbase/hbase-common.jar,/usr/lib/hbase/hbase-protocol.jar,/usr/lib/hbase/lib/protobuf-java-2.5.0.jar,/usr/lib/hbase/lib/htrace-core.jar --class YourClassName --master local App.jar Try this and see if it works for you. From: Y. Dong [mailto:tq00...@gmail.com] Sent: Tuesday, September 16, 2014 8:18 AM To: user@spark.apache.org Subject: HBase and non-existent TableInputFormat Hello, I'm currently using spark-core 1.1 and hbase 0.98.5 and I want to simply read from hbase. The Java code is attached. However the problem is TableInputFormat does not even exist in hbase-client API, is there any other way I can read from hbase? Thanks SparkConf sconf = new SparkConf().setAppName("App").setMaster("local"); JavaSparkContext sc = new JavaSparkContext(sconf); Configuration conf = HBaseConfiguration.create(); conf.set(TableInputFormat.INPUT_TABLE, "Article"); JavaPairRDD hBaseRDD = sc.newAPIHadoopRDD(conf, TableInputFormat.class,org.apache.hadoop.hbase.io.ImmutableBytesWritable.class, org.apache.hadoop.hbase.client.Result.class);
Re: HBase and non-existent TableInputFormat
hbase-client module serves client facing APIs. hbase-server module is supposed to host classes used on server side. There is still some work to be done so that the above goal is achieved. On Tue, Sep 16, 2014 at 9:06 AM, Y. Dong wrote: > Thanks Ted. It is indeed in hbase-server. Just curious, what’s the > difference between hbase-client and hbase-server? > > On 16 Sep 2014, at 17:01, Ted Yu wrote: > > bq. TableInputFormat does not even exist in hbase-client API > > It is in hbase-server module. > > Take a look at http://hbase.apache.org/book.html#mapreduce.example.read > > On Tue, Sep 16, 2014 at 8:18 AM, Y. Dong wrote: > >> Hello, >> >> I’m currently using spark-core 1.1 and hbase 0.98.5 and I want to simply >> read from hbase. The Java code is attached. However the problem is >> TableInputFormat does not even exist in hbase-client API, is there any >> other way I can read from >> hbase? Thanks >> >> SparkConf sconf = *new* SparkConf().setAppName(“App").setMaster("local"); >> JavaSparkContext sc = *new* JavaSparkContext(sconf); >> >> Configuration conf = HBaseConfiguration.*create*(); >> conf.set(TableInputFormat.INPUT_TABLE, "Article"); >> >> JavaPairRDD hBaseRDD = >> sc.newAPIHadoopRDD(conf, >> TableInputFormat.*class* >> ,org.apache.hadoop.hbase.io.ImmutableBytesWritable.*class*, >> org.apache.hadoop.hbase.client.Result.*class*); >> >> >> >> > >
Re: HBase and non-existent TableInputFormat
bq. TableInputFormat does not even exist in hbase-client API It is in hbase-server module. Take a look at http://hbase.apache.org/book.html#mapreduce.example.read On Tue, Sep 16, 2014 at 8:18 AM, Y. Dong wrote: > Hello, > > I’m currently using spark-core 1.1 and hbase 0.98.5 and I want to simply > read from hbase. The Java code is attached. However the problem is > TableInputFormat does not even exist in hbase-client API, is there any > other way I can read from > hbase? Thanks > > SparkConf sconf = *new* SparkConf().setAppName(“App").setMaster("local"); > JavaSparkContext sc = *new* JavaSparkContext(sconf); > > > Configuration conf = HBaseConfiguration.*create*(); > conf.set(TableInputFormat.INPUT_TABLE, "Article"); > > > JavaPairRDD hBaseRDD = > sc.newAPIHadoopRDD(conf, > TableInputFormat.*class* > ,org.apache.hadoop.hbase.io.ImmutableBytesWritable.*class*, > org.apache.hadoop.hbase.client.Result.*class*); > > > >
HBase and non-existent TableInputFormat
Hello, I’m currently using spark-core 1.1 and hbase 0.98.5 and I want to simply read from hbase. The Java code is attached. However the problem is TableInputFormat does not even exist in hbase-client API, is there any other way I can read from hbase? Thanks SparkConf sconf = new SparkConf().setAppName(“App").setMaster("local"); JavaSparkContext sc = new JavaSparkContext(sconf); Configuration conf = HBaseConfiguration.create(); conf.set(TableInputFormat.INPUT_TABLE, "Article"); JavaPairRDD hBaseRDD = sc.newAPIHadoopRDD(conf, TableInputFormat.class,org.apache.hadoop.hbase.io.ImmutableBytesWritable.class, org.apache.hadoop.hbase.client.Result.class);