Hi there
I am using SparkSQL to read from hbase, however
1. I find some API not available in my dependencies. Where to add them:org.apache.hadoop.hbase.spark.example.hbasecontext org.apache.spark.sql.datasources.hbase.HBaseTableCatalog org.apache.hadoop.hbase.spark.datasources.HBaseSparkConf 2. Is there a complete example code about how to use SparkSQL read/write from hbase? The document I refered is this: http://hbase.apache.org/book.html#_sparksql_dataframes. It seems that this is a snapshot for 2.0, while I am using hbase 1.2.1 + spark1.6.1 + hadoop2.7.1. In my App, I want to load the entire hbase table into sparksql My code: import org.apache.spark._ import org.apache.hadoop.hbase._ import org.apache.hadoop.hbase.HBaseConfiguration import org.apache.hadoop.hbase.spark.example.hbasecontext import org.apache.spark.sql.datasources.hbase.HBaseTableCatalog import org.apache.hadoop.hbase.spark.datasources.HBaseSparkConf object HbaseConnector { def main(args: Array[String]) { val tableName = args(0) val sparkMasterUrlDev = "spark:// hadoopmaster:7077" val sparkMasterUrlLocal = "local[2]" val sparkConf = new SparkConf().setAppName("HbaseConnector for table " + tableName).setMaster(sparkMasterUrlDev).set("spark.executor.memory", "10g") val sc = new SparkContext(sparkConf) val sqlContext = new org.apache.spark.sql.SQLContext(sc) val conf = new HBaseConfiguration() conf.set("hbase.zookeeper.quorum", "z1,z2,z3") conf.set("hbase.zookeeper.property.clientPort", "2181") conf.set("hbase.rootdir", "hdfs://hadoopmaster:8020/hbase") // val hbaseContext = new HBaseContext(sc, conf) val pv = sqlContext.read.options(Map(HBaseTableCatalog.tableCatalog -> writeCatalog, HBaseSparkConf.TIMESTAMP -> tsSpecified.toString)) .format("org.apache.hadoop.hbase.spark") .load() pv.write.saveAsTable(tableName) } } My POM file is attached as well. Thanks for a help. San.Luo
pom.xml
Description: pom.xml
