Thanks Ted, So basically involves Java programming much like JDBC connection retrieval etc.
Writing to Hbase is pretty fast. Now I have both views in Phoenix and Hive on the underlying Hbase tables. I am looking for flexibility here so I get I should use Spark on Hive tables with a view on Hbase table. Also I like tools like Zeppelin that work with both SQL and Spark Functional programming. Sounds like reading data from Hbase table is best done through some form of SQL. What are view on this approach? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 10 October 2016 at 22:13, Ted Yu <[email protected]> wrote: > For org.apache.hadoop.hbase.client.Result, there is this method: > > public byte[] getValue(byte [] family, byte [] qualifier) { > > which allows you to retrieve value for designated column. > > > FYI > > On Mon, Oct 10, 2016 at 2:08 PM, Mich Talebzadeh < > [email protected]> > wrote: > > > Hi, > > > > I am trying to do some operation on an Hbase table that is being > populated > > by Spark Streaming. > > > > Now this is just Spark on Hbase as opposed to Spark on Hive -> view on > > Hbase etc. I also have Phoenix view on this Hbase table. > > > > This is sample code > > > > scala> val tableName = "marketDataHbase" > > > val conf = HBaseConfiguration.create() > > conf: org.apache.hadoop.conf.Configuration = Configuration: > > core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, > > yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, > > hbase-default.xml, hbase-site.xml > > scala> conf.set(TableInputFormat.INPUT_TABLE, tableName) > > scala> //create rdd > > scala> > > *val hBaseRDD = sc.newAPIHadoopRDD(conf, > > classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io > > <http://hbase.io>.ImmutableBytesWritable],classOf[org.apache.hadoop. > > hbase.client.Result])*hBaseRDD: > > org.apache.spark.rdd.RDD[(org.apache.hadoop.hbase.io. > > ImmutableBytesWritable, > > org.apache.hadoop.hbase.client.Result)] = NewHadoopRDD[4] at > > newAPIHadoopRDD at <console>:64 > > scala> hBaseRDD.count > > res11: Long = 22272 > > > > scala> // transform (ImmutableBytesWritable, Result) tuples into an > RDD > > of Result's > > scala> val resultRDD = hBaseRDD.map(tuple => tuple._2) > > resultRDD: org.apache.spark.rdd.RDD[org.apache.hadoop.hbase.client. > Result] > > = MapPartitionsRDD[8] at map at <console>:41 > > > > scala> // transform into an RDD of (RowKey, ColumnValue)s the RowKey > has > > the time removed > > > > scala> val keyValueRDD = resultRDD.map(result => > > (Bytes.toString(result.getRow()).split(" ")(0), > > Bytes.toString(result.value))) > > keyValueRDD: org.apache.spark.rdd.RDD[(String, String)] = > > MapPartitionsRDD[9] at map at <console>:43 > > > > scala> keyValueRDD.take(2).foreach(kv => println(kv)) > > (000055e2-63f1-4def-b625-e73f0ac36271,43.89760813529593664528) > > (000151e9-ff27-493d-a5ca-288507d92f95,57.68882040742382868990) > > > > OK above I am only getting the rowkey (UUID above) and the last > > attribute (price). > > However, I have the rowkey and 3 more columns there in Hbase table! > > > > scan 'marketDataHbase', "LIMIT" => 1 > > ROW COLUMN+CELL > > 000055e2-63f1-4def-b625-e73f0ac36271 > > column=price_info:price, timestamp=1476133232864, > > value=43.89760813529593664528 > > 000055e2-63f1-4def-b625-e73f0ac36271 > > column=price_info:ticker, timestamp=1476133232864, value=S08 > > 000055e2-63f1-4def-b625-e73f0ac36271 > > column=price_info:timecreated, timestamp=1476133232864, > > value=2016-10-10T17:12:22 > > 1 row(s) in 0.0100 seconds > > So how can I get the other columns? > > > > Thanks > > > > > > Dr Mich Talebzadeh > > > > > > > > LinkedIn * https://www.linkedin.com/profile/view?id= > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd > > OABUrV8Pw>* > > > > > > > > http://talebzadehmich.wordpress.com > > > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > > loss, damage or destruction of data or any other property which may arise > > from relying on this email's technical content is explicitly disclaimed. > > The author will in no case be liable for any monetary damages arising > from > > such loss, damage or destruction. > > >
