In that case I suggest polling user@hive to see if someone has done this. Thanks
On Mon, Oct 10, 2016 at 2:56 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Thanks I am on Spark 2 so may not be feasible. > > As a mater of interest how about using Hive on top of Hbase table? > > Dr Mich Talebzadeh > > > > LinkedIn * https://www.linkedin.com/profile/view?id= > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd > OABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 10 October 2016 at 22:49, Ted Yu <yuzhih...@gmail.com> wrote: > > > In hbase master branch, there is hbase-spark module which would allow you > > to integrate with Spark seamlessly. > > > > Note: support for Spark 2.0 is pending. For details, see HBASE-16179 > > > > Cheers > > > > On Mon, Oct 10, 2016 at 2:46 PM, Mich Talebzadeh < > > mich.talebza...@gmail.com> > > wrote: > > > > > Thanks Ted, > > > > > > So basically involves Java programming much like JDBC connection > > retrieval > > > etc. > > > > > > Writing to Hbase is pretty fast. Now I have both views in Phoenix and > > Hive > > > on the underlying Hbase tables. > > > > > > I am looking for flexibility here so I get I should use Spark on Hive > > > tables with a view on Hbase table. > > > > > > Also I like tools like Zeppelin that work with both SQL and Spark > > > Functional programming. > > > > > > Sounds like reading data from Hbase table is best done through some > form > > of > > > SQL. > > > > > > What are view on this approach? > > > > > > > > > > > > Dr Mich Talebzadeh > > > > > > > > > > > > LinkedIn * https://www.linkedin.com/profile/view?id= > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > <https://www.linkedin.com/profile/view?id= > AAEAAAAWh2gBxianrbJd6zP6AcPCCd > > > OABUrV8Pw>* > > > > > > > > > > > > http://talebzadehmich.wordpress.com > > > > > > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for > any > > > loss, damage or destruction of data or any other property which may > arise > > > from relying on this email's technical content is explicitly > disclaimed. > > > The author will in no case be liable for any monetary damages arising > > from > > > such loss, damage or destruction. > > > > > > > > > > > > On 10 October 2016 at 22:13, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > > > For org.apache.hadoop.hbase.client.Result, there is this method: > > > > > > > > public byte[] getValue(byte [] family, byte [] qualifier) { > > > > > > > > which allows you to retrieve value for designated column. > > > > > > > > > > > > FYI > > > > > > > > On Mon, Oct 10, 2016 at 2:08 PM, Mich Talebzadeh < > > > > mich.talebza...@gmail.com> > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > I am trying to do some operation on an Hbase table that is being > > > > populated > > > > > by Spark Streaming. > > > > > > > > > > Now this is just Spark on Hbase as opposed to Spark on Hive -> view > > on > > > > > Hbase etc. I also have Phoenix view on this Hbase table. > > > > > > > > > > This is sample code > > > > > > > > > > scala> val tableName = "marketDataHbase" > > > > > > val conf = HBaseConfiguration.create() > > > > > conf: org.apache.hadoop.conf.Configuration = Configuration: > > > > > core-default.xml, core-site.xml, mapred-default.xml, > mapred-site.xml, > > > > > yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, > > > > > hbase-default.xml, hbase-site.xml > > > > > scala> conf.set(TableInputFormat.INPUT_TABLE, tableName) > > > > > scala> //create rdd > > > > > scala> > > > > > *val hBaseRDD = sc.newAPIHadoopRDD(conf, > > > > > classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io > > > > > <http://hbase.io>.ImmutableBytesWritable], > classOf[org.apache.hadoop. > > > > > hbase.client.Result])*hBaseRDD: > > > > > org.apache.spark.rdd.RDD[(org.apache.hadoop.hbase.io. > > > > > ImmutableBytesWritable, > > > > > org.apache.hadoop.hbase.client.Result)] = NewHadoopRDD[4] at > > > > > newAPIHadoopRDD at <console>:64 > > > > > scala> hBaseRDD.count > > > > > res11: Long = 22272 > > > > > > > > > > scala> // transform (ImmutableBytesWritable, Result) tuples > into > > an > > > > RDD > > > > > of Result's > > > > > scala> val resultRDD = hBaseRDD.map(tuple => tuple._2) > > > > > resultRDD: org.apache.spark.rdd.RDD[org. > apache.hadoop.hbase.client. > > > > Result] > > > > > = MapPartitionsRDD[8] at map at <console>:41 > > > > > > > > > > scala> // transform into an RDD of (RowKey, ColumnValue)s the > > RowKey > > > > has > > > > > the time removed > > > > > > > > > > scala> val keyValueRDD = resultRDD.map(result => > > > > > (Bytes.toString(result.getRow()).split(" ")(0), > > > > > Bytes.toString(result.value))) > > > > > keyValueRDD: org.apache.spark.rdd.RDD[(String, String)] = > > > > > MapPartitionsRDD[9] at map at <console>:43 > > > > > > > > > > scala> keyValueRDD.take(2).foreach(kv => println(kv)) > > > > > (000055e2-63f1-4def-b625-e73f0ac36271,43.89760813529593664528) > > > > > (000151e9-ff27-493d-a5ca-288507d92f95,57.68882040742382868990) > > > > > > > > > > OK above I am only getting the rowkey (UUID above) and the last > > > > > attribute (price). > > > > > However, I have the rowkey and 3 more columns there in Hbase table! > > > > > > > > > > scan 'marketDataHbase', "LIMIT" => 1 > > > > > ROW COLUMN+CELL > > > > > 000055e2-63f1-4def-b625-e73f0ac36271 > > > > > column=price_info:price, timestamp=1476133232864, > > > > > value=43.89760813529593664528 > > > > > 000055e2-63f1-4def-b625-e73f0ac36271 > > > > > column=price_info:ticker, timestamp=1476133232864, value=S08 > > > > > 000055e2-63f1-4def-b625-e73f0ac36271 > > > > > column=price_info:timecreated, timestamp=1476133232864, > > > > > value=2016-10-10T17:12:22 > > > > > 1 row(s) in 0.0100 seconds > > > > > So how can I get the other columns? > > > > > > > > > > Thanks > > > > > > > > > > > > > > > Dr Mich Talebzadeh > > > > > > > > > > > > > > > > > > > > LinkedIn * https://www.linkedin.com/profile/view?id= > > > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > > > <https://www.linkedin.com/profile/view?id= > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCd > > > > > OABUrV8Pw>* > > > > > > > > > > > > > > > > > > > > http://talebzadehmich.wordpress.com > > > > > > > > > > > > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility > for > > > any > > > > > loss, damage or destruction of data or any other property which may > > > arise > > > > > from relying on this email's technical content is explicitly > > > disclaimed. > > > > > The author will in no case be liable for any monetary damages > arising > > > > from > > > > > such loss, damage or destruction. > > > > > > > > > > > > > > >