Re: reading Hbase table in Spark

Mich Talebzadeh Mon, 10 Oct 2016 15:14:26 -0700

I have already done it with Hive and Phoenix thanks

Dr Mich Talebzadeh




LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 10 October 2016 at 22:58, Ted Yu <[email protected]> wrote:

> In that case I suggest polling user@hive to see if someone has done this.
>
> Thanks
>
> On Mon, Oct 10, 2016 at 2:56 PM, Mich Talebzadeh <
> [email protected]>
> wrote:
>
> > Thanks I am on Spark 2 so may not be feasible.
> >
> > As a mater of interest how about using Hive on top of Hbase table?
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 10 October 2016 at 22:49, Ted Yu <[email protected]> wrote:
> >
> > > In hbase master branch, there is hbase-spark module which would allow
> you
> > > to integrate with Spark seamlessly.
> > >
> > > Note: support for Spark 2.0 is pending. For details, see HBASE-16179
> > >
> > > Cheers
> > >
> > > On Mon, Oct 10, 2016 at 2:46 PM, Mich Talebzadeh <
> > > [email protected]>
> > > wrote:
> > >
> > > > Thanks Ted,
> > > >
> > > > So basically involves Java programming much like JDBC connection
> > > retrieval
> > > > etc.
> > > >
> > > > Writing to Hbase is pretty fast. Now I have both views in Phoenix and
> > > Hive
> > > > on the underlying Hbase tables.
> > > >
> > > > I am looking for flexibility here so I get I should use Spark on Hive
> > > > tables with a view on Hbase table.
> > > >
> > > > Also I like tools like Zeppelin that work with both SQL and Spark
> > > > Functional programming.
> > > >
> > > > Sounds like reading data from Hbase table is best done through some
> > form
> > > of
> > > > SQL.
> > > >
> > > > What are view on this approach?
> > > >
> > > >
> > > >
> > > > Dr Mich Talebzadeh
> > > >
> > > >
> > > >
> > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > > <https://www.linkedin.com/profile/view?id=
> > AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> > > > OABUrV8Pw>*
> > > >
> > > >
> > > >
> > > > http://talebzadehmich.wordpress.com
> > > >
> > > >
> > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> > any
> > > > loss, damage or destruction of data or any other property which may
> > arise
> > > > from relying on this email's technical content is explicitly
> > disclaimed.
> > > > The author will in no case be liable for any monetary damages arising
> > > from
> > > > such loss, damage or destruction.
> > > >
> > > >
> > > >
> > > > On 10 October 2016 at 22:13, Ted Yu <[email protected]> wrote:
> > > >
> > > > > For org.apache.hadoop.hbase.client.Result, there is this method:
> > > > >
> > > > >   public byte[] getValue(byte [] family, byte [] qualifier) {
> > > > >
> > > > > which allows you to retrieve value for designated column.
> > > > >
> > > > >
> > > > > FYI
> > > > >
> > > > > On Mon, Oct 10, 2016 at 2:08 PM, Mich Talebzadeh <
> > > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am trying to do some operation on an Hbase table that is being
> > > > > populated
> > > > > > by Spark Streaming.
> > > > > >
> > > > > > Now this is just Spark on Hbase as opposed to Spark on Hive ->
> view
> > > on
> > > > > > Hbase etc. I also have Phoenix view on this Hbase table.
> > > > > >
> > > > > > This is sample code
> > > > > >
> > > > > > scala>     val tableName = "marketDataHbase"
> > > > > > >     val conf = HBaseConfiguration.create()
> > > > > > conf: org.apache.hadoop.conf.Configuration = Configuration:
> > > > > > core-default.xml, core-site.xml, mapred-default.xml,
> > mapred-site.xml,
> > > > > > yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
> > > > > > hbase-default.xml, hbase-site.xml
> > > > > > scala>     conf.set(TableInputFormat.INPUT_TABLE, tableName)
> > > > > > scala>         //create rdd
> > > > > > scala>
> > > > > > *val hBaseRDD = sc.newAPIHadoopRDD(conf,
> > > > > > classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io
> > > > > > <http://hbase.io>.ImmutableBytesWritable],
> > classOf[org.apache.hadoop.
> > > > > > hbase.client.Result])*hBaseRDD:
> > > > > > org.apache.spark.rdd.RDD[(org.apache.hadoop.hbase.io.
> > > > > > ImmutableBytesWritable,
> > > > > > org.apache.hadoop.hbase.client.Result)] = NewHadoopRDD[4] at
> > > > > > newAPIHadoopRDD at <console>:64
> > > > > > scala> hBaseRDD.count
> > > > > > res11: Long = 22272
> > > > > >
> > > > > > scala>     // transform (ImmutableBytesWritable, Result) tuples
> > into
> > > an
> > > > > RDD
> > > > > > of Result's
> > > > > > scala> val resultRDD = hBaseRDD.map(tuple => tuple._2)
> > > > > > resultRDD: org.apache.spark.rdd.RDD[org.
> > apache.hadoop.hbase.client.
> > > > > Result]
> > > > > > = MapPartitionsRDD[8] at map at <console>:41
> > > > > >
> > > > > > scala>  // transform into an RDD of (RowKey, ColumnValue)s  the
> > > RowKey
> > > > > has
> > > > > > the time removed
> > > > > >
> > > > > > scala> val keyValueRDD = resultRDD.map(result =>
> > > > > > (Bytes.toString(result.getRow()).split(" ")(0),
> > > > > > Bytes.toString(result.value)))
> > > > > > keyValueRDD: org.apache.spark.rdd.RDD[(String, String)] =
> > > > > > MapPartitionsRDD[9] at map at <console>:43
> > > > > >
> > > > > > scala> keyValueRDD.take(2).foreach(kv => println(kv))
> > > > > > (000055e2-63f1-4def-b625-e73f0ac36271,43.89760813529593664528)
> > > > > > (000151e9-ff27-493d-a5ca-288507d92f95,57.68882040742382868990)
> > > > > >
> > > > > > OK above I am only getting the rowkey (UUID above) and the last
> > > > > > attribute (price).
> > > > > > However, I have the rowkey and 3 more columns there in Hbase
> table!
> > > > > >
> > > > > > scan 'marketDataHbase', "LIMIT" => 1
> > > > > > ROW                                                   COLUMN+CELL
> > > > > >  000055e2-63f1-4def-b625-e73f0ac36271
> > > > > > column=price_info:price, timestamp=1476133232864,
> > > > > > value=43.89760813529593664528
> > > > > >  000055e2-63f1-4def-b625-e73f0ac36271
> > > > > > column=price_info:ticker, timestamp=1476133232864, value=S08
> > > > > >  000055e2-63f1-4def-b625-e73f0ac36271
> > > > > > column=price_info:timecreated, timestamp=1476133232864,
> > > > > > value=2016-10-10T17:12:22
> > > > > > 1 row(s) in 0.0100 seconds
> > > > > > So how can I get the other columns?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > > Dr Mich Talebzadeh
> > > > > >
> > > > > >
> > > > > >
> > > > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > > > > <https://www.linkedin.com/profile/view?id=
> > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> > > > > > OABUrV8Pw>*
> > > > > >
> > > > > >
> > > > > >
> > > > > > http://talebzadehmich.wordpress.com
> > > > > >
> > > > > >
> > > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility
> > for
> > > > any
> > > > > > loss, damage or destruction of data or any other property which
> may
> > > > arise
> > > > > > from relying on this email's technical content is explicitly
> > > > disclaimed.
> > > > > > The author will in no case be liable for any monetary damages
> > arising
> > > > > from
> > > > > > such loss, damage or destruction.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: reading Hbase table in Spark

Reply via email to