Re: reading Hbase table in Spark

2016-12-11 Thread Mich Talebzadeh
Hi Asher,

As mentioned before Spark 2 does not work with Phoenix. However, you can
use Spark 2 on top of Phoenix directly.

Does that answer your point?

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 8 December 2016 at 08:31, Asher  wrote:

> Hi
> Mich, can you describe the detail about used phoenix read/write hbase table
> in spark for RDD's process.
> thx
>
>
>
> --
> View this message in context: http://apache-hbase.679495.n3.
> nabble.com/reading-Hbase-table-in-Spark-tp4083260p4084996.html
> Sent from the HBase User mailing list archive at Nabble.com.
>


Re: reading Hbase table in Spark

2016-12-08 Thread Asher
Hi
Mich, can you describe the detail about used phoenix read/write hbase table
in spark for RDD's process.
thx



--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/reading-Hbase-table-in-Spark-tp4083260p4084996.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: reading Hbase table in Spark

2016-10-10 Thread Mich Talebzadeh
I have already done it with Hive and Phoenix thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 10 October 2016 at 22:58, Ted Yu  wrote:

> In that case I suggest polling user@hive to see if someone has done this.
>
> Thanks
>
> On Mon, Oct 10, 2016 at 2:56 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > Thanks I am on Spark 2 so may not be feasible.
> >
> > As a mater of interest how about using Hive on top of Hbase table?
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >  > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 10 October 2016 at 22:49, Ted Yu  wrote:
> >
> > > In hbase master branch, there is hbase-spark module which would allow
> you
> > > to integrate with Spark seamlessly.
> > >
> > > Note: support for Spark 2.0 is pending. For details, see HBASE-16179
> > >
> > > Cheers
> > >
> > > On Mon, Oct 10, 2016 at 2:46 PM, Mich Talebzadeh <
> > > mich.talebza...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Ted,
> > > >
> > > > So basically involves Java programming much like JDBC connection
> > > retrieval
> > > > etc.
> > > >
> > > > Writing to Hbase is pretty fast. Now I have both views in Phoenix and
> > > Hive
> > > > on the underlying Hbase tables.
> > > >
> > > > I am looking for flexibility here so I get I should use Spark on Hive
> > > > tables with a view on Hbase table.
> > > >
> > > > Also I like tools like Zeppelin that work with both SQL and Spark
> > > > Functional programming.
> > > >
> > > > Sounds like reading data from Hbase table is best done through some
> > form
> > > of
> > > > SQL.
> > > >
> > > > What are view on this approach?
> > > >
> > > >
> > > >
> > > > Dr Mich Talebzadeh
> > > >
> > > >
> > > >
> > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > >  > AAEWh2gBxianrbJd6zP6AcPCCd
> > > > OABUrV8Pw>*
> > > >
> > > >
> > > >
> > > > http://talebzadehmich.wordpress.com
> > > >
> > > >
> > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> > any
> > > > loss, damage or destruction of data or any other property which may
> > arise
> > > > from relying on this email's technical content is explicitly
> > disclaimed.
> > > > The author will in no case be liable for any monetary damages arising
> > > from
> > > > such loss, damage or destruction.
> > > >
> > > >
> > > >
> > > > On 10 October 2016 at 22:13, Ted Yu  wrote:
> > > >
> > > > > For org.apache.hadoop.hbase.client.Result, there is this method:
> > > > >
> > > > >   public byte[] getValue(byte [] family, byte [] qualifier) {
> > > > >
> > > > > which allows you to retrieve value for designated column.
> > > > >
> > > > >
> > > > > FYI
> > > > >
> > > > > On Mon, Oct 10, 2016 at 2:08 PM, Mich Talebzadeh <
> > > > > mich.talebza...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am trying to do some operation on an Hbase table that is being
> > > > > populated
> > > > > > by Spark Streaming.
> > > > > >
> > > > > > Now this is just Spark on Hbase as opposed to Spark on Hive ->
> view
> > > on
> > > > > > Hbase etc. I also have Phoenix view on this Hbase table.
> > > > > >
> > > > > > This is sample code
> > > > > >
> > > > > > scala> val tableName = "marketDataHbase"
> > > > > > > val conf = HBaseConfiguration.create()
> > > > > > conf: org.apache.hadoop.conf.Configuration = Configuration:
> > > > > > core-default.xml, core-site.xml, mapred-default.xml,
> > mapred-site.xml,
> > > > > > yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
> > > > > > hbase-default.xml, hbase-site.xml
> > > > > > scala> conf.set(TableInputFormat.INPUT_TABLE, tableName)
> > > > > > scala> //create rdd
> > > > > > scala>
> > > > > > *val hBaseRDD = sc.newAPIHadoopRDD(conf,
> > > > > > 

Re: reading Hbase table in Spark

2016-10-10 Thread Ted Yu
In that case I suggest polling user@hive to see if someone has done this.

Thanks

On Mon, Oct 10, 2016 at 2:56 PM, Mich Talebzadeh 
wrote:

> Thanks I am on Spark 2 so may not be feasible.
>
> As a mater of interest how about using Hive on top of Hbase table?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  OABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 10 October 2016 at 22:49, Ted Yu  wrote:
>
> > In hbase master branch, there is hbase-spark module which would allow you
> > to integrate with Spark seamlessly.
> >
> > Note: support for Spark 2.0 is pending. For details, see HBASE-16179
> >
> > Cheers
> >
> > On Mon, Oct 10, 2016 at 2:46 PM, Mich Talebzadeh <
> > mich.talebza...@gmail.com>
> > wrote:
> >
> > > Thanks Ted,
> > >
> > > So basically involves Java programming much like JDBC connection
> > retrieval
> > > etc.
> > >
> > > Writing to Hbase is pretty fast. Now I have both views in Phoenix and
> > Hive
> > > on the underlying Hbase tables.
> > >
> > > I am looking for flexibility here so I get I should use Spark on Hive
> > > tables with a view on Hbase table.
> > >
> > > Also I like tools like Zeppelin that work with both SQL and Spark
> > > Functional programming.
> > >
> > > Sounds like reading data from Hbase table is best done through some
> form
> > of
> > > SQL.
> > >
> > > What are view on this approach?
> > >
> > >
> > >
> > > Dr Mich Talebzadeh
> > >
> > >
> > >
> > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > >  AAEWh2gBxianrbJd6zP6AcPCCd
> > > OABUrV8Pw>*
> > >
> > >
> > >
> > > http://talebzadehmich.wordpress.com
> > >
> > >
> > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> > > loss, damage or destruction of data or any other property which may
> arise
> > > from relying on this email's technical content is explicitly
> disclaimed.
> > > The author will in no case be liable for any monetary damages arising
> > from
> > > such loss, damage or destruction.
> > >
> > >
> > >
> > > On 10 October 2016 at 22:13, Ted Yu  wrote:
> > >
> > > > For org.apache.hadoop.hbase.client.Result, there is this method:
> > > >
> > > >   public byte[] getValue(byte [] family, byte [] qualifier) {
> > > >
> > > > which allows you to retrieve value for designated column.
> > > >
> > > >
> > > > FYI
> > > >
> > > > On Mon, Oct 10, 2016 at 2:08 PM, Mich Talebzadeh <
> > > > mich.talebza...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am trying to do some operation on an Hbase table that is being
> > > > populated
> > > > > by Spark Streaming.
> > > > >
> > > > > Now this is just Spark on Hbase as opposed to Spark on Hive -> view
> > on
> > > > > Hbase etc. I also have Phoenix view on this Hbase table.
> > > > >
> > > > > This is sample code
> > > > >
> > > > > scala> val tableName = "marketDataHbase"
> > > > > > val conf = HBaseConfiguration.create()
> > > > > conf: org.apache.hadoop.conf.Configuration = Configuration:
> > > > > core-default.xml, core-site.xml, mapred-default.xml,
> mapred-site.xml,
> > > > > yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
> > > > > hbase-default.xml, hbase-site.xml
> > > > > scala> conf.set(TableInputFormat.INPUT_TABLE, tableName)
> > > > > scala> //create rdd
> > > > > scala>
> > > > > *val hBaseRDD = sc.newAPIHadoopRDD(conf,
> > > > > classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io
> > > > > .ImmutableBytesWritable],
> classOf[org.apache.hadoop.
> > > > > hbase.client.Result])*hBaseRDD:
> > > > > org.apache.spark.rdd.RDD[(org.apache.hadoop.hbase.io.
> > > > > ImmutableBytesWritable,
> > > > > org.apache.hadoop.hbase.client.Result)] = NewHadoopRDD[4] at
> > > > > newAPIHadoopRDD at :64
> > > > > scala> hBaseRDD.count
> > > > > res11: Long = 22272
> > > > >
> > > > > scala> // transform (ImmutableBytesWritable, Result) tuples
> into
> > an
> > > > RDD
> > > > > of Result's
> > > > > scala> val resultRDD = hBaseRDD.map(tuple => tuple._2)
> > > > > resultRDD: org.apache.spark.rdd.RDD[org.
> apache.hadoop.hbase.client.
> > > > Result]
> > > > > = MapPartitionsRDD[8] at map at :41
> > > > >
> > > > > scala>  // transform into an RDD of (RowKey, ColumnValue)s  the
> > RowKey
> > > > has
> > > > > the time removed
> > > > >
> > > > > scala> val keyValueRDD = 

Re: reading Hbase table in Spark

2016-10-10 Thread Mich Talebzadeh
Thanks I am on Spark 2 so may not be feasible.

As a mater of interest how about using Hive on top of Hbase table?

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 10 October 2016 at 22:49, Ted Yu  wrote:

> In hbase master branch, there is hbase-spark module which would allow you
> to integrate with Spark seamlessly.
>
> Note: support for Spark 2.0 is pending. For details, see HBASE-16179
>
> Cheers
>
> On Mon, Oct 10, 2016 at 2:46 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > Thanks Ted,
> >
> > So basically involves Java programming much like JDBC connection
> retrieval
> > etc.
> >
> > Writing to Hbase is pretty fast. Now I have both views in Phoenix and
> Hive
> > on the underlying Hbase tables.
> >
> > I am looking for flexibility here so I get I should use Spark on Hive
> > tables with a view on Hbase table.
> >
> > Also I like tools like Zeppelin that work with both SQL and Spark
> > Functional programming.
> >
> > Sounds like reading data from Hbase table is best done through some form
> of
> > SQL.
> >
> > What are view on this approach?
> >
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >  > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 10 October 2016 at 22:13, Ted Yu  wrote:
> >
> > > For org.apache.hadoop.hbase.client.Result, there is this method:
> > >
> > >   public byte[] getValue(byte [] family, byte [] qualifier) {
> > >
> > > which allows you to retrieve value for designated column.
> > >
> > >
> > > FYI
> > >
> > > On Mon, Oct 10, 2016 at 2:08 PM, Mich Talebzadeh <
> > > mich.talebza...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am trying to do some operation on an Hbase table that is being
> > > populated
> > > > by Spark Streaming.
> > > >
> > > > Now this is just Spark on Hbase as opposed to Spark on Hive -> view
> on
> > > > Hbase etc. I also have Phoenix view on this Hbase table.
> > > >
> > > > This is sample code
> > > >
> > > > scala> val tableName = "marketDataHbase"
> > > > > val conf = HBaseConfiguration.create()
> > > > conf: org.apache.hadoop.conf.Configuration = Configuration:
> > > > core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
> > > > yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
> > > > hbase-default.xml, hbase-site.xml
> > > > scala> conf.set(TableInputFormat.INPUT_TABLE, tableName)
> > > > scala> //create rdd
> > > > scala>
> > > > *val hBaseRDD = sc.newAPIHadoopRDD(conf,
> > > > classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io
> > > > .ImmutableBytesWritable],classOf[org.apache.hadoop.
> > > > hbase.client.Result])*hBaseRDD:
> > > > org.apache.spark.rdd.RDD[(org.apache.hadoop.hbase.io.
> > > > ImmutableBytesWritable,
> > > > org.apache.hadoop.hbase.client.Result)] = NewHadoopRDD[4] at
> > > > newAPIHadoopRDD at :64
> > > > scala> hBaseRDD.count
> > > > res11: Long = 22272
> > > >
> > > > scala> // transform (ImmutableBytesWritable, Result) tuples into
> an
> > > RDD
> > > > of Result's
> > > > scala> val resultRDD = hBaseRDD.map(tuple => tuple._2)
> > > > resultRDD: org.apache.spark.rdd.RDD[org.apache.hadoop.hbase.client.
> > > Result]
> > > > = MapPartitionsRDD[8] at map at :41
> > > >
> > > > scala>  // transform into an RDD of (RowKey, ColumnValue)s  the
> RowKey
> > > has
> > > > the time removed
> > > >
> > > > scala> val keyValueRDD = resultRDD.map(result =>
> > > > (Bytes.toString(result.getRow()).split(" ")(0),
> > > > Bytes.toString(result.value)))
> > > > keyValueRDD: org.apache.spark.rdd.RDD[(String, String)] =
> > > > MapPartitionsRDD[9] at map at :43
> > > >
> > > > scala> keyValueRDD.take(2).foreach(kv => println(kv))
> > > > (55e2-63f1-4def-b625-e73f0ac36271,43.89760813529593664528)
> > > > (000151e9-ff27-493d-a5ca-288507d92f95,57.68882040742382868990)
> > > >
> > > > OK above I am only getting 

Re: reading Hbase table in Spark

2016-10-10 Thread Ted Yu
In hbase master branch, there is hbase-spark module which would allow you
to integrate with Spark seamlessly.

Note: support for Spark 2.0 is pending. For details, see HBASE-16179

Cheers

On Mon, Oct 10, 2016 at 2:46 PM, Mich Talebzadeh 
wrote:

> Thanks Ted,
>
> So basically involves Java programming much like JDBC connection retrieval
> etc.
>
> Writing to Hbase is pretty fast. Now I have both views in Phoenix and Hive
> on the underlying Hbase tables.
>
> I am looking for flexibility here so I get I should use Spark on Hive
> tables with a view on Hbase table.
>
> Also I like tools like Zeppelin that work with both SQL and Spark
> Functional programming.
>
> Sounds like reading data from Hbase table is best done through some form of
> SQL.
>
> What are view on this approach?
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  OABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 10 October 2016 at 22:13, Ted Yu  wrote:
>
> > For org.apache.hadoop.hbase.client.Result, there is this method:
> >
> >   public byte[] getValue(byte [] family, byte [] qualifier) {
> >
> > which allows you to retrieve value for designated column.
> >
> >
> > FYI
> >
> > On Mon, Oct 10, 2016 at 2:08 PM, Mich Talebzadeh <
> > mich.talebza...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I am trying to do some operation on an Hbase table that is being
> > populated
> > > by Spark Streaming.
> > >
> > > Now this is just Spark on Hbase as opposed to Spark on Hive -> view on
> > > Hbase etc. I also have Phoenix view on this Hbase table.
> > >
> > > This is sample code
> > >
> > > scala> val tableName = "marketDataHbase"
> > > > val conf = HBaseConfiguration.create()
> > > conf: org.apache.hadoop.conf.Configuration = Configuration:
> > > core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
> > > yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
> > > hbase-default.xml, hbase-site.xml
> > > scala> conf.set(TableInputFormat.INPUT_TABLE, tableName)
> > > scala> //create rdd
> > > scala>
> > > *val hBaseRDD = sc.newAPIHadoopRDD(conf,
> > > classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io
> > > .ImmutableBytesWritable],classOf[org.apache.hadoop.
> > > hbase.client.Result])*hBaseRDD:
> > > org.apache.spark.rdd.RDD[(org.apache.hadoop.hbase.io.
> > > ImmutableBytesWritable,
> > > org.apache.hadoop.hbase.client.Result)] = NewHadoopRDD[4] at
> > > newAPIHadoopRDD at :64
> > > scala> hBaseRDD.count
> > > res11: Long = 22272
> > >
> > > scala> // transform (ImmutableBytesWritable, Result) tuples into an
> > RDD
> > > of Result's
> > > scala> val resultRDD = hBaseRDD.map(tuple => tuple._2)
> > > resultRDD: org.apache.spark.rdd.RDD[org.apache.hadoop.hbase.client.
> > Result]
> > > = MapPartitionsRDD[8] at map at :41
> > >
> > > scala>  // transform into an RDD of (RowKey, ColumnValue)s  the RowKey
> > has
> > > the time removed
> > >
> > > scala> val keyValueRDD = resultRDD.map(result =>
> > > (Bytes.toString(result.getRow()).split(" ")(0),
> > > Bytes.toString(result.value)))
> > > keyValueRDD: org.apache.spark.rdd.RDD[(String, String)] =
> > > MapPartitionsRDD[9] at map at :43
> > >
> > > scala> keyValueRDD.take(2).foreach(kv => println(kv))
> > > (55e2-63f1-4def-b625-e73f0ac36271,43.89760813529593664528)
> > > (000151e9-ff27-493d-a5ca-288507d92f95,57.68882040742382868990)
> > >
> > > OK above I am only getting the rowkey (UUID above) and the last
> > > attribute (price).
> > > However, I have the rowkey and 3 more columns there in Hbase table!
> > >
> > > scan 'marketDataHbase', "LIMIT" => 1
> > > ROW   COLUMN+CELL
> > >  55e2-63f1-4def-b625-e73f0ac36271
> > > column=price_info:price, timestamp=1476133232864,
> > > value=43.89760813529593664528
> > >  55e2-63f1-4def-b625-e73f0ac36271
> > > column=price_info:ticker, timestamp=1476133232864, value=S08
> > >  55e2-63f1-4def-b625-e73f0ac36271
> > > column=price_info:timecreated, timestamp=1476133232864,
> > > value=2016-10-10T17:12:22
> > > 1 row(s) in 0.0100 seconds
> > > So how can I get the other columns?
> > >
> > > Thanks
> > >
> > >
> > > Dr Mich Talebzadeh
> > >
> > >
> > >
> > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > >  AAEWh2gBxianrbJd6zP6AcPCCd
> > > OABUrV8Pw>*
> > >
> > >
> > >

Re: reading Hbase table in Spark

2016-10-10 Thread Mich Talebzadeh
Thanks Ted,

So basically involves Java programming much like JDBC connection retrieval
etc.

Writing to Hbase is pretty fast. Now I have both views in Phoenix and Hive
on the underlying Hbase tables.

I am looking for flexibility here so I get I should use Spark on Hive
tables with a view on Hbase table.

Also I like tools like Zeppelin that work with both SQL and Spark
Functional programming.

Sounds like reading data from Hbase table is best done through some form of
SQL.

What are view on this approach?



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 10 October 2016 at 22:13, Ted Yu  wrote:

> For org.apache.hadoop.hbase.client.Result, there is this method:
>
>   public byte[] getValue(byte [] family, byte [] qualifier) {
>
> which allows you to retrieve value for designated column.
>
>
> FYI
>
> On Mon, Oct 10, 2016 at 2:08 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I am trying to do some operation on an Hbase table that is being
> populated
> > by Spark Streaming.
> >
> > Now this is just Spark on Hbase as opposed to Spark on Hive -> view on
> > Hbase etc. I also have Phoenix view on this Hbase table.
> >
> > This is sample code
> >
> > scala> val tableName = "marketDataHbase"
> > > val conf = HBaseConfiguration.create()
> > conf: org.apache.hadoop.conf.Configuration = Configuration:
> > core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
> > yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
> > hbase-default.xml, hbase-site.xml
> > scala> conf.set(TableInputFormat.INPUT_TABLE, tableName)
> > scala> //create rdd
> > scala>
> > *val hBaseRDD = sc.newAPIHadoopRDD(conf,
> > classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io
> > .ImmutableBytesWritable],classOf[org.apache.hadoop.
> > hbase.client.Result])*hBaseRDD:
> > org.apache.spark.rdd.RDD[(org.apache.hadoop.hbase.io.
> > ImmutableBytesWritable,
> > org.apache.hadoop.hbase.client.Result)] = NewHadoopRDD[4] at
> > newAPIHadoopRDD at :64
> > scala> hBaseRDD.count
> > res11: Long = 22272
> >
> > scala> // transform (ImmutableBytesWritable, Result) tuples into an
> RDD
> > of Result's
> > scala> val resultRDD = hBaseRDD.map(tuple => tuple._2)
> > resultRDD: org.apache.spark.rdd.RDD[org.apache.hadoop.hbase.client.
> Result]
> > = MapPartitionsRDD[8] at map at :41
> >
> > scala>  // transform into an RDD of (RowKey, ColumnValue)s  the RowKey
> has
> > the time removed
> >
> > scala> val keyValueRDD = resultRDD.map(result =>
> > (Bytes.toString(result.getRow()).split(" ")(0),
> > Bytes.toString(result.value)))
> > keyValueRDD: org.apache.spark.rdd.RDD[(String, String)] =
> > MapPartitionsRDD[9] at map at :43
> >
> > scala> keyValueRDD.take(2).foreach(kv => println(kv))
> > (55e2-63f1-4def-b625-e73f0ac36271,43.89760813529593664528)
> > (000151e9-ff27-493d-a5ca-288507d92f95,57.68882040742382868990)
> >
> > OK above I am only getting the rowkey (UUID above) and the last
> > attribute (price).
> > However, I have the rowkey and 3 more columns there in Hbase table!
> >
> > scan 'marketDataHbase', "LIMIT" => 1
> > ROW   COLUMN+CELL
> >  55e2-63f1-4def-b625-e73f0ac36271
> > column=price_info:price, timestamp=1476133232864,
> > value=43.89760813529593664528
> >  55e2-63f1-4def-b625-e73f0ac36271
> > column=price_info:ticker, timestamp=1476133232864, value=S08
> >  55e2-63f1-4def-b625-e73f0ac36271
> > column=price_info:timecreated, timestamp=1476133232864,
> > value=2016-10-10T17:12:22
> > 1 row(s) in 0.0100 seconds
> > So how can I get the other columns?
> >
> > Thanks
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >  > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
>


Re: reading Hbase table in Spark

2016-10-10 Thread Ted Yu
For org.apache.hadoop.hbase.client.Result, there is this method:

  public byte[] getValue(byte [] family, byte [] qualifier) {

which allows you to retrieve value for designated column.


FYI

On Mon, Oct 10, 2016 at 2:08 PM, Mich Talebzadeh 
wrote:

> Hi,
>
> I am trying to do some operation on an Hbase table that is being populated
> by Spark Streaming.
>
> Now this is just Spark on Hbase as opposed to Spark on Hive -> view on
> Hbase etc. I also have Phoenix view on this Hbase table.
>
> This is sample code
>
> scala> val tableName = "marketDataHbase"
> > val conf = HBaseConfiguration.create()
> conf: org.apache.hadoop.conf.Configuration = Configuration:
> core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
> yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
> hbase-default.xml, hbase-site.xml
> scala> conf.set(TableInputFormat.INPUT_TABLE, tableName)
> scala> //create rdd
> scala>
> *val hBaseRDD = sc.newAPIHadoopRDD(conf,
> classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io
> .ImmutableBytesWritable],classOf[org.apache.hadoop.
> hbase.client.Result])*hBaseRDD:
> org.apache.spark.rdd.RDD[(org.apache.hadoop.hbase.io.
> ImmutableBytesWritable,
> org.apache.hadoop.hbase.client.Result)] = NewHadoopRDD[4] at
> newAPIHadoopRDD at :64
> scala> hBaseRDD.count
> res11: Long = 22272
>
> scala> // transform (ImmutableBytesWritable, Result) tuples into an RDD
> of Result's
> scala> val resultRDD = hBaseRDD.map(tuple => tuple._2)
> resultRDD: org.apache.spark.rdd.RDD[org.apache.hadoop.hbase.client.Result]
> = MapPartitionsRDD[8] at map at :41
>
> scala>  // transform into an RDD of (RowKey, ColumnValue)s  the RowKey has
> the time removed
>
> scala> val keyValueRDD = resultRDD.map(result =>
> (Bytes.toString(result.getRow()).split(" ")(0),
> Bytes.toString(result.value)))
> keyValueRDD: org.apache.spark.rdd.RDD[(String, String)] =
> MapPartitionsRDD[9] at map at :43
>
> scala> keyValueRDD.take(2).foreach(kv => println(kv))
> (55e2-63f1-4def-b625-e73f0ac36271,43.89760813529593664528)
> (000151e9-ff27-493d-a5ca-288507d92f95,57.68882040742382868990)
>
> OK above I am only getting the rowkey (UUID above) and the last
> attribute (price).
> However, I have the rowkey and 3 more columns there in Hbase table!
>
> scan 'marketDataHbase', "LIMIT" => 1
> ROW   COLUMN+CELL
>  55e2-63f1-4def-b625-e73f0ac36271
> column=price_info:price, timestamp=1476133232864,
> value=43.89760813529593664528
>  55e2-63f1-4def-b625-e73f0ac36271
> column=price_info:ticker, timestamp=1476133232864, value=S08
>  55e2-63f1-4def-b625-e73f0ac36271
> column=price_info:timecreated, timestamp=1476133232864,
> value=2016-10-10T17:12:22
> 1 row(s) in 0.0100 seconds
> So how can I get the other columns?
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  OABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>