Re: Hive on Hbase

Mich Talebzadeh Thu, 17 Nov 2016 06:41:38 -0800

thanks John.

How about using Phoenix or using Spark RDDs on top of Hbase?


Many people think Phoenix is not a good choice?



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 17 November 2016 at 14:24, John Leach <[email protected]> wrote:

> Mich,
>
> I have not found too many happy users of Hive on top of HBase in my
> experience.  For every query in Hive, you will have to read the data from
> the filesystem into hbase and then serialize the data via an HBase scanner
> into Hive.  The throughput through this mechanism is pretty poor and now
> when you read 1 million records you actually read 1 Million records in
> HBase and 1 Million Records in Hive.  There are significant resource
> management issues with this approach as well.
>
> At Splice Machine (open source), we have written an implementation to read
> the store files directly from the file system (via embedded Spark) and then
> we do incremental deltas with HBase to maintain consistency.  When we read
> 1 million records, Spark reads most of them directly from the filesystem.
> Spark provides resource management and fair scheduling of those queries as
> well.
>
> We released some of our performance results at HBaseCon East in NYC.  Here
> is the video.  https://www.youtube.com/watch?v=cgIz-cjehJ0 <
> https://www.youtube.com/watch?v=cgIz-cjehJ0> .
>
> Regards,
> John Leach
>
> > On Nov 17, 2016, at 6:09 AM, Mich Talebzadeh <[email protected]>
> wrote:
> >
> > H,
> >
> > My approach to have a SQL engine on top of Hbase has been (excluding
> Spark
> > & Phoenix for now) is to create Hbase table as is, then create an
> EXTERNAL
> > Hive table on Hbase using Hadoop.hive.HbaseStorageHandler to interface
> with
> > Hbase table.
> >
> > My reasoning with creating Hive external table is to avoid accidentally
> > dropping Hbase table etc. Is this a reasonable approach?
> >
> > Then that Hive table can be used by a variety of tools like Spark,
> Tableau,
> > Zeppelin.
> >
> > Is this a viable solution as Hive seems to be preferred on top of Hbase
> > compared to Phoenix etc.
> >
> > Thaks
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
>
>

Re: Hive on Hbase

Reply via email to