Re: Hive on Hbase

John Leach Thu, 17 Nov 2016 10:44:48 -0800

Mich,

Please see slide 9 for architectural differences between Splice Machine, 
Trafodion, and Phoenix.


https://docs.google.com/presentation/d/111t2QSVaI-CPwE_ejPHZMFhKJVe5yghCMPLfR3zh9hQ/edit?ts=582def5b#slide=id.g5fcdef5a7_09
 
<https://docs.google.com/presentation/d/111t2QSVaI-CPwE_ejPHZMFhKJVe5yghCMPLfR3zh9hQ/edit?ts=582def5b#slide=id.g5fcdef5a7_09>

The performance differences are in the later slides.

Hope this helps.  

Regards,
John Leach

> On Nov 17, 2016, at 10:41 AM, Gunnar Tapper <[email protected]> wrote:
> 
> Hi,
> 
> Trafodion's native storage engine is HBase.
> 
> You can find its documentation at: trafodion.apache.org/documentation.html
> 
> Since this is an HBase user mailing list, I suggest that we discuss your
> other questions on [email protected].
> 
> Thanks,
> 
> Gunnar
> 
> 
> 
> On Thu, Nov 17, 2016 at 8:19 AM, Mich Talebzadeh <[email protected]>
> wrote:
> 
>> thanks Gunnar.
>> 
>> have you tried the performance of this product on Hbase. There are a number
>> of options available. However, what makes this product better than hive on
>> hbase?
>> 
>> regards
>> 
>> Dr Mich Talebzadeh
>> 
>> 
>> 
>> LinkedIn * https://www.linkedin.com/profile/view?id=
>> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
>> OABUrV8Pw>*
>> 
>> 
>> 
>> http://talebzadehmich.wordpress.com
>> 
>> 
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
>> loss, damage or destruction of data or any other property which may arise
>> from relying on this email's technical content is explicitly disclaimed.
>> The author will in no case be liable for any monetary damages arising from
>> such loss, damage or destruction.
>> 
>> 
>> 
>> On 17 November 2016 at 15:04, Gunnar Tapper <[email protected]>
>> wrote:
>> 
>>> Apache Trafodion provides SQL on top of HBase.
>>> 
>>> On Thu, Nov 17, 2016 at 7:40 AM, Mich Talebzadeh <
>>> [email protected]>
>>> wrote:
>>> 
>>>> thanks John.
>>>> 
>>>> How about using Phoenix or using Spark RDDs on top of Hbase?
>>>> 
>>>> Many people think Phoenix is not a good choice?
>>>> 
>>>> 
>>>> 
>>>> Dr Mich Talebzadeh
>>>> 
>>>> 
>>>> 
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=
>>>> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=
>> AAEAAAAWh2gBxianrbJd6zP6AcPCCd
>>>> OABUrV8Pw>*
>>>> 
>>>> 
>>>> 
>>>> http://talebzadehmich.wordpress.com
>>>> 
>>>> 
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any
>>>> loss, damage or destruction of data or any other property which may
>> arise
>>>> from relying on this email's technical content is explicitly
>> disclaimed.
>>>> The author will in no case be liable for any monetary damages arising
>>> from
>>>> such loss, damage or destruction.
>>>> 
>>>> 
>>>> 
>>>> On 17 November 2016 at 14:24, John Leach <[email protected]>
>>> wrote:
>>>> 
>>>>> Mich,
>>>>> 
>>>>> I have not found too many happy users of Hive on top of HBase in my
>>>>> experience.  For every query in Hive, you will have to read the data
>>> from
>>>>> the filesystem into hbase and then serialize the data via an HBase
>>>> scanner
>>>>> into Hive.  The throughput through this mechanism is pretty poor and
>>> now
>>>>> when you read 1 million records you actually read 1 Million records
>> in
>>>>> HBase and 1 Million Records in Hive.  There are significant resource
>>>>> management issues with this approach as well.
>>>>> 
>>>>> At Splice Machine (open source), we have written an implementation to
>>>> read
>>>>> the store files directly from the file system (via embedded Spark)
>> and
>>>> then
>>>>> we do incremental deltas with HBase to maintain consistency.  When we
>>>> read
>>>>> 1 million records, Spark reads most of them directly from the
>>> filesystem.
>>>>> Spark provides resource management and fair scheduling of those
>> queries
>>>> as
>>>>> well.
>>>>> 
>>>>> We released some of our performance results at HBaseCon East in NYC.
>>>> Here
>>>>> is the video.  https://www.youtube.com/watch?v=cgIz-cjehJ0 <
>>>>> https://www.youtube.com/watch?v=cgIz-cjehJ0> .
>>>>> 
>>>>> Regards,
>>>>> John Leach
>>>>> 
>>>>>> On Nov 17, 2016, at 6:09 AM, Mich Talebzadeh <
>>>> [email protected]>
>>>>> wrote:
>>>>>> 
>>>>>> H,
>>>>>> 
>>>>>> My approach to have a SQL engine on top of Hbase has been
>> (excluding
>>>>> Spark
>>>>>> & Phoenix for now) is to create Hbase table as is, then create an
>>>>> EXTERNAL
>>>>>> Hive table on Hbase using Hadoop.hive.HbaseStorageHandler to
>>> interface
>>>>> with
>>>>>> Hbase table.
>>>>>> 
>>>>>> My reasoning with creating Hive external table is to avoid
>>> accidentally
>>>>>> dropping Hbase table etc. Is this a reasonable approach?
>>>>>> 
>>>>>> Then that Hive table can be used by a variety of tools like Spark,
>>>>> Tableau,
>>>>>> Zeppelin.
>>>>>> 
>>>>>> Is this a viable solution as Hive seems to be preferred on top of
>>> Hbase
>>>>>> compared to Phoenix etc.
>>>>>> 
>>>>>> Thaks
>>>>>> 
>>>>>> Dr Mich Talebzadeh
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=
>>>>> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=
>>>> AAEAAAAWh2gBxianrbJd6zP6AcPCCd
>>>>> OABUrV8Pw>*
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> http://talebzadehmich.wordpress.com
>>>>>> 
>>>>>> 
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>> for
>>>> any
>>>>>> loss, damage or destruction of data or any other property which may
>>>> arise
>>>>>> from relying on this email's technical content is explicitly
>>>> disclaimed.
>>>>>> The author will in no case be liable for any monetary damages
>> arising
>>>>> from
>>>>>> such loss, damage or destruction.
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Thanks,
>>> 
>>> Gunnar
>>> *If you think you can you can, if you think you can't you're right.*
>>> 
>> 
> 
> 
> 
> -- 
> Thanks,
> 
> Gunnar
> *If you think you can you can, if you think you can't you're right.*

Re: Hive on Hbase

Reply via email to