Re: Spark and HBase

Nicholas Chammas Tue, 08 Apr 2014 10:14:58 -0700

Just took a quick look at the overview
here<http://phoenix.incubator.apache.org/> and
the quick start guide
here<http://phoenix.incubator.apache.org/Phoenix-in-15-minutes-or-less.html>
.


It looks like Apache Phoenix aims to provide flexible SQL access to data,
both for transactional and analytic purposes, and at interactive speeds.

Nick


On Tue, Apr 8, 2014 at 12:38 PM, Bin Wang <binwang...@gmail.com> wrote:

> First, I have not tried it myself. However, what I have heard it has some
> basic SQL features so you can query you HBase table like query content on
> HDFS using Hive.
> So it is not "query a simple column", I believe you can do joins and other
> SQL queries. Maybe you can wrap up an EMR cluster with Hbase preconfigured
> and give it a try.
>
> Sorry cannot provide more detailed explanation and help.
>
>
>
> On Tue, Apr 8, 2014 at 10:17 AM, Flavio Pompermaier 
> <pomperma...@okkam.it>wrote:
>
>> Thanks for the quick reply Bin. Phenix is something I'm going to try for
>> sure but is seems somehow useless if I can use Spark.
>> Probably, as you said, since Phoenix use a dedicated data structure
>> within each HBase Table has a more effective memory usage but if I need to
>> deserialize data stored in a HBase cell I still have to read in memory that
>> object and thus I need Spark. From what I understood Phoenix is good if I
>> have to query a simple column of HBase but things get really complicated if
>> I have to add an index for each column in my table and I store complex
>> object within the cells. Is it correct?
>>
>> Best,
>> Flavio
>>
>>
>>
>>
>> On Tue, Apr 8, 2014 at 6:05 PM, Bin Wang <binwang...@gmail.com> wrote:
>>
>>> Hi Flavio,
>>>
>>> I happened to attend, actually attending the 2014 Apache Conf, I heard a
>>> project called "Apache Phoenix", which fully leverage HBase and suppose to
>>> be 1000x faster than Hive. And it is not memory bounded, in which case sets
>>> up a limit for Spark. It is still in the incubating group and the "stats"
>>> functions spark has already implemented are still on the roadmap. I am not
>>> sure whether it will be good but might be something interesting to check
>>> out.
>>>
>>> /usr/bin
>>>
>>>
>>> On Tue, Apr 8, 2014 at 9:57 AM, Flavio Pompermaier <pomperma...@okkam.it
>>> > wrote:
>>>
>>>> Hi to everybody,
>>>>
>>>>  in these days I looked a bit at the recent evolution of the big data
>>>> stacks and it seems that HBase is somehow fading away in favour of
>>>> Spark+HDFS. Am I correct?
>>>> Do you think that Spark and HBase should work together or not?
>>>>
>>>> Best regards,
>>>> Flavio
>>>>
>>>
>

Re: Spark and HBase

Reply via email to