Re: Hive or Phoenix

anil gupta Wed, 10 Sep 2014 09:20:30 -0700

Hi Prakash,

Here is the url for performance comparison:
http://phoenix.apache.org/performance.html


Thanks,
Anil Gupta

On Wed, Sep 10, 2014 at 9:16 AM, anil gupta <anilgupt...@gmail.com> wrote:

> Hi Prakash,
>
> Please find my reply inline.
>
> On Tue, Sep 9, 2014 at 11:28 PM, Prakash Hosalli <
> prakash.hosa...@syncoms.com> wrote:
>
>> Hi James/Anil,
>>
>>
>>         Regarding the questions you put forward,
>>
>> 1.      Yes we will stored data in Hbase,
>> 2.      Hive will run over Hbase.
>>
> Anil: I am not aware of your use case to say how much you can do with
> OOTB(Out of the Box) features of Hive and HBase integration. But, when i
> tried to use Hive with HBase i could not use it because Hive does not
> supports querying a table that has composite rowkeys. In an production
> environment, most of the times users have composite rowkeys. Obviously, you
> can patch Hive-HBase integration to make it better. Please keep in mind
> that Hive is not designed to support HBase(HBase integration is just a
> small feature of Hive). In contrast, Phoenix is designed on "Top of HBase"
> so you will get much much better integration and optimization of HBase
> query.
>
>> 3.      We will be using large amount of data (approximately 10 Million
>> of rows/daily to be process).
>>
> Anil: What kind of processing you will be doing? If you are doing simple
> aggregates, that is already supported by Phoenix. You can also have a look
> a Phoenix-Pig integration to leverage more analytical power of Pig(Although
> Pig is a data flow language and Hive is declarative but you get Pig
> integration OOTB.)
>
>> 4.      Right now we have both options open, but primarily we plan to use
>> Hive table to serve client request/query on aggregated data.
>>
> Anil: People primarily use Hive for SQL querying, same can be achieved in
> a better way with Phoenix(especially when HBase is your storage).
>
>> 5.      We plan to employ all type of query & we plan to achieve high
>> level of low latency.
>>
> Anil: Phoenix will provide you much better performance on HBase.
>
>>
>>         If I understand correctly phoenix will just connect to Hbase
>> securely & rely on the Hbase API to extract query reply, therefore Phoenix
>> will depend on security mechanisms employed by Hbase API & will not provide
>> any security feature by itself.
>>
> Anil: Yes, that is true. At present, Phoenix does not provides mechanism
> to grant/revoke/create/add users. Same can be done using HBase shell and
> phoenix will honor those changes. Phoenix is open source so a patch is
> always appreciated for new features.
>
>>
>>         Kindly correct me if my understanding is wrong.
>>
>>
>> Thanks & Regards,
>> Prakash Hosalli
>>
>>
>> -----Original Message-----
>> From: James Taylor [mailto:jamestay...@apache.org]
>> Sent: Tuesday, September 09, 2014 11:56 PM
>> To: user; anil gupta
>> Subject: Re: Hive or Phoenix
>>
>> Hi Prakash,
>> If possible, it'd be helpful if you could describe your use case a bit.
>>
>> Some questions I'd have for you: is the data over which you'd query
>> stored in HBase? And if so, would the Hive run over the HBase data? Is the
>> data read-only or does it mutate? How much data are we talking about
>> (approximately) and what would your typical queries be: point look-ups,
>> range scans, or full table scans?
>>
>> As far as security, HBase provides some more fine grained mechanisms as
>> well which you could leverage through HBase APIs. Other than the ability to
>> connect to a secure cluster through the connection URL, Phoenix doesn't yet
>> provide a SQL wrapper on these HBase APIs. This is how Intuit is leveraging
>> Phoenix + security in HBase. Anil Gupta can likely tell you more.
>>
>> Thanks,
>> James
>>
>> On Tue, Sep 9, 2014 at 9:28 AM, Nicolas Maillard <
>> nmaill...@hortonworks.com> wrote:
>> > Hello Prakash
>> >
>> > Considering Hive or Phoenix is a little misleading they di serve
>> > different needs, let me break it down as I can.
>> >
>> > You mention security:
>> > Phoenix and hive both work on a secured Hadoop cluster, but Hive with
>> > Hive Atz has a more fine grained authorization model. So from that
>> > perspective Hive has more features.
>> >
>> > Query performance
>> > On the performance side Phoenix has random read,write access where
>> > Hive is a full data access, so no way to read a particular entry
>> > unless you read the whole associated file.
>> > So Hive is batch or interactive, meaning a couple of tens of seconds
>> > to get your answer, where Phoenix can be sub second, the response time
>> > will depend greatly on wether part of the pheonix key is in your
>> > query. I you do a full table scan response time will suffer. Granted
>> > secondary indexes could help you there.
>> >
>> > SQL Semantics
>> > Hive currently has a more rich sql semantics with analytics functions,
>> > complex types etc...
>> > Phoenix is also more limited than Hive in joins or UDFS
>> >
>> > So I would use Hive for large data, random analysis and ETL, and pay
>> > the price of the response time a little.
>> > Phoenix on the other hand is great for large volumes of data where you
>> > can set up your schema and especially keys according to specific needs
>> > and query patterns, in this situation you would get great query
>> performance.
>> >
>> > To sum up in all honesty both are needed
>> >
>> > Hope this helps
>> >
>> > On Tue, Sep 9, 2014 at 4:19 PM, Prakash Hosalli
>> > <prakash.hosa...@syncoms.com> wrote:
>> >>
>> >>
>> >>
>> >> Hi,
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>                 Is phoenix as any security layer in it. As we have in
>> >> hive.
>> >>
>> >>
>> >>
>> >>                 Getting confuse to go forward with Phoenix or Hive in
>> >> production environment in my company.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Thanks  & Regards,
>> >>
>> >> Prakash Hosalli
>> >>
>> >> Syncoms Bangalore India.
>> >>
>> >>
>> >
>> >
>> >
>> > CONFIDENTIALITY NOTICE
>> > NOTICE: This message is intended for the use of the individual or
>> > entity to which it is addressed and may contain information that is
>> > confidential, privileged and exempt from disclosure under applicable
>> > law. If the reader of this message is not the intended recipient, you
>> > are hereby notified that any printing, copying, dissemination,
>> > distribution, disclosure or forwarding of this communication is
>> > strictly prohibited. If you have received this communication in error,
>> > please contact the sender immediately and delete it from your system.
>> Thank You.
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>



-- 
Thanks & Regards,
Anil Gupta

Re: Hive or Phoenix

Reply via email to