Re: Hive or Phoenix

James Taylor Thu, 11 Sep 2014 09:00:51 -0700

Hi Siddharth,
If your data fits into memory, then I'd recommend using a RDBMS. They work
great when they can meet your scaling requirements.
Thanks,
James


On Thursday, September 11, 2014, Siddharth Ubale <
siddharth.ub...@syncoms.com> wrote:

>  Hi Anil,
>
>
>
> Thanks for the concise reply.
>
> Just wanted to take the conversation further and understand what benefits
> would phoenix offer in the scenario where we can employ a in memory system
> like Apache spark or Impala on top of hive to reduce latency?
>
> I am asking cos then security could be handled better….
>
> Please do share your views.
>
>
>
> Thanks,
>
> Siddharth Ubale
>
>
>
>
>
> *From:* anil gupta [mailto:anilgupt...@gmail.com
> <javascript:_e(%7B%7D,'cvml','anilgupt...@gmail.com');>]
> *Sent:* Wednesday, September 10, 2014 9:50 PM
> *To:* Prakash Hosalli
> *Cc:* user@phoenix.apache.org
> <javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>
> *Subject:* Re: Hive or Phoenix
>
>
>
> Hi Prakash,
>
> Here is the url for performance comparison:
> http://phoenix.apache.org/performance.html
>
> Thanks,
> Anil Gupta
>
>
>
> On Wed, Sep 10, 2014 at 9:16 AM, anil gupta <anilgupt...@gmail.com
> <javascript:_e(%7B%7D,'cvml','anilgupt...@gmail.com');>> wrote:
>
>  Hi Prakash,
>
> Please find my reply inline.
>
>
>
> On Tue, Sep 9, 2014 at 11:28 PM, Prakash Hosalli <
> prakash.hosa...@syncoms.com
> <javascript:_e(%7B%7D,'cvml','prakash.hosa...@syncoms.com');>> wrote:
>
> Hi James/Anil,
>
>
>         Regarding the questions you put forward,
>
> 1.      Yes we will stored data in Hbase,
> 2.      Hive will run over Hbase.
>
>  Anil: I am not aware of your use case to say how much you can do with
> OOTB(Out of the Box) features of Hive and HBase integration. But, when i
> tried to use Hive with HBase i could not use it because Hive does not
> supports querying a table that has composite rowkeys. In an production
> environment, most of the times users have composite rowkeys. Obviously, you
> can patch Hive-HBase integration to make it better. Please keep in mind
> that Hive is not designed to support HBase(HBase integration is just a
> small feature of Hive). In contrast, Phoenix is designed on "Top of HBase"
> so you will get much much better integration and optimization of HBase
> query.
>
> 3.      We will be using large amount of data (approximately 10 Million of
> rows/daily to be process).
>
>  Anil: What kind of processing you will be doing? If you are doing simple
> aggregates, that is already supported by Phoenix. You can also have a look
> a Phoenix-Pig integration to leverage more analytical power of Pig(Although
> Pig is a data flow language and Hive is declarative but you get Pig
> integration OOTB.)
>
> 4.      Right now we have both options open, but primarily we plan to use
> Hive table to serve client request/query on aggregated data.
>
>  Anil: People primarily use Hive for SQL querying, same can be achieved
> in a better way with Phoenix(especially when HBase is your storage).
>
> 5.      We plan to employ all type of query & we plan to achieve high
> level of low latency.
>
>  Anil: Phoenix will provide you much better performance on HBase.
>
>
>         If I understand correctly phoenix will just connect to Hbase
> securely & rely on the Hbase API to extract query reply, therefore Phoenix
> will depend on security mechanisms employed by Hbase API & will not provide
> any security feature by itself.
>
>  Anil: Yes, that is true. At present, Phoenix does not provides mechanism
> to grant/revoke/create/add users. Same can be done using HBase shell and
> phoenix will honor those changes. Phoenix is open source so a patch is
> always appreciated for new features.
>
>
>         Kindly correct me if my understanding is wrong.
>
>
> Thanks & Regards,
> Prakash Hosalli
>
>   -----Original Message-----
> From: James Taylor [mailto:jamestay...@apache.org
> <javascript:_e(%7B%7D,'cvml','jamestay...@apache.org');>]
> Sent: Tuesday, September 09, 2014 11:56 PM
> To: user; anil gupta
> Subject: Re: Hive or Phoenix
>
> Hi Prakash,
> If possible, it'd be helpful if you could describe your use case a bit.
>
> Some questions I'd have for you: is the data over which you'd query stored
> in HBase? And if so, would the Hive run over the HBase data? Is the data
> read-only or does it mutate? How much data are we talking about
> (approximately) and what would your typical queries be: point look-ups,
> range scans, or full table scans?
>
> As far as security, HBase provides some more fine grained mechanisms as
> well which you could leverage through HBase APIs. Other than the ability to
> connect to a secure cluster through the connection URL, Phoenix doesn't yet
> provide a SQL wrapper on these HBase APIs. This is how Intuit is leveraging
> Phoenix + security in HBase. Anil Gupta can likely tell you more.
>
> Thanks,
> James
>
> On Tue, Sep 9, 2014 at 9:28 AM, Nicolas Maillard <
> nmaill...@hortonworks.com
> <javascript:_e(%7B%7D,'cvml','nmaill...@hortonworks.com');>> wrote:
> > Hello Prakash
> >
> > Considering Hive or Phoenix is a little misleading they di serve
> > different needs, let me break it down as I can.
> >
> > You mention security:
> > Phoenix and hive both work on a secured Hadoop cluster, but Hive with
> > Hive Atz has a more fine grained authorization model. So from that
> > perspective Hive has more features.
> >
> > Query performance
> > On the performance side Phoenix has random read,write access where
> > Hive is a full data access, so no way to read a particular entry
> > unless you read the whole associated file.
> > So Hive is batch or interactive, meaning a couple of tens of seconds
> > to get your answer, where Phoenix can be sub second, the response time
> > will depend greatly on wether part of the pheonix key is in your
> > query. I you do a full table scan response time will suffer. Granted
> > secondary indexes could help you there.
> >
> > SQL Semantics
> > Hive currently has a more rich sql semantics with analytics functions,
> > complex types etc...
> > Phoenix is also more limited than Hive in joins or UDFS
> >
> > So I would use Hive for large data, random analysis and ETL, and pay
> > the price of the response time a little.
> > Phoenix on the other hand is great for large volumes of data where you
> > can set up your schema and especially keys according to specific needs
> > and query patterns, in this situation you would get great query
> performance.
> >
> > To sum up in all honesty both are needed
> >
> > Hope this helps
> >
> > On Tue, Sep 9, 2014 at 4:19 PM, Prakash Hosalli
> > <prakash.hosa...@syncoms.com
> <javascript:_e(%7B%7D,'cvml','prakash.hosa...@syncoms.com');>> wrote:
> >>
> >>
> >>
> >> Hi,
> >>
> >>
> >>
> >>
> >>
> >>                 Is phoenix as any security layer in it. As we have in
> >> hive.
> >>
> >>
> >>
> >>                 Getting confuse to go forward with Phoenix or Hive in
> >> production environment in my company.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Thanks  & Regards,
> >>
> >> Prakash Hosalli
> >>
> >> Syncoms Bangalore India.
> >>
> >>
> >
> >
> >
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or
> > entity to which it is addressed and may contain information that is
> > confidential, privileged and exempt from disclosure under applicable
> > law. If the reader of this message is not the intended recipient, you
> > are hereby notified that any printing, copying, dissemination,
> > distribution, disclosure or forwarding of this communication is
> > strictly prohibited. If you have received this communication in error,
> > please contact the sender immediately and delete it from your system.
> Thank You.
>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: Hive or Phoenix

Reply via email to