I agree with Bejoy's assessment - Hive is good for processing large volumes of 
data in a batch manner. But for real-time or any complex SQL based analysis you 
would typically want to have some type of a RDBMS in the mix along with 
Hadoop/Hive. In terms of what's missing in Hive today - On the query side Hive 
doesn't yet support all flavors of subqueries (correlated subqueries to be 
specific. There are potential workarounds for the non-correlated ones), it also 
doesn't support inserting data from a stream i.e, INSERT INTO TABLE VALUES 
(...) type syntax, Hive's query optimizer is mostly rule-based at this time 
although there's push to move towards a cost-based one. On the administration 
side there's no workload management/job prioritization scheme like a typical 
RDBMS, Hive Server isn't thread-safe and also doesn't yet have any kind of 
security/authentication scheme.



From: Bejoy Ks [mailto:bejoy...@yahoo.com]
Sent: Monday, June 04, 2012 7:20 AM
To: user@hive.apache.org
Subject: Re: Front end visualization tool with Hive (when using as a warehouse)

Hi Sreenath

First of all don't take hive like a RDBMS system, while designing your 
solution. It is an awesome tool when it comes to processing of huge volumes of 
data in non real time mode. If any of your use cases comes with 'updates' on 
rows, it is not supported in hive. It is pretty expensive to have a work around 
for updates as well. (you can implement it on overwriting a per partition level 
in the most granular manner, still it is expensive)

By the way I'm not a DWH guy, may be others can add on their experience over 
these.

Regards
Bejoy KS

________________________________
From: Sreenath Menon <sreenathmen...@gmail.com<mailto:sreenathmen...@gmail.com>>
To: user@hive.apache.org<mailto:user@hive.apache.org>; Bejoy Ks 
<bejoy...@yahoo.com<mailto:bejoy...@yahoo.com>>
Sent: Monday, June 4, 2012 4:25 PM
Subject: Re: Front end visualization tool with Hive (when using as a warehouse)


Hi Bejoy

I am not looking for just an UI for queries (even though at first, when working 
on twitter data, that was of my interest). But, now I am planning on using Hive 
as a warehouse with a front end in-memory processing engine. Microstrategy or 
tableau would be a good choice.

Now further refining the problem, I would ask what is the warehousing power of 
Hive when compared to a traditional warehouse. Can Hive perform all operations 
performed/required in a warehouse. If not, where are the short comings which I 
need to deal with.

Always thankful for your apt assistance.
On Mon, Jun 4, 2012 at 3:49 PM, Bejoy Ks 
<bejoy...@yahoo.com<mailto:bejoy...@yahoo.com>> wrote:
Hi Sreenath

     If you are looking at a UI for queries then Cloudera's hue is the best 
choice. Also you do have odbc connectors that integrates BI tools like 
microstrategy, tableau etc with hive.

Regards
Bejoy KS

________________________________
From: Sreenath Menon <sreenathmen...@gmail.com<mailto:sreenathmen...@gmail.com>>
To: user@hive.apache.org<mailto:user@hive.apache.org>
Sent: Monday, June 4, 2012 2:42 PM
Subject: Front end visualization tool with Hive (when using as a warehouse)

Hi all

I am new to hive and am working on analysis of twitter data with Hive and 
Hadoop in a 27node cluster.
At present am using Microsoft powerpivot as the visualization tool for visual 
representation of analysis done using Hive and have got some really good 
results and I am stunned by the scalability power of the Hadoop system.

As a next step, I would like to evaluate the warehousing capabilities of Hive 
for business data.
Any insights into this is welcome. And am facing problem of delegating job to 
Hive/Powerpivot as Powerpivot itself has capabilities of being a warehouse 
tool. Any other good visualization tools for usage with Hive is also welcome.

For analyzing twitter data, I just ran complex Hive queries for each of 
analysis done. But for a warehouse, this does not sound like a good solution.

Any help is greatly appreciated.

Thanks.



Reply via email to