RE: Hive and Impala

Mich Talebzadeh Mon, 27 Apr 2015 08:32:38 -0700

Hi,


I agree with Douglas's sentiment in here.

 

The main attraction of Hive in general is in data ingestion process. Hive is
great in getting raw data in, in rough and ready format (disregard schema on
write and get data stored as is) and make sense of data later (schema on
read, turn raw data into cooked data but still having raw data stored for
reference). The economies of scale helps to contribute to much lower TCO.

 

The major challenge I see as of now with Hive is its locking and
concurrency. It would be great if we could introduce row level locking
(implemented by something like a simple serialisation mechanism like latch)
and make it work with MapReduce engine. The argument has always been (as I
have understood), is  MapReduce is not suitable for real time queries
because of its overhead compared to what you get from an RDBMS (OLTP or DW)
due to advantages that RDBMS optimiser offers and its multiple access path
(indexes etc). Having said that I am not convinced whether Impala is
anywhere near what an RDBMS offers (or Impala has cracked it). Again my
assertion here is based on the understanding that Impala does not use
MapReduce. But again that breaks down the axiom of Hadoop with HDFS +
MapReduce. 

 

HTH

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Author of the books "A Practitioner's Guide to Upgrading to Sybase ASE 15",
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN
978-0-9759693-0-4

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume
one out shortly

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 

From: Moore, Douglas [mailto:douglas.mo...@thinkbiganalytics.com] 
Sent: 27 April 2015 14:10
To: user@hive.apache.org
Subject: Re: Hive and Impala

 

Hive is great for massive transformations needed in ETL type processing and
full data set analytics. Impala is better suited for fast analytical queries
returning a tiny subset of the original data set. Both are improving in
terms of concurrency and latency however they have a long ways to go to beat
commercial MPP solutions in terms of performance and stability. Their key
advantages are storage economics and flexibility (schema on read).

Sent from my iPhone


On Apr 27, 2015, at 6:27 AM, Anilkumar Kalshetti <anilkalshe...@gmail.com>
wrote:

Hi Ashok,

 

Also Now you can use spark as execution Engine for Hive. Please check
HiveOnSpark[HoS] Project. 

 

Ref Link
<https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+S
tarted> .

 

Thanks

 

On 27 April 2015 at 15:22, Fabio C. <anyte...@gmail.com> wrote:

If the comparison mention just MR, then is probably outdated. Hive can now
run on Tez with a great improvement in performance.

However I don't know about Hive+Tez vs Impala.

 

On Mon, Apr 27, 2015 at 10:50 AM, Nitin Pawar <nitinpawar...@gmail.com>
wrote:

What use case are you trying to solve? 

 

On Mon, Apr 27, 2015 at 2:16 PM, Ashok Kumar <ashok34...@yahoo.com> wrote:


Hi gurus,

Kindly help me understand the advantage that Impala has over Hive.

I read a note that Impala does not use MapReduce engine and is therefore
very fast for queries compared to Hive. However, Hive as I understand is
widely used everywhere!

Thank you





 

-- 

Nitin Pawar

RE: Hive and Impala

Reply via email to