Hi,
I agree with Douglas's sentiment in here. The main attraction of Hive in general is in data ingestion process. Hive is great in getting raw data in, in rough and ready format (disregard schema on write and get data stored as is) and make sense of data later (schema on read, turn raw data into cooked data but still having raw data stored for reference). The economies of scale helps to contribute to much lower TCO. The major challenge I see as of now with Hive is its locking and concurrency. It would be great if we could introduce row level locking (implemented by something like a simple serialisation mechanism like latch) and make it work with MapReduce engine. The argument has always been (as I have understood), is MapReduce is not suitable for real time queries because of its overhead compared to what you get from an RDBMS (OLTP or DW) due to advantages that RDBMS optimiser offers and its multiple access path (indexes etc). Having said that I am not convinced whether Impala is anywhere near what an RDBMS offers (or Impala has cracked it). Again my assertion here is based on the understanding that Impala does not use MapReduce. But again that breaks down the axiom of Hadoop with HDFS + MapReduce. HTH Mich Talebzadeh http://talebzadehmich.wordpress.com Author of the books "A Practitioner's Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4 Publications due shortly: Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility. From: Moore, Douglas [mailto:douglas.mo...@thinkbiganalytics.com] Sent: 27 April 2015 14:10 To: user@hive.apache.org Subject: Re: Hive and Impala Hive is great for massive transformations needed in ETL type processing and full data set analytics. Impala is better suited for fast analytical queries returning a tiny subset of the original data set. Both are improving in terms of concurrency and latency however they have a long ways to go to beat commercial MPP solutions in terms of performance and stability. Their key advantages are storage economics and flexibility (schema on read). Sent from my iPhone On Apr 27, 2015, at 6:27 AM, Anilkumar Kalshetti <anilkalshe...@gmail.com> wrote: Hi Ashok, Also Now you can use spark as execution Engine for Hive. Please check HiveOnSpark[HoS] Project. Ref Link <https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+S tarted> . Thanks On 27 April 2015 at 15:22, Fabio C. <anyte...@gmail.com> wrote: If the comparison mention just MR, then is probably outdated. Hive can now run on Tez with a great improvement in performance. However I don't know about Hive+Tez vs Impala. On Mon, Apr 27, 2015 at 10:50 AM, Nitin Pawar <nitinpawar...@gmail.com> wrote: What use case are you trying to solve? On Mon, Apr 27, 2015 at 2:16 PM, Ashok Kumar <ashok34...@yahoo.com> wrote: Hi gurus, Kindly help me understand the advantage that Impala has over Hive. I read a note that Impala does not use MapReduce engine and is therefore very fast for queries compared to Hive. However, Hive as I understand is widely used everywhere! Thank you -- Nitin Pawar