Hi, on this topic,
All along the conclusion seem to be "quote" 1. The Hive is batch oriented(aka slow), it transfer the SQL query to MapReduce jobs, it mostly used in offline data processing. 2. The Phoenix is a SQL layer between applications and Hbase, it provide ad-hoc queries in real time. Fine that was notes from 2014 in here <https://www.quora.com/How-is-Apache-Phoenix-different-from-Hive-Hbase-integration>. >From the Phoenix web page it says: "he Phoenix query engine transforms your SQL query into one or more Hbase scans, and orchestrates their execution to produce standard JDBC result sets. Direct use of the Hbase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows." But things have changed now. Map-reduce "MR" in Hive 2 is depreciated and one more and less can do same in-memory speed with Hive on Spark engine. Hive does not convert the query to MR, it uses Spark that has DAG and IIMDB combined. Sure we are talking about OLTP speed with Phoenix and single row DML but Hive offers variety of table fornats Bottom line, has there been any comparative studies of this recently to gauge the performance of Hive on Spark vs Phoenix? Sounds like the distinguishing feature is that Phoenix does an a-synchronous write to Hbase and leaves Hbase to handle the work completion through its API and as expected Phoenix relies on memory (what else) to speed up this process. Hive on newer engine can do most of this these days. Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com