Re: Which [open-souce] SQL engine atop Hadoop?

Daniel Haviv Tue, 27 Jan 2015 01:17:00 -0800

Can you elaborate on why you prefer Tajo?

Daniel


> On 27 בינו׳ 2015, at 10:35, Azuryy Yu <[email protected]> wrote:
> 
> You almost list all open sourced MPP real time SQL-ON-Hadoop.
> 
> I prefer Tajo, which was relased by 0.9.0 recently, and still working in 
> progress for 1.0
> 
> 
>> On Mon, Jan 26, 2015 at 10:19 PM, Samuel Marks <[email protected]> wrote:
>> Since Hadoop came out, there have been various commercial and/or open-source 
>> attempts to expose some compatibility with SQL.
>> 
>> I am seeking one which is good for low-latency querying, and supports the 
>> most common CRUD, including [the basics!] along these lines: CREATE TABLE, 
>> INSERT INTO, SELECT * FROM, UPDATE Table SET C1=2 WHERE, DELETE FROM, and 
>> DROP TABLE.
>> 
>> I will be utilising them from Python, however there does seem to be a Python 
>> JDBC wrapper. Additionally it needs to be scalable for big and small data 
>> (starting on a single-node "cluster").
>> 
>> Here is what I've found thus far:
>> 
>> Apache Hive (SQL-like, with interactive SQL thanks to the Stinger initiative)
>> Apache Drill (ANSI SQL support)
>> Apache Spark (Spark SQL, queries only, add data via Hive, RDD or Paraquet)
>> Apache Phoenix (built atop Apache HBase, lacks full transaction support, 
>> relational operators and some built-in functions)
>> Presto from Facebook (can query Hive, Cassandra, relational DBs &etc. 
>> Doesn't seem to be designed for low-latency responses across small clusters, 
>> or support UPDATE operations. It is optimized for data warehousing or 
>> analytics¹)
>> SQL-Hadoop via MapR community edition (seems to be a packaging of Hive, HP 
>> Vertica, SparkSQL, Drill and a native ODBC wrapper)
>> Apache Kylin from Ebay (provides an SQL interface and multi-dimensional 
>> analysis [OLAP], "… offers ANSI SQL on Hadoop and supports most ANSI SQL 
>> query functions". It depends on HDFS, MapReduce, Hive and HBase; and seems 
>> targeted at very large data-sets though maintains low query latency)
>> Apache Tajo (ANSI/ISO SQL standard compliance with JDBC driver support 
>> [benchmarks against Hive and Impala])
>> Cascading's Lingual² ("Lingual provides JDBC Drivers, a SQL command shell, 
>> and a catalog manager for publishing files [or any resource] as schemas and 
>> tables.")
>> Which—from this list or elsewhere—would you recommend, and why?
>> 
>> Thanks for all suggestions,
>> 
>> Samuel Marks
>> http://linkedin.com/in/samuelmarks
>

Re: Which [open-souce] SQL engine atop Hadoop?

Reply via email to