Can you elaborate on why you prefer Tajo? Daniel
> On 27 בינו׳ 2015, at 10:35, Azuryy Yu <[email protected]> wrote: > > You almost list all open sourced MPP real time SQL-ON-Hadoop. > > I prefer Tajo, which was relased by 0.9.0 recently, and still working in > progress for 1.0 > > >> On Mon, Jan 26, 2015 at 10:19 PM, Samuel Marks <[email protected]> wrote: >> Since Hadoop came out, there have been various commercial and/or open-source >> attempts to expose some compatibility with SQL. >> >> I am seeking one which is good for low-latency querying, and supports the >> most common CRUD, including [the basics!] along these lines: CREATE TABLE, >> INSERT INTO, SELECT * FROM, UPDATE Table SET C1=2 WHERE, DELETE FROM, and >> DROP TABLE. >> >> I will be utilising them from Python, however there does seem to be a Python >> JDBC wrapper. Additionally it needs to be scalable for big and small data >> (starting on a single-node "cluster"). >> >> Here is what I've found thus far: >> >> Apache Hive (SQL-like, with interactive SQL thanks to the Stinger initiative) >> Apache Drill (ANSI SQL support) >> Apache Spark (Spark SQL, queries only, add data via Hive, RDD or Paraquet) >> Apache Phoenix (built atop Apache HBase, lacks full transaction support, >> relational operators and some built-in functions) >> Presto from Facebook (can query Hive, Cassandra, relational DBs &etc. >> Doesn't seem to be designed for low-latency responses across small clusters, >> or support UPDATE operations. It is optimized for data warehousing or >> analytics¹) >> SQL-Hadoop via MapR community edition (seems to be a packaging of Hive, HP >> Vertica, SparkSQL, Drill and a native ODBC wrapper) >> Apache Kylin from Ebay (provides an SQL interface and multi-dimensional >> analysis [OLAP], "… offers ANSI SQL on Hadoop and supports most ANSI SQL >> query functions". It depends on HDFS, MapReduce, Hive and HBase; and seems >> targeted at very large data-sets though maintains low query latency) >> Apache Tajo (ANSI/ISO SQL standard compliance with JDBC driver support >> [benchmarks against Hive and Impala]) >> Cascading's Lingual² ("Lingual provides JDBC Drivers, a SQL command shell, >> and a catalog manager for publishing files [or any resource] as schemas and >> tables.") >> Which—from this list or elsewhere—would you recommend, and why? >> >> Thanks for all suggestions, >> >> Samuel Marks >> http://linkedin.com/in/samuelmarks >
