Re: Q4A Project
Andrew, Have you considered leveraging existing SQL query layers like Hive or Spark's SQL/DataFrames API? There are some pretty massive optimizations involved in that API making the push-down predicates / selections pretty easy to adapt for Accumulo. On Mon, Apr 27, 2015 at 8:37 PM, Andrew Wells awe...@clearedgeit.com wrote: I have been working on a project, tentatively called Q4A (Query for Accumulo). Another possible name is ASQ (Accumulo Streaming Query) [discus]. This is a streaming query as the query is completed via a stream, should never group data in memory. To batch, intermediate results would be written back to Accumulo temporarily. The *primary goal* is to have a complete SQL implementation native to Accumulo. *Why do this?* I am getting tired of writing bad java code to query a database. I would rather write bad SQL code. Also, people should be able to get queries out faster and it shouldn't take a developer. *Native To Accumulo*: - There should be no special format to read a database created by Q4A - There should be no special format for Q4A to query a table - All tables are tables available to Q4A - Any special tables, are stored away from the users databases (indexes, column definitions, etc) *Other Goals*: - Implement the entire SQL definition (currently all of SQLite) - Create JDBC Driver/Server - Push down Expressions to the Tablet Servers - Install-less queries, use Q4A jar directly against any Accumulo Cluster ( less push-down expressions) - documentation :o - testing ;) *Does it work?* Not yet, the project is still a work in progress. and I will be working on it at the Accumulo Summit this year. Progress is slow as I am getting married in about a month and some change. *Questions:* If you have questions about Q4A as here, I will be at the Accumulo Summit @ ClearEdgeIT Table and Hackathon. *WHERE IS TEH LINK?!1!* Oh here: https://github.com/agwells0714/q4a -- *Andrew George Wells* *Software Engineer* *awe...@clearedgeit.com awe...@clearedgeit.com*
Re: Q4A Project
I'm always looking for places to help out and integrate/share designs ideas. I look forward to chatting with you about Q4A at the hackathon tomorrow! Have you, by chance, seen the Spark SQL adapter for the Accumulo Recipes Event Entity Stores [1]? At the very least, it's a good example of using Spark's SQL abstraction over Accumulo. As Mike Drob pointed out, Spark SQL has a pretty robust query planning / optimization layer. The Event/Entity stores in Accumulo Recipes also have a pluggable query planning/optimization layer. [1] https://github.com/calrissian/accumulo-recipes/blob/master/thirdparty/spark/src/test/scala/org/calrissian/accumulorecipes/spark/sql/EventStoreCatalystTest.scala On Mon, Apr 27, 2015 at 9:38 PM, Mike Drob mad...@cloudera.com wrote: Andrew, This is a cool thing to work on, I hope you have great success! A couple of questions about the motivations behind this, if you don't mind - - There are several SQL implementations already in the Hadoop ecosystem. In what ways do you expect this to improve upon Hive/Impala/Phoenix/Presto/Spark SQL? I haven't looked at the code, so it is quite possible you're already using one of those technologies. - In a conversation with some HP engineers earlier this year, they mentioned that building a SQL-92 layer is the easy part, and that a mature optimization engine is the really hard part. This is where Oracle may still be leaps and bounds ahead of its nearest competitors. Do you have plans for a query planner? If not, you might be back to writing MapReduce jobs sooner than you think. Look forward to seeing more! Mike On Mon, Apr 27, 2015 at 7:37 PM, Andrew Wells awe...@clearedgeit.com wrote: I have been working on a project, tentatively called Q4A (Query for Accumulo). Another possible name is ASQ (Accumulo Streaming Query) [discus]. This is a streaming query as the query is completed via a stream, should never group data in memory. To batch, intermediate results would be written back to Accumulo temporarily. The *primary goal* is to have a complete SQL implementation native to Accumulo. *Why do this?* I am getting tired of writing bad java code to query a database. I would rather write bad SQL code. Also, people should be able to get queries out faster and it shouldn't take a developer. *Native To Accumulo*: - There should be no special format to read a database created by Q4A - There should be no special format for Q4A to query a table - All tables are tables available to Q4A - Any special tables, are stored away from the users databases (indexes, column definitions, etc) *Other Goals*: - Implement the entire SQL definition (currently all of SQLite) - Create JDBC Driver/Server - Push down Expressions to the Tablet Servers - Install-less queries, use Q4A jar directly against any Accumulo Cluster ( less push-down expressions) - documentation :o - testing ;) *Does it work?* Not yet, the project is still a work in progress. and I will be working on it at the Accumulo Summit this year. Progress is slow as I am getting married in about a month and some change. *Questions:* If you have questions about Q4A as here, I will be at the Accumulo Summit @ ClearEdgeIT Table and Hackathon. *WHERE IS TEH LINK?!1!* Oh here: https://github.com/agwells0714/q4a -- *Andrew George Wells* *Software Engineer* *awe...@clearedgeit.com awe...@clearedgeit.com*
Re: Q4A Project
Andrew, This is a cool thing to work on, I hope you have great success! A couple of questions about the motivations behind this, if you don't mind - - There are several SQL implementations already in the Hadoop ecosystem. In what ways do you expect this to improve upon Hive/Impala/Phoenix/Presto/Spark SQL? I haven't looked at the code, so it is quite possible you're already using one of those technologies. - In a conversation with some HP engineers earlier this year, they mentioned that building a SQL-92 layer is the easy part, and that a mature optimization engine is the really hard part. This is where Oracle may still be leaps and bounds ahead of its nearest competitors. Do you have plans for a query planner? If not, you might be back to writing MapReduce jobs sooner than you think. Look forward to seeing more! Mike On Mon, Apr 27, 2015 at 7:37 PM, Andrew Wells awe...@clearedgeit.com wrote: I have been working on a project, tentatively called Q4A (Query for Accumulo). Another possible name is ASQ (Accumulo Streaming Query) [discus]. This is a streaming query as the query is completed via a stream, should never group data in memory. To batch, intermediate results would be written back to Accumulo temporarily. The *primary goal* is to have a complete SQL implementation native to Accumulo. *Why do this?* I am getting tired of writing bad java code to query a database. I would rather write bad SQL code. Also, people should be able to get queries out faster and it shouldn't take a developer. *Native To Accumulo*: - There should be no special format to read a database created by Q4A - There should be no special format for Q4A to query a table - All tables are tables available to Q4A - Any special tables, are stored away from the users databases (indexes, column definitions, etc) *Other Goals*: - Implement the entire SQL definition (currently all of SQLite) - Create JDBC Driver/Server - Push down Expressions to the Tablet Servers - Install-less queries, use Q4A jar directly against any Accumulo Cluster ( less push-down expressions) - documentation :o - testing ;) *Does it work?* Not yet, the project is still a work in progress. and I will be working on it at the Accumulo Summit this year. Progress is slow as I am getting married in about a month and some change. *Questions:* If you have questions about Q4A as here, I will be at the Accumulo Summit @ ClearEdgeIT Table and Hackathon. *WHERE IS TEH LINK?!1!* Oh here: https://github.com/agwells0714/q4a -- *Andrew George Wells* *Software Engineer* *awe...@clearedgeit.com awe...@clearedgeit.com*