Re: Q4A Project

2015-04-27 Thread Corey Nolet
Andrew,

Have you considered leveraging existing SQL query layers like Hive or
Spark's SQL/DataFrames API? There are some pretty massive optimizations
involved in that API making the push-down predicates / selections pretty
easy to adapt for Accumulo.

On Mon, Apr 27, 2015 at 8:37 PM, Andrew Wells awe...@clearedgeit.com
wrote:

 I have been working on a project, tentatively called Q4A (Query for
 Accumulo). Another possible name is ASQ (Accumulo Streaming Query) [discus].

 This is a streaming query as the query is completed via a stream, should
 never group data in memory. To batch, intermediate results would be written
 back to Accumulo temporarily.


 The *primary goal* is to have a complete SQL implementation native to
 Accumulo.

 *Why do this?*
 I am getting tired of writing bad java code to query a database. I would
 rather write bad SQL code. Also, people should be able to get queries out
 faster and it shouldn't take a developer.


 *Native To Accumulo*:

- There should be no special format to read a database created by Q4A
- There should be no special format for Q4A to query a table
- All tables are tables available to Q4A
- Any special tables, are stored away from the users databases
(indexes, column definitions, etc)

 *Other Goals*:

- Implement the entire SQL definition (currently all of SQLite)
- Create JDBC Driver/Server
- Push down Expressions to the Tablet Servers
- Install-less queries, use Q4A jar directly against any Accumulo
Cluster ( less push-down expressions)
- documentation :o
- testing ;)

 *Does it work?*
 Not yet, the project is still a work in progress. and I will be working on
 it at the Accumulo Summit this year. Progress is slow as I am getting
 married in about a month and some change.

 *Questions:*
 If you have questions about Q4A as here, I will be at the Accumulo Summit
 @ ClearEdgeIT Table and Hackathon.

 *WHERE IS TEH LINK?!1!*
 Oh here: https://github.com/agwells0714/q4a

 --
 *Andrew George Wells*
 *Software Engineer*
 *awe...@clearedgeit.com awe...@clearedgeit.com*




Re: Q4A Project

2015-04-27 Thread Corey Nolet
I'm always looking for places to help out and integrate/share designs 
ideas. I look forward to chatting with you about Q4A at the hackathon
tomorrow!

Have you, by chance, seen the Spark SQL adapter for the Accumulo Recipes
Event  Entity Stores [1]? At the very least, it's a good example of using
Spark's SQL abstraction over Accumulo. As Mike Drob pointed out, Spark SQL
has a pretty robust query planning / optimization layer. The Event/Entity
stores in Accumulo Recipes also have a pluggable query
planning/optimization layer.


[1]
https://github.com/calrissian/accumulo-recipes/blob/master/thirdparty/spark/src/test/scala/org/calrissian/accumulorecipes/spark/sql/EventStoreCatalystTest.scala

On Mon, Apr 27, 2015 at 9:38 PM, Mike Drob mad...@cloudera.com wrote:

 Andrew,

 This is a cool thing to work on, I hope you have great success!

 A couple of questions about the motivations behind this, if you don't mind
 -
 - There are several SQL implementations already in the Hadoop ecosystem.
 In what ways do you expect this to improve upon
 Hive/Impala/Phoenix/Presto/Spark SQL? I haven't looked at the code, so it
 is quite possible you're already using one of those technologies.
 - In a conversation with some HP engineers earlier this year, they
 mentioned that building a SQL-92 layer is the easy part, and that a mature
 optimization engine is the really hard part. This is where Oracle may still
 be leaps and bounds ahead of its nearest competitors. Do you have plans for
 a query planner? If not, you might be back to writing MapReduce jobs sooner
 than you think.

 Look forward to seeing more!

 Mike

 On Mon, Apr 27, 2015 at 7:37 PM, Andrew Wells awe...@clearedgeit.com
 wrote:

 I have been working on a project, tentatively called Q4A (Query for
 Accumulo). Another possible name is ASQ (Accumulo Streaming Query) [discus].

 This is a streaming query as the query is completed via a stream, should
 never group data in memory. To batch, intermediate results would be written
 back to Accumulo temporarily.


 The *primary goal* is to have a complete SQL implementation native to
 Accumulo.

 *Why do this?*
 I am getting tired of writing bad java code to query a database. I would
 rather write bad SQL code. Also, people should be able to get queries out
 faster and it shouldn't take a developer.


 *Native To Accumulo*:

- There should be no special format to read a database created by Q4A
- There should be no special format for Q4A to query a table
- All tables are tables available to Q4A
- Any special tables, are stored away from the users databases
(indexes, column definitions, etc)

 *Other Goals*:

- Implement the entire SQL definition (currently all of SQLite)
- Create JDBC Driver/Server
- Push down Expressions to the Tablet Servers
- Install-less queries, use Q4A jar directly against any Accumulo
Cluster ( less push-down expressions)
- documentation :o
- testing ;)

 *Does it work?*
 Not yet, the project is still a work in progress. and I will be working
 on it at the Accumulo Summit this year. Progress is slow as I am getting
 married in about a month and some change.

 *Questions:*
 If you have questions about Q4A as here, I will be at the Accumulo Summit
 @ ClearEdgeIT Table and Hackathon.

 *WHERE IS TEH LINK?!1!*
 Oh here: https://github.com/agwells0714/q4a

 --
 *Andrew George Wells*
 *Software Engineer*
 *awe...@clearedgeit.com awe...@clearedgeit.com*





Re: Q4A Project

2015-04-27 Thread Mike Drob
Andrew,

This is a cool thing to work on, I hope you have great success!

A couple of questions about the motivations behind this, if you don't mind -
- There are several SQL implementations already in the Hadoop ecosystem. In
what ways do you expect this to improve upon
Hive/Impala/Phoenix/Presto/Spark SQL? I haven't looked at the code, so it
is quite possible you're already using one of those technologies.
- In a conversation with some HP engineers earlier this year, they
mentioned that building a SQL-92 layer is the easy part, and that a mature
optimization engine is the really hard part. This is where Oracle may still
be leaps and bounds ahead of its nearest competitors. Do you have plans for
a query planner? If not, you might be back to writing MapReduce jobs sooner
than you think.

Look forward to seeing more!

Mike

On Mon, Apr 27, 2015 at 7:37 PM, Andrew Wells awe...@clearedgeit.com
wrote:

 I have been working on a project, tentatively called Q4A (Query for
 Accumulo). Another possible name is ASQ (Accumulo Streaming Query) [discus].

 This is a streaming query as the query is completed via a stream, should
 never group data in memory. To batch, intermediate results would be written
 back to Accumulo temporarily.


 The *primary goal* is to have a complete SQL implementation native to
 Accumulo.

 *Why do this?*
 I am getting tired of writing bad java code to query a database. I would
 rather write bad SQL code. Also, people should be able to get queries out
 faster and it shouldn't take a developer.


 *Native To Accumulo*:

- There should be no special format to read a database created by Q4A
- There should be no special format for Q4A to query a table
- All tables are tables available to Q4A
- Any special tables, are stored away from the users databases
(indexes, column definitions, etc)

 *Other Goals*:

- Implement the entire SQL definition (currently all of SQLite)
- Create JDBC Driver/Server
- Push down Expressions to the Tablet Servers
- Install-less queries, use Q4A jar directly against any Accumulo
Cluster ( less push-down expressions)
- documentation :o
- testing ;)

 *Does it work?*
 Not yet, the project is still a work in progress. and I will be working on
 it at the Accumulo Summit this year. Progress is slow as I am getting
 married in about a month and some change.

 *Questions:*
 If you have questions about Q4A as here, I will be at the Accumulo Summit
 @ ClearEdgeIT Table and Hackathon.

 *WHERE IS TEH LINK?!1!*
 Oh here: https://github.com/agwells0714/q4a

 --
 *Andrew George Wells*
 *Software Engineer*
 *awe...@clearedgeit.com awe...@clearedgeit.com*