[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15702457#comment-15702457 ] Julian Hyde commented on SOLR-8593: --- Calcite rewrites {{SELECT DISTINCT ...}} to {{SELECT ... GROUP BY ...}}. So if you just deal with {{GROUP BY}} (i.e. Calcite's Aggregate operator) you should be fine. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15700714#comment-15700714 ] Cao Manh Dat commented on SOLR-8593: What I mean about "run this filter faster than Calcite" is, Solr have something called parallel streaming expressions which will distribute that filter on many nodes, but as far as I know Calcite will do the second Filter ( HAVING clause ) in single node. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15700686#comment-15700686 ] Joel Bernstein commented on SOLR-8593: -- Hi [~risdenk] and [~caomanhdat]. I've reviewed the latest work on this ticket and it's looking really good! A couple pieces of functionality that appear to be missing are: 1) Handling of SELECT DISTINCT queries. In the current SQLHandler we can do MapReduce SELECT DISTINCT queries in parallel on worker nodes. And we can also push down the distinct logic to the JSON Facet API. 2) The pushing down of GROUP BY aggregations to the JSON Facet API. Both of these currently require the aggregationMode parameter to be passed in with the query, which I think is fine for the initial Calcite release. I'd be happy to add these capabilities to this branch. That will also give me an opportunity to work with the code and feel comfortable working with Calcite. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664701#comment-15664701 ] Julian Hyde commented on SOLR-8593: --- CALCITE-1306 covers this. It's not standard SQL but could be enabled via an extension. I disagree that "Solr will run this filter faster than Calcite". With query optimization, both queries will produce identical plans. This issue is not about performance. It is about syntactic sugar (not that there's anything wrong with that). > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664613#comment-15664613 ] Cao Manh Dat commented on SOLR-8593: But we wanna to handle having clause without the function like {code} having field_i = 19 {code} Is there any better ways to do this kind of filter? > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664595#comment-15664595 ] Julian Hyde commented on SOLR-8593: --- You're making a mistake I see a lot of people making: trying to do complex semantic transformations on the AST (SqlNode). That's an anti-pattern, because SQL's complex rules for name-resolution make the AST very brittle. You should do those kinds of transformations on the relational algebra tree (RelNode). In fact, Calcite will convert query into a {{Scan -> Filter -> Aggregate -> Filter -> Project}} logical plan (the first Filter is the WHERE clause, the second Filter is the HAVING clause), so I don't think you need to do any tricky processing looking for aliases. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664576#comment-15664576 ] Julian Hyde commented on SOLR-8593: --- Regarding the alias for "count(*)". I guess one approach is to extend Calcite to allow a pluggable alias derivation (it has to be pluggable because you can't please everyone). Another approach is to leave the aliases as they are but generate field names for the JSON result set. Note that if you call SqlNode.getParserPosition() on each item in the select clause it will tell you the start and end point of that expression in the original SQL string, so you can extract the "count(*)" using that information. I don't think the the following should be valid, but under your proposed change it would be: {code} SELECT deptno FROM ( SELECT deptno, count(\*) FROM emp GROUP BY deptno) AS t WHERE t."count(*)" > 3 {code} Note that "count(\*)" is not an expression; it is a reference to a "column" produced by the sub-query. In my opinion, using a textual expression is very confusing, and we should not do it. Derived alias of {{count(\*)}} should be something not easily guessable, which will encourage users to use an alias: {code} SELECT deptno FROM ( SELECT deptno, count(\*) AS c FROM emp GROUP BY deptno) AS t WHERE t.c > 3 {code} > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664525#comment-15664525 ] Cao Manh Dat commented on SOLR-8593: Thanks [~risdenk], Yeah, I will definitely do that in the future. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664433#comment-15664433 ] Kevin Risden commented on SOLR-8593: [~caomanhdat] - I merged your changes with the jira/solr-8593 branch. The pr 104 was updated automatically. In the future, I think you can open a PR against the existing jira/solr-8593 branch. This would probably be easier than a patch to merge changes. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15662327#comment-15662327 ] Cao Manh Dat commented on SOLR-8593: Thanks [~julianhyde], that make sense, but will Calcite have an option for that in the future? Because along with JDBC API, we also support query through REST API. For example {code} curl --data-urlencode 'stmt=SELECT to, count(*) FROM collection4 GROUP BY to ORDER BY count(*) desc LIMIT 10' http://localhost:8983/solr/collection4/sql?aggregationMode=facet {code} Below is sample result set: {code} {"result-set":{"docs":[ {"count(*)":9158,"to":"pete.da...@enron.com"}, {"count(*)":6244,"to":"tana.jo...@enron.com"}, {"count(*)":5874,"to":"jeff.dasov...@enron.com"}, {"count(*)":5867,"to":"sara.shackle...@enron.com"}, {"count(*)":5595,"to":"steven.k...@enron.com"}, {"count(*)":4904,"to":"vkamin...@aol.com"}, {"count(*)":4622,"to":"mark.tay...@enron.com"}, {"count(*)":3819,"to":"kay.m...@enron.com"}, {"count(*)":3678,"to":"richard.shap...@enron.com"}, {"count(*)":3653,"to":"kate.sy...@enron.com"}, {"EOF":"true","RESPONSE_TIME":10}]} } {code} So "EXPR$1" is kinda weird as a field name. I also want to ask another question, right now we rely on Calcite in memory filter to filter expr in having clause ( as {{ having sum(field_i) = 19 }} in below query {code} select str_s, count(*) as myCount, sum(field_i), min(field_i), max(field_i), cast(avg(1.0 * field_i) as float) from collection1 where text='' group by str_s having sum(field_i) = 19 order by sum(field_i) asc {code} So I created a filter predicate for convert LogicalFilter to SolrFilter like this {code} private static final boolean isNotFilterByExpr (List rexNodes, List fieldNames) { // We dont have a way to filter by result of aggregator now boolean result = true; for (RexNode rexNode : rexNodes) { if (rexNode instanceof RexCall) { result = result && isNotFilterByExpr(((RexCall) rexNode).getOperands(), fieldNames); } else if (rexNode instanceof RexInputRef) { result = result && !fieldNames.get(((RexInputRef) rexNode).getIndex()).startsWith("EXPR$"); } } return result; } private static final Predicate FILTER_PREDICATE = relNode -> { List filterOperands = ((RexCall) ((LogicalFilter) relNode).getCondition()).getOperands(); return isNotFilterByExpr(filterOperands, SolrRules.solrFieldNames(relNode.getRowType())); }; {code} But the above code is not works when I add alias for sum(field_i) like this {code} select str_s, count(*) as myCount, sum(field_i) as mySum, min(field_i), max(field_i), cast(avg(1.0 * field_i) as float) from collection1 where text='' group by str_s having sum(field_i) = 19 order by sum(field_i) asc {code} So do Calcite have a way to check for RexInputRef is an alias or not? > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15662280#comment-15662280 ] Julian Hyde commented on SOLR-8593: --- "count(\*)" is not a good derived column name, because it contains non-alphanumeric characters and is therefore not a valid identifier unless you enclose it in double-quotes. Therefore Calcite generates an alias that is a valid identifier. I believe quite a few other databases do this. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15661016#comment-15661016 ] Cao Manh Dat commented on SOLR-8593: Thanks [~risdenk] [~julianhyde] BTW: How can we get rid of EXPR$1 return in {code} select str_s, count(*) from collection1 {code} the result set return the name of second column as {{EXPR$1}}, {{not count(*)}} as expected > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15659950#comment-15659950 ] Kevin Risden commented on SOLR-8593: Thanks [~caomanhdat]! I'll integrate the fixes into the jira/solr-8593 branch and pull/104. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15658025#comment-15658025 ] Julian Hyde commented on SOLR-8593: --- By the way, when you're ready, add please Solr to the [powered by|https://calcite.apache.org/docs/powered_by.html] page; see CALCITE-1112 for details. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652093#comment-15652093 ] Joel Bernstein commented on SOLR-8593: -- Very excited about this patch. After a day of review I think understand how this comes together. Next step is for me to get it running. Initial run of the test cases fail, probably due to classpath issue that [~risdenk] mentions. I'll work on getting it running as the next step and compare the feature set with the current release. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625888#comment-15625888 ] ASF GitHub Bot commented on SOLR-8593: -- Github user risdenk commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/104#discussion_r85965892 --- Diff: lucene/default-nested-ivy-settings.xml --- @@ -32,6 +32,7 @@ +https://repository.apache.org/content/repositories/snapshots; m2compatible="true" /> --- End diff -- Avatica 1.9 released and PR updated. Waiting on Calcite 1.11. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15622553#comment-15622553 ] ASF GitHub Bot commented on SOLR-8593: -- Github user risdenk commented on the issue: https://github.com/apache/lucene-solr/pull/104 @joel-bernstein if you push to the `jira/solr-8593` branch it should update this PR. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15622328#comment-15622328 ] Kevin Risden commented on SOLR-8593: Avatica 1.9 (and Calcite 1.11) is because this needs avatica-core that doesn't do shading. The shading was causing problems integrating into Solr (forget the exact errors now). I saw that Calcite 1.10 requires Avatica 1.8 so just copied the one offending file for now. Since Avatica 1.9 was just released, I'll switch from the SNAPSHOT to the release and should be able to use 1.11 SNAPSHOT since the compatibility fixes for 1.9 were included for Calcite now. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616986#comment-15616986 ] Julian Hyde commented on SOLR-8593: --- Ah, I think I see what's going on. You're using avatica-1.9-SNAPSHOT with calcite-1.10. calcite-1.10 requires avatica-1.8, so you should use that. (Or is there a good reason why you need avatica-1.9?) By the way, avatica-1.9 is less than a week from release. calcite-1.11 is maybe a month to six weeks away. The exact compatibility issues you describe are covered in CALCITE-1270 (and see the PR attached to that case). > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616963#comment-15616963 ] Julian Hyde commented on SOLR-8593: --- Is there a Calcite issue logged for the AbstractMethodError relating to CalciteConnectionProperty? I see [others are running into the same problem|http://stackoverflow.com/questions/39318653/create-a-streaming-example-with-calcite-using-csv] and I want to document the solution (or fix the bug in Calcite/Avatica if it is a bug). > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616458#comment-15616458 ] Yonik Seeley commented on SOLR-8593: bq. (This should include joins) Awesome! I know many who have been waiting for that! > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616318#comment-15616318 ] Kevin Risden commented on SOLR-8593: One thing that you may have to do to make the Solr server happy until Calcite 1.11 gets released is put the solr-core jar in front of calcite-core on the classpath. They are alphabetical right now (I fixed by renaming solr-core to asolr-core). This is to make sure the CalciteConnectionProperty fixes goes in before the original version. Otherwise you get an abstract method error. select count(distinct) ... might actually work right now with that patch. If it doesn't then comment out the rules in the static block in SolrRules. That should give the full power of just regular SQL on top of Solr. It wouldn't really push anything down but still the SQL works. (This should include joins) You should be able to add/change the SolrRules to see what gets pushed down. The classes from the branch are the exact same as the code here: https://github.com/risdenk/solr-calcite-example The code there has some more tests that show the explain plan. It was easier for me to iterate on the implementation of the rules that way. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616304#comment-15616304 ] Joel Bernstein commented on SOLR-8593: -- [~risdenk], I'll pull the branch and begin working with it. My plan is to run the tests and see how the various Calcite pieces get triggered. Perhaps for the 6.4 for release we should shoot for the same functionality we currently have, just with Calcite swapped in. If we want tp add one new thing I think the SELECT COUNT(DISTINCT) ... query would be a great thing to add. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616284#comment-15616284 ] ASF GitHub Bot commented on SOLR-8593: -- Github user joel-bernstein commented on the issue: https://github.com/apache/lucene-solr/pull/104 Ok, got it. I'll pull the branch and start working with it. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513607#comment-15513607 ] Kevin Risden commented on SOLR-8593: Adding some resources that may be helpful: * http://www.slideshare.net/HadoopSummit/costbased-query-optimization * https://medium.com/@mpathirage/query-planning-with-apache-calcite-part-1-fe957b011c36#.ywd9ouxmv > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378775#comment-15378775 ] Joel Bernstein commented on SOLR-8593: -- Ah, I see. I think we can pull off CoGroup with a merge() and reduce(). {code} reduce(merge(search(..., sort="a asc"), search(..., sort="a asc"), on="a asc")), by="a", group(sort="b asc", n="5")) {code} This operation is more generic in that it could be an aggregation or a join depending what comes after the reduce. Fun stuff. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378687#comment-15378687 ] Julian Hyde commented on SOLR-8593: --- The trickiest thing about CoGroup is that it aggregates (i.e. groups together) rows without collapsing them. So you need to be able to represent a nested set of rows. If Solr's evaluator can't handle nested rows then CoGroup will be tricky. If you already have join and aggregate I'd stick with them. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378665#comment-15378665 ] Joel Bernstein commented on SOLR-8593: -- The CoGroup operator is interesting. Solr's Streaming Expressions provide some similar operations (https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions). CoGroup seems similar to a rollup() function wrapped around a join in Streaming Exrpressions. Psuedo-code: {code} rollup(sum(x), over="y", innerJoin(on="a=b", search(...), search(...))) {code) Our basic strategy is to build up a Streaming Expressions that implements Calcites relational algebra. We've done this already with the Presto parser, but without joins. Looking forward to diving deeper into Calcite. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376266#comment-15376266 ] Julian Hyde commented on SOLR-8593: --- You should probably model your join and aggregate operators as sub-classes of Join and Aggregate that understand the "distribution" trait. If you are doing, say, "group by x" then you will need your input either to be singleton (i.e. only one input stream) or partitioned on x. Calcite will be able to ensure that the input is partitioned appropriately, either because it is stored in partitions, or by applying a shuffle/exchange. There is the regular Exchange operator that changes the distribution (i.e. re-partitions) and there is SortExchange that changes the distribution and also sorts within each partition. SortExchange models what the shuffle does in MapReduce. After you have a plan like {noformat} MyJoin[left.a = right.b] Exchange[a] MyAggregate Exchange Scan[T1] Exchange[b] Scan[T2] {noformat} you can turn into map-reduce by making the consumer of each Exchange into a reduce task, and the input to each Exchange a map task. I asked [~ashutoshc] how he would generate Hive MapReduce plans in Calcite (most Hive plans these days are Tez) and he said you should consider writing a CoGroup operator (like the one in Pig). CoGroup is powerful enough to implement both join and aggregate, so it might save you some effort. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372906#comment-15372906 ] Joel Bernstein commented on SOLR-8593: -- Hi Julian! Thanks for the offer to help out. [~risdenk] and I are very interested in using Calcite to power Solr's Parallel SQL engine so we can use Calcites awesome optimizer. Kevin has been doing most of the work on this but I will be helping out more following the next Solr release. I think I our biggest struggle has been understanding how to apply the rules properly to push-down distributed joins and aggregations. Solr supports fast MapReduce shuffling, distributed joins and also has mature faceting analytics so we'd like to take advantage of all this power from the SQL interface. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371912#comment-15371912 ] Julian Hyde commented on SOLR-8593: --- Hi everyone! I'm VP of Apache Calcite. I only just noticed this JIRA case. I am excited that you are considering using Calcite. Please let me know if I can help. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301871#comment-15301871 ] Joel Bernstein commented on SOLR-8593: -- Interesting. I'd like to spend some time in the next couple weeks to see what we can done in the near term on this ticket. Possibly, the first step is just to release equivalent functionality to the current Presto code, with a Calcite release. This would provide the base to gradually expand the SQL feature set. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296678#comment-15296678 ] Kevin Risden commented on SOLR-8593: [~joel.bernstein] - Elasticsearch with Calcite just came through the Calcite dev list - CALCITE-1253 > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276494#comment-15276494 ] Joel Bernstein commented on SOLR-8593: -- Ok, I'll keep digging into the rules to get a better understanding. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276480#comment-15276480 ] Kevin Risden commented on SOLR-8593: Here is a blog post about the concepts and example rules: https://datapsyche.wordpress.com/2014/08/06/optiq-query-push-down-concepts/ I am pretty sure that Hive uses Calcite internally and that the implementation would be in that code base. Drill might also have the same join logic in their code base. {quote} Also do we have access to the SQL catalog and statistics from inside the rules engine. We'll need that information to decide which type of join to do. I guess we can make the catalog globally accessible if the rules API doesn't provide hooks into it. {quote} There is a computeSelfCost method that has access to the definition of: {code} public RelOptCost computeSelfCost(RelOptPlanner planner, RelMetadataQuery mq); {code} That should allow us to change which rules are fired based on different cost parameters. Also, the different rules can have different prerequisite states such that a filter is only allowed on a project or something similar. This should allow fine grained control of a join as well where the join will only fire if it is based on the sort of the same columns. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276476#comment-15276476 ] Joel Bernstein commented on SOLR-8593: -- I think my main concern is that the join pushdown rules haven't been exercised that much in production. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276472#comment-15276472 ] Joel Bernstein commented on SOLR-8593: -- Are you concerned that there isn't a reference implementation for join push downs? Projects such as Apache Drill must have taken the approach of only embedding the parser/optimizer? > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276452#comment-15276452 ] Kevin Risden commented on SOLR-8593: {quote} Do you know of an existing adapter that does join push downs that can be used as a reference implementation? {quote} {quote} And what's the best source of documentation for implementing rules? I haven't found anything that I would describes as comprehensive. {quote} I don't have one specifically for joins. The best reference for implementing rules is the example adapters for Cassandra, MongoDB, and Splunk in the Calcite source. I wish there was some more comprehensive documentation. It has been a lot of trial and error to see how things all fit together. Basically the pattern seems to be as follows: 1) Add a rule to SolrRules 2) The constructor specifies how it matches (project, filter, etc) 3) Add a class that implements the translation of that rule This all eventually gets passed down to the SolrTable where the query is built and run. I've been thinking about changing since so that the query is built along the way but not sure thats possible. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276420#comment-15276420 ] Joel Bernstein commented on SOLR-8593: -- [~risdenk], I've been reviewing the work on https://github.com/risdenk/solr-calcite-example. I think I understand the basics of how this works but there are some mysteries as to how exactly the rules get triggered and where to place rules like join push downs. Do you know of an existing adapter that does join push downs that can be used a reference implementation? And what's the best source of documentation for implementing rules? I haven't found anything that I would describes as comprehensive. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264057#comment-15264057 ] Joel Bernstein commented on SOLR-8593: -- This is awesome. I like the idea of making this a 6.2 priority. I should have some time over the next couple of days dig into the implementation. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264031#comment-15264031 ] Kevin Risden commented on SOLR-8593: Currently this is working out really nicely. Need to address the above items but currently can even hook up Avatica server and use the Avatica client to connect. This would resolve most of the issues in SOLR-8659 > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262895#comment-15262895 ] Kevin Risden commented on SOLR-8593: Ok made a bunch of progress on the jira/solr-8593 branch: * cleaned up the tests so that there are only a few remaining items to address (outlined below) * added support for float/double types * fixed a CloudSolrClient resource leak Left to do: * Add support for facets and map_reduce as parameters * ensure the pushdown to facets/map_reduce works correctly * figure out the CloudSolrClient cache (currently not caching and creating new per stream) * Push down aggregates to Solr * add tests to ensure the proper plan is being generated by Calcite > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262361#comment-15262361 ] Kevin Risden commented on SOLR-8593: Pushed initial version to branch jira/solr-8593. Currently there are a lot of tests in SQLHandler that are commented out. Need to go back through and make sure they have the expected results. Also some file formatting related to headers needs to be addressed. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208885#comment-15208885 ] Joel Bernstein commented on SOLR-8593: -- Yeah this was done originally because Presto didn't support mixed case indentifiers, so I solved that problem by making the table name case insensitive. But this also matches up with the SQL spec which is case insensitive for indentifiers. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208874#comment-15208874 ] Kevin Risden commented on SOLR-8593: Another thing I learned from looking closely at the TestSQLHandler tests is that collection name is case insensitive with the current implementation. This seems wrong to me because collections are case sensitive. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208777#comment-15208777 ] Kevin Risden commented on SOLR-8593: For reference the class that is querying with a stream is here: https://github.com/risdenk/lucene-solr/blob/calcite-sql-handler/solr/core/src/java/org/apache/solr/handler/sql/SolrTable.java#L82 > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208413#comment-15208413 ] Kevin Risden commented on SOLR-8593: One way to access Calcite is through JDBC. The JDBCStream is taking the resultset generated by Calcite and turning it back into a stream. Inside Calcite, the SQL statement is being converted into a Stream after going through some optimization rules. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208409#comment-15208409 ] Dennis Gove commented on SOLR-8593: --- Ok, I see that. So the JDBCStream is being used as the bridge between standard solr handlers and calcite. But how how calcite turning the SQL statement into a stream pipeline? > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208336#comment-15208336 ] Kevin Risden commented on SOLR-8593: I think we can. I will look into it today and see how far I get with some of the optimization rules. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208335#comment-15208335 ] Kevin Risden commented on SOLR-8593: I think my statement was misleading. I think the issue is the single quote quoted identifiers. Single quotes from the research I did are always used to identify string constants. Back ticks, double quotes, and even brackets can be utilized for quoting identifiers. The quoting options for Calcite seem to be available under quoting here: https://calcite.apache.org/docs/adapter.html#drivers > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208325#comment-15208325 ] Kevin Risden commented on SOLR-8593: The diff is misleading since SQLHandler is not shown due to having too big of a diff :/ I removed most of SQLHandler and then just wrapped Calcite in a JDBCStream. The SQLHandler class is viewable here: https://github.com/risdenk/lucene-solr/blob/calcite-sql-handler/solr/core/src/java/org/apache/solr/handler/SQLHandler.java > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207685#comment-15207685 ] Joel Bernstein commented on SOLR-8593: -- Currently quoted identifiers do refer to columns. This was originally done because Presto didn't support mixed case columns unless they were quoted. But Presto fixed that problem. So the quoted identifiers as they are now don't really serve a purpose. But I do believe that both Presto and Calcite allow columns for quoted identifiers to support non parseable identifiers. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207593#comment-15207593 ] Joel Bernstein commented on SOLR-8593: -- I looked through the code and I'm seeing how CloudSolrStream is being used. But it's not clear to me we'll be able to implement that full range of capabilities through this approach. For example: 1) Can we choose between the FacetStream and a parallel RollupStream based on the costs of the different approaches? 2) Can we do parallel joins using Solr's shuffling capabilities and Solr workers? > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207416#comment-15207416 ] Dennis Gove commented on SOLR-8593: --- [~risdenk], I think I must be missing something in the diff link but I don't see any use of the JDBCStream. I'm not sure I understand how the use of the JDBCStream will help with this. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207350#comment-15207350 ] Kevin Risden commented on SOLR-8593: [~joel.bernstein] I was able to make some really good progress on the above linked piece. Its coming together just need to add the optimizations now. I have been testing with https://github.com/risdenk/solr-calcite-example which is the same Calcite code isolated from Solr. It enables me to easily check the explain plan and data types. I'll probably bang on it some more tomorrow to integrate the push downs for project, filter, and sort. The hooks are all there just need to convert from the Cassandra example code to Solr syntax. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206919#comment-15206919 ] Kevin Risden commented on SOLR-8593: Here is a separate approach that uses all of Calcite and the JDBCStream: https://github.com/risdenk/lucene-solr/compare/master...risdenk:calcite-sql-handler It removes all the custom processing from SQLHandler, wraps Calcite in a JDBCStream, and executes the query. There is something I learned about TestSQLHandler that I'm not sure is correct: * quoted identifiers - like 'id' and 'text' aren't valid? These shouldn't be referring to columns? Things to be explored with this approach: * switch from a standard query in SolrEnumerator to a stream * fix data types * optimize cases like where > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206730#comment-15206730 ] Joel Bernstein commented on SOLR-8593: -- [~risdenk] and I have been looking into different approaches for this ticket. One of the approaches is to embed the Calcite SQL parser and optimizer inside the SQLHandler. The entry point for this appears to be: https://calcite.apache.org/apidocs/org/apache/calcite/tools/Planner.html Using this approach we would need to implement two things: 1) A CatalogReader, which the calcite validator and optimizer will use to do it's job. The underlying implementation of this should work for the JDBC driver also, so we kill two big birds with one stone when this implemented. 2) A custom RelVisitor, which will rewrite the relational algebra tree (RelNode), created by the optimizer. The RelNode tree will need to be mapped to the Streaming API. Since the Streaming API already supports parallel relational algebra this should be fairly straight forward. This approach would leave the Solr JDBC driver basically as it is, but provide all the hooks needed to finish off the remaining Catalog metadata methods. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204988#comment-15204988 ] Joel Bernstein commented on SOLR-8593: -- It's still worth looking into. But I suspect Calcite has to implement a lowest common denominator type of join to support joining lot's of different systems together. Since Solr is not a meta engine we can get really specific and use Solr's unique shuffling capabilities to do faster joins. [~dpgove], can speak to this better then I can, but the Streaming API's distributed joins are really fast. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204951#comment-15204951 ] Kevin Risden commented on SOLR-8593: {quote} But I get worried that we'll be roped into how Calcite does things. {quote} This is something that I was worried about as well. I'm still trying to figure out how Calcite would best fit together. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204946#comment-15204946 ] Joel Bernstein commented on SOLR-8593: -- If wrapping this logic up inside a Calcite adapter makes things easier then I'm all for it. But I get worried that we'll be roped into how Calcite does things. For example Solr has shuffling capabilities that will make for some some very fast Joins. Will we be able to access these features through a calcite adapter or will we have to use calcites join techniques. I'll read up on creating a Calcite adapter. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204891#comment-15204891 ] Kevin Risden commented on SOLR-8593: [~joel.bernstein] - I took a much closer look at Calcite and the SOLR-7560 patch. I think there can be a much simpler approach than implementing the parsing and catalog separately. If a Calcite adaptor is built that enables Calcite to access Solr this would solve a lot of problems at once potentially. Calcite has a local JDBC connection which you can just pass in a SQL query and get back a result set. This local JDBC connection could be wrapped in a stream and returned from the /sql handler. This would remove the need for the limit/having/etc streams that currently exist. The logic for pushing down having/limit/etc would have to be moved into how Calcite does the processing. This should make the SQL handler much simpler since it would pass off much of the work its doing to Calcite. The downside to this approach is that much of the logic would be Calcite specific and need to figure out exactly which hooks to tie into. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200020#comment-15200020 ] Kevin Risden commented on SOLR-8593: Speaking of Calcite and JDBC - https://calcite.apache.org/avatica/. It was recently split out as a subproject of Calcite and is an implementation of jdbc client/server basically. It might not be far enough along currently to integrate but it is a thought. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199661#comment-15199661 ] Joel Bernstein commented on SOLR-8593: -- SOLR-7560 has a patch that uses Apache Calcite rather then Presto to parse the SQL queries. This would be a good place to start working from. This patch ignores hooks into Calcite's optimizer which needs access to a SQL catalog. These hooks will need to be investigated and the SQL Catalog will have to be built out. This will begin to overlap with work being done on the JDBC driver pretty much immediately. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > Fix For: master > > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115253#comment-15115253 ] Joel Bernstein commented on SOLR-8593: -- Linking to the JDBC driver work. The JDBC driver and the Calcite integration are connected because both are going to require hooks into the SQL Catalog. So work on the JDBC driver will inform work coming soon on the Calcite integration. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org