[jira] [Comment Edited] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890784#comment-15890784 ] Julian Hyde edited comment on SOLR-8593 at 3/1/17 6:49 PM: --- [~risdenk], Regarding the Turkish locale issue. We have to explicitly pass user.timezone from maven into surefire (see [pom.xml|https://github.com/apache/calcite/blob/0372d23b847d4d145917dd786d1c9e3570cb8041/pom.xml#L733]), so I suspect we'd have to do the same with the locale. Can you log a Calcite case please? Even if we can't reproduce, I'd rather that we tracked it. was (Author: julianhyde): [~risdenk], Regarding the Turkish locale issue. We have to explicitly pass user.timezone from maven into surefire, so I suspect we'd have to do the same with the locale. Can you log a Calcite case please? Even if we can't reproduce, I'd rather that we tracked it. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement > Components: Parallel SQL >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Fix For: 6.5, master (7.0) > > Attachments: SOLR-8593.patch, SOLR-8593.patch, SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890784#comment-15890784 ] Julian Hyde commented on SOLR-8593: --- [~risdenk], Regarding the Turkish locale issue. We have to explicitly pass user.timezone from maven into surefire, so I suspect we'd have to do the same with the locale. Can you log a Calcite case please? Even if we can't reproduce, I'd rather that we tracked it. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement > Components: Parallel SQL >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Fix For: 6.5, master (7.0) > > Attachments: SOLR-8593.patch, SOLR-8593.patch, SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886312#comment-15886312 ] Julian Hyde commented on SOLR-8593: --- Oh wow. I18n never fails to surprise. Please log a Calcite issue. We should ensure that Calcite runs correctly if {{user.locale=tr}}. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement > Components: Parallel SQL >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Fix For: 6.5, master (7.0) > > Attachments: SOLR-8593.patch, SOLR-8593.patch, SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851767#comment-15851767 ] Julian Hyde commented on SOLR-8593: --- This shouldn't be due to cost differences. The plan without the sort (limit) is incorrect, so should never be chosen, regardless of cost. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement > Components: Parallel SQL >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Fix For: 6.5, master (7.0) > > Attachments: SOLR-8593.patch, SOLR-8593.patch, SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849344#comment-15849344 ] Julian Hyde commented on SOLR-8593: --- In the case where there is LIMIT but no ORDER BY, is a LogicalSort created? (There should be.) Is a SolrSort created, and its its offset field set (there should be)? If so, how/why does the SolrSort get dropped? (Does the planner find that it is equivalent to something cheaper? It shouldn't.) > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement > Components: Parallel SQL >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Fix For: 6.5, master (7.0) > > Attachments: SOLR-8593.patch, SOLR-8593.patch, SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849145#comment-15849145 ] Julian Hyde edited comment on SOLR-8593 at 2/2/17 12:05 AM: Not sure I understand. The query {{select a from b limit 10}} will have a {{Sort}} whose key has zero fields but which has fetch = 10. The {{Sort}} will be translated to a {{SolrSort}} with similar attributes. The sort is trivial - that is, you don't need to do any work to sort on 0 fields - but you do need to apply the limit. If you see a {{SolrSort}} with empty keys, don't drop it, but maybe convert into a {{SolrLimit}} if you have such a thing. You may be wondering why we combine sort and limit into the same operator. But remember that relational data sets are inherently unordered, so we have to do them at the same time. Sort with an empty key has reasonable semantics, just as -- I hope you agree -- Aggregate with an empty key (e.g. {{select count\(\*\) from emp}}, which is equivalent to {{select count\(\*\) from emp group by ()}}) is a reasonable generalization of Aggregate. was (Author: julianhyde): Not sure I understand. The query {{select a from b limit 10}} will have a {{Sort}} whose key has zero fields but which has fetch = 10. The {{Sort}} will be translated to a {{SolrSort}} with similar attributes. The sort is trivial - that is, you don't need to do any work to sort on 0 fields - but you do need to apply the limit. If you see a {{SolrSort}} with empty keys, don't drop it, but maybe convert into a {{SolrLimit}} if you have such a thing. You may be wondering why we combine sort and limit into the same operator. But remember that relational data sets are inherently unordered, so we have to do them at the same time. Sort with an empty key has reasonable semantics, just as -- I hope you agree -- Aggregate with an empty key (e.g. {{select count(*) from emp}}, which is equivalent to {{select count(*) from emp group by ()}}) is a reasonable generalization of Aggregate. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement > Components: Parallel SQL >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Fix For: 6.5, master (7.0) > > Attachments: SOLR-8593.patch, SOLR-8593.patch, SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849145#comment-15849145 ] Julian Hyde commented on SOLR-8593: --- Not sure I understand. The query {{select a from b limit 10}} will have a {{Sort}} whose key has zero fields but which has fetch = 10. The {{Sort}} will be translated to a {{SolrSort}} with similar attributes. The sort is trivial - that is, you don't need to do any work to sort on 0 fields - but you do need to apply the limit. If you see a {{SolrSort}} with empty keys, don't drop it, but maybe convert into a {{SolrLimit}} if you have such a thing. You may be wondering why we combine sort and limit into the same operator. But remember that relational data sets are inherently unordered, so we have to do them at the same time. Sort with an empty key has reasonable semantics, just as -- I hope you agree -- Aggregate with an empty key (e.g. {{select count(*) from emp}}, which is equivalent to {{select count(*) from emp group by ()}}) is a reasonable generalization of Aggregate. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement > Components: Parallel SQL >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Fix For: 6.5, master (7.0) > > Attachments: SOLR-8593.patch, SOLR-8593.patch, SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15843313#comment-15843313 ] Julian Hyde commented on SOLR-8593: --- If you have any "linking" issues with protobuf, you might check out HIVE-15708, which was caused because Hive used both avatica-core (which shades protobuf) and avatica (which does not). > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch, SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9893) EasyMock/Mockito no longer works with Java 9 b148+
[ https://issues.apache.org/jira/browse/SOLR-9893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812465#comment-15812465 ] Julian Hyde commented on SOLR-9893: --- [~thetaphi], Thanks for replying. I agree with your strategy. I've disabled our offending tests using Assume, and we can still claim that Avatica works on JDK9, albeit with less coverage. I am concerned that the Mockito/Cglib community seem to think that JDK9 support == adding support for new JDK9 features. Whereas we just want the same old functionality to run on a JDK9 runtime. (We can't use JDK9 features until we drop support for JDK1.7 and JDK1.8.) I'll weigh in on https://github.com/cglib/cglib/issues/93 and until then I guess we'll have to be patient. > EasyMock/Mockito no longer works with Java 9 b148+ > -- > > Key: SOLR-9893 > URL: https://issues.apache.org/jira/browse/SOLR-9893 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Affects Versions: 6.x, master (7.0) >Reporter: Uwe Schindler >Priority: Blocker > > EasyMock does not work anymore with latest Java 9, because it uses cglib > behind that is trying to access a protected method inside the runtime using > setAccessible. This is no longer allowed by Java 9. > Actually this is really stupid. Instead of forcefully making the protected > defineClass method available to the outside, it is much more correct to just > subclass ClassLoader (like the Lucene expressions module does). > I tried updating to easymock/mockito, but all that does not work, approx 25 > tests fail. The only way is to disable all Mocking tests in Java 9. The > underlying issue in cglib is still not solved, master's code is here: > https://github.com/cglib/cglib/blob/master/cglib/src/main/java/net/sf/cglib/core/ReflectUtils.java#L44-L62 > As we use an old stone-aged version of mockito (1.x), a fix is not expected > to happen, although cglib might fix this! > What should we do? This stupid issue prevents us from testing Java 9 with > Solr completely! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9893) EasyMock/Mockito no longer works with Java 9 b148+
[ https://issues.apache.org/jira/browse/SOLR-9893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807982#comment-15807982 ] Julian Hyde commented on SOLR-9893: --- We are running into the same issue in Calcite/Avatica: CALCITE-1567. Do you know if there is a Mockito bug logged for this? Somewhere in https://github.com/cglib/cglib/issues/93 someone suggests that it is fixed in a later version of Mockito. If so I would like to upgrade to that version of Mockito. > EasyMock/Mockito no longer works with Java 9 b148+ > -- > > Key: SOLR-9893 > URL: https://issues.apache.org/jira/browse/SOLR-9893 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Affects Versions: 6.x, master (7.0) >Reporter: Uwe Schindler >Priority: Blocker > > EasyMock does not work anymore with latest Java 9, because it uses cglib > behind that is trying to access a protected method inside the runtime using > setAccessible. This is no longer allowed by Java 9. > Actually this is really stupid. Instead of forcefully making the protected > defineClass method available to the outside, it is much more correct to just > subclass ClassLoader (like the Lucene expressions module does). > I tried updating to easymock/mockito, but all that does not work, approx 25 > tests fail. The only way is to disable all Mocking tests in Java 9. The > underlying issue in cglib is still not solved, master's code is here: > https://github.com/cglib/cglib/blob/master/cglib/src/main/java/net/sf/cglib/core/ReflectUtils.java#L44-L62 > As we use an old stone-aged version of mockito (1.x), a fix is not expected > to happen, although cglib might fix this! > What should we do? This stupid issue prevents us from testing Java 9 with > Solr completely! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764709#comment-15764709 ] Julian Hyde commented on SOLR-8593: --- Yes, early January. I've logged CALCITE-1547 to track the release. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch, SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755661#comment-15755661 ] Julian Hyde commented on SOLR-8593: --- A list of GROUP BY fields would be fine. But it must be in a sub-class Aggregate. Everyone else who is using Aggregate wants "Aggregate([x, y])" to be identical to "Aggregate([y, x])". > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch, SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753387#comment-15753387 ] Julian Hyde edited comment on SOLR-8593 at 12/16/16 4:21 AM: - Would it be correct to say that you have a physical operator which is a combination of Aggregate and TopN? This physical operator would have a sorted list of grouping fields and also a parameter N (which affects the cost estimate). Maybe it's a sub-class of Aggregate with some extra fields. It could be created by a planner rule that matches a Sort (with limit) on top of an Aggregate and also looks at estimated cardinality of the fields in order to sort them. was (Author: julianhyde): Would it be correct to say that you have a physical operator which is a combination of Aggregate and TopN? This physical operator would have a sorted list of grouping fields and also a parameter N (which affects the cost estimate). Maybe it's a sub-class of Aggregate with some extra fields. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch, SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753387#comment-15753387 ] Julian Hyde commented on SOLR-8593: --- Would it be correct to say that you have a physical operator which is a combination of Aggregate and TopN? This physical operator would have a sorted list of grouping fields and also a parameter N (which affects the cost estimate). Maybe it's a sub-class of Aggregate with some extra fields. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch, SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752805#comment-15752805 ] Julian Hyde edited comment on SOLR-8593 at 12/15/16 11:15 PM: -- I wasn't familiar with faceting, but I quickly read https://wiki.apache.org/solr/SolrFacetingOverview. Suppose table T has fields a, b, c, d, and you want to do a faceted search on b, a. If you issue the query {{select b, a, count\(*) from t group by b, a}} then you will end up with {code} Project($1, $0, $2) Aggregate({0, 1}, COUNT(*)) Scan(table=T) {code} and as you correctly say, {{0, 1}} represents {{a, b}} because that is the physical order of the columns. Can you explain why the faceting algorithm is interested in the order of the columns? Is it because it needs to produce the output ordered or nested on those columns? If so, we can rephrase the SQL query so that we are accurately expressing in relational algebra what we need. was (Author: julianhyde): I wasn't familiar with faceting, but I quickly read https://wiki.apache.org/solr/SolrFacetingOverview. Suppose table T has fields a, b, c, d, and you want to do a faceted search on b, a. If you issue the query {{select b, a, count\(*) from t group by b, a}} then you will end up with {code} Project($1, $0, $2) Aggregate({0, 1}, COUNT(*)) Scan(table=T) {code} and as you correctly say, {{ \{0, 1\} }} represents {{ \{a, b\} }} because that is the physical order of the columns. Can you explain why the faceting algorithm is interested in the order of the columns? Is it because it needs to produce the output ordered or nested on those columns? If so, we can rephrase the SQL query so that we are accurately expressing in relational algebra what we need. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch, SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752805#comment-15752805 ] Julian Hyde edited comment on SOLR-8593 at 12/15/16 11:14 PM: -- I wasn't familiar with faceting, but I quickly read https://wiki.apache.org/solr/SolrFacetingOverview. Suppose table T has fields a, b, c, d, and you want to do a faceted search on b, a. If you issue the query {{select b, a, count\(*) from t group by b, a}} then you will end up with {code} Project($1, $0, $2) Aggregate({0, 1}, COUNT(*)) Scan(table=T) {code} and as you correctly say, {{ \{0, 1\} }} represents {{ \{a, b\} }} because that is the physical order of the columns. Can you explain why the faceting algorithm is interested in the order of the columns? Is it because it needs to produce the output ordered or nested on those columns? If so, we can rephrase the SQL query so that we are accurately expressing in relational algebra what we need. was (Author: julianhyde): I wasn't familiar with faceting, but I quickly read https://wiki.apache.org/solr/SolrFacetingOverview. Suppose table T has fields a, b, c, d, and you want to do a faceted search on b, a. If you issue the query {{select b, a, count\(*) from t group by b, a}} then you will end up with {code} Project($1, $0, $2) Aggregate({0, 1}, COUNT(*)) Scan(table=T) {code} and as you correctly say, {0, 1} represents {a, b} because that is the physical order of the columns. Can you explain why the faceting algorithm is interested in the order of the columns? Is it because it needs to produce the output ordered or nested on those columns? If so, we can rephrase the SQL query so that we are accurately expressing in relational algebra what we need. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch, SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752805#comment-15752805 ] Julian Hyde commented on SOLR-8593: --- I wasn't familiar with faceting, but I quickly read https://wiki.apache.org/solr/SolrFacetingOverview. Suppose table T has fields a, b, c, d, and you want to do a faceted search on b, a. If you issue the query {{select b, a, count(*) from t group by b, a}} then you will end up with {code} Project($1, $0, $2) Aggregate({0, 1}, COUNT(*)) Scan(table=T) {code} and as you correctly say, {0, 1} represents {a, b} because that is the physical order of the columns. Can you explain why the faceting algorithm is interested in the order of the columns? Is it because it needs to produce the output ordered or nested on those columns? If so, we can rephrase the SQL query so that we are accurately expressing in relational algebra what we need. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch, SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752805#comment-15752805 ] Julian Hyde edited comment on SOLR-8593 at 12/15/16 11:13 PM: -- I wasn't familiar with faceting, but I quickly read https://wiki.apache.org/solr/SolrFacetingOverview. Suppose table T has fields a, b, c, d, and you want to do a faceted search on b, a. If you issue the query {{select b, a, count\(*) from t group by b, a}} then you will end up with {code} Project($1, $0, $2) Aggregate({0, 1}, COUNT(*)) Scan(table=T) {code} and as you correctly say, {0, 1} represents {a, b} because that is the physical order of the columns. Can you explain why the faceting algorithm is interested in the order of the columns? Is it because it needs to produce the output ordered or nested on those columns? If so, we can rephrase the SQL query so that we are accurately expressing in relational algebra what we need. was (Author: julianhyde): I wasn't familiar with faceting, but I quickly read https://wiki.apache.org/solr/SolrFacetingOverview. Suppose table T has fields a, b, c, d, and you want to do a faceted search on b, a. If you issue the query {{select b, a, count(*) from t group by b, a}} then you will end up with {code} Project($1, $0, $2) Aggregate({0, 1}, COUNT(*)) Scan(table=T) {code} and as you correctly say, {0, 1} represents {a, b} because that is the physical order of the columns. Can you explain why the faceting algorithm is interested in the order of the columns? Is it because it needs to produce the output ordered or nested on those columns? If so, we can rephrase the SQL query so that we are accurately expressing in relational algebra what we need. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch, SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710183#comment-15710183 ] Julian Hyde commented on SOLR-8593: --- Calcite's operators are logical. A 'Filter' operator might turn into operator instances running on multiple nodes or threads, each processing a partition of the data. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch, SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15702473#comment-15702473 ] Julian Hyde commented on SOLR-8593: --- Calcite is an algebra, not an executor. When if converts a HAVING clause to a SolrFilter you are more than welcome to run those filters in parallel. I suppose it would mean SolrAggregate producing parallel output streams. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15702457#comment-15702457 ] Julian Hyde commented on SOLR-8593: --- Calcite rewrites {{SELECT DISTINCT ...}} to {{SELECT ... GROUP BY ...}}. So if you just deal with {{GROUP BY}} (i.e. Calcite's Aggregate operator) you should be fine. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664701#comment-15664701 ] Julian Hyde commented on SOLR-8593: --- CALCITE-1306 covers this. It's not standard SQL but could be enabled via an extension. I disagree that "Solr will run this filter faster than Calcite". With query optimization, both queries will produce identical plans. This issue is not about performance. It is about syntactic sugar (not that there's anything wrong with that). > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664595#comment-15664595 ] Julian Hyde commented on SOLR-8593: --- You're making a mistake I see a lot of people making: trying to do complex semantic transformations on the AST (SqlNode). That's an anti-pattern, because SQL's complex rules for name-resolution make the AST very brittle. You should do those kinds of transformations on the relational algebra tree (RelNode). In fact, Calcite will convert query into a {{Scan -> Filter -> Aggregate -> Filter -> Project}} logical plan (the first Filter is the WHERE clause, the second Filter is the HAVING clause), so I don't think you need to do any tricky processing looking for aliases. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664576#comment-15664576 ] Julian Hyde edited comment on SOLR-8593 at 11/14/16 6:11 PM: - Regarding the alias for "count(\*)". I guess one approach is to extend Calcite to allow a pluggable alias derivation (it has to be pluggable because you can't please everyone). Another approach is to leave the aliases as they are but generate field names for the JSON result set. Note that if you call SqlNode.getParserPosition() on each item in the select clause it will tell you the start and end point of that expression in the original SQL string, so you can extract the "count(\*)" using that information. I don't think the the following should be valid, but under your proposed change it would be: {code} SELECT deptno FROM ( SELECT deptno, count(*) FROM emp GROUP BY deptno) AS t WHERE t."count(*)" > 3 {code} Note that "count(\*)" is not an expression; it is a reference to a "column" produced by the sub-query. In my opinion, using a textual expression is very confusing, and we should not do it. Derived alias of {{count(\*)}} should be something not easily guessable, which will encourage users to use an alias: {code} SELECT deptno FROM ( SELECT deptno, count(*) AS c FROM emp GROUP BY deptno) AS t WHERE t.c > 3 {code} was (Author: julianhyde): Regarding the alias for "count(*)". I guess one approach is to extend Calcite to allow a pluggable alias derivation (it has to be pluggable because you can't please everyone). Another approach is to leave the aliases as they are but generate field names for the JSON result set. Note that if you call SqlNode.getParserPosition() on each item in the select clause it will tell you the start and end point of that expression in the original SQL string, so you can extract the "count(*)" using that information. I don't think the the following should be valid, but under your proposed change it would be: {code} SELECT deptno FROM ( SELECT deptno, count(\*) FROM emp GROUP BY deptno) AS t WHERE t."count(*)" > 3 {code} Note that "count(\*)" is not an expression; it is a reference to a "column" produced by the sub-query. In my opinion, using a textual expression is very confusing, and we should not do it. Derived alias of {{count(\*)}} should be something not easily guessable, which will encourage users to use an alias: {code} SELECT deptno FROM ( SELECT deptno, count(\*) AS c FROM emp GROUP BY deptno) AS t WHERE t.c > 3 {code} > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664576#comment-15664576 ] Julian Hyde commented on SOLR-8593: --- Regarding the alias for "count(*)". I guess one approach is to extend Calcite to allow a pluggable alias derivation (it has to be pluggable because you can't please everyone). Another approach is to leave the aliases as they are but generate field names for the JSON result set. Note that if you call SqlNode.getParserPosition() on each item in the select clause it will tell you the start and end point of that expression in the original SQL string, so you can extract the "count(*)" using that information. I don't think the the following should be valid, but under your proposed change it would be: {code} SELECT deptno FROM ( SELECT deptno, count(\*) FROM emp GROUP BY deptno) AS t WHERE t."count(*)" > 3 {code} Note that "count(\*)" is not an expression; it is a reference to a "column" produced by the sub-query. In my opinion, using a textual expression is very confusing, and we should not do it. Derived alias of {{count(\*)}} should be something not easily guessable, which will encourage users to use an alias: {code} SELECT deptno FROM ( SELECT deptno, count(\*) AS c FROM emp GROUP BY deptno) AS t WHERE t.c > 3 {code} > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15662280#comment-15662280 ] Julian Hyde commented on SOLR-8593: --- "count(\*)" is not a good derived column name, because it contains non-alphanumeric characters and is therefore not a valid identifier unless you enclose it in double-quotes. Therefore Calcite generates an alias that is a valid identifier. I believe quite a few other databases do this. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8593.patch > > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15658025#comment-15658025 ] Julian Hyde edited comment on SOLR-8593 at 11/11/16 8:01 PM: - By the way, when you're ready, add please Solr to the [powered by Calcite|https://calcite.apache.org/docs/powered_by.html] page; see CALCITE-1112 for details. was (Author: julianhyde): By the way, when you're ready, add please Solr to the [powered by|https://calcite.apache.org/docs/powered_by.html] page; see CALCITE-1112 for details. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15658025#comment-15658025 ] Julian Hyde commented on SOLR-8593: --- By the way, when you're ready, add please Solr to the [powered by|https://calcite.apache.org/docs/powered_by.html] page; see CALCITE-1112 for details. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616986#comment-15616986 ] Julian Hyde commented on SOLR-8593: --- Ah, I think I see what's going on. You're using avatica-1.9-SNAPSHOT with calcite-1.10. calcite-1.10 requires avatica-1.8, so you should use that. (Or is there a good reason why you need avatica-1.9?) By the way, avatica-1.9 is less than a week from release. calcite-1.11 is maybe a month to six weeks away. The exact compatibility issues you describe are covered in CALCITE-1270 (and see the PR attached to that case). > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616963#comment-15616963 ] Julian Hyde commented on SOLR-8593: --- Is there a Calcite issue logged for the AbstractMethodError relating to CalciteConnectionProperty? I see [others are running into the same problem|http://stackoverflow.com/questions/39318653/create-a-streaming-example-with-calcite-using-csv] and I want to document the solution (or fix the bug in Calcite/Avatica if it is a bug). > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Assignee: Joel Bernstein > >The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378687#comment-15378687 ] Julian Hyde commented on SOLR-8593: --- The trickiest thing about CoGroup is that it aggregates (i.e. groups together) rows without collapsing them. So you need to be able to represent a nested set of rows. If Solr's evaluator can't handle nested rows then CoGroup will be tricky. If you already have join and aggregate I'd stick with them. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376266#comment-15376266 ] Julian Hyde commented on SOLR-8593: --- You should probably model your join and aggregate operators as sub-classes of Join and Aggregate that understand the "distribution" trait. If you are doing, say, "group by x" then you will need your input either to be singleton (i.e. only one input stream) or partitioned on x. Calcite will be able to ensure that the input is partitioned appropriately, either because it is stored in partitions, or by applying a shuffle/exchange. There is the regular Exchange operator that changes the distribution (i.e. re-partitions) and there is SortExchange that changes the distribution and also sorts within each partition. SortExchange models what the shuffle does in MapReduce. After you have a plan like {noformat} MyJoin[left.a = right.b] Exchange[a] MyAggregate Exchange Scan[T1] Exchange[b] Scan[T2] {noformat} you can turn into map-reduce by making the consumer of each Exchange into a reduce task, and the input to each Exchange a map task. I asked [~ashutoshc] how he would generate Hive MapReduce plans in Calcite (most Hive plans these days are Tez) and he said you should consider writing a CoGroup operator (like the one in Pig). CoGroup is powerful enough to implement both join and aggregate, so it might save you some effort. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
[ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371912#comment-15371912 ] Julian Hyde commented on SOLR-8593: --- Hi everyone! I'm VP of Apache Calcite. I only just noticed this JIRA case. I am excited that you are considering using Calcite. Please let me know if I can help. > Integrate Apache Calcite into the SQLHandler > > > Key: SOLR-8593 > URL: https://issues.apache.org/jira/browse/SOLR-8593 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > > The Presto SQL Parser was perfect for phase one of the SQLHandler. It was > nicely split off from the larger Presto project and it did everything that > was needed for the initial implementation. > Phase two of the SQL work though will require an optimizer. Here is where > Apache Calcite comes into play. It has a battle tested cost based optimizer > and has been integrated into Apache Drill and Hive. > This work can begin in trunk following the 6.0 release. The final query plans > will continue to be translated to Streaming API objects (TupleStreams), so > continued work on the JDBC driver should plug in nicely with the Calcite work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org