[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-11-28 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15702457#comment-15702457
 ] 

Julian Hyde commented on SOLR-8593:
---

Calcite rewrites {{SELECT DISTINCT ...}} to {{SELECT ... GROUP BY ...}}. So if 
you just deal with {{GROUP BY}} (i.e. Calcite's Aggregate operator) you should 
be fine.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8593.patch
>
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-11-27 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15700714#comment-15700714
 ] 

Cao Manh Dat commented on SOLR-8593:


What I mean about "run this filter faster than Calcite" is, Solr have something 
called parallel streaming expressions which will distribute that filter on many 
nodes, but as far as I know Calcite will do the second Filter ( HAVING clause ) 
in single node.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8593.patch
>
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-11-27 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15700686#comment-15700686
 ] 

Joel Bernstein commented on SOLR-8593:
--

Hi [~risdenk] and [~caomanhdat]. I've reviewed the latest work on this ticket 
and it's looking really good!

A couple pieces of functionality that appear to be missing are:

1) Handling of SELECT DISTINCT queries. In the current SQLHandler we can do 
MapReduce SELECT DISTINCT queries in parallel on worker nodes. And we can also 
push down the distinct logic to the JSON Facet API.

2) The pushing down of GROUP BY aggregations to the JSON Facet API.

Both of these currently require the aggregationMode parameter to be passed in 
with the query, which I think is fine for the initial Calcite release.

I'd be happy to add these capabilities to this branch. That will also give me 
an opportunity to work with the code and feel comfortable working with Calcite.


 

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8593.patch
>
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-11-14 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664701#comment-15664701
 ] 

Julian Hyde commented on SOLR-8593:
---

CALCITE-1306 covers this. It's not standard SQL but could be enabled via an 
extension.

I disagree that "Solr will run this filter faster than Calcite". With query 
optimization, both queries will produce identical plans. This issue is not 
about performance. It is about syntactic sugar (not that there's anything wrong 
with that).

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8593.patch
>
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-11-14 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664613#comment-15664613
 ] 

Cao Manh Dat commented on SOLR-8593:


But we wanna to handle having clause without the function like
{code}
having field_i = 19
{code}
Is there any better ways to do this kind of filter?

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8593.patch
>
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-11-14 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664595#comment-15664595
 ] 

Julian Hyde commented on SOLR-8593:
---

You're making a mistake I see a lot of people making: trying to do complex 
semantic transformations on the AST (SqlNode). That's an anti-pattern, because 
SQL's complex rules for name-resolution make the AST very brittle. You should 
do those kinds of transformations on the relational algebra tree (RelNode).

In fact, Calcite will convert query into a {{Scan -> Filter -> Aggregate -> 
Filter -> Project}} logical plan (the first Filter is the WHERE clause, the 
second Filter is the HAVING clause), so I don't think you need to do any tricky 
processing looking for aliases.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8593.patch
>
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-11-14 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664576#comment-15664576
 ] 

Julian Hyde commented on SOLR-8593:
---

Regarding the alias for "count(*)". I guess one approach is to extend Calcite 
to allow a pluggable alias derivation (it has to be pluggable because you can't 
please everyone). Another approach is to leave the aliases as they are but 
generate field names for the JSON result set. Note that if you call 
SqlNode.getParserPosition() on each item in the select clause it will tell you 
the start and end point of that expression in the original SQL string, so you 
can extract the "count(*)" using that information.

I don't think the the following should be valid, but under your proposed change 
it would be:

{code}
SELECT deptno
FROM (
  SELECT deptno, count(\*)
  FROM emp
  GROUP BY deptno) AS t
WHERE t."count(*)" > 3
{code}

Note that "count(\*)" is not an expression; it is a reference to a "column" 
produced by the sub-query. In my opinion, using a textual expression is very 
confusing, and we should not do it. Derived alias of {{count(\*)}} should be 
something not easily guessable, which will encourage users to use an alias:

{code}
SELECT deptno
FROM (
  SELECT deptno, count(\*) AS c
  FROM emp
  GROUP BY deptno) AS t
WHERE t.c > 3
{code}

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8593.patch
>
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-11-14 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664525#comment-15664525
 ] 

Cao Manh Dat commented on SOLR-8593:


Thanks [~risdenk], 

Yeah, I will definitely do that in the future.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8593.patch
>
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-11-14 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664433#comment-15664433
 ] 

Kevin Risden commented on SOLR-8593:


[~caomanhdat] - I merged your changes with the jira/solr-8593 branch. The pr 
104 was updated automatically.

In the future, I think you can open a PR against the existing jira/solr-8593 
branch. This would probably be easier than a patch to merge changes.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8593.patch
>
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-11-13 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15662327#comment-15662327
 ] 

Cao Manh Dat commented on SOLR-8593:


Thanks [~julianhyde], that make sense, but will Calcite have an option for that 
in the future? Because along with JDBC API, we also support query through REST 
API.

For example 
{code}
curl --data-urlencode 'stmt=SELECT to, count(*) FROM collection4 GROUP BY to 
ORDER BY count(*) desc LIMIT 10' 
  http://localhost:8983/solr/collection4/sql?aggregationMode=facet
{code}

Below is sample result set:
{code}
{"result-set":{"docs":[
  {"count(*)":9158,"to":"pete.da...@enron.com"},
  {"count(*)":6244,"to":"tana.jo...@enron.com"},
  {"count(*)":5874,"to":"jeff.dasov...@enron.com"},
  {"count(*)":5867,"to":"sara.shackle...@enron.com"},
  {"count(*)":5595,"to":"steven.k...@enron.com"},
  {"count(*)":4904,"to":"vkamin...@aol.com"},
  {"count(*)":4622,"to":"mark.tay...@enron.com"},
  {"count(*)":3819,"to":"kay.m...@enron.com"},
  {"count(*)":3678,"to":"richard.shap...@enron.com"},
  {"count(*)":3653,"to":"kate.sy...@enron.com"},
  {"EOF":"true","RESPONSE_TIME":10}]}
}
{code}
So "EXPR$1" is kinda weird as a field name.

I also want to ask another question, right now we rely on Calcite in memory 
filter to filter expr in having clause ( as {{ having sum(field_i) = 19 }} in 
below query
{code}
select str_s, count(*) as myCount, sum(field_i), min(field_i), max(field_i), 
cast(avg(1.0 * field_i) as float) from collection1 where text='' group by 
str_s having sum(field_i) = 19 order by sum(field_i) asc
{code}

So I created a filter predicate for convert LogicalFilter to SolrFilter like 
this
{code}
private static final boolean isNotFilterByExpr (List rexNodes, 
List fieldNames) {
  // We dont have a way to filter by result of aggregator now
  boolean result = true;
  for (RexNode rexNode : rexNodes) {
  if (rexNode instanceof RexCall) {
  result = result && isNotFilterByExpr(((RexCall) rexNode).getOperands(), 
fieldNames);
  } else if (rexNode instanceof RexInputRef) {
  result = result && !fieldNames.get(((RexInputRef) 
rexNode).getIndex()).startsWith("EXPR$");
  }
  }
  return result;
}

private static final Predicate FILTER_PREDICATE = relNode -> {
  List filterOperands = ((RexCall) ((LogicalFilter) 
relNode).getCondition()).getOperands();
  return isNotFilterByExpr(filterOperands, 
SolrRules.solrFieldNames(relNode.getRowType()));
};
{code}

But the above code is not works when I add alias for sum(field_i) like this
{code}
select str_s, count(*) as myCount, sum(field_i) as mySum, min(field_i), 
max(field_i), cast(avg(1.0 * field_i) as float) from collection1 where 
text='' group by str_s having sum(field_i) = 19 order by sum(field_i) asc
{code}

So do Calcite have a way to check for RexInputRef is an alias or not?


> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8593.patch
>
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-11-13 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15662280#comment-15662280
 ] 

Julian Hyde commented on SOLR-8593:
---

"count(\*)" is not a good derived column name, because it contains 
non-alphanumeric characters and is therefore not a valid identifier unless you 
enclose it in double-quotes. Therefore Calcite generates an alias that is a 
valid identifier.

I believe quite a few other databases do this.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8593.patch
>
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-11-12 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15661016#comment-15661016
 ] 

Cao Manh Dat commented on SOLR-8593:


Thanks [~risdenk]
[~julianhyde] BTW: How can we get rid of EXPR$1 return in 
{code}
select str_s, count(*) from collection1
{code}
the result set return the name of second column as {{EXPR$1}}, {{not count(*)}} 
as expected

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8593.patch
>
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-11-12 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15659950#comment-15659950
 ] 

Kevin Risden commented on SOLR-8593:


Thanks [~caomanhdat]! I'll integrate the fixes into the jira/solr-8593 branch 
and pull/104.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8593.patch
>
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-11-11 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15658025#comment-15658025
 ] 

Julian Hyde commented on SOLR-8593:
---

By the way, when you're ready, add please Solr to the [powered 
by|https://calcite.apache.org/docs/powered_by.html] page; see CALCITE-1112 for 
details.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-11-09 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652093#comment-15652093
 ] 

Joel Bernstein commented on SOLR-8593:
--

Very excited about this patch. After a day of review I think understand how 
this comes together. Next step is for me to get it running. Initial run of the 
test cases fail, probably due to classpath issue that [~risdenk] mentions. I'll 
work on getting it running as the next step and compare the feature set with 
the current release.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625888#comment-15625888
 ] 

ASF GitHub Bot commented on SOLR-8593:
--

Github user risdenk commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/104#discussion_r85965892
  
--- Diff: lucene/default-nested-ivy-settings.xml ---
@@ -32,6 +32,7 @@
   
 
   
+https://repository.apache.org/content/repositories/snapshots; 
m2compatible="true" />
--- End diff --

Avatica 1.9 released and PR updated. Waiting on Calcite 1.11.


> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-10-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15622553#comment-15622553
 ] 

ASF GitHub Bot commented on SOLR-8593:
--

Github user risdenk commented on the issue:

https://github.com/apache/lucene-solr/pull/104
  
@joel-bernstein if you push to the `jira/solr-8593` branch it should update 
this PR.


> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-10-31 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15622328#comment-15622328
 ] 

Kevin Risden commented on SOLR-8593:


Avatica 1.9 (and Calcite 1.11) is because this needs avatica-core that doesn't 
do shading. The shading was causing problems integrating into Solr (forget the 
exact errors now). I saw that Calcite 1.10 requires Avatica 1.8 so just copied 
the one offending file for now. Since Avatica 1.9 was just released, I'll 
switch from the SNAPSHOT to the release and should be able to use 1.11 SNAPSHOT 
since the compatibility fixes for 1.9 were included for Calcite now.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-10-28 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616986#comment-15616986
 ] 

Julian Hyde commented on SOLR-8593:
---

Ah, I think I see what's going on. You're using avatica-1.9-SNAPSHOT with 
calcite-1.10. calcite-1.10 requires avatica-1.8, so you should use that. (Or is 
there a good reason why you need avatica-1.9?)

By the way, avatica-1.9 is less than a week from release. calcite-1.11 is maybe 
a month to six weeks away. The exact compatibility issues you describe are 
covered in CALCITE-1270 (and see the PR attached to that case).

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-10-28 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616963#comment-15616963
 ] 

Julian Hyde commented on SOLR-8593:
---

Is there a Calcite issue logged for the AbstractMethodError relating to 
CalciteConnectionProperty? I see [others are running into the same 
problem|http://stackoverflow.com/questions/39318653/create-a-streaming-example-with-calcite-using-csv]
 and I want to document the solution (or fix the bug in Calcite/Avatica if it 
is a bug).

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-10-28 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616458#comment-15616458
 ] 

Yonik Seeley commented on SOLR-8593:


bq. (This should include joins)

Awesome! I know many who have been waiting for that!

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-10-28 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616318#comment-15616318
 ] 

Kevin Risden commented on SOLR-8593:


One thing that you may have to do to make the Solr server happy until Calcite 
1.11 gets released is put the solr-core jar in front of calcite-core on the 
classpath. They are alphabetical right now (I fixed by renaming solr-core to 
asolr-core). This is to make sure the CalciteConnectionProperty fixes goes in 
before the original version. Otherwise you get an abstract method error.

select count(distinct) ... might actually work right now with that patch. If it 
doesn't then comment out the rules in the static block in SolrRules. That 
should give the full power of just regular SQL on top of Solr. It wouldn't 
really push anything down but still the SQL works. (This should include joins)

You should be able to add/change the SolrRules to see what gets pushed down. 
The classes from the branch are the exact same as the code here: 
https://github.com/risdenk/solr-calcite-example The code there has some more 
tests that show the explain plan. It was easier for me to iterate on the 
implementation of the rules that way.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-10-28 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616304#comment-15616304
 ] 

Joel Bernstein commented on SOLR-8593:
--

[~risdenk], I'll pull the branch and begin working with it. My plan is to run 
the tests and see how the various Calcite pieces get triggered.

Perhaps for the 6.4 for release we should shoot for the same functionality we 
currently have, just with Calcite swapped in.

If we want tp add one new thing I think the SELECT COUNT(DISTINCT) ... query 
would be a great thing to add.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616284#comment-15616284
 ] 

ASF GitHub Bot commented on SOLR-8593:
--

Github user joel-bernstein commented on the issue:

https://github.com/apache/lucene-solr/pull/104
  
Ok, got it. I'll pull the branch and start working with it.


> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-09-22 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513607#comment-15513607
 ] 

Kevin Risden commented on SOLR-8593:


Adding some resources that may be helpful:
* http://www.slideshare.net/HadoopSummit/costbased-query-optimization
* 
https://medium.com/@mpathirage/query-planning-with-apache-calcite-part-1-fe957b011c36#.ywd9ouxmv

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-07-14 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378775#comment-15378775
 ] 

Joel Bernstein commented on SOLR-8593:
--

Ah, I see. I think we can pull off CoGroup with a merge() and reduce().

{code}
reduce(merge(search(..., sort="a asc"),
  search(..., sort="a asc"),
  on="a asc")), 
   by="a", 
   group(sort="b asc", n="5")) 
{code}

This operation is more generic in that it could be an aggregation or a join 
depending what comes after the reduce. 

Fun stuff.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-07-14 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378687#comment-15378687
 ] 

Julian Hyde commented on SOLR-8593:
---

The trickiest thing about CoGroup is that it aggregates (i.e. groups together) 
rows without collapsing them. So you need to be able to represent a nested set 
of rows. If Solr's evaluator can't handle nested rows then CoGroup will be 
tricky.

If you already have join and aggregate I'd stick with them.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-07-14 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378665#comment-15378665
 ] 

Joel Bernstein commented on SOLR-8593:
--

The CoGroup operator is interesting. Solr's Streaming Expressions provide some 
similar operations 
(https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions). 
CoGroup seems similar to a rollup() function wrapped around a join in Streaming 
Exrpressions. Psuedo-code:
{code}
 rollup(sum(x),
   over="y",
   innerJoin(on="a=b",
   search(...), 
   search(...))) 
{code)


Our basic strategy is to build up a Streaming Expressions that implements 
Calcites relational algebra. We've done this already with the Presto parser, 
but without joins.  

Looking forward to diving deeper into Calcite.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-07-13 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376266#comment-15376266
 ] 

Julian Hyde commented on SOLR-8593:
---

You should probably model your join and aggregate operators as sub-classes of 
Join and Aggregate that understand the "distribution" trait. If you are doing, 
say, "group by x" then you will need your input either to be singleton (i.e. 
only one input stream) or partitioned on x. Calcite will be able to ensure that 
the input is partitioned appropriately, either because it is stored in 
partitions, or by applying a shuffle/exchange. 

There is the regular Exchange operator that changes the distribution (i.e. 
re-partitions) and there is SortExchange that changes the distribution and also 
sorts within each partition. SortExchange models what the shuffle does in 
MapReduce.

After you have a plan like

{noformat}
MyJoin[left.a = right.b]
  Exchange[a]
MyAggregate
  Exchange
Scan[T1]
  Exchange[b]
Scan[T2]
{noformat}

you can turn into map-reduce by making the consumer of each Exchange into a 
reduce task, and the input to each Exchange a map task.

I asked [~ashutoshc] how he would generate Hive MapReduce plans in Calcite 
(most Hive plans these days are Tez) and he said you should consider writing a 
CoGroup operator (like the one in Pig). CoGroup is powerful enough to implement 
both join and aggregate, so it might save you some effort.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-07-12 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372906#comment-15372906
 ] 

Joel Bernstein commented on SOLR-8593:
--

Hi Julian!

Thanks for the offer to help out. [~risdenk] and I are very interested in using 
Calcite to power Solr's Parallel SQL engine so we can use Calcites awesome 
optimizer.

Kevin has been doing most of the work on this but I will be helping out more 
following the next Solr release. 

I think I our biggest struggle has been understanding how to apply the rules 
properly to push-down distributed joins and aggregations. Solr supports fast 
MapReduce shuffling, distributed joins and also has mature faceting analytics 
so we'd like to take advantage of all this power from the SQL interface.



> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-07-11 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371912#comment-15371912
 ] 

Julian Hyde commented on SOLR-8593:
---

Hi everyone! I'm VP of Apache Calcite. I only just noticed this JIRA case. I am 
excited that you are considering using Calcite. Please let me know if I can 
help.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-05-26 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301871#comment-15301871
 ] 

Joel Bernstein commented on SOLR-8593:
--

Interesting. I'd like to spend some time in the next couple weeks to see what 
we can done in the near term on this ticket. Possibly, the first step is just 
to release equivalent functionality to the current Presto code, with a Calcite 
release. This would provide the base to gradually expand the SQL feature set.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-05-23 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296678#comment-15296678
 ] 

Kevin Risden commented on SOLR-8593:


[~joel.bernstein] - Elasticsearch with Calcite just came through the Calcite 
dev list - CALCITE-1253

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-05-09 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276494#comment-15276494
 ] 

Joel Bernstein commented on SOLR-8593:
--

Ok, I'll keep digging into the rules to get a better understanding.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-05-09 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276480#comment-15276480
 ] 

Kevin Risden commented on SOLR-8593:


Here is a blog post about the concepts and example rules: 
https://datapsyche.wordpress.com/2014/08/06/optiq-query-push-down-concepts/

I am pretty sure that Hive uses Calcite internally and that the implementation 
would be in that code base. Drill might also have the same join logic in their 
code base. 

{quote}
Also do we have access to the SQL catalog and statistics from inside the rules 
engine. We'll need that information to decide which type of join to do. I guess 
we can make the catalog globally accessible if the rules API doesn't provide 
hooks into it.
{quote}

There is a computeSelfCost method that has access to the definition of:
{code}
public RelOptCost computeSelfCost(RelOptPlanner planner, RelMetadataQuery mq);
{code}

That should allow us to change which rules are fired based on different cost 
parameters. 

Also, the different rules can have different prerequisite states such that a 
filter is only allowed on a project or something similar. This should allow 
fine grained control of a join as well where the join will only fire if it is 
based on the sort of the same columns.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-05-09 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276476#comment-15276476
 ] 

Joel Bernstein commented on SOLR-8593:
--

I think my main concern is that the join pushdown rules haven't been exercised 
that much in production.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-05-09 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276472#comment-15276472
 ] 

Joel Bernstein commented on SOLR-8593:
--

Are you concerned that there isn't a reference implementation for join push 
downs?

Projects such as Apache Drill must have taken the approach of only embedding 
the parser/optimizer?

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-05-09 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276452#comment-15276452
 ] 

Kevin Risden commented on SOLR-8593:


{quote}
Do you know of an existing adapter that does join push downs that can be used 
as a reference implementation?
{quote}

{quote}
And what's the best source of documentation for implementing rules? I haven't 
found anything that I would describes as comprehensive.
{quote}

I don't have one specifically for joins. The best reference for implementing 
rules is the example adapters for Cassandra, MongoDB, and Splunk in the Calcite 
source. I wish there was some more comprehensive documentation. It has been a 
lot of trial and error to see how things all fit together.

Basically the pattern seems to be as follows:
1) Add a rule to SolrRules
2) The constructor specifies how it matches (project, filter, etc)
3) Add a class that implements the translation of that rule

This all eventually gets passed down to the SolrTable where the query is built 
and run. I've been thinking about changing since so that the query is built 
along the way but not sure thats possible. 

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-05-09 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276420#comment-15276420
 ] 

Joel Bernstein commented on SOLR-8593:
--

[~risdenk], I've been reviewing the work on 
https://github.com/risdenk/solr-calcite-example.

I think I understand the basics of how this works but there are some mysteries 
as to how exactly the rules get triggered and where to place rules like join 
push downs.

Do you know of an existing adapter that does join push downs that can be used a 
reference implementation?

And what's the best source of documentation for implementing rules? I haven't 
found anything that I would describes as comprehensive.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-04-29 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264057#comment-15264057
 ] 

Joel Bernstein commented on SOLR-8593:
--

This is awesome.

I like the idea of making this a 6.2 priority. I should have some time over the 
next couple of days dig into the implementation.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-04-29 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264031#comment-15264031
 ] 

Kevin Risden commented on SOLR-8593:


Currently this is working out really nicely. Need to address the above items 
but currently can even hook up Avatica server and use the Avatica client to 
connect. This would resolve most of the issues in SOLR-8659

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-04-28 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262895#comment-15262895
 ] 

Kevin Risden commented on SOLR-8593:


Ok made a bunch of progress on the jira/solr-8593 branch:
* cleaned up the tests so that there are only a few remaining items to address 
(outlined below)
* added support for float/double types
* fixed a CloudSolrClient resource leak

Left to do:
* Add support for facets and map_reduce as parameters
* ensure the pushdown to facets/map_reduce works correctly
* figure out the CloudSolrClient cache (currently not caching and creating new 
per stream)
* Push down aggregates to Solr
* add tests to ensure the proper plan is being generated by Calcite

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-04-28 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262361#comment-15262361
 ] 

Kevin Risden commented on SOLR-8593:


Pushed initial version to branch jira/solr-8593. Currently there are a lot of 
tests in SQLHandler that are commented out. Need to go back through and make 
sure they have the expected results. Also some file formatting related to 
headers needs to be addressed.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-23 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208885#comment-15208885
 ] 

Joel Bernstein commented on SOLR-8593:
--

Yeah this was done originally because Presto didn't support mixed case 
indentifiers, so I solved that problem by making the table name case 
insensitive. But this also matches up with the SQL spec which is case 
insensitive for indentifiers.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-23 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208874#comment-15208874
 ] 

Kevin Risden commented on SOLR-8593:


Another thing I learned from looking closely at the TestSQLHandler tests is 
that collection name is case insensitive with the current implementation. This 
seems wrong to me because collections are case sensitive.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-23 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208777#comment-15208777
 ] 

Kevin Risden commented on SOLR-8593:


For reference the class that is querying with a stream is here: 
https://github.com/risdenk/lucene-solr/blob/calcite-sql-handler/solr/core/src/java/org/apache/solr/handler/sql/SolrTable.java#L82

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-23 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208413#comment-15208413
 ] 

Kevin Risden commented on SOLR-8593:


One way to access Calcite is through JDBC. The JDBCStream is taking the 
resultset generated by Calcite and turning it back into a stream.

Inside Calcite, the SQL statement is being converted into a Stream after going 
through some optimization rules.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-23 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208409#comment-15208409
 ] 

Dennis Gove commented on SOLR-8593:
---

Ok, I see that. So the JDBCStream is being used as the bridge between standard 
solr handlers and calcite. But how how calcite turning the SQL statement into a 
stream pipeline?

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-23 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208336#comment-15208336
 ] 

Kevin Risden commented on SOLR-8593:


I think we can. I will look into it today and see how far I get with some of 
the optimization rules.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-23 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208335#comment-15208335
 ] 

Kevin Risden commented on SOLR-8593:


I think my statement was misleading. I think the issue is the single quote 
quoted identifiers. Single quotes from the research I did are always used to 
identify string constants. Back ticks, double quotes, and even brackets can be 
utilized for quoting identifiers. The quoting options for Calcite seem to be 
available under quoting here: 
https://calcite.apache.org/docs/adapter.html#drivers

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-23 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208325#comment-15208325
 ] 

Kevin Risden commented on SOLR-8593:


The diff is misleading since SQLHandler is not shown due to having too big of a 
diff :/ I removed most of SQLHandler and then just wrapped Calcite in a 
JDBCStream. The SQLHandler class is viewable here: 
https://github.com/risdenk/lucene-solr/blob/calcite-sql-handler/solr/core/src/java/org/apache/solr/handler/SQLHandler.java

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-22 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207685#comment-15207685
 ] 

Joel Bernstein commented on SOLR-8593:
--

Currently quoted identifiers do refer to columns. This was originally done 
because Presto didn't support mixed case columns unless they were quoted. But 
Presto fixed that problem. So the quoted identifiers as they are now don't 
really serve a purpose. But I do believe that both Presto and Calcite allow 
columns for quoted identifiers to support non parseable identifiers. 

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-22 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207593#comment-15207593
 ] 

Joel Bernstein commented on SOLR-8593:
--

I looked through the code and I'm seeing how CloudSolrStream is being used. But 
it's not clear to me we'll be able to implement that full range of capabilities 
through this approach.

For example:

1) Can we choose between the FacetStream and a parallel RollupStream based on 
the costs of the different approaches?

2) Can we do parallel joins using Solr's shuffling capabilities and Solr 
workers?


> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-22 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207416#comment-15207416
 ] 

Dennis Gove commented on SOLR-8593:
---

[~risdenk], I think I must be missing something in the diff link but I don't 
see any use of the JDBCStream. I'm not sure I understand how the use of the 
JDBCStream will help with this.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-22 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207350#comment-15207350
 ] 

Kevin Risden commented on SOLR-8593:


[~joel.bernstein] I was able to make some really good progress on the above 
linked piece. Its coming together just need to add the optimizations now.

I have been testing with https://github.com/risdenk/solr-calcite-example which 
is the same Calcite code isolated from Solr. It enables me to easily check the 
explain plan and data types. I'll probably bang on it some more tomorrow to 
integrate the push downs for project, filter, and sort. The hooks are all there 
just need to convert from the Cassandra example code to Solr syntax.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-22 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206919#comment-15206919
 ] 

Kevin Risden commented on SOLR-8593:


Here is a separate approach that uses all of Calcite and the JDBCStream: 
https://github.com/risdenk/lucene-solr/compare/master...risdenk:calcite-sql-handler

It removes all the custom processing from SQLHandler, wraps Calcite in a 
JDBCStream, and executes the query.

There is something I learned about TestSQLHandler that I'm not sure is correct:
* quoted identifiers - like 'id' and 'text' aren't valid? These shouldn't be 
referring to columns?

Things to be explored with this approach:
* switch from a standard query in SolrEnumerator to a stream
* fix data types
* optimize cases like where

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-22 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206730#comment-15206730
 ] 

Joel Bernstein commented on SOLR-8593:
--

[~risdenk] and I have been looking into different approaches for this ticket. 

One of the approaches is to embed the Calcite SQL parser and optimizer inside 
the SQLHandler. The entry point for this appears to be:

https://calcite.apache.org/apidocs/org/apache/calcite/tools/Planner.html

Using this approach we would need to implement two things:

1) A CatalogReader, which the calcite validator and optimizer will use to do 
it's job. The underlying implementation of this should work for the JDBC driver 
also, so we kill two big birds with one stone when this implemented.

2) A custom RelVisitor, which will rewrite the relational algebra tree 
(RelNode), created by the optimizer. The RelNode tree will need to be mapped to 
the Streaming API. Since the Streaming API already supports parallel relational 
algebra this should be fairly straight forward.

This approach would leave the Solr JDBC driver basically as it is, but provide 
all the hooks needed to finish off the remaining Catalog metadata methods.




> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-21 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204988#comment-15204988
 ] 

Joel Bernstein commented on SOLR-8593:
--

It's still worth looking into. But I suspect Calcite has to implement a lowest 
common denominator type of join to support joining lot's of different systems 
together.

Since Solr is not a meta engine we can get really specific and use Solr's 
unique shuffling capabilities to do faster joins.

[~dpgove], can speak to this better then I can, but the Streaming API's 
distributed joins are really fast. 

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-21 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204951#comment-15204951
 ] 

Kevin Risden commented on SOLR-8593:


{quote}
But I get worried that we'll be roped into how Calcite does things.
{quote}

This is something that I was worried about as well. I'm still trying to figure 
out how Calcite would best fit together.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-21 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204946#comment-15204946
 ] 

Joel Bernstein commented on SOLR-8593:
--


If wrapping this logic up inside a Calcite adapter makes things easier then I'm 
all for it. But I get worried that we'll be roped into how Calcite does things. 
For example Solr has shuffling capabilities that will make for some some very 
fast Joins. Will we be able to access these features through a calcite adapter 
or will we have to use calcites join techniques.

I'll read up on creating a Calcite adapter.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-21 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204891#comment-15204891
 ] 

Kevin Risden commented on SOLR-8593:


[~joel.bernstein] - I took a much closer look at Calcite and the SOLR-7560 
patch. I think there can be a much simpler approach than implementing the 
parsing and catalog separately. If a Calcite adaptor is built that enables 
Calcite to access Solr this would solve a lot of problems at once potentially. 
Calcite has a local JDBC connection which you can just pass in a SQL query and 
get back a result set. This local JDBC connection could be wrapped in a stream 
and returned from the /sql handler. This would remove the need for the 
limit/having/etc streams that currently exist. The logic for pushing down 
having/limit/etc would have to be moved into how Calcite does the processing. 
This should make the SQL handler much simpler since it would pass off much of 
the work its doing to Calcite. The downside to this approach is that much of 
the logic would be Calcite specific and need to figure out exactly which hooks 
to tie into.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-19 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200020#comment-15200020
 ] 

Kevin Risden commented on SOLR-8593:


Speaking of Calcite and JDBC - https://calcite.apache.org/avatica/. It was 
recently split out as a subproject of Calcite and is an implementation of jdbc 
client/server basically. It might not be far enough along currently to 
integrate but it is a thought. 

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-03-19 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199661#comment-15199661
 ] 

Joel Bernstein commented on SOLR-8593:
--

SOLR-7560 has a patch that uses Apache Calcite rather then Presto to parse the 
SQL queries. This would be a good place to start working from. This patch 
ignores hooks into Calcite's optimizer which needs access to a SQL catalog. 
These hooks will need to be investigated and the SQL Catalog will have to be 
built out. This will begin to overlap with work being done on the JDBC driver 
pretty much immediately.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
> Fix For: master
>
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-01-25 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115253#comment-15115253
 ] 

Joel Bernstein commented on SOLR-8593:
--

Linking to the JDBC driver work. The JDBC driver and the Calcite integration 
are connected because both are going to require hooks into the SQL Catalog. So 
work on the JDBC driver will inform work coming soon on the Calcite 
integration. 

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



<    1   2