[jira] [Commented] (CALCITE-1906) JdbcSortRule has a bug and it is never chosen

2018-07-20 Thread Luis Fernando Kauer (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551478#comment-16551478
 ] 

Luis Fernando Kauer commented on CALCITE-1906:
--

This might be related to another issue I reported in 
https://issues.apache.org/jira/browse/CALCITE-2056?focusedCommentId=16261240=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16261240

The problem is in converting the plan to sql when there is a JdbcSort and 
JdbcProject in that it ends up creating a subquery.

> JdbcSortRule has a bug and it is never chosen
> -
>
> Key: CALCITE-1906
> URL: https://issues.apache.org/jira/browse/CALCITE-1906
> Project: Calcite
>  Issue Type: Bug
>  Components: jdbc-adapter
>Reporter: Luis Fernando Kauer
>Assignee: Julian Hyde
>Priority: Major
>
> JdbcSortRule tries to push sort and limit operations to the database.
> Currently offset and limit operations are explicitly not pushed to the 
> database (prevented by the rule) but even sort operations end up not being 
> pushed.
> Checking how other adapters deal with this, like Mongo and Cassandra 
> adapters, I realized that the convert function from JdbcSortRule is different 
> from the others.
> Jdbc-adapter:
> {code}
>  if (sort.offset != null || sort.fetch != null) {
> // Cannot implement "OFFSET n FETCH n" currently.
> return null;
>   }
>   final RelTraitSet traitSet = sort.getTraitSet().replace(out);
>   return new JdbcSort(rel.getCluster(), traitSet,
>   convert(sort.getInput(), traitSet), sort.getCollation());
> {code}
> mongodb-adapter:
> {code}
>   final RelTraitSet traitSet =
>   sort.getTraitSet().replace(out)
>   .replace(sort.getCollation());
>   return new MongoSort(rel.getCluster(), traitSet,
>   convert(sort.getInput(), traitSet.replace(RelCollations.EMPTY)),
>   sort.getCollation(), sort.offset, sort.fetch);
> {code}
> By fixing JdbcSortRule so that it is just like those others and by removing 
> the code that prevented the rule to match when limit or offset are used seems 
> to solve the problem and JdbcSortRule now is being applied and both sort and 
> limit are being pushed to the database.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CALCITE-2081) NPE in join of 2 subqueries using window functions

2017-12-08 Thread Luis Fernando Kauer (JIRA)
Luis Fernando Kauer created CALCITE-2081:


 Summary: NPE in join of 2 subqueries using window functions
 Key: CALCITE-2081
 URL: https://issues.apache.org/jira/browse/CALCITE-2081
 Project: Calcite
  Issue Type: Bug
Reporter: Luis Fernando Kauer
Assignee: Julian Hyde


NPE when joining 2 subqueries that use window functions.

Testing with a new test case in JdbcTest:
{code}
  /** Tests for a join of subqueries using Window Functions */
  @Test public void testJoinWithWinAgg() {
final String sql = "select a.*, b.r from\n"
+ "(select \"deptno\", first_value(\"empid\") over \n"
+ "(partition by \"deptno\" order by \"commission\") as r\n"
+ "from \"hr\".\"emps\") a\n"
+ "left join\n"
+ "(select \"deptno\", last_value(\"empid\") over \n"
+ "(partition by \"deptno\" order by \"commission\") as r\n"
+ "from \"hr\".\"emps\") b\n"
+ "on a.\"deptno\" = b.\"deptno\"";
CalciteAssert.hr()
.query(sql)
.runs();
  }
{code}

Debugging this I found out that the NPE occurs at Expressions.declare (line 
2937) in initializer.getType() because initializer is null.  The statement is
{code}
int prevStart;
{code}
This statement does not have an initializer, but the method "declare" is trying 
to get the initializer's type, generating NPE.
This happens when joining 2 subqueries that use window functions because they 
end up using the same variables.  In BlockBuilder.append (line 124) it checks 
if the variable already exists and the problem starts there.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CALCITE-2056) Limit/Offset in generated Sql does not work for many databases

2017-11-21 Thread Luis Fernando Kauer (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261240#comment-16261240
 ] 

Luis Fernando Kauer commented on CALCITE-2056:
--

I'm working on the following branch to solve this:
https://github.com/lfkpoa/calcite/tree/CALCITE-2056

Julian, can you take a look? Do you prefer if I create a pull request, even if 
not ready yet?

I've made the following changes:
1. Created FetchOffsetType which is an enum that has different types of 
implementations for fetch/offset used by most databases.
2. Changed SqlDialect to include FetchOffsetType.  There are many different 
ways to instantiate a SqlDialect, some use INSTANCE for each database dialect. 
Others create the database dialect on demand, based on an empty context. I had 
to make sure the correct FetchOffsetType was set.
3. Changed SqlSelectOperator to call FetchOffsetType.unparse instead of calling 
Writer.fetchOffset.  But since some databases include the keywords for fetch 
(TOP, FIRST) just after SELECT keyword and others include at the end of the 
query, SqlSelectOperator calls FetchOffsetType.unparse twice, passing along the 
current FrameTypeEnum (SELECT or ORDER_BY).  I was not sure what to do if the 
FetchOffsetType does not implement the offset or fetch but it is called with it.
4. Changed SqlPrettyWriter.fetchOffset to use FetchOffsetType.  I think this 
was used as a hack to allow different types of fetch/offset.  With 
FetchOffsetType I don't think this is still necessary.
5. Created new test cases in SqlToRelConverterTest
6. Created new test cases in JdbcAdapterTest.  I'm not sure how to test the 
generated sql in jdbc adapter tests with other databases than hsqldb to check 
if fetch/offset is being pushed correctly.

Problem not solved:
Some tests I've created in JdbcAdapterTest fail because the generated SQL uses 
a subquery when there is no need for that.  This has to do with the order of 
JdbcProject and JdbcSort in the plan.  When JdbcProject is above JdbcSort it 
generates a subquery, because SELECT has a lower ordinal than ORDER_BY in 
SqlImplementor.Clause.  I don't know how to solve this.  I tried to put SELECT 
at the end which solved this problem but then I had problems when JdbcSort is 
above JdbcProject.

> Limit/Offset in generated Sql does not work for many databases
> --
>
> Key: CALCITE-2056
> URL: https://issues.apache.org/jira/browse/CALCITE-2056
> Project: Calcite
>  Issue Type: Bug
>Reporter: Luis Fernando Kauer
>Assignee: Julian Hyde
>
> SqlDialect.supportsOffsetFetch() defines whether the dialect supports 
> OFFSET/FETCH clauses introduced by SQL:2008, and by default returns true.
> SqlPrettyWriter.fetchOffset(SqlNode, SqlNode) uses that information to insert 
> "FETCH NEXT x ROWS" if it supportsOffsetFetch.
> However, I noticed that many of the databases support it in different ways 
> and some do not support at all.
> For example, Hsqldb, which is used in test cases does not accept FETCH NEXT x 
> ROWS, but it accepts LIMIT and FETCH FIRST x ROWS.
> Some databases accept TOP x and others use something even different, but most 
> of them also accept LIMIT syntax.
> I suggest we make using LIMIT the default, because it is the most accepted 
> and allow each SqlDialect do define its own way of using fetch and offset.
> If we check how Hibernate sets the dialects to deal with limit/offset for 
> each database, we see that most use limit syntax and some use special syntax, 
> but very few are configured to use SQL 2008 syntax.
> This hasn't been a problem until now because sort/limit/offset was not being 
> pushed to the database.  But now that I'm trying to fix it, the generated SQL 
> is not working in Hsqldb.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CALCITE-2056) Limit/Offset in generated Sql does not work for many databases

2017-11-16 Thread Luis Fernando Kauer (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255259#comment-16255259
 ] 

Luis Fernando Kauer commented on CALCITE-2056:
--

I searched for how each database implements this and what I found is that very 
few implement SQL standard Offset/Fetch and only very recently, while most of 
them implement LIMIT/OFFSET.
Some implement LIMIT (or FETCH) but not OFFSET:
SELECT * FROM T LIMIT 10 OFFSET 20
Netezza, MySQL, Sybase, PostreSQL, SQLite, HSQLDB, H2, Vertica, Polyhedra, 
CouchBase, DB2, Vertica

SELECT * FROM T LIMIT 10, 20
MySQL, DB2

SELECT * FROM T LIMIT 10
Netezza, MySQL, Sybase, PostreSQL, SQLite, HSQLDB, H2, Vertica, Polyhedra, 
CouchBase, DB2, Vertica, HIVE

SELECT FIRST 10 * FROM T
Ingres, Informix, Firebird

SELECT TOP 10 * FROM T
MsSQLServer, HSQLDB, MsAccess, Teradata

SELECT FIRST 10 SKIP 20 * FROM T
Firebird

SELECT * FROM T FETCH FIRST 10 ROWS ONLY
DB2, Oracle

SELECT * FROM T OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY
DB2(>=12), ORACLE(>12c), PostgreSQL(>8.4), HSQLDB(>2.2*), DERBY(>10.7)

The HSQLDB version 2.3.1 used in Calcite does not accept this exact syntax, but 
it accepts SELECT * FROM T OFFSET 20 ROWS FETCH 10 ROWS ONLY.

I think we should make Calcite work with any database and fallback to 
EnumerableLimit when limit/offset can't be pushed to the database. Some 
databases accept only limit but we could push the limit (with adjustment) and 
fallback the offset to Enumerable, when needed.
We could have an enum of limit/offset variations, like NONE, LIMIT, 
LIMIT_OFFSET, FIRST, TOP, FETCH, FETCH_OFFSET, and each SqlDialect could return 
which one it implements.
What do you think?


> Limit/Offset in generated Sql does not work for many databases
> --
>
> Key: CALCITE-2056
> URL: https://issues.apache.org/jira/browse/CALCITE-2056
> Project: Calcite
>  Issue Type: Bug
>Reporter: Luis Fernando Kauer
>Assignee: Julian Hyde
>
> SqlDialect.supportsOffsetFetch() defines whether the dialect supports 
> OFFSET/FETCH clauses introduced by SQL:2008, and by default returns true.
> SqlPrettyWriter.fetchOffset(SqlNode, SqlNode) uses that information to insert 
> "FETCH NEXT x ROWS" if it supportsOffsetFetch.
> However, I noticed that many of the databases support it in different ways 
> and some do not support at all.
> For example, Hsqldb, which is used in test cases does not accept FETCH NEXT x 
> ROWS, but it accepts LIMIT and FETCH FIRST x ROWS.
> Some databases accept TOP x and others use something even different, but most 
> of them also accept LIMIT syntax.
> I suggest we make using LIMIT the default, because it is the most accepted 
> and allow each SqlDialect do define its own way of using fetch and offset.
> If we check how Hibernate sets the dialects to deal with limit/offset for 
> each database, we see that most use limit syntax and some use special syntax, 
> but very few are configured to use SQL 2008 syntax.
> This hasn't been a problem until now because sort/limit/offset was not being 
> pushed to the database.  But now that I'm trying to fix it, the generated SQL 
> is not working in Hsqldb.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CALCITE-2056) Limit/Offset in generated Sql does not work for many databases

2017-11-15 Thread Luis Fernando Kauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CALCITE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luis Fernando Kauer updated CALCITE-2056:
-
Description: 
SqlDialect.supportsOffsetFetch() defines whether the dialect supports 
OFFSET/FETCH clauses introduced by SQL:2008, and by default returns true.
SqlPrettyWriter.fetchOffset(SqlNode, SqlNode) uses that information to insert 
"FETCH NEXT x ROWS" if it supportsOffsetFetch.
However, I noticed that many of the databases support it in different ways and 
some do not support at all.
For example, Hsqldb, which is used in test cases does not accept FETCH NEXT x 
ROWS, but it accepts LIMIT and FETCH FIRST x ROWS.
Some databases accept TOP x and others use something even different, but most 
of them also accept LIMIT syntax.
I suggest we make using LIMIT the default, because it is the most accepted and 
allow each SqlDialect do define its own way of using fetch and offset.
If we check how Hibernate sets the dialects to deal with limit/offset for each 
database, we see that most use limit syntax and some use special syntax, but 
very few are configured to use SQL 2008 syntax.
This hasn't been a problem until now because sort/limit/offset was not being 
pushed to the database.  But now that I'm trying to fix it, the generated SQL 
is not working in Hsqldb.

  was:
SqlDialect.supportsOffsetFetch() defines whether the dialect supports 
OFFSET/FETCH clauses introduced by SQL:2008, and by default returns true.
SqlPrettyWriter.fetchOffset(SqlNode, SqlNode) uses that information to insert 
"FETCH NEXT x ROWS" if it supportsOffsetFetch.
However, I noticed that many of the databases support it in different ways and 
some do not support at all.
For example, Hsqldb, which is used in test cases does not accept FETCH NEXT x 
ROWS, but it accepts LIMIT and FETCH FIRST x ROWS.
Some databases accept TOP x and others use something even different (like 
PostgreSQL).
I suggest we make using LIMIT the default, because it is the most accepted and 
allow each SqlDialect do define its own way of using fetch and offset.
If we check how Hibernate sets the dialects to deal with limit/offset for each 
database, we see that most use limit syntax and some use special syntax, but 
very few are configured to use SQL 2008 syntax.
This hasn't been a problem until now because sort/limit/offset was not being 
pushed to the database.  But now that I'm trying to fix it, the generated SQL 
is not working in Hsqldb.


> Limit/Offset in generated Sql does not work for many databases
> --
>
> Key: CALCITE-2056
> URL: https://issues.apache.org/jira/browse/CALCITE-2056
> Project: Calcite
>  Issue Type: Bug
>Reporter: Luis Fernando Kauer
>Assignee: Julian Hyde
>
> SqlDialect.supportsOffsetFetch() defines whether the dialect supports 
> OFFSET/FETCH clauses introduced by SQL:2008, and by default returns true.
> SqlPrettyWriter.fetchOffset(SqlNode, SqlNode) uses that information to insert 
> "FETCH NEXT x ROWS" if it supportsOffsetFetch.
> However, I noticed that many of the databases support it in different ways 
> and some do not support at all.
> For example, Hsqldb, which is used in test cases does not accept FETCH NEXT x 
> ROWS, but it accepts LIMIT and FETCH FIRST x ROWS.
> Some databases accept TOP x and others use something even different, but most 
> of them also accept LIMIT syntax.
> I suggest we make using LIMIT the default, because it is the most accepted 
> and allow each SqlDialect do define its own way of using fetch and offset.
> If we check how Hibernate sets the dialects to deal with limit/offset for 
> each database, we see that most use limit syntax and some use special syntax, 
> but very few are configured to use SQL 2008 syntax.
> This hasn't been a problem until now because sort/limit/offset was not being 
> pushed to the database.  But now that I'm trying to fix it, the generated SQL 
> is not working in Hsqldb.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CALCITE-2056) Limit/Offset in generated Sql does not work for many databases

2017-11-15 Thread Luis Fernando Kauer (JIRA)
Luis Fernando Kauer created CALCITE-2056:


 Summary: Limit/Offset in generated Sql does not work for many 
databases
 Key: CALCITE-2056
 URL: https://issues.apache.org/jira/browse/CALCITE-2056
 Project: Calcite
  Issue Type: Bug
Reporter: Luis Fernando Kauer
Assignee: Julian Hyde


SqlDialect.supportsOffsetFetch() defines whether the dialect supports 
OFFSET/FETCH clauses introduced by SQL:2008, and by default returns true.
SqlPrettyWriter.fetchOffset(SqlNode, SqlNode) uses that information to insert 
"FETCH NEXT x ROWS" if it supportsOffsetFetch.
However, I noticed that many of the databases support it in different ways and 
some do not support at all.
For example, Hsqldb, which is used in test cases does not accept FETCH NEXT x 
ROWS, but it accepts LIMIT and FETCH FIRST x ROWS.
Some databases accept TOP x and others use something even different (like 
PostgreSQL).
I suggest we make using LIMIT the default, because it is the most accepted and 
allow each SqlDialect do define its own way of using fetch and offset.
If we check how Hibernate sets the dialects to deal with limit/offset for each 
database, we see that most use limit syntax and some use special syntax, but 
very few are configured to use SQL 2008 syntax.
This hasn't been a problem until now because sort/limit/offset was not being 
pushed to the database.  But now that I'm trying to fix it, the generated SQL 
is not working in Hsqldb.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CALCITE-2044) Tweak cost of BindableTableScan to make sure Project is pushed through Aggregate

2017-11-14 Thread Luis Fernando Kauer (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251195#comment-16251195
 ] 

Luis Fernando Kauer commented on CALCITE-2044:
--

Thanks, Julian. 
Looks great. 
I changed typeFactory.builder() because it is deprecated and javadoc recomends 
using Builder, not because I don't like it. I thought I was updating to the 
recommended way to do it.
Should I continue using typeFactory.builder() for now?


> Tweak cost of BindableTableScan to make sure Project is pushed through 
> Aggregate
> 
>
> Key: CALCITE-2044
> URL: https://issues.apache.org/jira/browse/CALCITE-2044
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Luis Fernando Kauer
>Assignee: Julian Hyde
>Priority: Minor
>
> Similar to [CALCITE-1876].
> Projects are not pushed to BindableTableScan when using 
> ProjectableFilterableTable with aggregate functions.
> The reason is that the cost of BindableTableScan does not use projects (and 
> filters), so the planner chooses a plan with Project node removed by 
> ProjectRemoveRule.
> By tweaking the cost to use the number of used projects solved the problem.
> Any suggestion on the cost formula to take both projects and filters into 
> account?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CALCITE-2048) Create a better documentation for the Planner design

2017-11-13 Thread Luis Fernando Kauer (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250024#comment-16250024
 ] 

Luis Fernando Kauer commented on CALCITE-2048:
--

I agree with Enrico. 
The documentation should explain many concepts used in the planner, such as 
TraitDef, TraitSet, Convention, Program, Enumerable, Bindable, Program, Rules, 
transformation/conversion of nodes in rules, implementation of nodes, 
statistics, RelMetadataQuery, costs.  There are so many concepts.
I think examples or just snippets showing and explaining how they are used 
would help a lot.
We could also have a module with a full planner example with:
- configurable schema/catalog via json, so there is no need to use any adapter 
or create the schema somewhere else;
- configurable cost and metadata also via json, so it can be used to test 
different setups;
- show the best plan for each query, which rules where applied and the 
optimized SQL.
This would help test new rules, changes to costs and different queries easily 
using different setups.
Maybe even integrate with Cosette (Calcite-1977) to validate whether the 
optimized SQL is really equivalent to the query.



> Create a better documentation for the Planner design
> 
>
> Key: CALCITE-2048
> URL: https://issues.apache.org/jira/browse/CALCITE-2048
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Edmon Begoli
>Assignee: Edmon Begoli
>Priority: Minor
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Per request of the development community, and the assessment that we need a 
> tutorial, documentation and example code to work directly with the planner, 
> because it is a lot harder to work with the planner than with an adapter. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CALCITE-2044) Tweak cost of BindableTableScan to make sure Project is pushed through Aggregate

2017-11-09 Thread Luis Fernando Kauer (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246198#comment-16246198
 ] 

Luis Fernando Kauer commented on CALCITE-2044:
--

https://github.com/apache/calcite/pull/565


> Tweak cost of BindableTableScan to make sure Project is pushed through 
> Aggregate
> 
>
> Key: CALCITE-2044
> URL: https://issues.apache.org/jira/browse/CALCITE-2044
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Luis Fernando Kauer
>Assignee: Julian Hyde
>Priority: Minor
>
> Similar to [CALCITE-1876].
> Projects are not pushed to BindableTableScan when using 
> ProjectableFilterableTable with aggregate functions.
> The reason is that the cost of BindableTableScan does not use projects (and 
> filters), so the planner chooses a plan with Project node removed by 
> ProjectRemoveRule.
> By tweaking the cost to use the number of used projects solved the problem.
> Any suggestion on the cost formula to take both projects and filters into 
> account?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CALCITE-2044) Tweak cost of BindableTableScan to make sure Project is pushed through Aggregate

2017-11-09 Thread Luis Fernando Kauer (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245706#comment-16245706
 ] 

Luis Fernando Kauer commented on CALCITE-2044:
--

Alexey, I'd like to understand your real need.
You see, the computed cost does not have to be exact and the overall cost 
depends on the costs of all other nodes in the plan.
It is used to choose between different plans, but if you don't adjust the cost 
of all other nodes of the plan, like joins, filters and projects not pushed, 
aggregates, then you might not get the best plan anyway.
Here we are talking about BindableTableScan cost only and how the filters and 
projects change the cost.
Even though you may have expensive columns, I assume that if the query requires 
expensive columns you'll have to return them anyway, so usually pushing the 
used projects is desirable.
With filters is the same thing, I think.
Julian, I would like to know some counter examples to take it into account, if 
you may.


> Tweak cost of BindableTableScan to make sure Project is pushed through 
> Aggregate
> 
>
> Key: CALCITE-2044
> URL: https://issues.apache.org/jira/browse/CALCITE-2044
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Luis Fernando Kauer
>Assignee: Julian Hyde
>Priority: Minor
>
> Similar to [CALCITE-1876].
> Projects are not pushed to BindableTableScan when using 
> ProjectableFilterableTable with aggregate functions.
> The reason is that the cost of BindableTableScan does not use projects (and 
> filters), so the planner chooses a plan with Project node removed by 
> ProjectRemoveRule.
> By tweaking the cost to use the number of used projects solved the problem.
> Any suggestion on the cost formula to take both projects and filters into 
> account?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CALCITE-2044) Tweak cost of BindableTableScan to make sure Project is pushed through Aggregate

2017-11-08 Thread Luis Fernando Kauer (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243983#comment-16243983
 ] 

Luis Fernando Kauer commented on CALCITE-2044:
--

ProjectableFilterableTable is meant for simple uses.
I opened this Jira to solve the common problem of projects not being pushed to 
the table scan with aggregate.
I propose a simple solution, without the need to implement any other interface, 
just by reducing the cost when projects and filters are applied.
You are proposing a new feature that would allow more control of the cost, but 
Calcite already has many pluggable ways to control costs and provide 
statistics, and you should try them first. See Statistics, RelMetadataQuery, 
BuiltInMetadata ...

You can also use TranslatableTable and have full control.  I know, the csv 
example does not include filter push down.  Maybe we could improve the example 
to include it to show how it can be done.




> Tweak cost of BindableTableScan to make sure Project is pushed through 
> Aggregate
> 
>
> Key: CALCITE-2044
> URL: https://issues.apache.org/jira/browse/CALCITE-2044
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Luis Fernando Kauer
>Assignee: Julian Hyde
>Priority: Minor
>
> Similar to [CALCITE-1876].
> Projects are not pushed to BindableTableScan when using 
> ProjectableFilterableTable with aggregate functions.
> The reason is that the cost of BindableTableScan does not use projects (and 
> filters), so the planner chooses a plan with Project node removed by 
> ProjectRemoveRule.
> By tweaking the cost to use the number of used projects solved the problem.
> Any suggestion on the cost formula to take both projects and filters into 
> account?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CALCITE-2044) Tweak cost of BindableTableScan to make sure Project is pushed through Aggregate

2017-11-08 Thread Luis Fernando Kauer (JIRA)
Luis Fernando Kauer created CALCITE-2044:


 Summary: Tweak cost of BindableTableScan to make sure Project is 
pushed through Aggregate
 Key: CALCITE-2044
 URL: https://issues.apache.org/jira/browse/CALCITE-2044
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Luis Fernando Kauer
Assignee: Julian Hyde
Priority: Minor


Similar to [CALCITE-1876].
Projects are not pushed to BindableTableScan when using 
ProjectableFilterableTable with aggregate functions.
The reason is that the cost of BindableTableScan does not use projects (and 
filters), so the planner chooses a plan with Project node removed by 
ProjectRemoveRule.
By tweaking the cost to use the number of used projects solved the problem.
Any suggestion on the cost formula to take both projects and filters into 
account?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CALCITE-1876) Create a rule to push the projections used in aggregate functions

2017-11-07 Thread Luis Fernando Kauer (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242744#comment-16242744
 ] 

Luis Fernando Kauer commented on CALCITE-1876:
--

Pull Request:
https://github.com/apache/calcite/pull/562

I realized that the correct plan was being generated but it was not being 
selected.
Solved the problem by computing the cost of CsvTableScan using the number of 
projects used.

> Create a rule to push the projections used in aggregate functions
> -
>
> Key: CALCITE-1876
> URL: https://issues.apache.org/jira/browse/CALCITE-1876
> Project: Calcite
>  Issue Type: Improvement
>  Components: csv-adapter
>Reporter: Luis Fernando Kauer
>Assignee: Julian Hyde
>Priority: Minor
>
> From Julian Hyde;
> "Calcite should realize that Aggregate has an implied Project (because it 
> only uses a few columns) and push that projection into the CsvTableScan, but 
> it doesn’t."
> A query scans only the used projection when no aggregation is used:
> sql->explain plan for select name from emps;
> CsvTableScan(table=[[SALES, EMPS]], fields=[[1]])
> But it scans all the projections when an aggregation is used:
> sql->explain plan for select max(name) from emps;
> EnumerableAggregate(group=[{}], EXPR$0=[MAX($1)])
> CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CALCITE-1906) JdbcSortRule has a bug and it is never chosen

2017-08-04 Thread Luis Fernando Kauer (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114966#comment-16114966
 ] 

Luis Fernando Kauer commented on CALCITE-1906:
--

Debugging some more about this problem, I found out that what makes the sort be 
pushed is replacing the trait with the collations with a Collations.EMPTY trait:
{code}
convert(sort.getInput(), traitSet.replace(RelCollations.EMPTY))
{code}
The problem is that SortRemoveRule matches and it is applied when there is an 
active RelCollationTraitDef, removing Sort and setting the collation for the 
sort input.  Maybe JdbcTableScan was supposed to use the collation information, 
but it doesn't.
When using the code above to remove collations, SortRemoveRule is not applied 
and that's why the sort gets pushed to the database.
And the EnumerableLimitRule extracts the fetch and limit from Sort and it also 
is not applied when there is no collation field in the Sort. So the code above 
also prevents this rule from being applied.
If it can't assure that the table is sorted the best to do is to remove the 
collations when converting to JdbcSort.


> JdbcSortRule has a bug and it is never chosen
> -
>
> Key: CALCITE-1906
> URL: https://issues.apache.org/jira/browse/CALCITE-1906
> Project: Calcite
>  Issue Type: Bug
>  Components: jdbc-adapter
>Reporter: Luis Fernando Kauer
>Assignee: Julian Hyde
>
> JdbcSortRule tries to push sort and limit operations to the database.
> Currently offset and limit operations are explicitly not pushed to the 
> database (prevented by the rule) but even sort operations end up not being 
> pushed.
> Checking how other adapters deal with this, like Mongo and Cassandra 
> adapters, I realized that the convert function from JdbcSortRule is different 
> from the others.
> Jdbc-adapter:
> {code}
>  if (sort.offset != null || sort.fetch != null) {
> // Cannot implement "OFFSET n FETCH n" currently.
> return null;
>   }
>   final RelTraitSet traitSet = sort.getTraitSet().replace(out);
>   return new JdbcSort(rel.getCluster(), traitSet,
>   convert(sort.getInput(), traitSet), sort.getCollation());
> {code}
> mongodb-adapter:
> {code}
>   final RelTraitSet traitSet =
>   sort.getTraitSet().replace(out)
>   .replace(sort.getCollation());
>   return new MongoSort(rel.getCluster(), traitSet,
>   convert(sort.getInput(), traitSet.replace(RelCollations.EMPTY)),
>   sort.getCollation(), sort.offset, sort.fetch);
> {code}
> By fixing JdbcSortRule so that it is just like those others and by removing 
> the code that prevented the rule to match when limit or offset are used seems 
> to solve the problem and JdbcSortRule now is being applied and both sort and 
> limit are being pushed to the database.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CALCITE-1906) JdbcSortRule has a bug and it is never chosen

2017-07-27 Thread Luis Fernando Kauer (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103885#comment-16103885
 ] 

Luis Fernando Kauer commented on CALCITE-1906:
--

In my tests JdbcSort is also created but it is never chosen, not even in simple 
queries like the tests above.
With the "new code" above JdbcSort is created, chosen and the execution 
generates a SQL pushing sort and limit to the database.
So I think it fixes this bug.
I run the JdbcTest.testSelfJoinDifferentColumns and with this fix the JdbcSort 
is chosen:
{noformat}
2017-07-27 17:35:47,222 [main] DEBUG - Plan after physical tweaks: 
JdbcToEnumerableConverter: rowcount = 3.0, cumulative cost = 
{3060.45003 rows, 2923.618334746402 cpu, 0.0 io}, id = 177
  JdbcSort(sort0=[$1], dir0=[ASC], fetch=[3]): rowcount = 3.0, cumulative cost 
= {3060.15 rows, 2923.318334746402 cpu, 0.0 io}, id = 175
JdbcProject(full_name=[$1], last_name=[$3]): rowcount = 1500.0, cumulative 
cost = {3060.0 rows, 2922.0 cpu, 0.0 io}, id = 173
  JdbcJoin(condition=[=($2, $0)], joinType=[inner]): rowcount = 1500.0, 
cumulative cost = {1860.0 rows, 522.0 cpu, 0.0 io}, id = 171
JdbcProject(last_name=[$3]): rowcount = 100.0, cumulative cost = {180.0 
rows, 181.0 cpu, 0.0 io}, id = 166
  JdbcTableScan(table=[[foodmart, employee]]): rowcount = 100.0, 
cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 0
JdbcProject(full_name=[$1], first_name=[$2], last_name=[$3]): rowcount 
= 100.0, cumulative cost = {180.0 rows, 341.0 cpu, 0.0 io}, id = 169
  JdbcTableScan(table=[[foodmart, employee]]): rowcount = 100.0, 
cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 0

{noformat}

However, I noticed that the SortProjectTransposeRule is being matched and it 
pushes the Sort past the Project generating a plan like JdbcProject / JdbcSort 
/ JdbcTableScan, and the generated SQL used a subquery of the whole table.  The 
constructor of this rule is deprecated and has the comment "to be removed 
before 2.0", so I wonder if it is really necessary.  If I disable this rule, 
the generated SQL is right.

> JdbcSortRule has a bug and it is never chosen
> -
>
> Key: CALCITE-1906
> URL: https://issues.apache.org/jira/browse/CALCITE-1906
> Project: Calcite
>  Issue Type: Bug
>  Components: jdbc-adapter
>Reporter: Luis Fernando Kauer
>Assignee: Julian Hyde
>
> JdbcSortRule tries to push sort and limit operations to the database.
> Currently offset and limit operations are explicitly not pushed to the 
> database (prevented by the rule) but even sort operations end up not being 
> pushed.
> Checking how other adapters deal with this, like Mongo and Cassandra 
> adapters, I realized that the convert function from JdbcSortRule is different 
> from the others.
> Jdbc-adapter:
> {code}
>  if (sort.offset != null || sort.fetch != null) {
> // Cannot implement "OFFSET n FETCH n" currently.
> return null;
>   }
>   final RelTraitSet traitSet = sort.getTraitSet().replace(out);
>   return new JdbcSort(rel.getCluster(), traitSet,
>   convert(sort.getInput(), traitSet), sort.getCollation());
> {code}
> mongodb-adapter:
> {code}
>   final RelTraitSet traitSet =
>   sort.getTraitSet().replace(out)
>   .replace(sort.getCollation());
>   return new MongoSort(rel.getCluster(), traitSet,
>   convert(sort.getInput(), traitSet.replace(RelCollations.EMPTY)),
>   sort.getCollation(), sort.offset, sort.fetch);
> {code}
> By fixing JdbcSortRule so that it is just like those others and by removing 
> the code that prevented the rule to match when limit or offset are used seems 
> to solve the problem and JdbcSortRule now is being applied and both sort and 
> limit are being pushed to the database.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CALCITE-1906) JdbcSortRule has a bug and it is never chosen

2017-07-27 Thread Luis Fernando Kauer (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103620#comment-16103620
 ] 

Luis Fernando Kauer commented on CALCITE-1906:
--

Well, I tried many different simple queries and it was never chosen.
It seems that the convert function is wrong.
Code in Calcite already generate LIMIT and OFFSET for different dialects when 
generating the SQL.
The problem is that JdbcSort is never included in the final plan.
A simple test to verify using sort:
{code}
  @Test public void testSortPush() {
final String sql = "SELECT empno, ename\n"
+ "FROM Scott.emp ORDER BY ename";

CalciteAssert.model(JdbcTest.SCOTT_MODEL).query(sql).explainContains("JdbcSort");
  }
{code}
Another test to verify pushing Limit:
{code}
  @Test public void testLimitPush() {
final String sql = "SELECT empno, ename\n"
+ "FROM Scott.emp LIMIT 1";

CalciteAssert.model(JdbcTest.SCOTT_MODEL).query(sql).explainContains("JdbcSort");
  }
{code}
The new code is:
{code}
  /**
   * Rule to convert a {@link org.apache.calcite.rel.core.Sort} to an
   * {@link org.apache.calcite.adapter.jdbc.JdbcRules.JdbcSort}.
   */
  private static class JdbcSortRule extends JdbcConverterRule {
private JdbcSortRule(JdbcConvention out) {
  super(Sort.class, Convention.NONE, out, "JdbcSortRule");
}

public RelNode convert(RelNode rel) {
  final Sort sort = (Sort) rel;
  final RelTraitSet traitSet = sort.getTraitSet().replace(out)
  .replace(sort.getCollation());
  return new JdbcSort(rel.getCluster(), traitSet,
  convert(sort.getInput(), traitSet.replace(RelCollations.EMPTY)), 
  sort.getCollation(), sort.offset, sort.fetch);
}
  }

  /** Sort operator implemented in JDBC convention. */
  public static class JdbcSort
  extends Sort
  implements JdbcRel {
public JdbcSort(
RelOptCluster cluster,
RelTraitSet traitSet,
RelNode input,
RelCollation collation,
RexNode offset,
RexNode fetch) {
  super(cluster, traitSet, input, collation, offset, fetch);
  assert getConvention() instanceof JdbcConvention;
  assert getConvention() == input.getConvention();
}

@Override public JdbcSort copy(RelTraitSet traitSet, RelNode newInput,
RelCollation newCollation, RexNode offset, RexNode fetch) {
  return new JdbcSort(getCluster(), traitSet, newInput, newCollation, 
offset, fetch);
}

public JdbcImplementor.Result implement(JdbcImplementor implementor) {
  return implementor.implement(this);
}

@Override
public RelOptCost computeSelfCost(RelOptPlanner planner, 
RelMetadataQuery mq) {
return super.computeSelfCost(planner, mq).multiplyBy(0.05);
}
  }
{code}



> JdbcSortRule has a bug and it is never chosen
> -
>
> Key: CALCITE-1906
> URL: https://issues.apache.org/jira/browse/CALCITE-1906
> Project: Calcite
>  Issue Type: Bug
>  Components: jdbc-adapter
>Reporter: Luis Fernando Kauer
>Assignee: Julian Hyde
>
> JdbcSortRule tries to push sort and limit operations to the database.
> Currently offset and limit operations are explicitly not pushed to the 
> database (prevented by the rule) but even sort operations end up not being 
> pushed.
> Checking how other adapters deal with this, like Mongo and Cassandra 
> adapters, I realized that the convert function from JdbcSortRule is different 
> from the others.
> Jdbc-adapter:
> {code}
>  if (sort.offset != null || sort.fetch != null) {
> // Cannot implement "OFFSET n FETCH n" currently.
> return null;
>   }
>   final RelTraitSet traitSet = sort.getTraitSet().replace(out);
>   return new JdbcSort(rel.getCluster(), traitSet,
>   convert(sort.getInput(), traitSet), sort.getCollation());
> {code}
> mongodb-adapter:
> {code}
>   final RelTraitSet traitSet =
>   sort.getTraitSet().replace(out)
>   .replace(sort.getCollation());
>   return new MongoSort(rel.getCluster(), traitSet,
>   convert(sort.getInput(), traitSet.replace(RelCollations.EMPTY)),
>   sort.getCollation(), sort.offset, sort.fetch);
> {code}
> By fixing JdbcSortRule so that it is just like those others and by removing 
> the code that prevented the rule to match when limit or offset are used seems 
> to solve the problem and JdbcSortRule now is being applied and both sort and 
> limit are being pushed to the database.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CALCITE-1906) JdbcSortRule has a bug and it is chosen

2017-07-27 Thread Luis Fernando Kauer (JIRA)
Luis Fernando Kauer created CALCITE-1906:


 Summary: JdbcSortRule has a bug and it is chosen
 Key: CALCITE-1906
 URL: https://issues.apache.org/jira/browse/CALCITE-1906
 Project: Calcite
  Issue Type: Bug
  Components: jdbc-adapter
Reporter: Luis Fernando Kauer
Assignee: Julian Hyde


JdbcSortRule tries to push sort and limit operations to the database.
Currently offset and limit operations are explicitly not pushed to the database 
(prevented by the rule) but even sort operations end up not being pushed.
Checking how other adapters deal with this, like Mongo and Cassandra adapters, 
I realized that the convert function from JdbcSortRule is different from the 
others.
Jdbc-adapter:
{{
 if (sort.offset != null || sort.fetch != null) {
// Cannot implement "OFFSET n FETCH n" currently.
return null;
  }
  final RelTraitSet traitSet = sort.getTraitSet().replace(out);
  return new JdbcSort(rel.getCluster(), traitSet,
  convert(sort.getInput(), traitSet), sort.getCollation());
}}
mongodb-adapter:
{{
  final RelTraitSet traitSet =
  sort.getTraitSet().replace(out)
  .replace(sort.getCollation());
  return new MongoSort(rel.getCluster(), traitSet,
  convert(sort.getInput(), traitSet.replace(RelCollations.EMPTY)),
  sort.getCollation(), sort.offset, sort.fetch);
}}
By fixing JdbcSortRule so that it is just like those others and by removing the 
code that prevented the rule to match when limit or offset are used seems to 
solve the problem and JdbcSortRule now is being applied and both sort and limit 
are being pushed to the database.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CALCITE-1876) Create a rule to push the projections used in aggregate functions

2017-07-13 Thread Luis Fernando Kauer (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085646#comment-16085646
 ] 

Luis Fernando Kauer commented on CALCITE-1876:
--

The CSV adapter uses TranslatableTable and Calcite's built in rules don't 
simplify this kind of aggregate query.
However, I noticed that you mention to use ProjectableFilterableTable instead 
of TranslatableTable.
After implementing ProjectableFilterableTable in CSV adapter this aggregate 
query was simplified as expected with Calcite's built in rules, with no need to 
create a new rule.
So I wonder if a new rule should be created for CSV Adapter or if CSV Adapter 
should just implement ProjectableFilterableTable and take advantage of all the 
rules already implemented.


> Create a rule to push the projections used in aggregate functions
> -
>
> Key: CALCITE-1876
> URL: https://issues.apache.org/jira/browse/CALCITE-1876
> Project: Calcite
>  Issue Type: Improvement
>  Components: csv-adapter
>Reporter: Luis Fernando Kauer
>Assignee: Julian Hyde
>Priority: Minor
>
> From Julian Hyde;
> "Calcite should realize that Aggregate has an implied Project (because it 
> only uses a few columns) and push that projection into the CsvTableScan, but 
> it doesn’t."
> A query scans only the used projection when no aggregation is used:
> sql->explain plan for select name from emps;
> CsvTableScan(table=[[SALES, EMPS]], fields=[[1]])
> But it scans all the projections when an aggregation is used:
> sql->explain plan for select max(name) from emps;
> EnumerableAggregate(group=[{}], EXPR$0=[MAX($1)])
> CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CALCITE-1876) Create a rule to push the projections used in aggregate functions

2017-07-07 Thread Luis Fernando Kauer (JIRA)
Luis Fernando Kauer created CALCITE-1876:


 Summary: Create a rule to push the projections used in aggregate 
functions
 Key: CALCITE-1876
 URL: https://issues.apache.org/jira/browse/CALCITE-1876
 Project: Calcite
  Issue Type: Improvement
  Components: csv-adapter
Reporter: Luis Fernando Kauer
Assignee: Julian Hyde
Priority: Minor


>From Julian Hyde;
"Calcite should realize that Aggregate has an implied Project (because it only 
uses a few columns) and push that projection into the CsvTableScan, but it 
doesn’t."

A query scans only the used projection when no aggregation is used:
sql->explain plan for select name from emps;
CsvTableScan(table=[[SALES, EMPS]], fields=[[1]])

But it scans all the projections when an aggregation is used:
sql->explain plan for select max(name) from emps;
EnumerableAggregate(group=[{}], EXPR$0=[MAX($1)])
CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)