[jira] [Commented] (CALCITE-1906) JdbcSortRule has a bug and it is never chosen
[ https://issues.apache.org/jira/browse/CALCITE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551478#comment-16551478 ] Luis Fernando Kauer commented on CALCITE-1906: -- This might be related to another issue I reported in https://issues.apache.org/jira/browse/CALCITE-2056?focusedCommentId=16261240=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16261240 The problem is in converting the plan to sql when there is a JdbcSort and JdbcProject in that it ends up creating a subquery. > JdbcSortRule has a bug and it is never chosen > - > > Key: CALCITE-1906 > URL: https://issues.apache.org/jira/browse/CALCITE-1906 > Project: Calcite > Issue Type: Bug > Components: jdbc-adapter >Reporter: Luis Fernando Kauer >Assignee: Julian Hyde >Priority: Major > > JdbcSortRule tries to push sort and limit operations to the database. > Currently offset and limit operations are explicitly not pushed to the > database (prevented by the rule) but even sort operations end up not being > pushed. > Checking how other adapters deal with this, like Mongo and Cassandra > adapters, I realized that the convert function from JdbcSortRule is different > from the others. > Jdbc-adapter: > {code} > if (sort.offset != null || sort.fetch != null) { > // Cannot implement "OFFSET n FETCH n" currently. > return null; > } > final RelTraitSet traitSet = sort.getTraitSet().replace(out); > return new JdbcSort(rel.getCluster(), traitSet, > convert(sort.getInput(), traitSet), sort.getCollation()); > {code} > mongodb-adapter: > {code} > final RelTraitSet traitSet = > sort.getTraitSet().replace(out) > .replace(sort.getCollation()); > return new MongoSort(rel.getCluster(), traitSet, > convert(sort.getInput(), traitSet.replace(RelCollations.EMPTY)), > sort.getCollation(), sort.offset, sort.fetch); > {code} > By fixing JdbcSortRule so that it is just like those others and by removing > the code that prevented the rule to match when limit or offset are used seems > to solve the problem and JdbcSortRule now is being applied and both sort and > limit are being pushed to the database. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-2081) NPE in join of 2 subqueries using window functions
Luis Fernando Kauer created CALCITE-2081: Summary: NPE in join of 2 subqueries using window functions Key: CALCITE-2081 URL: https://issues.apache.org/jira/browse/CALCITE-2081 Project: Calcite Issue Type: Bug Reporter: Luis Fernando Kauer Assignee: Julian Hyde NPE when joining 2 subqueries that use window functions. Testing with a new test case in JdbcTest: {code} /** Tests for a join of subqueries using Window Functions */ @Test public void testJoinWithWinAgg() { final String sql = "select a.*, b.r from\n" + "(select \"deptno\", first_value(\"empid\") over \n" + "(partition by \"deptno\" order by \"commission\") as r\n" + "from \"hr\".\"emps\") a\n" + "left join\n" + "(select \"deptno\", last_value(\"empid\") over \n" + "(partition by \"deptno\" order by \"commission\") as r\n" + "from \"hr\".\"emps\") b\n" + "on a.\"deptno\" = b.\"deptno\""; CalciteAssert.hr() .query(sql) .runs(); } {code} Debugging this I found out that the NPE occurs at Expressions.declare (line 2937) in initializer.getType() because initializer is null. The statement is {code} int prevStart; {code} This statement does not have an initializer, but the method "declare" is trying to get the initializer's type, generating NPE. This happens when joining 2 subqueries that use window functions because they end up using the same variables. In BlockBuilder.append (line 124) it checks if the variable already exists and the problem starts there. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CALCITE-2056) Limit/Offset in generated Sql does not work for many databases
[ https://issues.apache.org/jira/browse/CALCITE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261240#comment-16261240 ] Luis Fernando Kauer commented on CALCITE-2056: -- I'm working on the following branch to solve this: https://github.com/lfkpoa/calcite/tree/CALCITE-2056 Julian, can you take a look? Do you prefer if I create a pull request, even if not ready yet? I've made the following changes: 1. Created FetchOffsetType which is an enum that has different types of implementations for fetch/offset used by most databases. 2. Changed SqlDialect to include FetchOffsetType. There are many different ways to instantiate a SqlDialect, some use INSTANCE for each database dialect. Others create the database dialect on demand, based on an empty context. I had to make sure the correct FetchOffsetType was set. 3. Changed SqlSelectOperator to call FetchOffsetType.unparse instead of calling Writer.fetchOffset. But since some databases include the keywords for fetch (TOP, FIRST) just after SELECT keyword and others include at the end of the query, SqlSelectOperator calls FetchOffsetType.unparse twice, passing along the current FrameTypeEnum (SELECT or ORDER_BY). I was not sure what to do if the FetchOffsetType does not implement the offset or fetch but it is called with it. 4. Changed SqlPrettyWriter.fetchOffset to use FetchOffsetType. I think this was used as a hack to allow different types of fetch/offset. With FetchOffsetType I don't think this is still necessary. 5. Created new test cases in SqlToRelConverterTest 6. Created new test cases in JdbcAdapterTest. I'm not sure how to test the generated sql in jdbc adapter tests with other databases than hsqldb to check if fetch/offset is being pushed correctly. Problem not solved: Some tests I've created in JdbcAdapterTest fail because the generated SQL uses a subquery when there is no need for that. This has to do with the order of JdbcProject and JdbcSort in the plan. When JdbcProject is above JdbcSort it generates a subquery, because SELECT has a lower ordinal than ORDER_BY in SqlImplementor.Clause. I don't know how to solve this. I tried to put SELECT at the end which solved this problem but then I had problems when JdbcSort is above JdbcProject. > Limit/Offset in generated Sql does not work for many databases > -- > > Key: CALCITE-2056 > URL: https://issues.apache.org/jira/browse/CALCITE-2056 > Project: Calcite > Issue Type: Bug >Reporter: Luis Fernando Kauer >Assignee: Julian Hyde > > SqlDialect.supportsOffsetFetch() defines whether the dialect supports > OFFSET/FETCH clauses introduced by SQL:2008, and by default returns true. > SqlPrettyWriter.fetchOffset(SqlNode, SqlNode) uses that information to insert > "FETCH NEXT x ROWS" if it supportsOffsetFetch. > However, I noticed that many of the databases support it in different ways > and some do not support at all. > For example, Hsqldb, which is used in test cases does not accept FETCH NEXT x > ROWS, but it accepts LIMIT and FETCH FIRST x ROWS. > Some databases accept TOP x and others use something even different, but most > of them also accept LIMIT syntax. > I suggest we make using LIMIT the default, because it is the most accepted > and allow each SqlDialect do define its own way of using fetch and offset. > If we check how Hibernate sets the dialects to deal with limit/offset for > each database, we see that most use limit syntax and some use special syntax, > but very few are configured to use SQL 2008 syntax. > This hasn't been a problem until now because sort/limit/offset was not being > pushed to the database. But now that I'm trying to fix it, the generated SQL > is not working in Hsqldb. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CALCITE-2056) Limit/Offset in generated Sql does not work for many databases
[ https://issues.apache.org/jira/browse/CALCITE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255259#comment-16255259 ] Luis Fernando Kauer commented on CALCITE-2056: -- I searched for how each database implements this and what I found is that very few implement SQL standard Offset/Fetch and only very recently, while most of them implement LIMIT/OFFSET. Some implement LIMIT (or FETCH) but not OFFSET: SELECT * FROM T LIMIT 10 OFFSET 20 Netezza, MySQL, Sybase, PostreSQL, SQLite, HSQLDB, H2, Vertica, Polyhedra, CouchBase, DB2, Vertica SELECT * FROM T LIMIT 10, 20 MySQL, DB2 SELECT * FROM T LIMIT 10 Netezza, MySQL, Sybase, PostreSQL, SQLite, HSQLDB, H2, Vertica, Polyhedra, CouchBase, DB2, Vertica, HIVE SELECT FIRST 10 * FROM T Ingres, Informix, Firebird SELECT TOP 10 * FROM T MsSQLServer, HSQLDB, MsAccess, Teradata SELECT FIRST 10 SKIP 20 * FROM T Firebird SELECT * FROM T FETCH FIRST 10 ROWS ONLY DB2, Oracle SELECT * FROM T OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY DB2(>=12), ORACLE(>12c), PostgreSQL(>8.4), HSQLDB(>2.2*), DERBY(>10.7) The HSQLDB version 2.3.1 used in Calcite does not accept this exact syntax, but it accepts SELECT * FROM T OFFSET 20 ROWS FETCH 10 ROWS ONLY. I think we should make Calcite work with any database and fallback to EnumerableLimit when limit/offset can't be pushed to the database. Some databases accept only limit but we could push the limit (with adjustment) and fallback the offset to Enumerable, when needed. We could have an enum of limit/offset variations, like NONE, LIMIT, LIMIT_OFFSET, FIRST, TOP, FETCH, FETCH_OFFSET, and each SqlDialect could return which one it implements. What do you think? > Limit/Offset in generated Sql does not work for many databases > -- > > Key: CALCITE-2056 > URL: https://issues.apache.org/jira/browse/CALCITE-2056 > Project: Calcite > Issue Type: Bug >Reporter: Luis Fernando Kauer >Assignee: Julian Hyde > > SqlDialect.supportsOffsetFetch() defines whether the dialect supports > OFFSET/FETCH clauses introduced by SQL:2008, and by default returns true. > SqlPrettyWriter.fetchOffset(SqlNode, SqlNode) uses that information to insert > "FETCH NEXT x ROWS" if it supportsOffsetFetch. > However, I noticed that many of the databases support it in different ways > and some do not support at all. > For example, Hsqldb, which is used in test cases does not accept FETCH NEXT x > ROWS, but it accepts LIMIT and FETCH FIRST x ROWS. > Some databases accept TOP x and others use something even different, but most > of them also accept LIMIT syntax. > I suggest we make using LIMIT the default, because it is the most accepted > and allow each SqlDialect do define its own way of using fetch and offset. > If we check how Hibernate sets the dialects to deal with limit/offset for > each database, we see that most use limit syntax and some use special syntax, > but very few are configured to use SQL 2008 syntax. > This hasn't been a problem until now because sort/limit/offset was not being > pushed to the database. But now that I'm trying to fix it, the generated SQL > is not working in Hsqldb. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CALCITE-2056) Limit/Offset in generated Sql does not work for many databases
[ https://issues.apache.org/jira/browse/CALCITE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Fernando Kauer updated CALCITE-2056: - Description: SqlDialect.supportsOffsetFetch() defines whether the dialect supports OFFSET/FETCH clauses introduced by SQL:2008, and by default returns true. SqlPrettyWriter.fetchOffset(SqlNode, SqlNode) uses that information to insert "FETCH NEXT x ROWS" if it supportsOffsetFetch. However, I noticed that many of the databases support it in different ways and some do not support at all. For example, Hsqldb, which is used in test cases does not accept FETCH NEXT x ROWS, but it accepts LIMIT and FETCH FIRST x ROWS. Some databases accept TOP x and others use something even different, but most of them also accept LIMIT syntax. I suggest we make using LIMIT the default, because it is the most accepted and allow each SqlDialect do define its own way of using fetch and offset. If we check how Hibernate sets the dialects to deal with limit/offset for each database, we see that most use limit syntax and some use special syntax, but very few are configured to use SQL 2008 syntax. This hasn't been a problem until now because sort/limit/offset was not being pushed to the database. But now that I'm trying to fix it, the generated SQL is not working in Hsqldb. was: SqlDialect.supportsOffsetFetch() defines whether the dialect supports OFFSET/FETCH clauses introduced by SQL:2008, and by default returns true. SqlPrettyWriter.fetchOffset(SqlNode, SqlNode) uses that information to insert "FETCH NEXT x ROWS" if it supportsOffsetFetch. However, I noticed that many of the databases support it in different ways and some do not support at all. For example, Hsqldb, which is used in test cases does not accept FETCH NEXT x ROWS, but it accepts LIMIT and FETCH FIRST x ROWS. Some databases accept TOP x and others use something even different (like PostgreSQL). I suggest we make using LIMIT the default, because it is the most accepted and allow each SqlDialect do define its own way of using fetch and offset. If we check how Hibernate sets the dialects to deal with limit/offset for each database, we see that most use limit syntax and some use special syntax, but very few are configured to use SQL 2008 syntax. This hasn't been a problem until now because sort/limit/offset was not being pushed to the database. But now that I'm trying to fix it, the generated SQL is not working in Hsqldb. > Limit/Offset in generated Sql does not work for many databases > -- > > Key: CALCITE-2056 > URL: https://issues.apache.org/jira/browse/CALCITE-2056 > Project: Calcite > Issue Type: Bug >Reporter: Luis Fernando Kauer >Assignee: Julian Hyde > > SqlDialect.supportsOffsetFetch() defines whether the dialect supports > OFFSET/FETCH clauses introduced by SQL:2008, and by default returns true. > SqlPrettyWriter.fetchOffset(SqlNode, SqlNode) uses that information to insert > "FETCH NEXT x ROWS" if it supportsOffsetFetch. > However, I noticed that many of the databases support it in different ways > and some do not support at all. > For example, Hsqldb, which is used in test cases does not accept FETCH NEXT x > ROWS, but it accepts LIMIT and FETCH FIRST x ROWS. > Some databases accept TOP x and others use something even different, but most > of them also accept LIMIT syntax. > I suggest we make using LIMIT the default, because it is the most accepted > and allow each SqlDialect do define its own way of using fetch and offset. > If we check how Hibernate sets the dialects to deal with limit/offset for > each database, we see that most use limit syntax and some use special syntax, > but very few are configured to use SQL 2008 syntax. > This hasn't been a problem until now because sort/limit/offset was not being > pushed to the database. But now that I'm trying to fix it, the generated SQL > is not working in Hsqldb. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CALCITE-2056) Limit/Offset in generated Sql does not work for many databases
Luis Fernando Kauer created CALCITE-2056: Summary: Limit/Offset in generated Sql does not work for many databases Key: CALCITE-2056 URL: https://issues.apache.org/jira/browse/CALCITE-2056 Project: Calcite Issue Type: Bug Reporter: Luis Fernando Kauer Assignee: Julian Hyde SqlDialect.supportsOffsetFetch() defines whether the dialect supports OFFSET/FETCH clauses introduced by SQL:2008, and by default returns true. SqlPrettyWriter.fetchOffset(SqlNode, SqlNode) uses that information to insert "FETCH NEXT x ROWS" if it supportsOffsetFetch. However, I noticed that many of the databases support it in different ways and some do not support at all. For example, Hsqldb, which is used in test cases does not accept FETCH NEXT x ROWS, but it accepts LIMIT and FETCH FIRST x ROWS. Some databases accept TOP x and others use something even different (like PostgreSQL). I suggest we make using LIMIT the default, because it is the most accepted and allow each SqlDialect do define its own way of using fetch and offset. If we check how Hibernate sets the dialects to deal with limit/offset for each database, we see that most use limit syntax and some use special syntax, but very few are configured to use SQL 2008 syntax. This hasn't been a problem until now because sort/limit/offset was not being pushed to the database. But now that I'm trying to fix it, the generated SQL is not working in Hsqldb. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CALCITE-2044) Tweak cost of BindableTableScan to make sure Project is pushed through Aggregate
[ https://issues.apache.org/jira/browse/CALCITE-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251195#comment-16251195 ] Luis Fernando Kauer commented on CALCITE-2044: -- Thanks, Julian. Looks great. I changed typeFactory.builder() because it is deprecated and javadoc recomends using Builder, not because I don't like it. I thought I was updating to the recommended way to do it. Should I continue using typeFactory.builder() for now? > Tweak cost of BindableTableScan to make sure Project is pushed through > Aggregate > > > Key: CALCITE-2044 > URL: https://issues.apache.org/jira/browse/CALCITE-2044 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Luis Fernando Kauer >Assignee: Julian Hyde >Priority: Minor > > Similar to [CALCITE-1876]. > Projects are not pushed to BindableTableScan when using > ProjectableFilterableTable with aggregate functions. > The reason is that the cost of BindableTableScan does not use projects (and > filters), so the planner chooses a plan with Project node removed by > ProjectRemoveRule. > By tweaking the cost to use the number of used projects solved the problem. > Any suggestion on the cost formula to take both projects and filters into > account? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CALCITE-2048) Create a better documentation for the Planner design
[ https://issues.apache.org/jira/browse/CALCITE-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250024#comment-16250024 ] Luis Fernando Kauer commented on CALCITE-2048: -- I agree with Enrico. The documentation should explain many concepts used in the planner, such as TraitDef, TraitSet, Convention, Program, Enumerable, Bindable, Program, Rules, transformation/conversion of nodes in rules, implementation of nodes, statistics, RelMetadataQuery, costs. There are so many concepts. I think examples or just snippets showing and explaining how they are used would help a lot. We could also have a module with a full planner example with: - configurable schema/catalog via json, so there is no need to use any adapter or create the schema somewhere else; - configurable cost and metadata also via json, so it can be used to test different setups; - show the best plan for each query, which rules where applied and the optimized SQL. This would help test new rules, changes to costs and different queries easily using different setups. Maybe even integrate with Cosette (Calcite-1977) to validate whether the optimized SQL is really equivalent to the query. > Create a better documentation for the Planner design > > > Key: CALCITE-2048 > URL: https://issues.apache.org/jira/browse/CALCITE-2048 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Edmon Begoli >Assignee: Edmon Begoli >Priority: Minor > Original Estimate: 672h > Remaining Estimate: 672h > > Per request of the development community, and the assessment that we need a > tutorial, documentation and example code to work directly with the planner, > because it is a lot harder to work with the planner than with an adapter. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CALCITE-2044) Tweak cost of BindableTableScan to make sure Project is pushed through Aggregate
[ https://issues.apache.org/jira/browse/CALCITE-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246198#comment-16246198 ] Luis Fernando Kauer commented on CALCITE-2044: -- https://github.com/apache/calcite/pull/565 > Tweak cost of BindableTableScan to make sure Project is pushed through > Aggregate > > > Key: CALCITE-2044 > URL: https://issues.apache.org/jira/browse/CALCITE-2044 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Luis Fernando Kauer >Assignee: Julian Hyde >Priority: Minor > > Similar to [CALCITE-1876]. > Projects are not pushed to BindableTableScan when using > ProjectableFilterableTable with aggregate functions. > The reason is that the cost of BindableTableScan does not use projects (and > filters), so the planner chooses a plan with Project node removed by > ProjectRemoveRule. > By tweaking the cost to use the number of used projects solved the problem. > Any suggestion on the cost formula to take both projects and filters into > account? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CALCITE-2044) Tweak cost of BindableTableScan to make sure Project is pushed through Aggregate
[ https://issues.apache.org/jira/browse/CALCITE-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245706#comment-16245706 ] Luis Fernando Kauer commented on CALCITE-2044: -- Alexey, I'd like to understand your real need. You see, the computed cost does not have to be exact and the overall cost depends on the costs of all other nodes in the plan. It is used to choose between different plans, but if you don't adjust the cost of all other nodes of the plan, like joins, filters and projects not pushed, aggregates, then you might not get the best plan anyway. Here we are talking about BindableTableScan cost only and how the filters and projects change the cost. Even though you may have expensive columns, I assume that if the query requires expensive columns you'll have to return them anyway, so usually pushing the used projects is desirable. With filters is the same thing, I think. Julian, I would like to know some counter examples to take it into account, if you may. > Tweak cost of BindableTableScan to make sure Project is pushed through > Aggregate > > > Key: CALCITE-2044 > URL: https://issues.apache.org/jira/browse/CALCITE-2044 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Luis Fernando Kauer >Assignee: Julian Hyde >Priority: Minor > > Similar to [CALCITE-1876]. > Projects are not pushed to BindableTableScan when using > ProjectableFilterableTable with aggregate functions. > The reason is that the cost of BindableTableScan does not use projects (and > filters), so the planner chooses a plan with Project node removed by > ProjectRemoveRule. > By tweaking the cost to use the number of used projects solved the problem. > Any suggestion on the cost formula to take both projects and filters into > account? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CALCITE-2044) Tweak cost of BindableTableScan to make sure Project is pushed through Aggregate
[ https://issues.apache.org/jira/browse/CALCITE-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243983#comment-16243983 ] Luis Fernando Kauer commented on CALCITE-2044: -- ProjectableFilterableTable is meant for simple uses. I opened this Jira to solve the common problem of projects not being pushed to the table scan with aggregate. I propose a simple solution, without the need to implement any other interface, just by reducing the cost when projects and filters are applied. You are proposing a new feature that would allow more control of the cost, but Calcite already has many pluggable ways to control costs and provide statistics, and you should try them first. See Statistics, RelMetadataQuery, BuiltInMetadata ... You can also use TranslatableTable and have full control. I know, the csv example does not include filter push down. Maybe we could improve the example to include it to show how it can be done. > Tweak cost of BindableTableScan to make sure Project is pushed through > Aggregate > > > Key: CALCITE-2044 > URL: https://issues.apache.org/jira/browse/CALCITE-2044 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Luis Fernando Kauer >Assignee: Julian Hyde >Priority: Minor > > Similar to [CALCITE-1876]. > Projects are not pushed to BindableTableScan when using > ProjectableFilterableTable with aggregate functions. > The reason is that the cost of BindableTableScan does not use projects (and > filters), so the planner chooses a plan with Project node removed by > ProjectRemoveRule. > By tweaking the cost to use the number of used projects solved the problem. > Any suggestion on the cost formula to take both projects and filters into > account? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CALCITE-2044) Tweak cost of BindableTableScan to make sure Project is pushed through Aggregate
Luis Fernando Kauer created CALCITE-2044: Summary: Tweak cost of BindableTableScan to make sure Project is pushed through Aggregate Key: CALCITE-2044 URL: https://issues.apache.org/jira/browse/CALCITE-2044 Project: Calcite Issue Type: Bug Components: core Reporter: Luis Fernando Kauer Assignee: Julian Hyde Priority: Minor Similar to [CALCITE-1876]. Projects are not pushed to BindableTableScan when using ProjectableFilterableTable with aggregate functions. The reason is that the cost of BindableTableScan does not use projects (and filters), so the planner chooses a plan with Project node removed by ProjectRemoveRule. By tweaking the cost to use the number of used projects solved the problem. Any suggestion on the cost formula to take both projects and filters into account? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CALCITE-1876) Create a rule to push the projections used in aggregate functions
[ https://issues.apache.org/jira/browse/CALCITE-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242744#comment-16242744 ] Luis Fernando Kauer commented on CALCITE-1876: -- Pull Request: https://github.com/apache/calcite/pull/562 I realized that the correct plan was being generated but it was not being selected. Solved the problem by computing the cost of CsvTableScan using the number of projects used. > Create a rule to push the projections used in aggregate functions > - > > Key: CALCITE-1876 > URL: https://issues.apache.org/jira/browse/CALCITE-1876 > Project: Calcite > Issue Type: Improvement > Components: csv-adapter >Reporter: Luis Fernando Kauer >Assignee: Julian Hyde >Priority: Minor > > From Julian Hyde; > "Calcite should realize that Aggregate has an implied Project (because it > only uses a few columns) and push that projection into the CsvTableScan, but > it doesn’t." > A query scans only the used projection when no aggregation is used: > sql->explain plan for select name from emps; > CsvTableScan(table=[[SALES, EMPS]], fields=[[1]]) > But it scans all the projections when an aggregation is used: > sql->explain plan for select max(name) from emps; > EnumerableAggregate(group=[{}], EXPR$0=[MAX($1)]) > CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CALCITE-1906) JdbcSortRule has a bug and it is never chosen
[ https://issues.apache.org/jira/browse/CALCITE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114966#comment-16114966 ] Luis Fernando Kauer commented on CALCITE-1906: -- Debugging some more about this problem, I found out that what makes the sort be pushed is replacing the trait with the collations with a Collations.EMPTY trait: {code} convert(sort.getInput(), traitSet.replace(RelCollations.EMPTY)) {code} The problem is that SortRemoveRule matches and it is applied when there is an active RelCollationTraitDef, removing Sort and setting the collation for the sort input. Maybe JdbcTableScan was supposed to use the collation information, but it doesn't. When using the code above to remove collations, SortRemoveRule is not applied and that's why the sort gets pushed to the database. And the EnumerableLimitRule extracts the fetch and limit from Sort and it also is not applied when there is no collation field in the Sort. So the code above also prevents this rule from being applied. If it can't assure that the table is sorted the best to do is to remove the collations when converting to JdbcSort. > JdbcSortRule has a bug and it is never chosen > - > > Key: CALCITE-1906 > URL: https://issues.apache.org/jira/browse/CALCITE-1906 > Project: Calcite > Issue Type: Bug > Components: jdbc-adapter >Reporter: Luis Fernando Kauer >Assignee: Julian Hyde > > JdbcSortRule tries to push sort and limit operations to the database. > Currently offset and limit operations are explicitly not pushed to the > database (prevented by the rule) but even sort operations end up not being > pushed. > Checking how other adapters deal with this, like Mongo and Cassandra > adapters, I realized that the convert function from JdbcSortRule is different > from the others. > Jdbc-adapter: > {code} > if (sort.offset != null || sort.fetch != null) { > // Cannot implement "OFFSET n FETCH n" currently. > return null; > } > final RelTraitSet traitSet = sort.getTraitSet().replace(out); > return new JdbcSort(rel.getCluster(), traitSet, > convert(sort.getInput(), traitSet), sort.getCollation()); > {code} > mongodb-adapter: > {code} > final RelTraitSet traitSet = > sort.getTraitSet().replace(out) > .replace(sort.getCollation()); > return new MongoSort(rel.getCluster(), traitSet, > convert(sort.getInput(), traitSet.replace(RelCollations.EMPTY)), > sort.getCollation(), sort.offset, sort.fetch); > {code} > By fixing JdbcSortRule so that it is just like those others and by removing > the code that prevented the rule to match when limit or offset are used seems > to solve the problem and JdbcSortRule now is being applied and both sort and > limit are being pushed to the database. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CALCITE-1906) JdbcSortRule has a bug and it is never chosen
[ https://issues.apache.org/jira/browse/CALCITE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103885#comment-16103885 ] Luis Fernando Kauer commented on CALCITE-1906: -- In my tests JdbcSort is also created but it is never chosen, not even in simple queries like the tests above. With the "new code" above JdbcSort is created, chosen and the execution generates a SQL pushing sort and limit to the database. So I think it fixes this bug. I run the JdbcTest.testSelfJoinDifferentColumns and with this fix the JdbcSort is chosen: {noformat} 2017-07-27 17:35:47,222 [main] DEBUG - Plan after physical tweaks: JdbcToEnumerableConverter: rowcount = 3.0, cumulative cost = {3060.45003 rows, 2923.618334746402 cpu, 0.0 io}, id = 177 JdbcSort(sort0=[$1], dir0=[ASC], fetch=[3]): rowcount = 3.0, cumulative cost = {3060.15 rows, 2923.318334746402 cpu, 0.0 io}, id = 175 JdbcProject(full_name=[$1], last_name=[$3]): rowcount = 1500.0, cumulative cost = {3060.0 rows, 2922.0 cpu, 0.0 io}, id = 173 JdbcJoin(condition=[=($2, $0)], joinType=[inner]): rowcount = 1500.0, cumulative cost = {1860.0 rows, 522.0 cpu, 0.0 io}, id = 171 JdbcProject(last_name=[$3]): rowcount = 100.0, cumulative cost = {180.0 rows, 181.0 cpu, 0.0 io}, id = 166 JdbcTableScan(table=[[foodmart, employee]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 0 JdbcProject(full_name=[$1], first_name=[$2], last_name=[$3]): rowcount = 100.0, cumulative cost = {180.0 rows, 341.0 cpu, 0.0 io}, id = 169 JdbcTableScan(table=[[foodmart, employee]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 0 {noformat} However, I noticed that the SortProjectTransposeRule is being matched and it pushes the Sort past the Project generating a plan like JdbcProject / JdbcSort / JdbcTableScan, and the generated SQL used a subquery of the whole table. The constructor of this rule is deprecated and has the comment "to be removed before 2.0", so I wonder if it is really necessary. If I disable this rule, the generated SQL is right. > JdbcSortRule has a bug and it is never chosen > - > > Key: CALCITE-1906 > URL: https://issues.apache.org/jira/browse/CALCITE-1906 > Project: Calcite > Issue Type: Bug > Components: jdbc-adapter >Reporter: Luis Fernando Kauer >Assignee: Julian Hyde > > JdbcSortRule tries to push sort and limit operations to the database. > Currently offset and limit operations are explicitly not pushed to the > database (prevented by the rule) but even sort operations end up not being > pushed. > Checking how other adapters deal with this, like Mongo and Cassandra > adapters, I realized that the convert function from JdbcSortRule is different > from the others. > Jdbc-adapter: > {code} > if (sort.offset != null || sort.fetch != null) { > // Cannot implement "OFFSET n FETCH n" currently. > return null; > } > final RelTraitSet traitSet = sort.getTraitSet().replace(out); > return new JdbcSort(rel.getCluster(), traitSet, > convert(sort.getInput(), traitSet), sort.getCollation()); > {code} > mongodb-adapter: > {code} > final RelTraitSet traitSet = > sort.getTraitSet().replace(out) > .replace(sort.getCollation()); > return new MongoSort(rel.getCluster(), traitSet, > convert(sort.getInput(), traitSet.replace(RelCollations.EMPTY)), > sort.getCollation(), sort.offset, sort.fetch); > {code} > By fixing JdbcSortRule so that it is just like those others and by removing > the code that prevented the rule to match when limit or offset are used seems > to solve the problem and JdbcSortRule now is being applied and both sort and > limit are being pushed to the database. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CALCITE-1906) JdbcSortRule has a bug and it is never chosen
[ https://issues.apache.org/jira/browse/CALCITE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103620#comment-16103620 ] Luis Fernando Kauer commented on CALCITE-1906: -- Well, I tried many different simple queries and it was never chosen. It seems that the convert function is wrong. Code in Calcite already generate LIMIT and OFFSET for different dialects when generating the SQL. The problem is that JdbcSort is never included in the final plan. A simple test to verify using sort: {code} @Test public void testSortPush() { final String sql = "SELECT empno, ename\n" + "FROM Scott.emp ORDER BY ename"; CalciteAssert.model(JdbcTest.SCOTT_MODEL).query(sql).explainContains("JdbcSort"); } {code} Another test to verify pushing Limit: {code} @Test public void testLimitPush() { final String sql = "SELECT empno, ename\n" + "FROM Scott.emp LIMIT 1"; CalciteAssert.model(JdbcTest.SCOTT_MODEL).query(sql).explainContains("JdbcSort"); } {code} The new code is: {code} /** * Rule to convert a {@link org.apache.calcite.rel.core.Sort} to an * {@link org.apache.calcite.adapter.jdbc.JdbcRules.JdbcSort}. */ private static class JdbcSortRule extends JdbcConverterRule { private JdbcSortRule(JdbcConvention out) { super(Sort.class, Convention.NONE, out, "JdbcSortRule"); } public RelNode convert(RelNode rel) { final Sort sort = (Sort) rel; final RelTraitSet traitSet = sort.getTraitSet().replace(out) .replace(sort.getCollation()); return new JdbcSort(rel.getCluster(), traitSet, convert(sort.getInput(), traitSet.replace(RelCollations.EMPTY)), sort.getCollation(), sort.offset, sort.fetch); } } /** Sort operator implemented in JDBC convention. */ public static class JdbcSort extends Sort implements JdbcRel { public JdbcSort( RelOptCluster cluster, RelTraitSet traitSet, RelNode input, RelCollation collation, RexNode offset, RexNode fetch) { super(cluster, traitSet, input, collation, offset, fetch); assert getConvention() instanceof JdbcConvention; assert getConvention() == input.getConvention(); } @Override public JdbcSort copy(RelTraitSet traitSet, RelNode newInput, RelCollation newCollation, RexNode offset, RexNode fetch) { return new JdbcSort(getCluster(), traitSet, newInput, newCollation, offset, fetch); } public JdbcImplementor.Result implement(JdbcImplementor implementor) { return implementor.implement(this); } @Override public RelOptCost computeSelfCost(RelOptPlanner planner, RelMetadataQuery mq) { return super.computeSelfCost(planner, mq).multiplyBy(0.05); } } {code} > JdbcSortRule has a bug and it is never chosen > - > > Key: CALCITE-1906 > URL: https://issues.apache.org/jira/browse/CALCITE-1906 > Project: Calcite > Issue Type: Bug > Components: jdbc-adapter >Reporter: Luis Fernando Kauer >Assignee: Julian Hyde > > JdbcSortRule tries to push sort and limit operations to the database. > Currently offset and limit operations are explicitly not pushed to the > database (prevented by the rule) but even sort operations end up not being > pushed. > Checking how other adapters deal with this, like Mongo and Cassandra > adapters, I realized that the convert function from JdbcSortRule is different > from the others. > Jdbc-adapter: > {code} > if (sort.offset != null || sort.fetch != null) { > // Cannot implement "OFFSET n FETCH n" currently. > return null; > } > final RelTraitSet traitSet = sort.getTraitSet().replace(out); > return new JdbcSort(rel.getCluster(), traitSet, > convert(sort.getInput(), traitSet), sort.getCollation()); > {code} > mongodb-adapter: > {code} > final RelTraitSet traitSet = > sort.getTraitSet().replace(out) > .replace(sort.getCollation()); > return new MongoSort(rel.getCluster(), traitSet, > convert(sort.getInput(), traitSet.replace(RelCollations.EMPTY)), > sort.getCollation(), sort.offset, sort.fetch); > {code} > By fixing JdbcSortRule so that it is just like those others and by removing > the code that prevented the rule to match when limit or offset are used seems > to solve the problem and JdbcSortRule now is being applied and both sort and > limit are being pushed to the database. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CALCITE-1906) JdbcSortRule has a bug and it is chosen
Luis Fernando Kauer created CALCITE-1906: Summary: JdbcSortRule has a bug and it is chosen Key: CALCITE-1906 URL: https://issues.apache.org/jira/browse/CALCITE-1906 Project: Calcite Issue Type: Bug Components: jdbc-adapter Reporter: Luis Fernando Kauer Assignee: Julian Hyde JdbcSortRule tries to push sort and limit operations to the database. Currently offset and limit operations are explicitly not pushed to the database (prevented by the rule) but even sort operations end up not being pushed. Checking how other adapters deal with this, like Mongo and Cassandra adapters, I realized that the convert function from JdbcSortRule is different from the others. Jdbc-adapter: {{ if (sort.offset != null || sort.fetch != null) { // Cannot implement "OFFSET n FETCH n" currently. return null; } final RelTraitSet traitSet = sort.getTraitSet().replace(out); return new JdbcSort(rel.getCluster(), traitSet, convert(sort.getInput(), traitSet), sort.getCollation()); }} mongodb-adapter: {{ final RelTraitSet traitSet = sort.getTraitSet().replace(out) .replace(sort.getCollation()); return new MongoSort(rel.getCluster(), traitSet, convert(sort.getInput(), traitSet.replace(RelCollations.EMPTY)), sort.getCollation(), sort.offset, sort.fetch); }} By fixing JdbcSortRule so that it is just like those others and by removing the code that prevented the rule to match when limit or offset are used seems to solve the problem and JdbcSortRule now is being applied and both sort and limit are being pushed to the database. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CALCITE-1876) Create a rule to push the projections used in aggregate functions
[ https://issues.apache.org/jira/browse/CALCITE-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085646#comment-16085646 ] Luis Fernando Kauer commented on CALCITE-1876: -- The CSV adapter uses TranslatableTable and Calcite's built in rules don't simplify this kind of aggregate query. However, I noticed that you mention to use ProjectableFilterableTable instead of TranslatableTable. After implementing ProjectableFilterableTable in CSV adapter this aggregate query was simplified as expected with Calcite's built in rules, with no need to create a new rule. So I wonder if a new rule should be created for CSV Adapter or if CSV Adapter should just implement ProjectableFilterableTable and take advantage of all the rules already implemented. > Create a rule to push the projections used in aggregate functions > - > > Key: CALCITE-1876 > URL: https://issues.apache.org/jira/browse/CALCITE-1876 > Project: Calcite > Issue Type: Improvement > Components: csv-adapter >Reporter: Luis Fernando Kauer >Assignee: Julian Hyde >Priority: Minor > > From Julian Hyde; > "Calcite should realize that Aggregate has an implied Project (because it > only uses a few columns) and push that projection into the CsvTableScan, but > it doesn’t." > A query scans only the used projection when no aggregation is used: > sql->explain plan for select name from emps; > CsvTableScan(table=[[SALES, EMPS]], fields=[[1]]) > But it scans all the projections when an aggregation is used: > sql->explain plan for select max(name) from emps; > EnumerableAggregate(group=[{}], EXPR$0=[MAX($1)]) > CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CALCITE-1876) Create a rule to push the projections used in aggregate functions
Luis Fernando Kauer created CALCITE-1876: Summary: Create a rule to push the projections used in aggregate functions Key: CALCITE-1876 URL: https://issues.apache.org/jira/browse/CALCITE-1876 Project: Calcite Issue Type: Improvement Components: csv-adapter Reporter: Luis Fernando Kauer Assignee: Julian Hyde Priority: Minor >From Julian Hyde; "Calcite should realize that Aggregate has an implied Project (because it only uses a few columns) and push that projection into the CsvTableScan, but it doesn’t." A query scans only the used projection when no aggregation is used: sql->explain plan for select name from emps; CsvTableScan(table=[[SALES, EMPS]], fields=[[1]]) But it scans all the projections when an aggregation is used: sql->explain plan for select max(name) from emps; EnumerableAggregate(group=[{}], EXPR$0=[MAX($1)]) CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]) -- This message was sent by Atlassian JIRA (v6.4.14#64029)