[jira] [Commented] (FLINK-6037) the estimateRowCount method of DataSetCalc didn't work in SQL
[ https://issues.apache.org/jira/browse/FLINK-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929674#comment-15929674 ] ASF GitHub Bot commented on FLINK-6037: --- GitHub user beyond1920 opened a pull request: https://github.com/apache/flink/pull/3559 [flink-6037] [Table API & SQL]hotfix: metadata provider didn't work in SQL This pr aims to fix a bug referenced by https://issues.apache.org/jira/browse/FLINK-6037. After using the right MetadataProvider, org.apache.flink.table.ExpressionReductionTest.testReduceCalcExpressionForBatchSQL test cannot pass because the optimized plan is changed (The problem is referenced by https://issues.apache.org/jira/browse/FLINK-6067 which would be fixed in another pr). I simply changed test sql to make it pass in this pr. You can merge this pull request into a Git repository by running: $ git pull https://github.com/alibaba/flink hotfix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3559.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3559 commit bd83041507b3f4fdea737b538fb39af4a249e6d2 Author: jingzhangDate: 2017-03-17T09:47:35Z fix the bug: metadata provider didn't work in SQL > the estimateRowCount method of DataSetCalc didn't work in SQL > - > > Key: FLINK-6037 > URL: https://issues.apache.org/jira/browse/FLINK-6037 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: jingzhang >Assignee: jingzhang > Fix For: 1.2.0 > > > The estimateRowCount method of DataSetCalc didn't work in the following > situation. > If I run the following code, > {code} > Table table = tableEnv.sql("select a, avg(a), sum(b), count(c) from t1 where > a==1 group by a"); > {code} > the cost of every node in Optimized node tree is : > {code} > DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, > COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, > 5000.0 cpu, 28000.0 io} > DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, > cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} > DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative > cost = {1000.0 rows, 1000.0 cpu, 0.0 io} > {code} > We expect the input rowcount of DataSetAggregate less than 1000, however the > actual input rowcount is still 1000 because the the estimateRowCount method > of DataSetCalc didn't work. > The problem is similar to the issue > https://issues.apache.org/jira/browse/FLINK-5394 which is already solved. > I find although we set metadata provider to > {{FlinkDefaultRelMetadataProvider}} in {{FlinkRelBuilder}}, but after run > {code}planner.rel(...) {code} to translate SqlNode to RelNode, the metadata > provider would be overrided from {{FlinkDefaultRelMetadataProvider}} to > {{DefaultRelMetadataProvider}} again because of the following code: > {code} > val cluster: RelOptCluster = RelOptCluster.create(planner, rexBuilder) > val config = SqlToRelConverter.configBuilder() > .withTrimUnusedFields(false).withConvertTableAccess(false).build() > val sqlToRelConverter: SqlToRelConverter = new SqlToRelConverter( > new ViewExpanderImpl, validator, createCatalogReader, cluster, > convertletTable, config) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6037) the estimateRowCount method of DataSetCalc didn't work in SQL
[ https://issues.apache.org/jira/browse/FLINK-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924346#comment-15924346 ] Fabian Hueske commented on FLINK-6037: -- Ah, I see. Thanks for the clarification [~jinyu.zj]! > the estimateRowCount method of DataSetCalc didn't work in SQL > - > > Key: FLINK-6037 > URL: https://issues.apache.org/jira/browse/FLINK-6037 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: jingzhang >Assignee: jingzhang > Fix For: 1.2.0 > > > The estimateRowCount method of DataSetCalc didn't work in the following > situation. > If I run the following code, > {code} > Table table = tableEnv.sql("select a, avg(a), sum(b), count(c) from t1 where > a==1 group by a"); > {code} > the cost of every node in Optimized node tree is : > {code} > DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, > COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, > 5000.0 cpu, 28000.0 io} > DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, > cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} > DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative > cost = {1000.0 rows, 1000.0 cpu, 0.0 io} > {code} > We expect the input rowcount of DataSetAggregate less than 1000, however the > actual input rowcount is still 1000 because the the estimateRowCount method > of DataSetCalc didn't work. > The problem is similar to the issue > https://issues.apache.org/jira/browse/FLINK-5394 which is already solved. > I find although we set metadata provider to > {{FlinkDefaultRelMetadataProvider}} in {{FlinkRelBuilder}}, but after run > {code}planner.rel(...) {code} to translate SqlNode to RelNode, the metadata > provider would be overrided from {{FlinkDefaultRelMetadataProvider}} to > {{DefaultRelMetadataProvider}} again because of the following code: > {code} > val cluster: RelOptCluster = RelOptCluster.create(planner, rexBuilder) > val config = SqlToRelConverter.configBuilder() > .withTrimUnusedFields(false).withConvertTableAccess(false).build() > val sqlToRelConverter: SqlToRelConverter = new SqlToRelConverter( > new ViewExpanderImpl, validator, createCatalogReader, cluster, > convertletTable, config) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6037) the estimateRowCount method of DataSetCalc didn't work in SQL
[ https://issues.apache.org/jira/browse/FLINK-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924332#comment-15924332 ] jingzhang commented on FLINK-6037: -- [~fhueske], this issue is different from https://issues.apache.org/jira/browse/FLINK-5394, this issue only happens in the SQL. I agree there has no difference between Table API and SQL since both are represented the same way at the optimization layer. However, when using {{SqlToRelConverter}} to convert SqlNode to RelNode, the metadata provider would be overrided from {{FlinkDefaultRelMetadataProvider}} to {{DefaultRelMetadataProvider}} again because of the following code: {code} val cluster: RelOptCluster = RelOptCluster.create(planner, rexBuilder) val config = SqlToRelConverter.configBuilder() .withTrimUnusedFields(false).withConvertTableAccess(false).build() val sqlToRelConverter: SqlToRelConverter = new SqlToRelConverter( new ViewExpanderImpl, validator, createCatalogReader, cluster, convertletTable, config) {code}. So in the optimization phase, Table API uses {{FlinkDefaultRelMetadataProvider}} , but SQL uses {{DefaultRelMetadataProvider}}. > the estimateRowCount method of DataSetCalc didn't work in SQL > - > > Key: FLINK-6037 > URL: https://issues.apache.org/jira/browse/FLINK-6037 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: jingzhang >Assignee: jingzhang > Fix For: 1.2.0 > > > The estimateRowCount method of DataSetCalc didn't work in the following > situation. > If I run the following code, > {code} > Table table = tableEnv.sql("select a, avg(a), sum(b), count(c) from t1 where > a==1 group by a"); > {code} > the cost of every node in Optimized node tree is : > {code} > DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, > COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, > 5000.0 cpu, 28000.0 io} > DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, > cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} > DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative > cost = {1000.0 rows, 1000.0 cpu, 0.0 io} > {code} > We expect the input rowcount of DataSetAggregate less than 1000, however the > actual input rowcount is still 1000 because the the estimateRowCount method > of DataSetCalc didn't work. > The problem is similar to the issue > https://issues.apache.org/jira/browse/FLINK-5394 which is already solved. > I find although we set metadata provider to > {{FlinkDefaultRelMetadataProvider}} in {{FlinkRelBuilder}}, but after run > {code}planner.rel(...) {code} to translate SqlNode to RelNode, the metadata > provider would be overrided from {{FlinkDefaultRelMetadataProvider}} to > {{DefaultRelMetadataProvider}} again because of the following code: > {code} > val cluster: RelOptCluster = RelOptCluster.create(planner, rexBuilder) > val config = SqlToRelConverter.configBuilder() > .withTrimUnusedFields(false).withConvertTableAccess(false).build() > val sqlToRelConverter: SqlToRelConverter = new SqlToRelConverter( > new ViewExpanderImpl, validator, createCatalogReader, cluster, > convertletTable, config) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6037) the estimateRowCount method of DataSetCalc didn't work in SQL
[ https://issues.apache.org/jira/browse/FLINK-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923777#comment-15923777 ] Fabian Hueske commented on FLINK-6037: -- Hi [~jinyu.zj], can you add a description and explain how this issue is different from FLINK-5394. At the optimization layer, there should not be a difference between Table API and SQL since both are represented the same way. Thanks, Fabian > the estimateRowCount method of DataSetCalc didn't work in SQL > - > > Key: FLINK-6037 > URL: https://issues.apache.org/jira/browse/FLINK-6037 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: jingzhang >Assignee: jingzhang > Fix For: 1.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)