[jira] [Commented] (FLINK-6037) the estimateRowCount method of DataSetCalc didn't work in SQL

2017-03-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929674#comment-15929674
 ] 

ASF GitHub Bot commented on FLINK-6037:
---

GitHub user beyond1920 opened a pull request:

https://github.com/apache/flink/pull/3559

[flink-6037] [Table API & SQL]hotfix: metadata provider didn't work in SQL

This pr aims to fix a bug referenced by 
https://issues.apache.org/jira/browse/FLINK-6037.
After using the right MetadataProvider, 
org.apache.flink.table.ExpressionReductionTest.testReduceCalcExpressionForBatchSQL
 test cannot pass because the optimized plan is changed (The problem is 
referenced by https://issues.apache.org/jira/browse/FLINK-6067 which would be 
fixed in another pr). I simply changed test sql to make it pass in this pr.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alibaba/flink hotfix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3559.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3559


commit bd83041507b3f4fdea737b538fb39af4a249e6d2
Author: jingzhang 
Date:   2017-03-17T09:47:35Z

fix the bug: metadata provider didn't work in SQL




> the estimateRowCount method of DataSetCalc didn't work in SQL
> -
>
> Key: FLINK-6037
> URL: https://issues.apache.org/jira/browse/FLINK-6037
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: jingzhang
>Assignee: jingzhang
> Fix For: 1.2.0
>
>
> The estimateRowCount method of DataSetCalc didn't work in the following 
> situation. 
> If I run the following code,
> {code}
> Table table = tableEnv.sql("select a, avg(a), sum(b), count(c) from t1 where 
> a==1 group by a");
> {code}
> the cost of every node in Optimized node tree is :
> {code}
> DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, 
> COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 
> 5000.0 cpu, 28000.0 io}
>   DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, 
> cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io}
>   DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative 
> cost = {1000.0 rows, 1000.0 cpu, 0.0 io}
> {code}
> We expect the input rowcount of DataSetAggregate less than 1000, however the 
> actual input rowcount is still 1000 because the the estimateRowCount method 
> of DataSetCalc didn't work. 
> The problem is similar to the issue 
> https://issues.apache.org/jira/browse/FLINK-5394 which is already solved.
> I find although we set metadata provider to 
> {{FlinkDefaultRelMetadataProvider}} in {{FlinkRelBuilder}}, but after run 
> {code}planner.rel(...) {code} to translate SqlNode to RelNode, the  metadata 
> provider would be overrided from {{FlinkDefaultRelMetadataProvider}} to 
> {{DefaultRelMetadataProvider}} again because of the following code:
> {code}
>   val cluster: RelOptCluster = RelOptCluster.create(planner, rexBuilder)
>   val config = SqlToRelConverter.configBuilder()
> .withTrimUnusedFields(false).withConvertTableAccess(false).build()
>   val sqlToRelConverter: SqlToRelConverter = new SqlToRelConverter(
> new ViewExpanderImpl, validator, createCatalogReader, cluster, 
> convertletTable, config)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6037) the estimateRowCount method of DataSetCalc didn't work in SQL

2017-03-14 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924346#comment-15924346
 ] 

Fabian Hueske commented on FLINK-6037:
--

Ah, I see. Thanks for the clarification [~jinyu.zj]! 

> the estimateRowCount method of DataSetCalc didn't work in SQL
> -
>
> Key: FLINK-6037
> URL: https://issues.apache.org/jira/browse/FLINK-6037
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: jingzhang
>Assignee: jingzhang
> Fix For: 1.2.0
>
>
> The estimateRowCount method of DataSetCalc didn't work in the following 
> situation. 
> If I run the following code,
> {code}
> Table table = tableEnv.sql("select a, avg(a), sum(b), count(c) from t1 where 
> a==1 group by a");
> {code}
> the cost of every node in Optimized node tree is :
> {code}
> DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, 
> COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 
> 5000.0 cpu, 28000.0 io}
>   DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, 
> cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io}
>   DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative 
> cost = {1000.0 rows, 1000.0 cpu, 0.0 io}
> {code}
> We expect the input rowcount of DataSetAggregate less than 1000, however the 
> actual input rowcount is still 1000 because the the estimateRowCount method 
> of DataSetCalc didn't work. 
> The problem is similar to the issue 
> https://issues.apache.org/jira/browse/FLINK-5394 which is already solved.
> I find although we set metadata provider to 
> {{FlinkDefaultRelMetadataProvider}} in {{FlinkRelBuilder}}, but after run 
> {code}planner.rel(...) {code} to translate SqlNode to RelNode, the  metadata 
> provider would be overrided from {{FlinkDefaultRelMetadataProvider}} to 
> {{DefaultRelMetadataProvider}} again because of the following code:
> {code}
>   val cluster: RelOptCluster = RelOptCluster.create(planner, rexBuilder)
>   val config = SqlToRelConverter.configBuilder()
> .withTrimUnusedFields(false).withConvertTableAccess(false).build()
>   val sqlToRelConverter: SqlToRelConverter = new SqlToRelConverter(
> new ViewExpanderImpl, validator, createCatalogReader, cluster, 
> convertletTable, config)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6037) the estimateRowCount method of DataSetCalc didn't work in SQL

2017-03-14 Thread jingzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924332#comment-15924332
 ] 

jingzhang commented on FLINK-6037:
--

[~fhueske], this issue is different from 
https://issues.apache.org/jira/browse/FLINK-5394, this issue only happens in 
the SQL. 
I agree there has no difference between Table API and SQL since both are 
represented the same way at the optimization layer. However, when using 
{{SqlToRelConverter}} to convert SqlNode to RelNode, the metadata provider 
would be overrided from {{FlinkDefaultRelMetadataProvider}} to 
{{DefaultRelMetadataProvider}} again because of the following code:
{code}
  val cluster: RelOptCluster = RelOptCluster.create(planner, rexBuilder)
  val config = SqlToRelConverter.configBuilder()
.withTrimUnusedFields(false).withConvertTableAccess(false).build()
  val sqlToRelConverter: SqlToRelConverter = new SqlToRelConverter(
new ViewExpanderImpl, validator, createCatalogReader, cluster, 
convertletTable, config)
{code}.
So in the optimization phase, Table API uses 
{{FlinkDefaultRelMetadataProvider}} , but SQL uses 
{{DefaultRelMetadataProvider}}.

> the estimateRowCount method of DataSetCalc didn't work in SQL
> -
>
> Key: FLINK-6037
> URL: https://issues.apache.org/jira/browse/FLINK-6037
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: jingzhang
>Assignee: jingzhang
> Fix For: 1.2.0
>
>
> The estimateRowCount method of DataSetCalc didn't work in the following 
> situation. 
> If I run the following code,
> {code}
> Table table = tableEnv.sql("select a, avg(a), sum(b), count(c) from t1 where 
> a==1 group by a");
> {code}
> the cost of every node in Optimized node tree is :
> {code}
> DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, 
> COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 
> 5000.0 cpu, 28000.0 io}
>   DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, 
> cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io}
>   DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative 
> cost = {1000.0 rows, 1000.0 cpu, 0.0 io}
> {code}
> We expect the input rowcount of DataSetAggregate less than 1000, however the 
> actual input rowcount is still 1000 because the the estimateRowCount method 
> of DataSetCalc didn't work. 
> The problem is similar to the issue 
> https://issues.apache.org/jira/browse/FLINK-5394 which is already solved.
> I find although we set metadata provider to 
> {{FlinkDefaultRelMetadataProvider}} in {{FlinkRelBuilder}}, but after run 
> {code}planner.rel(...) {code} to translate SqlNode to RelNode, the  metadata 
> provider would be overrided from {{FlinkDefaultRelMetadataProvider}} to 
> {{DefaultRelMetadataProvider}} again because of the following code:
> {code}
>   val cluster: RelOptCluster = RelOptCluster.create(planner, rexBuilder)
>   val config = SqlToRelConverter.configBuilder()
> .withTrimUnusedFields(false).withConvertTableAccess(false).build()
>   val sqlToRelConverter: SqlToRelConverter = new SqlToRelConverter(
> new ViewExpanderImpl, validator, createCatalogReader, cluster, 
> convertletTable, config)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6037) the estimateRowCount method of DataSetCalc didn't work in SQL

2017-03-14 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923777#comment-15923777
 ] 

Fabian Hueske commented on FLINK-6037:
--

Hi [~jinyu.zj], can you add a description and explain how this issue is 
different from FLINK-5394. 
At the optimization layer, there should not be a difference between Table API 
and SQL since both are represented the same way.

Thanks, Fabian

> the estimateRowCount method of DataSetCalc didn't work in SQL
> -
>
> Key: FLINK-6037
> URL: https://issues.apache.org/jira/browse/FLINK-6037
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: jingzhang
>Assignee: jingzhang
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)