[GitHub] [flink] godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] Introduce LimitableTableSource for optimizing limit

2019-10-25 Thread GitBox
godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] 
Introduce LimitableTableSource for optimizing limit
URL: https://github.com/apache/flink/pull/9876#discussion_r338943502
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/test/resources/org/apache/flink/table/planner/plan/batch/sql/LimitTest.xml
 ##
 @@ -144,6 +186,27 @@ LogicalSort(fetch=[0])
 
   
+
+  
+  
+
+  
+
+
+  
+
+
+  

[GitHub] [flink] godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] Introduce LimitableTableSource for optimizing limit

2019-10-25 Thread GitBox
godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] 
Introduce LimitableTableSource for optimizing limit
URL: https://github.com/apache/flink/pull/9876#discussion_r338941218
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/test/resources/org/apache/flink/table/planner/plan/batch/sql/LimitTest.xml
 ##
 @@ -51,6 +72,27 @@ Calc(select=[a, c])
+- Exchange(distribution=[single])
   +- Limit(offset=[0], fetch=[20], global=[false])
  +- TableSourceScan(table=[[default_catalog, default_database, 
MyTable, source: [TestTableSource(a, b, c)]]], fields=[a, b, c])
+]]>
+
+  
+  
+
+  
+
+
+  
+
+
+  

[GitHub] [flink] godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] Introduce LimitableTableSource for optimizing limit

2019-10-25 Thread GitBox
godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] 
Introduce LimitableTableSource for optimizing limit
URL: https://github.com/apache/flink/pull/9876#discussion_r338941218
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/test/resources/org/apache/flink/table/planner/plan/batch/sql/LimitTest.xml
 ##
 @@ -51,6 +72,27 @@ Calc(select=[a, c])
+- Exchange(distribution=[single])
   +- Limit(offset=[0], fetch=[20], global=[false])
  +- TableSourceScan(table=[[default_catalog, default_database, 
MyTable, source: [TestTableSource(a, b, c)]]], fields=[a, b, c])
+]]>
+
+  
+  
+
+  
+
+
+  
+
+
+  

[GitHub] [flink] godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] Introduce LimitableTableSource for optimizing limit

2019-10-25 Thread GitBox
godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] 
Introduce LimitableTableSource for optimizing limit
URL: https://github.com/apache/flink/pull/9876#discussion_r338940838
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/rules/logical/PushLimitIntoTableSourceScanRule.scala
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.plan.rules.logical
+
+import org.apache.flink.table.api.TableException
+import org.apache.flink.table.plan.stats.TableStats
+import org.apache.flink.table.planner.plan.nodes.logical.{FlinkLogicalSort, 
FlinkLogicalTableSourceScan}
+import org.apache.flink.table.planner.plan.schema.{FlinkRelOptTable, 
TableSourceTable}
+import org.apache.flink.table.planner.plan.stats.FlinkStatistic
+import org.apache.flink.table.sources.LimitableTableSource
+
+import org.apache.calcite.plan.RelOptRule.{none, operand}
+import org.apache.calcite.plan.{RelOptRule, RelOptRuleCall}
+import org.apache.calcite.rel.core.{Sort, TableScan}
+import org.apache.calcite.rex.RexLiteral
+import org.apache.calcite.tools.RelBuilder
+
+/**
+  * Planner rule that tries to push limit into a [[LimitableTableSource]].
+  * The original limit will still be retained.
+  */
+class PushLimitIntoTableSourceScanRule extends RelOptRule(
+  operand(classOf[FlinkLogicalSort],
+operand(classOf[FlinkLogicalTableSourceScan], none)), 
"PushLimitIntoTableSourceScanRule") {
+
+  override def matches(call: RelOptRuleCall): Boolean = {
+val sort = call.rel(0).asInstanceOf[Sort]
+val fetch = sort.fetch
+val offset = sort.offset
+// Only push-down the limit whose offset equal zero. Because it is 
difficult to source based
+// push to handle the non-zero offset. And the non-zero offset usually 
appear together with
+// sort.
+val onlyLimit = sort.getCollation.getFieldCollations.isEmpty &&
+(offset == null || RexLiteral.intValue(offset) == 0) &&
+fetch != null
+
+var supportPushDown = false
+if (onlyLimit) {
+  supportPushDown = call.rel(1).asInstanceOf[TableScan]
+  .getTable.unwrap(classOf[TableSourceTable[_]]) match {
+case table: TableSourceTable[_] =>
+  table.tableSource match {
+case source: LimitableTableSource[_] => !source.isLimitPushedDown
+case _ => false
+  }
+case _ => false
+  }
+}
+supportPushDown
+  }
+
+  override def onMatch(call: RelOptRuleCall): Unit = {
+val sort = call.rel(0).asInstanceOf[Sort]
+val scan = call.rel(1).asInstanceOf[FlinkLogicalTableSourceScan]
+val relOptTable = scan.getTable.asInstanceOf[FlinkRelOptTable]
+val limit = RexLiteral.intValue(sort.fetch)
+val relBuilder = call.builder()
+val newRelOptTable = applyLimit(limit, relOptTable, relBuilder)
+val newScan = scan.copy(scan.getTraitSet, newRelOptTable)
+
+val newTableSource = 
newRelOptTable.unwrap(classOf[TableSourceTable[_]]).tableSource
+val oldTableSource = 
relOptTable.unwrap(classOf[TableSourceTable[_]]).tableSource
+
+if (newTableSource.asInstanceOf[LimitableTableSource[_]].isLimitPushedDown
+&& 
newTableSource.explainSource().equals(oldTableSource.explainSource)) {
+  throw new TableException("Failed to push limit into table source! "
+  + "table source with pushdown capability must override and change "
+  + "explainSource() API to explain the pushdown applied!")
+}
+call.transformTo(newScan)
 
 Review comment:
   yes, we should retain the limit and the concern mentioned above could be 
ignored 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] Introduce LimitableTableSource for optimizing limit

2019-10-24 Thread GitBox
godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] 
Introduce LimitableTableSource for optimizing limit
URL: https://github.com/apache/flink/pull/9876#discussion_r338496600
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/rules/logical/PushLimitIntoTableSourceScanRule.scala
 ##
 @@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.plan.rules.logical
+
+import org.apache.flink.table.plan.stats.TableStats
+import org.apache.flink.table.planner.plan.nodes.logical.{FlinkLogicalSort, 
FlinkLogicalTableSourceScan}
+import org.apache.flink.table.planner.plan.schema.{FlinkRelOptTable, 
TableSourceTable}
+import org.apache.flink.table.planner.plan.stats.FlinkStatistic
+import org.apache.flink.table.sources.LimitableTableSource
+
+import org.apache.calcite.plan.RelOptRule.{none, operand}
+import org.apache.calcite.plan.{RelOptRule, RelOptRuleCall}
+import org.apache.calcite.rel.core.{Sort, TableScan}
+import org.apache.calcite.rex.RexLiteral
+import org.apache.calcite.tools.RelBuilder
+
+/**
+  * Planner rule that tries to push limit into a [[LimitableTableSource]].
+  * The original limit will still be retained.
+  */
+class PushLimitIntoTableSourceScanRule extends RelOptRule(
+  operand(classOf[FlinkLogicalSort],
+operand(classOf[FlinkLogicalTableSourceScan], none)), 
"PushLimitIntoTableSourceScanRule") {
+
+  override def matches(call: RelOptRuleCall): Boolean = {
+val sort = call.rel(0).asInstanceOf[Sort]
+val fetch = sort.fetch
+val offset = sort.offset
+// Only push-down the limit whose offset equal zero. Because it is 
difficult to source based
+// push to handle the non-zero offset. And the non-zero offset usually 
appear together with
+// sort.
+val onlyLimit = sort.getCollation.getFieldCollations.isEmpty &&
+(offset == null || RexLiteral.intValue(offset) == 0) &&
+fetch != null
+
+var supportPushDown = false
+if (onlyLimit) {
+  supportPushDown = call.rel(1).asInstanceOf[TableScan]
+  .getTable.unwrap(classOf[TableSourceTable[_]]) match {
+case table: TableSourceTable[_] =>
+  table.tableSource match {
+case source: LimitableTableSource[_] => !source.isLimitPushedDown
 
 Review comment:
   no. current implementation violates the open/close principle. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] Introduce LimitableTableSource for optimizing limit

2019-10-24 Thread GitBox
godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] 
Introduce LimitableTableSource for optimizing limit
URL: https://github.com/apache/flink/pull/9876#discussion_r338497928
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/rules/logical/PushLimitIntoTableSourceScanRule.scala
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.plan.rules.logical
+
+import org.apache.flink.table.api.TableException
+import org.apache.flink.table.plan.stats.TableStats
+import org.apache.flink.table.planner.plan.nodes.logical.{FlinkLogicalSort, 
FlinkLogicalTableSourceScan}
+import org.apache.flink.table.planner.plan.schema.{FlinkRelOptTable, 
TableSourceTable}
+import org.apache.flink.table.planner.plan.stats.FlinkStatistic
+import org.apache.flink.table.sources.LimitableTableSource
+
+import org.apache.calcite.plan.RelOptRule.{none, operand}
+import org.apache.calcite.plan.{RelOptRule, RelOptRuleCall}
+import org.apache.calcite.rel.core.{Sort, TableScan}
+import org.apache.calcite.rex.RexLiteral
+import org.apache.calcite.tools.RelBuilder
+
+/**
+  * Planner rule that tries to push limit into a [[LimitableTableSource]].
+  * The original limit will still be retained.
+  */
+class PushLimitIntoTableSourceScanRule extends RelOptRule(
+  operand(classOf[FlinkLogicalSort],
+operand(classOf[FlinkLogicalTableSourceScan], none)), 
"PushLimitIntoTableSourceScanRule") {
+
+  override def matches(call: RelOptRuleCall): Boolean = {
+val sort = call.rel(0).asInstanceOf[Sort]
+val fetch = sort.fetch
+val offset = sort.offset
+// Only push-down the limit whose offset equal zero. Because it is 
difficult to source based
+// push to handle the non-zero offset. And the non-zero offset usually 
appear together with
+// sort.
+val onlyLimit = sort.getCollation.getFieldCollations.isEmpty &&
+(offset == null || RexLiteral.intValue(offset) == 0) &&
+fetch != null
+
+var supportPushDown = false
+if (onlyLimit) {
+  supportPushDown = call.rel(1).asInstanceOf[TableScan]
+  .getTable.unwrap(classOf[TableSourceTable[_]]) match {
+case table: TableSourceTable[_] =>
+  table.tableSource match {
+case source: LimitableTableSource[_] => !source.isLimitPushedDown
+case _ => false
+  }
+case _ => false
+  }
+}
+supportPushDown
+  }
+
+  override def onMatch(call: RelOptRuleCall): Unit = {
+val sort = call.rel(0).asInstanceOf[Sort]
+val scan = call.rel(1).asInstanceOf[FlinkLogicalTableSourceScan]
+val relOptTable = scan.getTable.asInstanceOf[FlinkRelOptTable]
+val limit = RexLiteral.intValue(sort.fetch)
+val relBuilder = call.builder()
+val newRelOptTable = applyLimit(limit, relOptTable, relBuilder)
+val newScan = scan.copy(scan.getTraitSet, newRelOptTable)
+
+val newTableSource = 
newRelOptTable.unwrap(classOf[TableSourceTable[_]]).tableSource
+val oldTableSource = 
relOptTable.unwrap(classOf[TableSourceTable[_]]).tableSource
+
+if (newTableSource.asInstanceOf[LimitableTableSource[_]].isLimitPushedDown
+&& 
newTableSource.explainSource().equals(oldTableSource.explainSource)) {
+  throw new TableException("Failed to push limit into table source! "
+  + "table source with pushdown capability must override and change "
+  + "explainSource() API to explain the pushdown applied!")
+}
+call.transformTo(newScan)
 
 Review comment:
   The limit node is removed for new plan, while the java doc mentions that 
`original limit will still be retained` 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] Introduce LimitableTableSource for optimizing limit

2019-10-24 Thread GitBox
godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] 
Introduce LimitableTableSource for optimizing limit
URL: https://github.com/apache/flink/pull/9876#discussion_r338494003
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/planner/utils/TestLimitableTableSource.scala
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.utils
+
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.java.io.CollectionInputFormat
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.streaming.api.datastream.DataStream
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
+import org.apache.flink.table.api.TableSchema
+import org.apache.flink.table.sources._
+import org.apache.flink.types.Row
+
+import scala.collection.JavaConverters._
+
+/**
+  * The table source which support push-down the limit to the source.
+  */
+class TestLimitableTableSource(
+data: Seq[Row],
+rowType: RowTypeInfo,
+var limit: Long = -1,
+var limitablePushedDown: Boolean = false)
+  extends StreamTableSource[Row]
+  with LimitableTableSource[Row] {
+
+  override def isBounded = true
+
+  override def getDataStream(execEnv: StreamExecutionEnvironment): 
DataStream[Row] = {
+if (limit == 0 && limit >= 0) {
 
 Review comment:
   how about the query like `select * from MyTable limit 0` ?  and the 
following code returns empty collection if limit is 0. I think the check 
condition should be `limit < 0`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] Introduce LimitableTableSource for optimizing limit

2019-10-23 Thread GitBox
godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] 
Introduce LimitableTableSource for optimizing limit
URL: https://github.com/apache/flink/pull/9876#discussion_r338360645
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/rules/logical/PushLimitIntoTableSourceScanRule.scala
 ##
 @@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.plan.rules.logical
+
+import org.apache.flink.table.plan.stats.TableStats
+import org.apache.flink.table.planner.plan.nodes.logical.{FlinkLogicalSort, 
FlinkLogicalTableSourceScan}
+import org.apache.flink.table.planner.plan.schema.{FlinkRelOptTable, 
TableSourceTable}
+import org.apache.flink.table.planner.plan.stats.FlinkStatistic
+import org.apache.flink.table.sources.LimitableTableSource
+
+import org.apache.calcite.plan.RelOptRule.{none, operand}
+import org.apache.calcite.plan.{RelOptRule, RelOptRuleCall}
+import org.apache.calcite.rel.core.{Sort, TableScan}
+import org.apache.calcite.rex.RexLiteral
+import org.apache.calcite.tools.RelBuilder
+
+/**
+  * Planner rule that tries to push limit into a [[LimitableTableSource]].
+  * The original limit will still be retained.
+  */
+class PushLimitIntoTableSourceScanRule extends RelOptRule(
+  operand(classOf[FlinkLogicalSort],
+operand(classOf[FlinkLogicalTableSourceScan], none)), 
"PushLimitIntoTableSourceScanRule") {
+
+  override def matches(call: RelOptRuleCall): Boolean = {
+val sort = call.rel(0).asInstanceOf[Sort]
+val fetch = sort.fetch
+val offset = sort.offset
+// Only push-down the limit whose offset equal zero. Because it is 
difficult to source based
+// push to handle the non-zero offset. And the non-zero offset usually 
appear together with
+// sort.
+val onlyLimit = sort.getCollation.getFieldCollations.isEmpty &&
+(offset == null || RexLiteral.intValue(offset) == 0) &&
+fetch != null
+
+var supportPushDown = false
+if (onlyLimit) {
+  supportPushDown = call.rel(1).asInstanceOf[TableScan]
+  .getTable.unwrap(classOf[TableSourceTable[_]]) match {
+case table: TableSourceTable[_] =>
+  table.tableSource match {
+case source: LimitableTableSource[_] => !source.isLimitPushedDown
+case _ => false
+  }
+case _ => false
+  }
+}
+supportPushDown
+  }
+
+  override def onMatch(call: RelOptRuleCall): Unit = {
+val sort = call.rel(0).asInstanceOf[Sort]
+val scan = call.rel(1).asInstanceOf[FlinkLogicalTableSourceScan]
+val relOptTable = scan.getTable.asInstanceOf[FlinkRelOptTable]
+val limit = RexLiteral.intValue(sort.fetch)
+val relBuilder = call.builder()
+val newRelOptTable = applyLimit(limit, relOptTable, relBuilder)
+val newScan = scan.copy(scan.getTraitSet, newRelOptTable)
 
 Review comment:
   we should check whether the digest of new scan has been changed just like 
[FLINK-12399](https://github.com/apache/flink/pull/8468)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] Introduce LimitableTableSource for optimizing limit

2019-10-23 Thread GitBox
godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] 
Introduce LimitableTableSource for optimizing limit
URL: https://github.com/apache/flink/pull/9876#discussion_r338359693
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/rules/logical/PushLimitIntoTableSourceScanRule.scala
 ##
 @@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.plan.rules.logical
+
+import org.apache.flink.table.plan.stats.TableStats
+import org.apache.flink.table.planner.plan.nodes.logical.{FlinkLogicalSort, 
FlinkLogicalTableSourceScan}
+import org.apache.flink.table.planner.plan.schema.{FlinkRelOptTable, 
TableSourceTable}
+import org.apache.flink.table.planner.plan.stats.FlinkStatistic
+import org.apache.flink.table.sources.LimitableTableSource
+
+import org.apache.calcite.plan.RelOptRule.{none, operand}
+import org.apache.calcite.plan.{RelOptRule, RelOptRuleCall}
+import org.apache.calcite.rel.core.{Sort, TableScan}
+import org.apache.calcite.rex.RexLiteral
+import org.apache.calcite.tools.RelBuilder
+
+/**
+  * Planner rule that tries to push limit into a [[LimitableTableSource]].
+  * The original limit will still be retained.
+  */
+class PushLimitIntoTableSourceScanRule extends RelOptRule(
+  operand(classOf[FlinkLogicalSort],
+operand(classOf[FlinkLogicalTableSourceScan], none)), 
"PushLimitIntoTableSourceScanRule") {
+
+  override def matches(call: RelOptRuleCall): Boolean = {
+val sort = call.rel(0).asInstanceOf[Sort]
+val fetch = sort.fetch
+val offset = sort.offset
+// Only push-down the limit whose offset equal zero. Because it is 
difficult to source based
+// push to handle the non-zero offset. And the non-zero offset usually 
appear together with
+// sort.
+val onlyLimit = sort.getCollation.getFieldCollations.isEmpty &&
+(offset == null || RexLiteral.intValue(offset) == 0) &&
+fetch != null
+
+var supportPushDown = false
+if (onlyLimit) {
+  supportPushDown = call.rel(1).asInstanceOf[TableScan]
+  .getTable.unwrap(classOf[TableSourceTable[_]]) match {
+case table: TableSourceTable[_] =>
+  table.tableSource match {
+case source: LimitableTableSource[_] => !source.isLimitPushedDown
 
 Review comment:
   we can't push limit down if the table source is a `FilterableTableSource` 
and `isFilterPushedDown` is true. Because if the table source is 
`ParquetTableSource` (which is a `FilterableTableSource`), and the `predicate` 
in `ParquetTableSource` can not filter records, just row-groups. Some `dirty` 
records maybe return from table source. It can't limit the `clean` records in 
such table source. 
   Similarly, we should change the `match` in 
`PushFilterIntoTableSourceScanRule`, and can't push filter down if the table 
source is a `LimitableTableSource` and `isLimitPushedDown` is true.
   
   The logic of `PushLimitIntoTableSourceScanRule` and 
`PushFilterIntoTableSourceScanRule` are coupled together. It's better to find a 
clean way.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] Introduce LimitableTableSource for optimizing limit

2019-10-23 Thread GitBox
godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] 
Introduce LimitableTableSource for optimizing limit
URL: https://github.com/apache/flink/pull/9876#discussion_r338361636
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/planner/utils/TestLimitableTableSource.scala
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.utils
+
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.java.io.CollectionInputFormat
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.streaming.api.datastream.DataStream
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
+import org.apache.flink.table.api.TableSchema
+import org.apache.flink.table.sources._
+import org.apache.flink.types.Row
+
+import scala.collection.JavaConverters._
+
+/**
+  * The table source which support push-down the limit to the source.
+  */
+class TestLimitableTableSource(
+data: Seq[Row],
+rowType: RowTypeInfo,
+var limit: Long = -1,
+var limitablePushedDown: Boolean = false)
+  extends StreamTableSource[Row]
+  with LimitableTableSource[Row] {
+
+  override def isBounded = true
+
+  override def getDataStream(execEnv: StreamExecutionEnvironment): 
DataStream[Row] = {
+if (limit == 0 && limit >= 0) {
 
 Review comment:
   `limit == 0 && limit >= 0` ??


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] Introduce LimitableTableSource for optimizing limit

2019-10-23 Thread GitBox
godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] 
Introduce LimitableTableSource for optimizing limit
URL: https://github.com/apache/flink/pull/9876#discussion_r338362044
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/test/resources/org/apache/flink/table/planner/plan/batch/sql/LimitTest.xml
 ##
 @@ -19,30 +19,56 @@ limitations under the License.
   
 
   
+
 
 Review comment:
   remove this blank line ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] Introduce LimitableTableSource for optimizing limit

2019-10-23 Thread GitBox
godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] 
Introduce LimitableTableSource for optimizing limit
URL: https://github.com/apache/flink/pull/9876#discussion_r338355400
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/sources/LimitableTableSource.java
 ##
 @@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.sources;
+
+import org.apache.flink.annotation.Experimental;
+
+/**
+ * Adds support for limiting push-down to a {@link TableSource}.
+ * A {@link TableSource} extending this interface is able to limit the number 
of records.
+ */
+@Experimental
 
 Review comment:
   why this interface is experimental ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] Introduce LimitableTableSource for optimizing limit

2019-10-23 Thread GitBox
godfreyhe commented on a change in pull request #9876: [FLINK-14134][table] 
Introduce LimitableTableSource for optimizing limit
URL: https://github.com/apache/flink/pull/9876#discussion_r338362335
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/planner/plan/batch/sql/LimitTest.scala
 ##
 @@ -90,4 +96,28 @@ class LimitTest extends TableTestBase {
 util.verifyPlan("SELECT a, c FROM MyTable OFFSET 10 ROWS")
   }
 
+  @Test
+  def testFetchWithLimitSource(): Unit = {
 
 Review comment:
   please add a test with `order by`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services