[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner

2019-12-03 Thread GitBox
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] 
Enable partition statistics in blink planner
URL: https://github.com/apache/flink/pull/10315#discussion_r353028551
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/schema/TableSourceTable.scala
 ##
 @@ -50,7 +51,8 @@ class TableSourceTable[T](
 statistic: FlinkStatistic,
 val tableSource: TableSource[T],
 val isStreamingMode: Boolean,
-val catalogTable: CatalogTable)
+val catalogTable: CatalogTable,
+val tableIdentifier: Option[ObjectIdentifier])
 
 Review comment:
   It's better to make the comment of `tableIdentifier` field more clear


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner

2019-12-02 Thread GitBox
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] 
Enable partition statistics in blink planner
URL: https://github.com/apache/flink/pull/10315#discussion_r353017040
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/plan/stats/TableStats.java
 ##
 @@ -69,4 +70,18 @@ public TableStats copy() {
return copy;
}
 
+   public TableStats merge(TableStats other) {
+   Map colStats = new HashMap<>();
+   for (Map.Entry entry : 
this.colStats.entrySet()) {
+   String col = entry.getKey();
+   ColumnStats stats = entry.getValue();
+   ColumnStats otherStats = other.colStats.get(col);
+   if (otherStats != null) {
+   colStats.put(col, stats.merge(otherStats));
+   }
+   }
+   return new TableStats(
+   Stream.of(this.rowCount, 
other.rowCount).anyMatch(c -> c == UNKNOWN.rowCount) ?
 
 Review comment:
   i think the logic will be more robustness , like `this.rowCount >=0 && 
other.rowCount >= 0 ? ...` not `c == UNKNOWN.rowCount` (it's a public 
interface, other negative number may also be given by user) 
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner

2019-12-02 Thread GitBox
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] 
Enable partition statistics in blink planner
URL: https://github.com/apache/flink/pull/10315#discussion_r353011470
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/plan/stats/ColumnStats.java
 ##
 @@ -217,6 +218,60 @@ public ColumnStats copy() {
}
}
 
+   @SuppressWarnings("unchecked")
+   public ColumnStats merge(ColumnStats other) {
+
 
 Review comment:
   nit: remove this blank line


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner

2019-12-02 Thread GitBox
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] 
Enable partition statistics in blink planner
URL: https://github.com/apache/flink/pull/10315#discussion_r353027712
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/test/java/org/apache/flink/table/planner/catalog/CatalogStatisticsTest.java
 ##
 @@ -111,10 +119,62 @@ public void testGetStatsFromCatalogForCatalogTableImpl() 
throws Exception {
false);
 
alterTableStatistics(catalog);
-
assertStatistics(tEnv);
}
 
+   @Test
+   public void testGetPartitionStatsFromCatalog() throws Exception {
 
 Review comment:
   https://issues.apache.org/jira/browse/FLINK-14663, and @zjuwangg is fixing it


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner

2019-12-02 Thread GitBox
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] 
Enable partition statistics in blink planner
URL: https://github.com/apache/flink/pull/10315#discussion_r352476764
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/test/java/org/apache/flink/table/planner/catalog/CatalogStatisticsTest.java
 ##
 @@ -111,10 +119,62 @@ public void testGetStatsFromCatalogForCatalogTableImpl() 
throws Exception {
false);
 
alterTableStatistics(catalog);
-
assertStatistics(tEnv);
}
 
+   @Test
+   public void testGetPartitionStatsFromCatalog() throws Exception {
 
 Review comment:
   please add some tests about UNKNOWN stats


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner

2019-12-02 Thread GitBox
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] 
Enable partition statistics in blink planner
URL: https://github.com/apache/flink/pull/10315#discussion_r352464395
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/plan/stats/ColumnStats.java
 ##
 @@ -217,6 +219,55 @@ public ColumnStats copy() {
}
}
 
+   @SuppressWarnings("unchecked")
+   public ColumnStats merge(ColumnStats other) {
+   Long ndv = Stream.of(this.ndv, 
other.ndv).filter(Objects::nonNull).reduce(Long::sum).orElse(null);
 
 Review comment:
   i think the strategy should be: if one of the `ndv`s is null, the merged 
`ndv` should also be null. for example: there are many `ColumnStats`s to merge, 
and only a few of `ColumnStats`s have non-null `ndv` values, the result of 
current strategy is an untrusted value.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner

2019-12-02 Thread GitBox
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] 
Enable partition statistics in blink planner
URL: https://github.com/apache/flink/pull/10315#discussion_r352472551
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/rules/logical/PushPartitionIntoTableSourceScanRule.scala
 ##
 @@ -133,6 +147,21 @@ class PushPartitionIntoTableSourceScanRule extends 
RelOptRule(
 }
   }
 
+  private def extractPartitionStats(
 
 Review comment:
   `getPartitionStats` is better ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner

2019-12-02 Thread GitBox
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] 
Enable partition statistics in blink planner
URL: https://github.com/apache/flink/pull/10315#discussion_r352465217
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/plan/stats/TableStats.java
 ##
 @@ -69,4 +69,17 @@ public TableStats copy() {
return copy;
}
 
+   public TableStats merge(TableStats other) {
+   Map colStats = new HashMap<>();
+   HashMap otherColStats = new 
HashMap<>(other.colStats);
+   for (Map.Entry entry : 
this.colStats.entrySet()) {
+   String col = entry.getKey();
+   ColumnStats stats = entry.getValue();
+   ColumnStats otherStats = otherColStats.remove(col);
+   stats = otherStats == null ? stats : 
stats.merge(otherStats);
+   colStats.put(col, stats);
+   }
+   colStats.putAll(otherColStats);
+   return new TableStats(this.rowCount + other.rowCount, colStats);
 
 Review comment:
   if `rowCount` is -1 (unknown value), the merged `rowCount` should also be 
unknown


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner

2019-12-02 Thread GitBox
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] 
Enable partition statistics in blink planner
URL: https://github.com/apache/flink/pull/10315#discussion_r352468269
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/rules/logical/PushPartitionIntoTableSourceScanRule.scala
 ##
 @@ -143,7 +172,7 @@ class PushPartitionIntoTableSourceScanRule extends 
RelOptRule(
   private def adjustPartitionPredicate(
   inputFieldNames: Array[String],
   partitionFieldNames: Array[String],
-  partitionPredicate: RexNode): RexNode = {
+  partitionPredicate: RexNode) = {
 
 Review comment:
   revert this


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner

2019-12-02 Thread GitBox
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] 
Enable partition statistics in blink planner
URL: https://github.com/apache/flink/pull/10315#discussion_r352473315
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/rules/logical/PushPartitionIntoTableSourceScanRule.scala
 ##
 @@ -112,14 +124,16 @@ class PushPartitionIntoTableSourceScanRule extends 
RelOptRule(
 }
 
 val statistic = tableSourceTable.getStatistic
-val newStatistic = if (remainingPartitions.size() == allPartitions.size()) 
{
-  // Keep all Statistics if no predicates can be pushed down
-  statistic
-} else if (statistic == FlinkStatistic.UNKNOWN) {
-  statistic
-} else {
-  // Remove tableStats after predicates pushed down
-  FlinkStatistic.builder().statistic(statistic).tableStats(null).build()
+val newStatistic = {
+  val tableStats = catalogOption match {
 
 Review comment:
   if no partitions are pruned and the original `TableStats` is not `UNKNOWN`, 
we could use the original `TableStats` and avoid fetch all partitions' 
statistics


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner

2019-12-02 Thread GitBox
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] 
Enable partition statistics in blink planner
URL: https://github.com/apache/flink/pull/10315#discussion_r352475344
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/schema/TableSourceTable.scala
 ##
 @@ -50,7 +51,8 @@ class TableSourceTable[T](
 statistic: FlinkStatistic,
 val tableSource: TableSource[T],
 val isStreamingMode: Boolean,
-val catalogTable: CatalogTable)
+val catalogTable: CatalogTable,
+val tableIdentifier: Option[ObjectIdentifier])
 
 Review comment:
   field `names` already contains full info of `tableIdentifier`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner

2019-12-02 Thread GitBox
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] 
Enable partition statistics in blink planner
URL: https://github.com/apache/flink/pull/10315#discussion_r352470086
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/rules/logical/PushPartitionIntoTableSourceScanRule.scala
 ##
 @@ -133,6 +147,21 @@ class PushPartitionIntoTableSourceScanRule extends 
RelOptRule(
 }
   }
 
+  private def extractPartitionStats(
+  catalog: Catalog,
+  objectIdentifier: ObjectIdentifier,
+  partSpec: util.Map[String, String]) = {
 
 Review comment:
   please add return type


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services