[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner URL: https://github.com/apache/flink/pull/10315#discussion_r353028551 ## File path: flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/schema/TableSourceTable.scala ## @@ -50,7 +51,8 @@ class TableSourceTable[T]( statistic: FlinkStatistic, val tableSource: TableSource[T], val isStreamingMode: Boolean, -val catalogTable: CatalogTable) +val catalogTable: CatalogTable, +val tableIdentifier: Option[ObjectIdentifier]) Review comment: It's better to make the comment of `tableIdentifier` field more clear This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner URL: https://github.com/apache/flink/pull/10315#discussion_r353017040 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/plan/stats/TableStats.java ## @@ -69,4 +70,18 @@ public TableStats copy() { return copy; } + public TableStats merge(TableStats other) { + Map colStats = new HashMap<>(); + for (Map.Entry entry : this.colStats.entrySet()) { + String col = entry.getKey(); + ColumnStats stats = entry.getValue(); + ColumnStats otherStats = other.colStats.get(col); + if (otherStats != null) { + colStats.put(col, stats.merge(otherStats)); + } + } + return new TableStats( + Stream.of(this.rowCount, other.rowCount).anyMatch(c -> c == UNKNOWN.rowCount) ? Review comment: i think the logic will be more robustness , like `this.rowCount >=0 && other.rowCount >= 0 ? ...` not `c == UNKNOWN.rowCount` (it's a public interface, other negative number may also be given by user) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner URL: https://github.com/apache/flink/pull/10315#discussion_r353011470 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/plan/stats/ColumnStats.java ## @@ -217,6 +218,60 @@ public ColumnStats copy() { } } + @SuppressWarnings("unchecked") + public ColumnStats merge(ColumnStats other) { + Review comment: nit: remove this blank line This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner URL: https://github.com/apache/flink/pull/10315#discussion_r353027712 ## File path: flink-table/flink-table-planner-blink/src/test/java/org/apache/flink/table/planner/catalog/CatalogStatisticsTest.java ## @@ -111,10 +119,62 @@ public void testGetStatsFromCatalogForCatalogTableImpl() throws Exception { false); alterTableStatistics(catalog); - assertStatistics(tEnv); } + @Test + public void testGetPartitionStatsFromCatalog() throws Exception { Review comment: https://issues.apache.org/jira/browse/FLINK-14663, and @zjuwangg is fixing it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner URL: https://github.com/apache/flink/pull/10315#discussion_r352476764 ## File path: flink-table/flink-table-planner-blink/src/test/java/org/apache/flink/table/planner/catalog/CatalogStatisticsTest.java ## @@ -111,10 +119,62 @@ public void testGetStatsFromCatalogForCatalogTableImpl() throws Exception { false); alterTableStatistics(catalog); - assertStatistics(tEnv); } + @Test + public void testGetPartitionStatsFromCatalog() throws Exception { Review comment: please add some tests about UNKNOWN stats This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner URL: https://github.com/apache/flink/pull/10315#discussion_r352464395 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/plan/stats/ColumnStats.java ## @@ -217,6 +219,55 @@ public ColumnStats copy() { } } + @SuppressWarnings("unchecked") + public ColumnStats merge(ColumnStats other) { + Long ndv = Stream.of(this.ndv, other.ndv).filter(Objects::nonNull).reduce(Long::sum).orElse(null); Review comment: i think the strategy should be: if one of the `ndv`s is null, the merged `ndv` should also be null. for example: there are many `ColumnStats`s to merge, and only a few of `ColumnStats`s have non-null `ndv` values, the result of current strategy is an untrusted value. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner URL: https://github.com/apache/flink/pull/10315#discussion_r352472551 ## File path: flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/rules/logical/PushPartitionIntoTableSourceScanRule.scala ## @@ -133,6 +147,21 @@ class PushPartitionIntoTableSourceScanRule extends RelOptRule( } } + private def extractPartitionStats( Review comment: `getPartitionStats` is better ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner URL: https://github.com/apache/flink/pull/10315#discussion_r352465217 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/plan/stats/TableStats.java ## @@ -69,4 +69,17 @@ public TableStats copy() { return copy; } + public TableStats merge(TableStats other) { + Map colStats = new HashMap<>(); + HashMap otherColStats = new HashMap<>(other.colStats); + for (Map.Entry entry : this.colStats.entrySet()) { + String col = entry.getKey(); + ColumnStats stats = entry.getValue(); + ColumnStats otherStats = otherColStats.remove(col); + stats = otherStats == null ? stats : stats.merge(otherStats); + colStats.put(col, stats); + } + colStats.putAll(otherColStats); + return new TableStats(this.rowCount + other.rowCount, colStats); Review comment: if `rowCount` is -1 (unknown value), the merged `rowCount` should also be unknown This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner URL: https://github.com/apache/flink/pull/10315#discussion_r352468269 ## File path: flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/rules/logical/PushPartitionIntoTableSourceScanRule.scala ## @@ -143,7 +172,7 @@ class PushPartitionIntoTableSourceScanRule extends RelOptRule( private def adjustPartitionPredicate( inputFieldNames: Array[String], partitionFieldNames: Array[String], - partitionPredicate: RexNode): RexNode = { + partitionPredicate: RexNode) = { Review comment: revert this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner URL: https://github.com/apache/flink/pull/10315#discussion_r352473315 ## File path: flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/rules/logical/PushPartitionIntoTableSourceScanRule.scala ## @@ -112,14 +124,16 @@ class PushPartitionIntoTableSourceScanRule extends RelOptRule( } val statistic = tableSourceTable.getStatistic -val newStatistic = if (remainingPartitions.size() == allPartitions.size()) { - // Keep all Statistics if no predicates can be pushed down - statistic -} else if (statistic == FlinkStatistic.UNKNOWN) { - statistic -} else { - // Remove tableStats after predicates pushed down - FlinkStatistic.builder().statistic(statistic).tableStats(null).build() +val newStatistic = { + val tableStats = catalogOption match { Review comment: if no partitions are pruned and the original `TableStats` is not `UNKNOWN`, we could use the original `TableStats` and avoid fetch all partitions' statistics This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner URL: https://github.com/apache/flink/pull/10315#discussion_r352475344 ## File path: flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/schema/TableSourceTable.scala ## @@ -50,7 +51,8 @@ class TableSourceTable[T]( statistic: FlinkStatistic, val tableSource: TableSource[T], val isStreamingMode: Boolean, -val catalogTable: CatalogTable) +val catalogTable: CatalogTable, +val tableIdentifier: Option[ObjectIdentifier]) Review comment: field `names` already contains full info of `tableIdentifier` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner
godfreyhe commented on a change in pull request #10315: [FLINK-14552][table] Enable partition statistics in blink planner URL: https://github.com/apache/flink/pull/10315#discussion_r352470086 ## File path: flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/rules/logical/PushPartitionIntoTableSourceScanRule.scala ## @@ -133,6 +147,21 @@ class PushPartitionIntoTableSourceScanRule extends RelOptRule( } } + private def extractPartitionStats( + catalog: Catalog, + objectIdentifier: ObjectIdentifier, + partSpec: util.Map[String, String]) = { Review comment: please add return type This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services