[GitHub] [flink] zjuwangg commented on a change in pull request #8636: [FLINK-12237][hive]Support Hive table stats related operations in HiveCatalog
zjuwangg commented on a change in pull request #8636: [FLINK-12237][hive]Support Hive table stats related operations in HiveCatalog URL: https://github.com/apache/flink/pull/8636#discussion_r299760649 ## File path: flink-table/flink-table-common/src/test/java/org/apache/flink/table/catalog/CatalogTest.java ## @@ -1069,6 +1070,84 @@ public void testListPartitionPartialSpec() throws Exception { assertEquals(1, catalog.listPartitions(path1, createAnotherPartitionSpecSubset()).size()); } + + // -- table and column stats -- + + @Test + public void testGetTableStats_TableNotExistException() throws Exception{ + catalog.createDatabase(db1, createDb(), false); + exception.expect(org.apache.flink.table.catalog.exceptions.TableNotExistException.class); + catalog.getTableStatistics(path1); + } + + @Test + public void testGetPartitionStats() throws Exception{ + catalog.createDatabase(db1, createDb(), false); + catalog.createTable(path1, createPartitionedTable(), false); + catalog.createPartition(path1, createPartitionSpec(), createPartition(), false); + CatalogTableStatistics tableStatistics = catalog.getPartitionStatistics(path1, createPartitionSpec()); + assertEquals(0, tableStatistics.getFileCount()); Review comment: it will cause an unwanted reverse dependency for HiveStatsUtil is a class of flink-connector-hive. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] zjuwangg commented on a change in pull request #8636: [FLINK-12237][hive]Support Hive table stats related operations in HiveCatalog
zjuwangg commented on a change in pull request #8636: [FLINK-12237][hive]Support Hive table stats related operations in HiveCatalog URL: https://github.com/apache/flink/pull/8636#discussion_r298069919 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java ## @@ -1060,21 +1061,73 @@ private static Function instantiateHiveFunction(ObjectPath functionPath, HiveCat ); } + private boolean isTablePartitioned(Table hiveTable) { + return hiveTable.getPartitionKeysSize() != 0; + } + // -- stats -- @Override public void alterTableStatistics(ObjectPath tablePath, CatalogTableStatistics tableStatistics, boolean ignoreIfNotExists) throws TableNotExistException, CatalogException { - + try { + Table hiveTable = getHiveTable(tablePath); + // Set table stats + if (needToUpdateStatistics(tableStatistics, hiveTable.getParameters())) { + updateStatisticsParameters(tableStatistics, hiveTable.getParameters()); + client.alter_table(tablePath.getDatabaseName(), tablePath.getObjectName(), hiveTable); + } + } catch (TableNotExistException e) { + if (!ignoreIfNotExists) { + throw e; + } + } catch (TException e) { + throw new CatalogException(String.format("Failed to alter table stats of table %s", tablePath.getFullName()), e); + } } @Override public void alterTableColumnStatistics(ObjectPath tablePath, CatalogColumnStatistics columnStatistics, boolean ignoreIfNotExists) throws TableNotExistException, CatalogException { } + private static boolean needToUpdateStatistics(CatalogTableStatistics statistics, Map oldParameters) { + String oldRowCount = oldParameters.getOrDefault(StatsSetupConst.ROW_COUNT, "0"); Review comment: > CatalogTableStatistics Now we alter hive table stats just consider specific four stats const and don't use `CatalogTableStatistics::properties`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] zjuwangg commented on a change in pull request #8636: [FLINK-12237][hive]Support Hive table stats related operations in HiveCatalog
zjuwangg commented on a change in pull request #8636: [FLINK-12237][hive]Support Hive table stats related operations in HiveCatalog URL: https://github.com/apache/flink/pull/8636#discussion_r297103593 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java ## @@ -1060,21 +1061,73 @@ private static Function instantiateHiveFunction(ObjectPath functionPath, HiveCat ); } + private boolean isTablePartitioned(Table hiveTable) { + return hiveTable.getPartitionKeysSize() != 0; + } + // -- stats -- @Override public void alterTableStatistics(ObjectPath tablePath, CatalogTableStatistics tableStatistics, boolean ignoreIfNotExists) throws TableNotExistException, CatalogException { - + try { + Table hiveTable = getHiveTable(tablePath); + // Set table stats + if (needToUpdateStatistics(tableStatistics, hiveTable.getParameters())) { + updateStatisticsParameters(tableStatistics, hiveTable.getParameters()); + client.alter_table(tablePath.getDatabaseName(), tablePath.getObjectName(), hiveTable); Review comment: No, i haven't tested this with Hive-1.2.1, since you have added a shim for it, I'll add it also. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] zjuwangg commented on a change in pull request #8636: [FLINK-12237][hive]Support Hive table stats related operations in HiveCatalog
zjuwangg commented on a change in pull request #8636: [FLINK-12237][hive]Support Hive table stats related operations in HiveCatalog URL: https://github.com/apache/flink/pull/8636#discussion_r296989387 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java ## @@ -1083,8 +1136,14 @@ public void alterPartitionColumnStatistics(ObjectPath tablePath, CatalogPartitio } @Override - public CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException { - throw new UnsupportedOperationException(); + public CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws TableNotExistException, + CatalogException, TableNotPartitionedException { + Table hiveTable = getHiveTable(tablePath); + if (!isTablePartitioned(hiveTable)) { + return createCatalogTableStatistics(hiveTable.getParameters()); + } else { + throw new TableNotPartitionedException(getName(), tablePath); Review comment: `hive> ANALYZE TABLE pt_products COMPUTE STATISTICS ; FAILED: SemanticException [Error 10115]: Table is partitioned and partition specification is needed` I test it in the hive shell and it returns an error message. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services