[GitHub] [flink] zjuwangg commented on a change in pull request #8636: [FLINK-12237][hive]Support Hive table stats related operations in HiveCatalog

2019-07-02 Thread GitBox
zjuwangg commented on a change in pull request #8636: 
[FLINK-12237][hive]Support Hive table stats related operations in HiveCatalog
URL: https://github.com/apache/flink/pull/8636#discussion_r299760649
 
 

 ##
 File path: 
flink-table/flink-table-common/src/test/java/org/apache/flink/table/catalog/CatalogTest.java
 ##
 @@ -1069,6 +1070,84 @@ public void testListPartitionPartialSpec() throws 
Exception {
assertEquals(1, catalog.listPartitions(path1, 
createAnotherPartitionSpecSubset()).size());
}
 
+
+   // -- table and column stats --
+
+   @Test
+   public void testGetTableStats_TableNotExistException() throws Exception{
+   catalog.createDatabase(db1, createDb(), false);
+   
exception.expect(org.apache.flink.table.catalog.exceptions.TableNotExistException.class);
+   catalog.getTableStatistics(path1);
+   }
+
+   @Test
+   public void testGetPartitionStats() throws Exception{
+   catalog.createDatabase(db1, createDb(), false);
+   catalog.createTable(path1, createPartitionedTable(), false);
+   catalog.createPartition(path1, createPartitionSpec(), 
createPartition(), false);
+   CatalogTableStatistics tableStatistics = 
catalog.getPartitionStatistics(path1, createPartitionSpec());
+   assertEquals(0, tableStatistics.getFileCount());
 
 Review comment:
   it will cause an unwanted reverse dependency for HiveStatsUtil is a class of 
flink-connector-hive.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] zjuwangg commented on a change in pull request #8636: [FLINK-12237][hive]Support Hive table stats related operations in HiveCatalog

2019-06-27 Thread GitBox
zjuwangg commented on a change in pull request #8636: 
[FLINK-12237][hive]Support Hive table stats related operations in HiveCatalog
URL: https://github.com/apache/flink/pull/8636#discussion_r298069919
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java
 ##
 @@ -1060,21 +1061,73 @@ private static Function 
instantiateHiveFunction(ObjectPath functionPath, HiveCat
);
}
 
+   private boolean isTablePartitioned(Table hiveTable) {
+   return hiveTable.getPartitionKeysSize() != 0;
+   }
+
// -- stats --
 
@Override
public void alterTableStatistics(ObjectPath tablePath, 
CatalogTableStatistics tableStatistics, boolean ignoreIfNotExists) throws 
TableNotExistException, CatalogException {
-
+   try {
+   Table hiveTable = getHiveTable(tablePath);
+   // Set table stats
+   if (needToUpdateStatistics(tableStatistics, 
hiveTable.getParameters())) {
+   updateStatisticsParameters(tableStatistics, 
hiveTable.getParameters());
+   client.alter_table(tablePath.getDatabaseName(), 
tablePath.getObjectName(), hiveTable);
+   }
+   } catch (TableNotExistException e) {
+   if (!ignoreIfNotExists) {
+   throw e;
+   }
+   } catch (TException e) {
+   throw new CatalogException(String.format("Failed to 
alter table stats of table %s", tablePath.getFullName()), e);
+   }
}
 
@Override
public void alterTableColumnStatistics(ObjectPath tablePath, 
CatalogColumnStatistics columnStatistics, boolean ignoreIfNotExists) throws 
TableNotExistException, CatalogException {
 
}
 
+   private static boolean needToUpdateStatistics(CatalogTableStatistics 
statistics, Map oldParameters) {
+   String oldRowCount = 
oldParameters.getOrDefault(StatsSetupConst.ROW_COUNT, "0");
 
 Review comment:
   > CatalogTableStatistics
   
   Now we alter hive table stats just consider specific four stats const and 
don't use `CatalogTableStatistics::properties`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] zjuwangg commented on a change in pull request #8636: [FLINK-12237][hive]Support Hive table stats related operations in HiveCatalog

2019-06-25 Thread GitBox
zjuwangg commented on a change in pull request #8636: 
[FLINK-12237][hive]Support Hive table stats related operations in HiveCatalog
URL: https://github.com/apache/flink/pull/8636#discussion_r297103593
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java
 ##
 @@ -1060,21 +1061,73 @@ private static Function 
instantiateHiveFunction(ObjectPath functionPath, HiveCat
);
}
 
+   private boolean isTablePartitioned(Table hiveTable) {
+   return hiveTable.getPartitionKeysSize() != 0;
+   }
+
// -- stats --
 
@Override
public void alterTableStatistics(ObjectPath tablePath, 
CatalogTableStatistics tableStatistics, boolean ignoreIfNotExists) throws 
TableNotExistException, CatalogException {
-
+   try {
+   Table hiveTable = getHiveTable(tablePath);
+   // Set table stats
+   if (needToUpdateStatistics(tableStatistics, 
hiveTable.getParameters())) {
+   updateStatisticsParameters(tableStatistics, 
hiveTable.getParameters());
+   client.alter_table(tablePath.getDatabaseName(), 
tablePath.getObjectName(), hiveTable);
 
 Review comment:
   No, i haven't tested this with Hive-1.2.1, since you have added a shim for 
it, I'll add it also.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] zjuwangg commented on a change in pull request #8636: [FLINK-12237][hive]Support Hive table stats related operations in HiveCatalog

2019-06-24 Thread GitBox
zjuwangg commented on a change in pull request #8636: 
[FLINK-12237][hive]Support Hive table stats related operations in HiveCatalog
URL: https://github.com/apache/flink/pull/8636#discussion_r296989387
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java
 ##
 @@ -1083,8 +1136,14 @@ public void alterPartitionColumnStatistics(ObjectPath 
tablePath, CatalogPartitio
}
 
@Override
-   public CatalogTableStatistics getTableStatistics(ObjectPath tablePath) 
throws TableNotExistException, CatalogException {
-   throw new UnsupportedOperationException();
+   public CatalogTableStatistics getTableStatistics(ObjectPath tablePath) 
throws TableNotExistException,
+   CatalogException, TableNotPartitionedException {
+   Table hiveTable = getHiveTable(tablePath);
+   if (!isTablePartitioned(hiveTable)) {
+   return 
createCatalogTableStatistics(hiveTable.getParameters());
+   } else {
+   throw new TableNotPartitionedException(getName(), 
tablePath);
 
 Review comment:
   `hive> ANALYZE TABLE pt_products COMPUTE STATISTICS ;
   
   FAILED: SemanticException [Error 10115]: Table is partitioned and partition 
specification is needed`
   
   I test it in the hive shell and it returns an error message.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services