[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-05-01 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r280134188
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/stats/CatalogColumnStatistics.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.catalog.stats;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Column statistics a table or partition.
 
 Review comment:
   nit: "Column statistics `of` a table or partition."?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-05-01 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r280134823
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/stats/CatalogTableStatistics.java
 ##
 @@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.catalog.stats;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Statistics for a non-partitioned table or a partition of a partitioned 
table.
+ */
+public class CatalogTableStatistics {
+   public static final CatalogTableStatistics UNKNOWN = new 
CatalogTableStatistics(0, 0, 0, 0);
+
+   /**
+* The number of rows in the table or partition.
+*/
+   private final long rowCount;
+
+   /**
+* The number of files on disk.
+*/
+   private final int fileCount;
+
+   /**
+* The total size in bytes.
+*/
+   private final long totalSize;
+
+   /**
+* The raw data size (size when loaded in memory) in bytes.
+*/
+   private final long rawDataSize;
+
+   private Map properties;
+
+   public CatalogTableStatistics(long rowCount, int fileCount, long 
totalSize, long rawDataSize) {
+   this(rowCount, fileCount, totalSize, rawDataSize, new 
HashMap<>());
+   }
+
+   public CatalogTableStatistics(long rowCount, int fileCount, long 
totalSize, long rawDataSize,
+   Map properties) {
+   this.rowCount = rowCount;
+   this.fileCount = fileCount;
+   this.totalSize = totalSize;
+   this.rawDataSize = rawDataSize;
+   this.properties = properties;
+   }
+
+   /**
+* The number of rows.
+*/
+   public long getRowCount() {
+   return this.rowCount;
+   }
+
+   public int getFileCount() {
+   return this.fileCount;
+   }
+
+   public long getTotalSize() {
+   return this.totalSize;
+   }
+
+   public long getRawDataSize() {
+   return this.rawDataSize;
+   }
+
+   public Map getProperties() {
+   return this.properties;
+   }
+
+   /**
+* Create a deep copy of "this" instance.
+* @return a deep copy
 
 Review comment:
   nit: please add an empty line between comment and annotation


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-05-01 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r280133871
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/stats/CatalogColumnStatisticsDataLong.java
 ##
 @@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.catalog.stats;
+
+import java.util.Map;
+
+/**
+ * Column statistics value of long type.
+ */
+public class CatalogColumnStatisticsDataLong extends 
CatalogColumnStatisticsDataBase {
+   /**
+* mim value.
+*/
+   private final long min;
+
+   /**
+* max value.
+*/
+   private final long max;
+
+   /**
+* number of distinct values.
+*/
+   private final long ndv;
+
+   public CatalogColumnStatisticsDataLong(long min, long max, long ndv, 
long nullCount) {
+   super(nullCount);
+   this.min = min;
+   this.max = max;
+   this.ndv = ndv;
+   }
+
+   public CatalogColumnStatisticsDataLong(long min, long max, long ndv, 
long nullCount, Map properties) {
+   super(nullCount, properties);
+   this.min = min;
+   this.max = max;
+   this.ndv = ndv;
+   }
+
+   public long getMin() {
+   return min;
+   }
+
+   public long getMax() {
+   return max;
+   }
+
+   public long getNdv() {
+   return ndv;
+   }
+
+   public CatalogColumnStatisticsDataLong copy() {
+   return new CatalogColumnStatisticsDataLong(min, max, ndv, 
getNullCount(), getProperties());
 
 Review comment:
   create a copy of the properties?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-05-01 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r280133941
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/stats/CatalogColumnStatisticsDataBoolean.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.catalog.stats;
+
+import java.util.Map;
+
+/**
+ * Column statistics value of boolean type.
+ */
+public class CatalogColumnStatisticsDataBoolean extends 
CatalogColumnStatisticsDataBase {
+   /**
+* number of "true" values.
+*/
+   private final long trueCount;
+
+   /**
+* number of "false" values.
+*/
+   private final long falseCount;
+
+   public CatalogColumnStatisticsDataBoolean(long trueCount, long 
falseCount, long nullCount) {
+   super(nullCount);
+   this.trueCount = trueCount;
+   this.falseCount = falseCount;
+   }
+
+   public CatalogColumnStatisticsDataBoolean(long trueCount, long 
falseCount, long nullCount, Map properties) {
+   super(nullCount, properties);
+   this.trueCount = trueCount;
+   this.falseCount = falseCount;
+   }
+
+   public Long getTrueCount() {
+   return trueCount;
+   }
+
+   public Long getFalseCount() {
+   return falseCount;
+   }
+
+   public CatalogColumnStatisticsDataBoolean copy() {
+   return new CatalogColumnStatisticsDataBoolean(trueCount, 
falseCount, getNullCount(), getProperties());
 
 Review comment:
   create a copy of the properties?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-30 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279839941
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/test/java/org/apache/flink/table/catalog/GenericInMemoryCatalogTest.java
 ##
 @@ -1104,6 +1104,43 @@ public void testDropFunction_FunctionNotExist_ignored() 
throws Exception {
catalog.dropDatabase(db1, false);
}
 
+   // -- statistics --
 
 Review comment:
   > Well, I think the added test is written differently as others, which gives 
such an expression. However, it covers all the added implementation methods, as 
you can see. What's missing, though, is some negative test cases, which I will 
create a followup JIRA for this.
   
   I think we need to have a consistent style for unit tests in this class, 
otherwise it's hard for other developers to figure out the two styles and 
extend/modify the testing code


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279592781
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws 
TableNotExistException, CatalogException;
+
+   /**
+* Get the column statistics of a table.
+*
+* @param tablePath path of the table
+* @return the column statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) 
throws TableNotExistException, CatalogException;
 
 Review comment:
   ok, no problem. Please also take the APIs in ReadableWritableCatalog into 
consideration


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279573210
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws 
TableNotExistException, CatalogException;
+
+   /**
+* Get the column statistics of a table.
+*
+* @param tablePath path of the table
+* @return the column statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) 
throws TableNotExistException, CatalogException;
+
+   /**
+* Get the statistics of a partition.
+*
+* @param tablePath path of the table
+* @param partitionSpec partition spec of the partition
+* @return the statistics of the given partition
+*
+* @throws PartitionNotExistException if the partition is not 
partitioned
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getPartitionStatistics(ObjectPath tablePath, 
CatalogPartitionSpec partitionSpec)
+   throws PartitionNotExistException, CatalogException;
+
+   /**
+* Get the column statistics of a partition.
+*
+* @param tablePath path of the table
+* @param partitionSpec partition spec of the partition
+* @return the column statistics of the given partition
+*
+* @throws PartitionNotExistException if the partition is not 
partitioned
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogColumnStatistics getPartitionColumnStatistics(ObjectPath 
tablePath, CatalogPartitionSpec partitionSpec)
 
 Review comment:
   ditto. What does this API try to accomplish? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279573127
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws 
TableNotExistException, CatalogException;
+
+   /**
+* Get the column statistics of a table.
+*
+* @param tablePath path of the table
+* @return the column statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) 
throws TableNotExistException, CatalogException;
+
+   /**
+* Get the statistics of a partition.
+*
+* @param tablePath path of the table
+* @param partitionSpec partition spec of the partition
+* @return the statistics of the given partition
+*
+* @throws PartitionNotExistException if the partition is not 
partitioned
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getPartitionStatistics(ObjectPath tablePath, 
CatalogPartitionSpec partitionSpec)
 
 Review comment:
   what happens if this API is called on a non-partitioned table?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279572332
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws 
TableNotExistException, CatalogException;
+
+   /**
+* Get the column statistics of a table.
+*
+* @param tablePath path of the table
+* @return the column statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) 
throws TableNotExistException, CatalogException;
+
+   /**
+* Get the statistics of a partition.
+*
+* @param tablePath path of the table
+* @param partitionSpec partition spec of the partition
+* @return the statistics of the given partition
+*
+* @throws PartitionNotExistException if the partition is not 
partitioned
 
 Review comment:
   the wordage "if the partition is not partitioned" needs some rework


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279571858
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws 
TableNotExistException, CatalogException;
 
 Review comment:
   What happens if this API is called on a partitioned table?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279572488
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws 
TableNotExistException, CatalogException;
+
+   /**
+* Get the column statistics of a table.
+*
+* @param tablePath path of the table
+* @return the column statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) 
throws TableNotExistException, CatalogException;
 
 Review comment:
   What does this API try to accomplish?  If this is for all columns of the 
table, should it return a list of `CatalogColumnStatistics`? If it's for a 
single column, should it take a column name?
   
   Besides, what happens if this API is called on a partitioned table?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279570916
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java
 ##
 @@ -224,6 +235,8 @@ public void dropTable(ObjectPath tablePath, boolean 
ignoreIfNotExists) throws Ta
tables.remove(tablePath);
 
 
 Review comment:
   Aren't line 238/239 only drops partition stats and partition column stats?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279568940
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java
 ##
 @@ -246,6 +259,16 @@ public void renameTable(ObjectPath tablePath, String 
newTableName, boolean ignor
if (partitions.containsKey(tablePath)) {
partitions.put(newPath, 
partitions.remove(tablePath));
}
+
 
 Review comment:
   Line # 263-271 only relocate table stats and table column stats, did I miss 
anything?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279569853
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table, null if its statistics 
don't exist or are unknown.
 
 Review comment:
   Yeah, it doesn't make sense to throw `stats not exist exception`. Just using 
the table API and an example to demonstrate the behavior difference.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279500369
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/test/java/org/apache/flink/table/catalog/GenericInMemoryCatalogTest.java
 ##
 @@ -1104,6 +1104,43 @@ public void testDropFunction_FunctionNotExist_ignored() 
throws Exception {
catalog.dropDatabase(db1, false);
}
 
+   // -- statistics --
+
+   @Test
+   public void testStatistics() throws Exception {
+   // Table related
+   catalog.createDatabase(db1, createDb(), false);
+   GenericCatalogTable table = createTable();
+   catalog.createTable(path1, table, false);
+   assertTrue(catalog.getTableStatistics(path1) == null);
 
 Review comment:
   nit: to follow the format in this file, can we have an empty line between 
assertions and code to be tested?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279523176
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table, null if its statistics 
don't exist or are unknown.
 
 Review comment:
   Stats should probably always exist. It's how Hive handles it. Also, the 
behavior of "returns null when the stats doesn't exist" doesn't seem to match 
behaviors of other metadata APIs and would confuse people, for example 
`getTable()` throws `TableNotExistException` when the request table doesn't 
exist. It's just the default, "unknown" stats will be with some default value, 
for example, rowCount, fileCount, totalSize will all be 0.
   
   This will also better explain the signatures of these APIs.
   
   What do you think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279506420
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/test/java/org/apache/flink/table/catalog/GenericInMemoryCatalogTest.java
 ##
 @@ -1104,6 +1104,43 @@ public void testDropFunction_FunctionNotExist_ignored() 
throws Exception {
catalog.dropDatabase(db1, false);
}
 
+   // -- statistics --
 
 Review comment:
   compared to the amount of unit tests for other APIs, the amount of UT for 
stats seems to be too few.
   
   I'm ok with either adding more tests in this PR, or creating an immediate 
followup PR dedicated to adding UT for stats related APIs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279500776
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/test/java/org/apache/flink/table/catalog/GenericInMemoryCatalogTest.java
 ##
 @@ -1104,6 +1104,43 @@ public void testDropFunction_FunctionNotExist_ignored() 
throws Exception {
catalog.dropDatabase(db1, false);
}
 
+   // -- statistics --
+
+   @Test
+   public void testStatistics() throws Exception {
+   // Table related
+   catalog.createDatabase(db1, createDb(), false);
+   GenericCatalogTable table = createTable();
+   catalog.createTable(path1, table, false);
+   assertTrue(catalog.getTableStatistics(path1) == null);
+   assertTrue(catalog.getTableColumnStatistics(path1) == null);
+   CatalogTableStatistics tableStatistics = new 
CatalogTableStatistics(5, 2, 100, 575);
+   catalog.alterTableStatistics(path1, tableStatistics, false);
+   CatalogTestUtil.checkEquals(tableStatistics, 
catalog.getTableStatistics(path1));
+   CatalogColumnStatistics columnStatistics = new 
CatalogColumnStatistics(20L, 2L, 40.5, 17, 45, 15);
+   catalog.alterTableColumnStatistics(path1, columnStatistics, 
false);
+   CatalogTestUtil.checkEquals(columnStatistics, 
catalog.getTableColumnStatistics(path1));
+
+   // Partition related
+   catalog.createDatabase(db2, createDb(), false);
+   GenericCatalogTable table2 = createPartitionedTable();
+   catalog.createTable(path2, table2, false);
+   CatalogPartitionSpec partitionSpec = createPartitionSpec();
+   catalog.createPartition(path2, partitionSpec, 
createPartition(), false);
+   assertTrue(catalog.getPartitionStatistics(path2, partitionSpec) 
== null);
+   assertTrue(catalog.getPartitionColumnStatistics(path2, 
partitionSpec) == null);
+   catalog.alterPartitionStatistics(path2, partitionSpec, 
tableStatistics, false);
+   CatalogTestUtil.checkEquals(tableStatistics, 
catalog.getPartitionStatistics(path2, partitionSpec));
+   catalog.alterPartitionColumnStatistics(path2, partitionSpec, 
columnStatistics, false);
+   CatalogTestUtil.checkEquals(columnStatistics, 
catalog.getPartitionColumnStatistics(path2, partitionSpec));
+
+   // Clean up
 
 Review comment:
   nit: we can remove these cleanups as they are already in close()


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279504357
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogColumnStatistics.java
 ##
 @@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.catalog;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Column statistics a non-partitioned table or a partition of a partitioned 
table.
+ */
+public class CatalogColumnStatistics {
+
+   /**
+* number of distinct values.
+*/
+   private final Long ndv;
+
+   /**
+* number of nulls.
+*/
+   private final Long nullCount;
+
+   /**
+* average length of column values.
+*/
+   private final Double avgLen;
+
+   /**
+* max length of column values.
+*/
+   private final Integer maxLen;
+
+   /**
+* max value of column values.
+*/
+   private final Number max;
 
 Review comment:
   do we need to make min/max type `Object`? For string and date columns, the 
min/max would be String and Date,
   
   it's always been a bit tricky on stats definition. Hive handles stats for 
different typed columns in a different way by having individual classes for 
different typed stats, e.g. DecimalColumnStatsData, StringColumnStatsData, and 
DateColumnStatsData.
   
   I'm ok to leave this as-is for now, but want to bring up the potential 
issues and the need for a solution soon.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279499373
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java
 ##
 @@ -246,6 +259,16 @@ public void renameTable(ObjectPath tablePath, String 
newTableName, boolean ignor
if (partitions.containsKey(tablePath)) {
partitions.put(newPath, 
partitions.remove(tablePath));
}
+
 
 Review comment:
   do we need to relocate partition stats and partition column stats too?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279498588
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java
 ##
 @@ -224,6 +235,8 @@ public void dropTable(ObjectPath tablePath, boolean 
ignoreIfNotExists) throws Ta
tables.remove(tablePath);
 
 
 Review comment:
   do we need to also remove table stats and column stats of the table?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services