[GitHub] [flink] bowenli86 commented on issue #10380: [FLINK-14662]Distinguish unknown CatalogTableStatistics and zero

2019-12-02 Thread GitBox
bowenli86 commented on issue #10380: [FLINK-14662]Distinguish unknown 
CatalogTableStatistics and zero
URL: https://github.com/apache/flink/pull/10380#issuecomment-561036955
 
 
   > My guess would be this is not the case with unknown table stats, but 
indeed have table stats that shows the row count is 0.
   > @zjuwangg please double check this.
   
   that comes down to how to define `unknown` table stats. From my perspective, 
value of the unknown table stats is the initial default value, which remains 
unchanged before running any `analyze` commands. E.g. if we just append rows or 
partitions to a table in Hive, `totalSize=0, numRows=0, rawDataSize=0` would 
remain unchanged.
   
   and have we checked how Hive behaves when having `totalSize=-1, numRows=-1, 
rawDataSize=-1`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on issue #10380: [FLINK-14662]Distinguish unknown CatalogTableStatistics and zero

2019-12-02 Thread GitBox
bowenli86 commented on issue #10380: [FLINK-14662]Distinguish unknown 
CatalogTableStatistics and zero
URL: https://github.com/apache/flink/pull/10380#issuecomment-560996150
 
 
   @zjuwangg you sure? Below is an example table I created, and note that 
`parameters:{totalSize=0, numRows=0, rawDataSize=0, 
COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=0, 
transient_lastDdlTime=1575348162}`
   ```
   hive> create table x (name String);
   OK
   Time taken: 1.736 seconds
   hive> describe extended x;
   OK
   name string
   
   Detailed Table Information   Table(tableName:x, dbName:default, 
owner:bowen.li, createTime:1575348162, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null)], 
location:hdfs://localhost:9000/user/hive/warehouse/x, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
partitionKeys:[], parameters:{totalSize=0, numRows=0, rawDataSize=0, 
COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=0, 
transient_lastDdlTime=1575348162}, viewOriginalText:null, 
viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false)
   Time taken: 0.088 seconds, Fetched: 3 row(s)
   ```
   
   @KurtYoung 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services