Re: Review Request 31178: Discrepancy in cardinality estimates between partitioned and un-partitioned tables

2015-04-07 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31178/
---

(Updated April 8, 2015, 12:40 a.m.)


Review request for hive and Ashutosh Chauhan.


Changes
---

Address test failures.


Repository: hive-git


Description
---

The discrepancy is because NDV calculation for a partitioned table assumes that 
the NDV range is contained within each partition and is calculates as select 
max(NUM_DISTINCTS) from PART_COL_STATS” .
This is problematic for columns like ticket number which are naturally 
increasing with the partitioned date column ss_sold_date_sk.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2b8280e 
  data/files/extrapolate_stats_partial_ndv.txt PRE-CREATION 
  
metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java 
74f1b01 
  
metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java
 7fc04f1 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
ba27f10 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 75005aa 
  metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 
475883b 
  ql/src/test/queries/clientpositive/extrapolate_part_stats_partial_ndv.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/extrapolate_part_stats_partial_ndv.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/31178/diff/


Testing
---


Thanks,

pengcheng xiong



Re: Review Request 31178: Discrepancy in cardinality estimates between partitioned and un-partitioned tables

2015-04-06 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31178/
---

(Updated April 6, 2015, 9:21 p.m.)


Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

The discrepancy is because NDV calculation for a partitioned table assumes that 
the NDV range is contained within each partition and is calculates as select 
max(NUM_DISTINCTS) from PART_COL_STATS” .
This is problematic for columns like ticket number which are naturally 
increasing with the partitioned date column ss_sold_date_sk.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java cc16c38 
  data/files/extrapolate_stats_partial_ndv.txt PRE-CREATION 
  
metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java 
74f1b01 
  
metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java
 7fc04f1 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
d404789 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 6956e3b 
  metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 
475883b 
  ql/src/test/queries/clientpositive/extrapolate_part_stats_partial_ndv.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/extrapolate_part_stats_partial_ndv.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/31178/diff/


Testing
---


Thanks,

pengcheng xiong



Review Request 31178: Discrepancy in cardinality estimates between partitioned and un-partitioned tables

2015-02-18 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31178/
---

Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

The discrepancy is because NDV calculation for a partitioned table assumes that 
the NDV range is contained within each partition and is calculates as select 
max(NUM_DISTINCTS) from PART_COL_STATS” .
This is problematic for columns like ticket number which are naturally 
increasing with the partitioned date column ss_sold_date_sk.


Diffs
-

  data/files/extrapolate_stats_partial_ndv.txt PRE-CREATION 
  
metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java 
74f1b01 
  
metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java
 7fc04f1 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
574141c 
  metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 
475883b 
  ql/src/test/queries/clientpositive/extrapolate_part_stats_full.q 00c9b53 
  ql/src/test/queries/clientpositive/extrapolate_part_stats_partial.q 8ae9a90 
  ql/src/test/queries/clientpositive/extrapolate_part_stats_partial_ndv.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/extrapolate_part_stats_full.q.out 0f6b15d 
  ql/src/test/results/clientpositive/extrapolate_part_stats_partial.q.out 
1fdeb90 
  ql/src/test/results/clientpositive/extrapolate_part_stats_partial_ndv.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/31178/diff/


Testing
---


Thanks,

pengcheng xiong