Re: Review Request 31178: Discrepancy in cardinality estimates between partitioned and un-partitioned tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31178/ --- (Updated April 8, 2015, 12:40 a.m.) Review request for hive and Ashutosh Chauhan. Changes --- Address test failures. Repository: hive-git Description --- The discrepancy is because NDV calculation for a partitioned table assumes that the NDV range is contained within each partition and is calculates as select max(NUM_DISTINCTS) from PART_COL_STATS” . This is problematic for columns like ticket number which are naturally increasing with the partitioned date column ss_sold_date_sk. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2b8280e data/files/extrapolate_stats_partial_ndv.txt PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java 74f1b01 metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java 7fc04f1 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java ba27f10 metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 75005aa metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 475883b ql/src/test/queries/clientpositive/extrapolate_part_stats_partial_ndv.q PRE-CREATION ql/src/test/results/clientpositive/extrapolate_part_stats_partial_ndv.q.out PRE-CREATION Diff: https://reviews.apache.org/r/31178/diff/ Testing --- Thanks, pengcheng xiong
Re: Review Request 31178: Discrepancy in cardinality estimates between partitioned and un-partitioned tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31178/ --- (Updated April 6, 2015, 9:21 p.m.) Review request for hive and Ashutosh Chauhan. Repository: hive-git Description --- The discrepancy is because NDV calculation for a partitioned table assumes that the NDV range is contained within each partition and is calculates as select max(NUM_DISTINCTS) from PART_COL_STATS” . This is problematic for columns like ticket number which are naturally increasing with the partitioned date column ss_sold_date_sk. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java cc16c38 data/files/extrapolate_stats_partial_ndv.txt PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java 74f1b01 metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java 7fc04f1 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java d404789 metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 6956e3b metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 475883b ql/src/test/queries/clientpositive/extrapolate_part_stats_partial_ndv.q PRE-CREATION ql/src/test/results/clientpositive/extrapolate_part_stats_partial_ndv.q.out PRE-CREATION Diff: https://reviews.apache.org/r/31178/diff/ Testing --- Thanks, pengcheng xiong
Review Request 31178: Discrepancy in cardinality estimates between partitioned and un-partitioned tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31178/ --- Review request for hive and Ashutosh Chauhan. Repository: hive-git Description --- The discrepancy is because NDV calculation for a partitioned table assumes that the NDV range is contained within each partition and is calculates as select max(NUM_DISTINCTS) from PART_COL_STATS” . This is problematic for columns like ticket number which are naturally increasing with the partitioned date column ss_sold_date_sk. Diffs - data/files/extrapolate_stats_partial_ndv.txt PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java 74f1b01 metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java 7fc04f1 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 574141c metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 475883b ql/src/test/queries/clientpositive/extrapolate_part_stats_full.q 00c9b53 ql/src/test/queries/clientpositive/extrapolate_part_stats_partial.q 8ae9a90 ql/src/test/queries/clientpositive/extrapolate_part_stats_partial_ndv.q PRE-CREATION ql/src/test/results/clientpositive/extrapolate_part_stats_full.q.out 0f6b15d ql/src/test/results/clientpositive/extrapolate_part_stats_partial.q.out 1fdeb90 ql/src/test/results/clientpositive/extrapolate_part_stats_partial_ndv.q.out PRE-CREATION Diff: https://reviews.apache.org/r/31178/diff/ Testing --- Thanks, pengcheng xiong