bharath v created IMPALA-7560: --------------------------------- Summary: Better selectivity estimate for != (not equals) binary predicate Key: IMPALA-7560 URL: https://issues.apache.org/jira/browse/IMPALA-7560 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 2.12.0, Impala 2.10.0, Impala 2.9.0, Impala 2.8.0, Impala 2.13.0 Reporter: bharath v
Currently we use the default selectivity estimate for any binary predicate with op other than EQ / NON_DISTINCT. {noformat} // Determine selectivity // TODO: Compute selectivity for nested predicates. // TODO: Improve estimation using histograms. Reference<SlotRef> slotRefRef = new Reference<SlotRef>(); if ((op_ == Operator.EQ || op_ == Operator.NOT_DISTINCT) && isSingleColumnPredicate(slotRefRef, null)) { long distinctValues = slotRefRef.getRef().getNumDistinctValues(); if (distinctValues > 0) { selectivity_ = 1.0 / distinctValues; selectivity_ = Math.max(0, Math.min(1, selectivity_)); } } {noformat} This can give very conservative estimates. For example: {noformat} [localhost:21000] tpch> select * from nation where n_regionkey != 1; [localhost:21000] tpch> summary; +--------------+--------+----------+----------+-------+------------+-----------+---------------+-------------+ | Operator | #Hosts | Avg Time | Max Time | *#Rows* | *Est. #Rows* | Peak Mem | Est. Peak Mem | Detail | +--------------+--------+----------+----------+-------+------------+-----------+---------------+-------------+ | 00:SCAN HDFS | 1 | 3.32ms | 3.32ms | *20* | *3* | 143.00 KB | 16.00 MB | tpch.nation | +--------------+--------+----------+----------+-------+------------+-----------+---------------+-------------+ [localhost:21000] tpch> {noformat} Ideally we could've inversed the selecitivity to 4/5 (=1 - 1/5) that can give better estimate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org