[GitHub] [hive] kgyrtkirk commented on a change in pull request #787: HIVE-22239
kgyrtkirk commented on a change in pull request #787: HIVE-22239 URL: https://github.com/apache/hive/pull/787#discussion_r333889584 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java ## @@ -1946,6 +2022,8 @@ public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx, pred = jop.getConf().getResidualFilterExprs().get(0); } // evaluate filter expression and update statistics + final boolean uniformWithinRange = HiveConf.getBoolVar( Review comment: unused varibale (remove before committing) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk commented on a change in pull request #787: HIVE-22239
kgyrtkirk commented on a change in pull request #787: HIVE-22239 URL: https://github.com/apache/hive/pull/787#discussion_r333889646 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java ## @@ -2039,6 +2117,8 @@ public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx, pred = jop.getConf().getResidualFilterExprs().get(0); } // evaluate filter expression and update statistics + final boolean uniformWithinRange = HiveConf.getBoolVar( Review comment: unused varibale (remove before committing) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk commented on a change in pull request #787: HIVE-22239
kgyrtkirk commented on a change in pull request #787: HIVE-22239 URL: https://github.com/apache/hive/pull/787#discussion_r332382671 ## File path: ql/src/test/results/clientpositive/llap/bucket_map_join_tez2.q.out ## @@ -759,50 +759,56 @@ STAGE PLANS: Statistics: Num rows: 500 Data size: 2000 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (key > 2) (type: boolean) -Statistics: Num rows: 166 Data size: 664 Basic stats: COMPLETE Column stats: COMPLETE +Statistics: Num rows: 498 Data size: 1992 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: key (type: int) outputColumnNames: _col0 - Statistics: Num rows: 166 Data size: 664 Basic stats: COMPLETE Column stats: COMPLETE - Map Join Operator -condition map: - Inner Join 0 to 1 -keys: - 0 _col0 (type: int) - 1 _col0 (type: int) -outputColumnNames: _col0, _col1 -input vertices: - 1 Map 2 -Statistics: Num rows: 272 Data size: 2176 Basic stats: COMPLETE Column stats: COMPLETE -File Output Operator - compressed: false - Statistics: Num rows: 272 Data size: 2176 Basic stats: COMPLETE Column stats: COMPLETE - table: - input format: org.apache.hadoop.mapred.SequenceFileInputFormat - output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat - serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + Statistics: Num rows: 498 Data size: 1992 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator +key expressions: _col0 (type: int) +sort order: + +Map-reduce partition columns: _col0 (type: int) +Statistics: Num rows: 498 Data size: 1992 Basic stats: COMPLETE Column stats: COMPLETE Review comment: disable enhancement; seems to be interfering with the test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk commented on a change in pull request #787: HIVE-22239
kgyrtkirk commented on a change in pull request #787: HIVE-22239 URL: https://github.com/apache/hive/pull/787#discussion_r332352613 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java ## @@ -276,9 +279,11 @@ public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx, ExprNodeDesc pred = fop.getConf().getPredicate(); // evaluate filter expression and update statistics +final boolean uniformWithinRange = HiveConf.getBoolVar( +aspCtx.getConf(), HiveConf.ConfVars.HIVE_STATS_RANGE_SELECTIVITY_UNIFORM_DISTRIBUTION); aspCtx.clearAffectedColumns(); long newNumRows = evaluateExpression(parentStats, pred, aspCtx, -neededCols, fop, parentStats.getNumRows()); +neededCols, fop, parentStats.getNumRows(), uniformWithinRange); Review comment: I don't think we should pass a boolean here; as aspCtx is there This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk commented on a change in pull request #787: HIVE-22239
kgyrtkirk commented on a change in pull request #787: HIVE-22239 URL: https://github.com/apache/hive/pull/787#discussion_r332365302 ## File path: ql/src/test/results/clientpositive/llap/retry_failure_stat_changes.q.out ## @@ -139,25 +139,25 @@ Stage-0 PARTITION_ONLY_SHUFFLE [RS_12] Group By Operator [GBY_11] (rows=1/1 width=8) Output:["_col0"],aggregations:["sum(_col0)"] - Select Operator [SEL_9] (rows=1/3 width=8) + Select Operator [SEL_9] (rows=4/3 width=8) Output:["_col0"] -Merge Join Operator [MERGEJOIN_30] (rows=1/3 width=8) +Merge Join Operator [MERGEJOIN_30] (rows=4/3 width=8) Conds:RS_6._col0=RS_7._col0(Inner),Output:["_col0","_col1"] <-Map 1 [SIMPLE_EDGE] llap SHUFFLE [RS_6] PartitionCols:_col0 -Select Operator [SEL_2] (rows=1/5 width=4) +Select Operator [SEL_2] (rows=7/5 width=4) Output:["_col0"] - Filter Operator [FIL_18] (rows=1/5 width=4) + Filter Operator [FIL_18] (rows=7/5 width=4) Review comment: there was a "by-design" underestimation in this testcase...now we have good estimate :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk commented on a change in pull request #787: HIVE-22239
kgyrtkirk commented on a change in pull request #787: HIVE-22239 URL: https://github.com/apache/hive/pull/787#discussion_r332368069 ## File path: ql/src/test/results/clientpositive/llap/semijoin_reddedup.q.out ## @@ -258,9 +258,12 @@ STAGE PLANS: Tez A masked pattern was here Edges: +Map 1 <- Reducer 10 (BROADCAST_EDGE) +Map 11 <- Reducer 10 (BROADCAST_EDGE) +Reducer 10 <- Reducer 9 (CUSTOM_SIMPLE_EDGE) Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE) Reducer 3 <- Reducer 2 (SIMPLE_EDGE), Reducer 9 (SIMPLE_EDGE) -Reducer 4 <- Map 10 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE) +Reducer 4 <- Map 11 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE) Review comment: I'm not 100% sure what's the goal of this test; but I think we should probably disable the uniform distribution for it to retain its original goal This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk commented on a change in pull request #787: HIVE-22239
kgyrtkirk commented on a change in pull request #787: HIVE-22239 URL: https://github.com/apache/hive/pull/787#discussion_r332357556 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java ## @@ -856,8 +856,15 @@ public static ColStatistics getColStatistics(ColumnStatisticsObj cso, String tab } else if (colTypeLowerCase.equals(serdeConstants.BINARY_TYPE_NAME)) { cs.setAvgColLen(csd.getBinaryStats().getAvgColLen()); cs.setNumNulls(csd.getBinaryStats().getNumNulls()); -} else if (colTypeLowerCase.equals(serdeConstants.TIMESTAMP_TYPE_NAME) || -colTypeLowerCase.equals(serdeConstants.TIMESTAMPLOCALTZ_TYPE_NAME)) { +} else if (colTypeLowerCase.equals(serdeConstants.TIMESTAMP_TYPE_NAME)) { + cs.setAvgColLen(JavaDataModel.get().lengthOfTimestamp()); + cs.setNumNulls(csd.getTimestampStats().getNumNulls()); + Long lowVal = (csd.getTimestampStats().getLowValue() != null) ? csd.getTimestampStats().getLowValue() + .getSecondsSinceEpoch() : null; + Long highVal = (csd.getTimestampStats().getHighValue() != null) ? csd.getTimestampStats().getHighValue() + .getSecondsSinceEpoch() : null; + cs.setRange(lowVal, highVal); Review comment: I don't feel this fortunatewe do know the low/high value but we faltten it to some number...instead of this we would need properly typed ranges - for decimal we already throw away all our knowledge if it runs beyond long limits the current changes follow the existing traditions - if we decide to change that ; it should be done in a separate ticket This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk commented on a change in pull request #787: HIVE-22239
kgyrtkirk commented on a change in pull request #787: HIVE-22239 URL: https://github.com/apache/hive/pull/787#discussion_r332358389 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java ## @@ -935,8 +942,11 @@ else if(colTypeLowerCase.equals(serdeConstants.SMALLINT_TYPE_NAME)){ cs.setNumTrues(Math.max(1, numRows/2)); cs.setNumFalses(Math.max(1, numRows/2)); cs.setAvgColLen(JavaDataModel.get().primitive1()); -} else if (colTypeLowerCase.equals(serdeConstants.TIMESTAMP_TYPE_NAME) || -colTypeLowerCase.equals(serdeConstants.TIMESTAMPLOCALTZ_TYPE_NAME)) { +} else if (colTypeLowerCase.equals(serdeConstants.TIMESTAMP_TYPE_NAME)) { + cs.setAvgColLen(JavaDataModel.get().lengthOfTimestamp()); + // epoch, seconds since epoch + cs.setRange(0, 2177452799L); Review comment: not 100% sure about this; but I think the minvalue should not be 0; or we may optimize away pre 1970 filter conditions and why it's exactly 217745799 :) I might be missing some context... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk commented on a change in pull request #787: HIVE-22239
kgyrtkirk commented on a change in pull request #787: HIVE-22239 URL: https://github.com/apache/hive/pull/787#discussion_r332373527 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/TimestampColumnStatsAggregator.java ## @@ -0,0 +1,358 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hadoop.hive.metastore.columnstats.aggr; + +import java.util.Collections; +import java.util.Comparator; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import org.apache.hadoop.hive.common.ndv.NumDistinctValueEstimator; +import org.apache.hadoop.hive.common.ndv.NumDistinctValueEstimatorFactory; +import org.apache.hadoop.hive.metastore.api.ColumnStatisticsData; +import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj; +import org.apache.hadoop.hive.metastore.api.Timestamp; +import org.apache.hadoop.hive.metastore.api.TimestampColumnStatsData; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.columnstats.cache.TimestampColumnStatsDataInspector; +import org.apache.hadoop.hive.metastore.utils.MetaStoreServerUtils.ColStatsObjWithSourceInfo; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import static org.apache.hadoop.hive.metastore.columnstats.ColumnsStatsUtils.timestampInspectorFromStats; + +public class TimestampColumnStatsAggregator extends ColumnStatsAggregator implements +IExtrapolatePartStatus { + + private static final Logger LOG = LoggerFactory.getLogger(TimestampColumnStatsAggregator.class); + + @Override + public ColumnStatisticsObj aggregate(List colStatsWithSourceInfo, +List partNames, boolean areAllPartsFound) throws MetaException { +ColumnStatisticsObj statsObj = null; +String colType = null; +String colName = null; +// check if all the ColumnStatisticsObjs contain stats and all the ndv are +// bitvectors +boolean doAllPartitionContainStats = partNames.size() == colStatsWithSourceInfo.size(); +NumDistinctValueEstimator ndvEstimator = null; +for (ColStatsObjWithSourceInfo csp : colStatsWithSourceInfo) { + ColumnStatisticsObj cso = csp.getColStatsObj(); + if (statsObj == null) { +colName = cso.getColName(); +colType = cso.getColType(); +statsObj = ColumnStatsAggregatorFactory.newColumnStaticsObj(colName, colType, +cso.getStatsData().getSetField()); +LOG.trace("doAllPartitionContainStats for column: {} is: {}", colName, doAllPartitionContainStats); + } + TimestampColumnStatsDataInspector timestampColumnStats = timestampInspectorFromStats(cso); + + if (timestampColumnStats.getNdvEstimator() == null) { +ndvEstimator = null; +break; + } else { +// check if all of the bit vectors can merge +NumDistinctValueEstimator estimator = timestampColumnStats.getNdvEstimator(); +if (ndvEstimator == null) { + ndvEstimator = estimator; +} else { + if (ndvEstimator.canMerge(estimator)) { +continue; + } else { +ndvEstimator = null; +break; + } +} + } +} +if (ndvEstimator != null) { + ndvEstimator = NumDistinctValueEstimatorFactory + .getEmptyNumDistinctValueEstimator(ndvEstimator); +} +LOG.debug("all of the bit vectors can merge for " + colName + " is " + (ndvEstimator != null)); +ColumnStatisticsData columnStatisticsData = new ColumnStatisticsData(); +if (doAllPartitionContainStats || colStatsWithSourceInfo.size() < 2) { + TimestampColumnStatsDataInspector aggregateData = null; + long lowerBound = 0; + long higherBound = 0; + double densityAvgSum = 0.0; + for (ColStatsObjWithSourceInfo csp : colStatsWithSourceInfo) { +ColumnStatisticsObj cso = csp.getColStatsObj(); +TimestampColumnStatsDataInspector newData = timestampInspectorFromStats(cso); +higherBound += newData.getNumDVs(); +densityAvgSum += (diff(newData.getHighValue(), newData.getLowValue())) +/
[GitHub] [hive] kgyrtkirk commented on a change in pull request #787: HIVE-22239
kgyrtkirk commented on a change in pull request #787: HIVE-22239 URL: https://github.com/apache/hive/pull/787#discussion_r332377711 ## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ## @@ -562,14 +562,27 @@ struct DateColumnStatsData { 5: optional binary bitVectors } +struct Timestamp { +1: required i64 secondsSinceEpoch Review comment: I'm afraid that there will be a downside that we are throwing away precision - and because of that we may get into some troubles later: If we do truncate to seconds; we may not be able to extend the timestamp logic to the stats optimizer - as we are not working with the real values. consider the following ```sql select '2019-11-11 11:11:11.400' < '2019-11-11 11:11:11.300' ``` if we round to seconds; consider that the left side comes from a table - a the columns maxvalue; the stats optimizer could deduce a "true" for the above Would it complicate things much to use a non-rounded timestamp - and retain miliseconds/microsendond as well ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk commented on a change in pull request #787: HIVE-22239
kgyrtkirk commented on a change in pull request #787: HIVE-22239 URL: https://github.com/apache/hive/pull/787#discussion_r332352109 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -2537,6 +2537,11 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal "When estimating output rows for a join involving multiple columns, the default behavior assumes" + "the columns are independent. Setting this flag to true will cause the estimator to assume" + "the columns are correlated."), + HIVE_STATS_RANGE_SELECTIVITY_UNIFORM_DISTRIBUTION("hive.stats.filter.range.uniform", true, +"When estimating output rows from a condition, if a range predicate is applied over a column and the" + +"minimum and maximum values for that column are available, assume uniform distribution of values" + +"accross that range and scales number of rows proportionally. If this is set to false, default" + +"selectivity value is used."), Review comment: small thing: could you add spaces at the end; because right now it contains words like "theminimum" and "valuesacross" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk commented on a change in pull request #787: HIVE-22239
kgyrtkirk commented on a change in pull request #787: HIVE-22239 URL: https://github.com/apache/hive/pull/787#discussion_r332355776 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java ## @@ -856,8 +856,15 @@ public static ColStatistics getColStatistics(ColumnStatisticsObj cso, String tab } else if (colTypeLowerCase.equals(serdeConstants.BINARY_TYPE_NAME)) { cs.setAvgColLen(csd.getBinaryStats().getAvgColLen()); cs.setNumNulls(csd.getBinaryStats().getNumNulls()); -} else if (colTypeLowerCase.equals(serdeConstants.TIMESTAMP_TYPE_NAME) || -colTypeLowerCase.equals(serdeConstants.TIMESTAMPLOCALTZ_TYPE_NAME)) { +} else if (colTypeLowerCase.equals(serdeConstants.TIMESTAMP_TYPE_NAME)) { Review comment: I start to feel like having these 2 changes separetly from eachother might have been good This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk commented on a change in pull request #787: HIVE-22239
kgyrtkirk commented on a change in pull request #787: HIVE-22239 URL: https://github.com/apache/hive/pull/787#discussion_r332353993 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java ## @@ -316,7 +321,7 @@ public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx, protected long evaluateExpression(Statistics stats, ExprNodeDesc pred, AnnotateStatsProcCtx aspCtx, List neededCols, -Operator op, long currNumRows) throws SemanticException { +Operator op, long currNumRows, boolean uniformWithinRange) throws SemanticException { Review comment: this boolean is not used in this function ; but aspCtx can be used to obtain it - I would suggest to either: * leave the function signature as is and get the boolean when its needed from conf * this class is constructed and used to optimize a single query - so we may pass the conf on consturction and create a finalized field from this info This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk commented on a change in pull request #787: HIVE-22239
kgyrtkirk commented on a change in pull request #787: HIVE-22239 URL: https://github.com/apache/hive/pull/787#discussion_r332383859 ## File path: ql/src/test/results/clientpositive/llap/subquery_select.q.out ## @@ -3918,14 +3918,14 @@ STAGE PLANS: Statistics: Num rows: 26 Data size: 208 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: p_partkey BETWEEN 1 AND 2 (type: boolean) -Statistics: Num rows: 8 Data size: 64 Basic stats: COMPLETE Column stats: COMPLETE +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: p_size (type: int) outputColumnNames: p_size - Statistics: Num rows: 8 Data size: 64 Basic stats: COMPLETE Column stats: COMPLETE + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator aggregations: max(p_size) -minReductionHashAggr: 0.875 +minReductionHashAggr: 0.0 Review comment: the computed range is most likely empty... these changes suggest to me that something is not entirely rightis this expected? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk commented on a change in pull request #787: HIVE-22239
kgyrtkirk commented on a change in pull request #787: HIVE-22239 URL: https://github.com/apache/hive/pull/787#discussion_r332355126 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java ## @@ -967,13 +979,23 @@ private long evaluateComparator(Statistics stats, AnnotateStatsProcCtx aspCtx, E if (minValue > value) { return 0; } + if (uniformWithinRange) { +// Assuming uniform distribution, we can use the range to calculate +// new estimate for the number of rows +return Math.round(((double) (value - minValue) / (maxValue - minValue)) * numRows); Review comment: I think we will probably hit a divide by zero here when max=min; I don't see any preceeding conditionals covering for that (however there can be...) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org