[GitHub] [hive] jcamachor commented on a change in pull request #787: HIVE-22239

2019-10-11 Thread GitBox
jcamachor commented on a change in pull request #787: HIVE-22239
URL: https://github.com/apache/hive/pull/787#discussion_r334016828
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
 ##
 @@ -944,7 +948,7 @@ else 
if(colTypeLowerCase.equals(serdeConstants.SMALLINT_TYPE_NAME)){
 } else if (colTypeLowerCase.equals(serdeConstants.DATE_TYPE_NAME)) {
   cs.setAvgColLen(JavaDataModel.get().lengthOfDate());
   // epoch, days since epoch
-  cs.setRange(0, 25201);
+  cs.setRange(DATE_RANGE_LOWER_LIMIT, DATE_RANGE_UPPER_LIMIT);
 
 Review comment:
   Yeah, this is a heuristic... No matter what you do, you will always get it 
wrong in some cases. I guess the idea is to target the most common case. The 
solution to overestimation/underestimation is to compute column stats as you 
mentioned, we do not want to let user tune this too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] jcamachor commented on a change in pull request #787: HIVE-22239

2019-10-11 Thread GitBox
jcamachor commented on a change in pull request #787: HIVE-22239
URL: https://github.com/apache/hive/pull/787#discussion_r334016828
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
 ##
 @@ -944,7 +948,7 @@ else 
if(colTypeLowerCase.equals(serdeConstants.SMALLINT_TYPE_NAME)){
 } else if (colTypeLowerCase.equals(serdeConstants.DATE_TYPE_NAME)) {
   cs.setAvgColLen(JavaDataModel.get().lengthOfDate());
   // epoch, days since epoch
-  cs.setRange(0, 25201);
+  cs.setRange(DATE_RANGE_LOWER_LIMIT, DATE_RANGE_UPPER_LIMIT);
 
 Review comment:
   Yeah, this is a heuristic... No matter what you do, you will always get it 
wrong in some cases. I guess the idea is the target the most common case. The 
solution to overestimation/underestimation is to compute column stats as you 
mentioned, we do not want to let user tune this too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] jcamachor commented on a change in pull request #787: HIVE-22239

2019-10-09 Thread GitBox
jcamachor commented on a change in pull request #787: HIVE-22239
URL: https://github.com/apache/hive/pull/787#discussion_r333070056
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
 ##
 @@ -967,13 +979,23 @@ private long evaluateComparator(Statistics stats, 
AnnotateStatsProcCtx aspCtx, E
   if (minValue > value) {
 return 0;
   }
+  if (uniformWithinRange) {
+// Assuming uniform distribution, we can use the range to 
calculate
+// new estimate for the number of rows
+return Math.round(((double) (value - minValue) / (maxValue - 
minValue)) * numRows);
 
 Review comment:
   Good catch. I fixed that in latest patch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] jcamachor commented on a change in pull request #787: HIVE-22239

2019-10-09 Thread GitBox
jcamachor commented on a change in pull request #787: HIVE-22239
URL: https://github.com/apache/hive/pull/787#discussion_r333032465
 
 

 ##
 File path: ql/src/test/results/clientpositive/llap/subquery_select.q.out
 ##
 @@ -3918,14 +3918,14 @@ STAGE PLANS:
   Statistics: Num rows: 26 Data size: 208 Basic stats: 
COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: p_partkey BETWEEN 1 AND 2 (type: 
boolean)
-Statistics: Num rows: 8 Data size: 64 Basic stats: 
COMPLETE Column stats: COMPLETE
+Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
 Select Operator
   expressions: p_size (type: int)
   outputColumnNames: p_size
-  Statistics: Num rows: 8 Data size: 64 Basic stats: 
COMPLETE Column stats: COMPLETE
+  Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
   Group By Operator
 aggregations: max(p_size)
-minReductionHashAggr: 0.875
+minReductionHashAggr: 0.0
 
 Review comment:
   Range is `15103`-`195606` for `p_partkey` column, out of 26 rows. Hence, the 
estimate of `1` seems right.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] jcamachor commented on a change in pull request #787: HIVE-22239

2019-10-09 Thread GitBox
jcamachor commented on a change in pull request #787: HIVE-22239
URL: https://github.com/apache/hive/pull/787#discussion_r333003637
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
 ##
 @@ -856,8 +856,15 @@ public static ColStatistics 
getColStatistics(ColumnStatisticsObj cso, String tab
 } else if (colTypeLowerCase.equals(serdeConstants.BINARY_TYPE_NAME)) {
   cs.setAvgColLen(csd.getBinaryStats().getAvgColLen());
   cs.setNumNulls(csd.getBinaryStats().getNumNulls());
-} else if (colTypeLowerCase.equals(serdeConstants.TIMESTAMP_TYPE_NAME) ||
-colTypeLowerCase.equals(serdeConstants.TIMESTAMPLOCALTZ_TYPE_NAME)) {
+} else if (colTypeLowerCase.equals(serdeConstants.TIMESTAMP_TYPE_NAME)) {
 
 Review comment:
   I think it is a good idea and we are not in a hurry... Let's do the right 
thing.
   I have created https://issues.apache.org/jira/browse/HIVE-22311.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] jcamachor commented on a change in pull request #787: HIVE-22239

2019-10-09 Thread GitBox
jcamachor commented on a change in pull request #787: HIVE-22239
URL: https://github.com/apache/hive/pull/787#discussion_r332994124
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/TimestampColumnStatsAggregator.java
 ##
 @@ -0,0 +1,358 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.hive.metastore.columnstats.aggr;
+
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import org.apache.hadoop.hive.common.ndv.NumDistinctValueEstimator;
+import org.apache.hadoop.hive.common.ndv.NumDistinctValueEstimatorFactory;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsData;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj;
+import org.apache.hadoop.hive.metastore.api.Timestamp;
+import org.apache.hadoop.hive.metastore.api.TimestampColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import 
org.apache.hadoop.hive.metastore.columnstats.cache.TimestampColumnStatsDataInspector;
+import 
org.apache.hadoop.hive.metastore.utils.MetaStoreServerUtils.ColStatsObjWithSourceInfo;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static 
org.apache.hadoop.hive.metastore.columnstats.ColumnsStatsUtils.timestampInspectorFromStats;
+
+public class TimestampColumnStatsAggregator extends ColumnStatsAggregator 
implements
+IExtrapolatePartStatus {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(TimestampColumnStatsAggregator.class);
+
+  @Override
+  public ColumnStatisticsObj aggregate(List 
colStatsWithSourceInfo,
+List partNames, boolean areAllPartsFound) throws 
MetaException {
+ColumnStatisticsObj statsObj = null;
+String colType = null;
+String colName = null;
+// check if all the ColumnStatisticsObjs contain stats and all the ndv are
+// bitvectors
+boolean doAllPartitionContainStats = partNames.size() == 
colStatsWithSourceInfo.size();
+NumDistinctValueEstimator ndvEstimator = null;
+for (ColStatsObjWithSourceInfo csp : colStatsWithSourceInfo) {
+  ColumnStatisticsObj cso = csp.getColStatsObj();
+  if (statsObj == null) {
+colName = cso.getColName();
+colType = cso.getColType();
+statsObj = ColumnStatsAggregatorFactory.newColumnStaticsObj(colName, 
colType,
+cso.getStatsData().getSetField());
+LOG.trace("doAllPartitionContainStats for column: {} is: {}", colName, 
doAllPartitionContainStats);
+  }
+  TimestampColumnStatsDataInspector timestampColumnStats = 
timestampInspectorFromStats(cso);
+
+  if (timestampColumnStats.getNdvEstimator() == null) {
+ndvEstimator = null;
+break;
+  } else {
+// check if all of the bit vectors can merge
+NumDistinctValueEstimator estimator = 
timestampColumnStats.getNdvEstimator();
+if (ndvEstimator == null) {
+  ndvEstimator = estimator;
+} else {
+  if (ndvEstimator.canMerge(estimator)) {
+continue;
+  } else {
+ndvEstimator = null;
+break;
+  }
+}
+  }
+}
+if (ndvEstimator != null) {
+  ndvEstimator = NumDistinctValueEstimatorFactory
+  .getEmptyNumDistinctValueEstimator(ndvEstimator);
+}
+LOG.debug("all of the bit vectors can merge for " + colName + " is " + 
(ndvEstimator != null));
+ColumnStatisticsData columnStatisticsData = new ColumnStatisticsData();
+if (doAllPartitionContainStats || colStatsWithSourceInfo.size() < 2) {
+  TimestampColumnStatsDataInspector aggregateData = null;
+  long lowerBound = 0;
+  long higherBound = 0;
+  double densityAvgSum = 0.0;
+  for (ColStatsObjWithSourceInfo csp : colStatsWithSourceInfo) {
+ColumnStatisticsObj cso = csp.getColStatsObj();
+TimestampColumnStatsDataInspector newData = 
timestampInspectorFromStats(cso);
+higherBound += newData.getNumDVs();
+densityAvgSum += (diff(newData.getHighValue(), newData.getLowValue()))
+/ 

[GitHub] [hive] jcamachor commented on a change in pull request #787: HIVE-22239

2019-10-09 Thread GitBox
jcamachor commented on a change in pull request #787: HIVE-22239
URL: https://github.com/apache/hive/pull/787#discussion_r332991505
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
 ##
 @@ -562,14 +562,27 @@ struct DateColumnStatsData {
 5: optional binary bitVectors
 }
 
+struct Timestamp {
+1: required i64 secondsSinceEpoch
 
 Review comment:
   I do not think it is too complicated but it will imply changes in the 
metastore tables that store these values too. I did not want to change the 
internal representation of column stats for timestamp type in this patch, that 
is why I introduced the type but did not change the internal representation 
based on seconds. I created https://issues.apache.org/jira/browse/HIVE-22309.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] jcamachor commented on a change in pull request #787: HIVE-22239

2019-10-09 Thread GitBox
jcamachor commented on a change in pull request #787: HIVE-22239
URL: https://github.com/apache/hive/pull/787#discussion_r332979645
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
 ##
 @@ -856,8 +856,15 @@ public static ColStatistics 
getColStatistics(ColumnStatisticsObj cso, String tab
 } else if (colTypeLowerCase.equals(serdeConstants.BINARY_TYPE_NAME)) {
   cs.setAvgColLen(csd.getBinaryStats().getAvgColLen());
   cs.setNumNulls(csd.getBinaryStats().getNumNulls());
-} else if (colTypeLowerCase.equals(serdeConstants.TIMESTAMP_TYPE_NAME) ||
-colTypeLowerCase.equals(serdeConstants.TIMESTAMPLOCALTZ_TYPE_NAME)) {
+} else if (colTypeLowerCase.equals(serdeConstants.TIMESTAMP_TYPE_NAME)) {
+  cs.setAvgColLen(JavaDataModel.get().lengthOfTimestamp());
+  cs.setNumNulls(csd.getTimestampStats().getNumNulls());
+  Long lowVal = (csd.getTimestampStats().getLowValue() != null) ? 
csd.getTimestampStats().getLowValue()
+  .getSecondsSinceEpoch() : null;
+  Long highVal = (csd.getTimestampStats().getHighValue() != null) ? 
csd.getTimestampStats().getHighValue()
+  .getSecondsSinceEpoch() : null;
+  cs.setRange(lowVal, highVal);
 
 Review comment:
   Yeah, I did not want to change the information that we store for timestamp 
type, note that this patch only changes the way we read the data. I agree we 
could use finer granularity for different types.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] jcamachor commented on a change in pull request #787: HIVE-22239

2019-10-08 Thread GitBox
jcamachor commented on a change in pull request #787: HIVE-22239
URL: https://github.com/apache/hive/pull/787#discussion_r332770578
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
 ##
 @@ -935,8 +942,11 @@ else 
if(colTypeLowerCase.equals(serdeConstants.SMALLINT_TYPE_NAME)){
 cs.setNumTrues(Math.max(1, numRows/2));
 cs.setNumFalses(Math.max(1, numRows/2));
 cs.setAvgColLen(JavaDataModel.get().primitive1());
-} else if (colTypeLowerCase.equals(serdeConstants.TIMESTAMP_TYPE_NAME) ||
-colTypeLowerCase.equals(serdeConstants.TIMESTAMPLOCALTZ_TYPE_NAME)) {
+} else if (colTypeLowerCase.equals(serdeConstants.TIMESTAMP_TYPE_NAME)) {
+  cs.setAvgColLen(JavaDataModel.get().lengthOfTimestamp());
+  // epoch, seconds since epoch
+  cs.setRange(0, 2177452799L);
 
 Review comment:
   I answer to same comment to @miklosgergely above, please see my response. I 
used a new heuristic, I do not think existing was a very good one...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] jcamachor commented on a change in pull request #787: HIVE-22239

2019-10-08 Thread GitBox
jcamachor commented on a change in pull request #787: HIVE-22239
URL: https://github.com/apache/hive/pull/787#discussion_r332736102
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
 ##
 @@ -935,8 +942,11 @@ else 
if(colTypeLowerCase.equals(serdeConstants.SMALLINT_TYPE_NAME)){
 cs.setNumTrues(Math.max(1, numRows/2));
 cs.setNumFalses(Math.max(1, numRows/2));
 cs.setAvgColLen(JavaDataModel.get().primitive1());
-} else if (colTypeLowerCase.equals(serdeConstants.TIMESTAMP_TYPE_NAME) ||
-colTypeLowerCase.equals(serdeConstants.TIMESTAMPLOCALTZ_TYPE_NAME)) {
+} else if (colTypeLowerCase.equals(serdeConstants.TIMESTAMP_TYPE_NAME)) {
+  cs.setAvgColLen(JavaDataModel.get().lengthOfTimestamp());
+  // epoch, seconds since epoch
+  cs.setRange(0, 2177452799L);
 
 Review comment:
   I do not think this is critical, but I explored a bit and this seems to be a 
poor choice for a heuristic, since in most cases it will lead to 
underestimation of the data size (since most users will not have dates starting 
from 1970).
   I will take as lower limit `01-01-1999` and as upper limit `12-31-2024` 
(mentioned by Gopal). Let me know if you see any cons with this approach.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] jcamachor commented on a change in pull request #787: HIVE-22239

2019-10-08 Thread GitBox
jcamachor commented on a change in pull request #787: HIVE-22239
URL: https://github.com/apache/hive/pull/787#discussion_r332747726
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
 ##
 @@ -316,7 +321,7 @@ public Object process(Node nd, Stack stack, 
NodeProcessorCtx procCtx,
 
 protected long evaluateExpression(Statistics stats, ExprNodeDesc pred,
 AnnotateStatsProcCtx aspCtx, List neededCols,
-Operator op, long currNumRows) throws SemanticException {
+Operator op, long currNumRows, boolean uniformWithinRange) throws 
SemanticException {
 
 Review comment:
   I have used the `AnnotateStatsProcCtx` to hold that value, thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] jcamachor commented on a change in pull request #787: HIVE-22239

2019-10-08 Thread GitBox
jcamachor commented on a change in pull request #787: HIVE-22239
URL: https://github.com/apache/hive/pull/787#discussion_r332736102
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
 ##
 @@ -935,8 +942,11 @@ else 
if(colTypeLowerCase.equals(serdeConstants.SMALLINT_TYPE_NAME)){
 cs.setNumTrues(Math.max(1, numRows/2));
 cs.setNumFalses(Math.max(1, numRows/2));
 cs.setAvgColLen(JavaDataModel.get().primitive1());
-} else if (colTypeLowerCase.equals(serdeConstants.TIMESTAMP_TYPE_NAME) ||
-colTypeLowerCase.equals(serdeConstants.TIMESTAMPLOCALTZ_TYPE_NAME)) {
+} else if (colTypeLowerCase.equals(serdeConstants.TIMESTAMP_TYPE_NAME)) {
+  cs.setAvgColLen(JavaDataModel.get().lengthOfTimestamp());
+  // epoch, seconds since epoch
+  cs.setRange(0, 2177452799L);
 
 Review comment:
   I do not think this is critical, but I explored a bit and this seems to be a 
poor choice for a heuristic, since in most cases it will lead to 
underestimation of the data size (since most users will not have dates starting 
from 1970).
   I will take as lower limit `01-01-2015` (epoch for ORC) and as upper limit 
`12-31-2024`. Let me know if you see any cons with this approach.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] jcamachor commented on a change in pull request #787: HIVE-22239

2019-10-08 Thread GitBox
jcamachor commented on a change in pull request #787: HIVE-22239
URL: https://github.com/apache/hive/pull/787#discussion_r332736102
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
 ##
 @@ -935,8 +942,11 @@ else 
if(colTypeLowerCase.equals(serdeConstants.SMALLINT_TYPE_NAME)){
 cs.setNumTrues(Math.max(1, numRows/2));
 cs.setNumFalses(Math.max(1, numRows/2));
 cs.setAvgColLen(JavaDataModel.get().primitive1());
-} else if (colTypeLowerCase.equals(serdeConstants.TIMESTAMP_TYPE_NAME) ||
-colTypeLowerCase.equals(serdeConstants.TIMESTAMPLOCALTZ_TYPE_NAME)) {
+} else if (colTypeLowerCase.equals(serdeConstants.TIMESTAMP_TYPE_NAME)) {
+  cs.setAvgColLen(JavaDataModel.get().lengthOfTimestamp());
+  // epoch, seconds since epoch
+  cs.setRange(0, 2177452799L);
 
 Review comment:
   I do not think this is critical, but I explored a bit and this seems to be a 
poor choice for a heuristic, since in most cases it will lead to 
underestimation of the data size (since most users will not have dates starting 
from 1970).
   I will take as lower limit `01-01-2015` (epoch for ORC) and as upper limit 
`12-31-2025`. Let me know if you see any cons with this approach.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org