[jira] [Commented] (SPARK-23799) [CBO] FilterEstimation.evaluateInSet produces devision by zero in a case of empty table with analyzed statistics

2018-04-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451609#comment-16451609
 ] 

Apache Spark commented on SPARK-23799:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/21147

> [CBO] FilterEstimation.evaluateInSet produces devision by zero in a case of 
> empty table with analyzed statistics
> 
>
> Key: SPARK-23799
> URL: https://issues.apache.org/jira/browse/SPARK-23799
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.2.1, 2.3.0
>Reporter: Michael Shtelma
>Assignee: Michael Shtelma
>Priority: Major
> Fix For: 2.4.0
>
>
> Spark 2.2.1 and 2.3.0 can produce NumberFormatException (see below) during 
> the analysis of the queries, which are using previously analyzed hive tables. 
> The NumberFormatException occurs because in [FilterEstimation.scala on lines 
> 50 and 
> 52|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala?utf8=%E2%9C%93#L50-L52]
>  the method calculateFilterSelectivity returns NaN, which is caused by 
> devision by zero. This leads to NumberFormatException during conversion from 
> Double to BigDecimal. 
> NaN is caused by devision by zero in evaluateInSet method. 
> Exception:
> java.lang.NumberFormatException
> at java.math.BigDecimal.(BigDecimal.java:494)
> at java.math.BigDecimal.(BigDecimal.java:824)
> at scala.math.BigDecimal$.decimal(BigDecimal.scala:52)
> at scala.math.BigDecimal$.decimal(BigDecimal.scala:55)
> at scala.math.BigDecimal$.double2bigDecimal(BigDecimal.scala:343)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.FilterEstimation.estimate(FilterEstimation.scala:52)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visitFilter(BasicStatsPlanVisitor.scala:43)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visitFilter(BasicStatsPlanVisitor.scala:25)
> at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlanVisitor$class.visit(LogicalPlanVisitor.scala:30)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visit(BasicStatsPlanVisitor.scala:25)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$$anonfun$stats$1.apply(LogicalPlanStats.scala:35)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$$anonfun$stats$1.apply(LogicalPlanStats.scala:33)
> at scala.Option.getOrElse(Option.scala:121)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$class.stats(LogicalPlanStats.scala:33)
> at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.stats(LogicalPlan.scala:30)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.EstimationUtils$$anonfun$rowCountsExist$1.apply(EstimationUtils.scala:32)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.EstimationUtils$$anonfun$rowCountsExist$1.apply(EstimationUtils.scala:32)
> at 
> scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38)
> at 
> scala.collection.IndexedSeqOptimized$class.forall(IndexedSeqOptimized.scala:43)
> at scala.collection.mutable.WrappedArray.forall(WrappedArray.scala:35)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.EstimationUtils$.rowCountsExist(EstimationUtils.scala:32)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.ProjectEstimation$.estimate(ProjectEstimation.scala:27)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visitProject(BasicStatsPlanVisitor.scala:63)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visitProject(BasicStatsPlanVisitor.scala:25)
> at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlanVisitor$class.visit(LogicalPlanVisitor.scala:37)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visit(BasicStatsPlanVisitor.scala:25)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$$anonfun$stats$1.apply(LogicalPlanStats.scala:35)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$$anonfun$stats$1.apply(LogicalPlanStats.scala:33)
> at scala.Option.getOrElse(Option.scala:121)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$class.stats(LogicalPlanStats.scala:33)
> at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.stats(LogicalPlan.scala:30)
> at 
> 

[jira] [Commented] (SPARK-23799) [CBO] FilterEstimation.evaluateInSet produces devision by zero in a case of empty table with analyzed statistics

2018-04-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435439#comment-16435439
 ] 

Apache Spark commented on SPARK-23799:
--

User 'mshtelma' has created a pull request for this issue:
https://github.com/apache/spark/pull/21052

> [CBO] FilterEstimation.evaluateInSet produces devision by zero in a case of 
> empty table with analyzed statistics
> 
>
> Key: SPARK-23799
> URL: https://issues.apache.org/jira/browse/SPARK-23799
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.2.1, 2.3.0
>Reporter: Michael Shtelma
>Priority: Major
>
> Spark 2.2.1 and 2.3.0 can produce NumberFormatException (see below) during 
> the analysis of the queries, which are using previously analyzed hive tables. 
> The NumberFormatException occurs because in [FilterEstimation.scala on lines 
> 50 and 
> 52|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala?utf8=%E2%9C%93#L50-L52]
>  the method calculateFilterSelectivity returns NaN, which is caused by 
> devision by zero. This leads to NumberFormatException during conversion from 
> Double to BigDecimal. 
> NaN is caused by devision by zero in evaluateInSet method. 
> Exception:
> java.lang.NumberFormatException
> at java.math.BigDecimal.(BigDecimal.java:494)
> at java.math.BigDecimal.(BigDecimal.java:824)
> at scala.math.BigDecimal$.decimal(BigDecimal.scala:52)
> at scala.math.BigDecimal$.decimal(BigDecimal.scala:55)
> at scala.math.BigDecimal$.double2bigDecimal(BigDecimal.scala:343)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.FilterEstimation.estimate(FilterEstimation.scala:52)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visitFilter(BasicStatsPlanVisitor.scala:43)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visitFilter(BasicStatsPlanVisitor.scala:25)
> at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlanVisitor$class.visit(LogicalPlanVisitor.scala:30)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visit(BasicStatsPlanVisitor.scala:25)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$$anonfun$stats$1.apply(LogicalPlanStats.scala:35)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$$anonfun$stats$1.apply(LogicalPlanStats.scala:33)
> at scala.Option.getOrElse(Option.scala:121)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$class.stats(LogicalPlanStats.scala:33)
> at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.stats(LogicalPlan.scala:30)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.EstimationUtils$$anonfun$rowCountsExist$1.apply(EstimationUtils.scala:32)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.EstimationUtils$$anonfun$rowCountsExist$1.apply(EstimationUtils.scala:32)
> at 
> scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38)
> at 
> scala.collection.IndexedSeqOptimized$class.forall(IndexedSeqOptimized.scala:43)
> at scala.collection.mutable.WrappedArray.forall(WrappedArray.scala:35)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.EstimationUtils$.rowCountsExist(EstimationUtils.scala:32)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.ProjectEstimation$.estimate(ProjectEstimation.scala:27)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visitProject(BasicStatsPlanVisitor.scala:63)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visitProject(BasicStatsPlanVisitor.scala:25)
> at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlanVisitor$class.visit(LogicalPlanVisitor.scala:37)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visit(BasicStatsPlanVisitor.scala:25)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$$anonfun$stats$1.apply(LogicalPlanStats.scala:35)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$$anonfun$stats$1.apply(LogicalPlanStats.scala:33)
> at scala.Option.getOrElse(Option.scala:121)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$class.stats(LogicalPlanStats.scala:33)
> at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.stats(LogicalPlan.scala:30)
> at 
> org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$$anonfun$2.apply(CostBasedJoinReorder.scala:64)
> 

[jira] [Commented] (SPARK-23799) [CBO] FilterEstimation.evaluateInSet produces devision by zero in a case of empty table with analyzed statistics

2018-03-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415652#comment-16415652
 ] 

Apache Spark commented on SPARK-23799:
--

User 'mshtelma' has created a pull request for this issue:
https://github.com/apache/spark/pull/20913

> [CBO] FilterEstimation.evaluateInSet produces devision by zero in a case of 
> empty table with analyzed statistics
> 
>
> Key: SPARK-23799
> URL: https://issues.apache.org/jira/browse/SPARK-23799
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.2.1, 2.3.0
>Reporter: Michael Shtelma
>Priority: Major
>
> FilterEstimation.evaluateInSet can perform devision by zero in a case of 
> empty table with analyzed statistics, which leads to the following exception: 
>  
> java.lang.NumberFormatException
> at java.math.BigDecimal.(BigDecimal.java:494)
> at java.math.BigDecimal.(BigDecimal.java:824)
> at scala.math.BigDecimal$.decimal(BigDecimal.scala:52)
> at scala.math.BigDecimal$.decimal(BigDecimal.scala:55)
> at scala.math.BigDecimal$.double2bigDecimal(BigDecimal.scala:343)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.FilterEstimation.estimate(FilterEstimation.scala:52)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visitFilter(BasicStatsPlanVisitor.scala:43)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visitFilter(BasicStatsPlanVisitor.scala:25)
> at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlanVisitor$class.visit(LogicalPlanVisitor.scala:30)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visit(BasicStatsPlanVisitor.scala:25)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$$anonfun$stats$1.apply(LogicalPlanStats.scala:35)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$$anonfun$stats$1.apply(LogicalPlanStats.scala:33)
> at scala.Option.getOrElse(Option.scala:121)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$class.stats(LogicalPlanStats.scala:33)
> at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.stats(LogicalPlan.scala:30)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.EstimationUtils$$anonfun$rowCountsExist$1.apply(EstimationUtils.scala:32)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.EstimationUtils$$anonfun$rowCountsExist$1.apply(EstimationUtils.scala:32)
> at 
> scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38)
> at 
> scala.collection.IndexedSeqOptimized$class.forall(IndexedSeqOptimized.scala:43)
> at scala.collection.mutable.WrappedArray.forall(WrappedArray.scala:35)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.EstimationUtils$.rowCountsExist(EstimationUtils.scala:32)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.ProjectEstimation$.estimate(ProjectEstimation.scala:27)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visitProject(BasicStatsPlanVisitor.scala:63)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visitProject(BasicStatsPlanVisitor.scala:25)
> at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlanVisitor$class.visit(LogicalPlanVisitor.scala:37)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visit(BasicStatsPlanVisitor.scala:25)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$$anonfun$stats$1.apply(LogicalPlanStats.scala:35)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$$anonfun$stats$1.apply(LogicalPlanStats.scala:33)
> at scala.Option.getOrElse(Option.scala:121)
> at 
> org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$class.stats(LogicalPlanStats.scala:33)
> at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.stats(LogicalPlan.scala:30)
> at 
> org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$$anonfun$2.apply(CostBasedJoinReorder.scala:64)
> at 
> org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$$anonfun$2.apply(CostBasedJoinReorder.scala:64)
> at 
> scala.collection.LinearSeqOptimized$class.forall(LinearSeqOptimized.scala:83)
> at scala.collection.immutable.List.forall(List.scala:84)
> at 
> org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$.org$apache$spark$sql$catalyst$optimizer$CostBasedJoinReorder$$reorder(CostBasedJoinReorder.scala:64)
> at 
>