srowen commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid  
match error and int overflow in ApproximatePercentile
URL: https://github.com/apache/spark/pull/26905#discussion_r358819767
 
 

 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 ##########
 @@ -83,32 +83,37 @@ case class ApproximatePercentile(
   }
 
   // Mark as lazy so that accuracyExpression is not evaluated during tree 
transformation.
-  private lazy val accuracy: Int = accuracyExpression.eval().asInstanceOf[Int]
-
-  override def inputTypes: Seq[AbstractDataType] = {
-    // Support NumericType, DateType and TimestampType since their internal 
types are all numeric,
-    // and can be easily cast to double for processing.
-    Seq(TypeCollection(NumericType, DateType, TimestampType),
-      TypeCollection(DoubleType, ArrayType(DoubleType)), IntegerType)
-  }
+  private lazy val accuracy: Long = 
accuracyExpression.eval().asInstanceOf[Number].longValue()
 
   // Mark as lazy so that percentageExpression is not evaluated during tree 
transformation.
   private lazy val (returnPercentileArray: Boolean, percentages: 
Array[Double]) =
-    percentageExpression.eval() match {
-      // Rule ImplicitTypeCasts can cast other numeric types to double
-      case num: Double => (false, Array(num))
-      case arrayData: ArrayData => (true, arrayData.toDoubleArray())
+    percentageExpression.dataType match {
+      case DoubleType => (false, 
Array(percentageExpression.eval().asInstanceOf[Double]))
+      case _: NumericType =>
 
 Review comment:
   BTW, as an aside, allowing very large values > 2 billion doesn't matter 
here. This is the inverse of an accuracy param and anything that large makes 
the resulting param basically 0 anyway, so, just do whatever is simpler IMHO, 
including disallowing long if that helps.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to