Re: [PR] [SPARK-50561][SQL] Improve type coercion and boundary checking for UNIFORM SQL function [spark]

via GitHub Fri, 20 Dec 2024 14:30:10 -0800


dtenedor commented on code in PR #49237:
URL: https://github.com/apache/spark/pull/49237#discussion_r1894421035



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala:
##########
@@ -229,16 +232,20 @@ case class Uniform(min: Expression, max: Expression, 
seedExpression: Expression,
         if Seq(first, second).forall(integer) => IntegerType
       case (_, ShortType) | (ShortType, _)
         if Seq(first, second).forall(integer) => ShortType
+      case (_, ByteType) | (ByteType, _)
+        if Seq(first, second).forall(integer) => ByteType
       case (_, DoubleType) | (DoubleType, _) => DoubleType
       case (_, FloatType) | (FloatType, _) => FloatType
+      case (_, d: DecimalType) => d
+      case (d: DecimalType, _) => d
       case _ =>
         throw SparkException.internalError(
           s"Unexpected argument data types: ${min.dataType}, ${max.dataType}")
     }
   }
 
   private def integer(t: DataType): Boolean = t match {
-    case _: ShortType | _: IntegerType | _: LongType => true
+    case _: ByteType | _: ShortType | _: IntegerType | _: LongType => true

Review Comment:
   I ended up just updating this to use the `ExpectsInputTypes` trait to 
simplify this code.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala:
##########
@@ -229,16 +232,20 @@ case class Uniform(min: Expression, max: Expression, 
seedExpression: Expression,
         if Seq(first, second).forall(integer) => IntegerType
       case (_, ShortType) | (ShortType, _)

Review Comment:
   Good idea, this is done.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala:
##########
@@ -229,16 +232,20 @@ case class Uniform(min: Expression, max: Expression, 
seedExpression: Expression,
         if Seq(first, second).forall(integer) => IntegerType
       case (_, ShortType) | (ShortType, _)
         if Seq(first, second).forall(integer) => ShortType
+      case (_, ByteType) | (ByteType, _)
+        if Seq(first, second).forall(integer) => ByteType
       case (_, DoubleType) | (DoubleType, _) => DoubleType
       case (_, FloatType) | (FloatType, _) => FloatType
+      case (_, d: DecimalType) => d

Review Comment:
   Good idea, this is done.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala:
##########
@@ -229,16 +232,20 @@ case class Uniform(min: Expression, max: Expression, 
seedExpression: Expression,
         if Seq(first, second).forall(integer) => IntegerType
       case (_, ShortType) | (ShortType, _)
         if Seq(first, second).forall(integer) => ShortType
+      case (_, ByteType) | (ByteType, _)
+        if Seq(first, second).forall(integer) => ByteType
       case (_, DoubleType) | (DoubleType, _) => DoubleType
       case (_, FloatType) | (FloatType, _) => FloatType
+      case (_, d: DecimalType) => d
+      case (d: DecimalType, _) => d

Review Comment:
   Good idea, this is done.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala:
##########
@@ -49,6 +49,9 @@ trait RDG extends Expression with ExpressionWithRandomSeed {
   @transient protected lazy val seed: Long = seedExpression match {
     case e if e.dataType == IntegerType => e.eval().asInstanceOf[Int]
     case e if e.dataType == LongType => e.eval().asInstanceOf[Long]
+    case e if e.dataType == FloatType => e.eval().asInstanceOf[Float].toLong
+    case e if e.dataType == DoubleType => e.eval().asInstanceOf[Double].toLong
+    case e if e.dataType.isInstanceOf[DecimalType] => 
e.eval().asInstanceOf[Decimal].toLong

Review Comment:
   I checked and the existing `RAND` and `RANDN` functions only accept 
`IntegerType` or `LongType` for the random seed (but positive, zero, and 
negative values are allowed). I updated this PR to be consistent with that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-50561][SQL] Improve type coercion and boundary checking for UNIFORM SQL function [spark]

Reply via email to