JoshRosen commented on a change in pull request #25010: [SPARK-28201][SQL]
Revisit MakeDecimal behavior on overflow
URL: https://github.com/apache/spark/pull/25010#discussion_r298804728
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/decimalExpressions.scala
##########
@@ -46,19 +47,35 @@ case class UnscaledValue(child: Expression) extends
UnaryExpression {
*/
case class MakeDecimal(child: Expression, precision: Int, scale: Int) extends
UnaryExpression {
+ private val nullOnOverflow = SQLConf.get.decimalOperationsNullOnOverflow
+ private lazy val doEval = if (nullOnOverflow) {
+ input: Long => new Decimal().setOrNull(input, precision, scale)
+ } else {
+ input: Long => new Decimal().set(input, precision, scale)
+ }
+
override def dataType: DataType = DecimalType(precision, scale)
- override def nullable: Boolean = true
+ override def nullable: Boolean = child.nullable || nullOnOverflow
override def toString: String = s"MakeDecimal($child,$precision,$scale)"
- protected override def nullSafeEval(input: Any): Any =
- Decimal(input.asInstanceOf[Long], precision, scale)
+ protected override def nullSafeEval(input: Any): Any =
doEval(input.asInstanceOf[Long])
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
nullSafeCodeGen(ctx, ev, eval => {
+ val setMethod = if (nullOnOverflow) {
+ "setOrNull"
+ } else {
+ "set"
+ }
+ val setNull = if (nullable) {
Review comment:
Is it safe to skip the assignment to `${ev.isNull}` in the `nullable =
false` branch?
I'm concerned about what would happen when the underlying variable that the
`${ev.isNull} = ` updates has already been initialized (possibly to a default
value): I _guess_ that's okay because then then when `nullable == false` the
`isNull` variable already has the correct value (`false`).
I was just curious whether there's existing precedence for this in codegen.
Should we limit scope and keep the `isNull` assignment as we had it before
(deferring this additional optimization)? Or should we keep this additional
optimization citing precedence? (I'm not a codegen expert; mostly asking for my
own curiousity, plus general conservatism about reducing number of branches /
cases to review).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]