skambha commented on issue #27627: [WIP][SPARK-28067][SQL] Fix incorrect results for decimal aggregate sum by returning null on decimal overflow URL: https://github.com/apache/spark/pull/27627#issuecomment-595411413 In the repro (that has the join and then the agg), we also see the null being written out via the UnsafeRowWriter that is mentioned in my earlier comment with codegen details here.https://github.com/apache/spark/pull/27627#issuecomment-591185451 Codegen and the plan details from https://github.com/apache/spark/pull/27627#issuecomment-591185451 here: https://github.com/skambha/notes/blob/master/spark28067_ansifalse_wholestagetrue.txt ``` scala> df2.queryExecution.debug.codegen Found 2 WholeStageCodegen subtrees. == Subtree 1 / 2 (maxMethodCodeSize:173; maxConstantPoolSize:145(0.22% used); numInnerClasses:0) == *(2) HashAggregate(keys=[], functions=[sum(decNum#14)], output=[sum(decNum)#23]) +- Exchange SinglePartition, true, [id=#77] +- *(1) HashAggregate(keys=[], functions=[partial_sum(decNum#14)], output=[sum#29]) +- *(1) Project [decNum#14] +- *(1) BroadcastHashJoin [intNum#8], [intNum#15], Inner, BuildLeft :- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))), [id=#65] : +- LocalTableScan [intNum#8] +- *(1) LocalTableScan [decNum#14, intNum#15] ``` The relevant code lines of interest are the following: - Addition of two decimal values ( expression coming from sum) that results in value that cannot be contained. - Writing a big overflow decimal using UnsafeWriter —> which will write null silently. Codegen id1: /* 108 */ agg_value_3 = agg_value_4.$plus(agg_expr_0_0); /* 113 */ agg_value_2 = agg_value_3; /* 126 */ agg_mutableStateArray_0[0] = agg_value_2; /* 147 */ bhj_mutableStateArray_0[3].write(0, agg_mutableStateArray_0[0], 38, 18); https://github.com/skambha/notes/blob/master/spark28067_ansifalse_wholestagetrue.txt#L336 https://github.com/skambha/notes/blob/master/spark28067_ansifalse_wholestagetrue.txt#L354 https://github.com/skambha/notes/blob/master/spark28067_ansifalse_wholestagetrue.txt#L375 Codegen id2: /* 080 */ agg_value_5 = agg_value_6.$plus(agg_expr_0_0); /* 091 */ agg_value_4 = agg_mutableStateArray_0[0]; /* 098 */ agg_mutableStateArray_0[0] = agg_value_4; /* 128 */ agg_mutableStateArray_1[0].write(0, agg_value_1, 38, 18); https://github.com/skambha/notes/blob/master/spark28067_ansifalse_wholestagetrue.txt#L164 https://github.com/skambha/notes/blob/master/spark28067_ansifalse_wholestagetrue.txt#L212 - For both partial and final stage of aggregate, we can see the same issue that can happen. @cloud-fan, given the above pieces of code, why would the problem not exist for the partial aggregate (updateExpression) case?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
