bersprockets opened a new pull request, #38923:
URL: https://github.com/apache/spark/pull/38923

   ### What changes were proposed in this pull request?
   
   Change `InterpretedMutableProjection` to use `setDecimal` rather than 
`setNullAt` to set null values for high-precision decimals in unsafe rows.
   
   ### Why are the changes needed?
   
   The following returns the wrong answer:
   
   ```
   set spark.sql.codegen.wholeStage=false;
   set spark.sql.codegen.factoryMode=NO_CODEGEN;
   
   select max(col1), max(col2) from values
   (cast(null  as decimal(27,2)), cast(null   as decimal(27,2))),
   (cast(77.77 as decimal(27,2)), cast(245.00 as decimal(27,2)))
   as data(col1, col2);
   
   +---------+---------+
   |max(col1)|max(col2)|
   +---------+---------+
   |null     |239.88   |
   +---------+---------+
   ```
   This is because `InterpretedMutableProjection` inappropriately uses 
`InternalRow#setNullAt` on unsafe rows to set null for decimal types with 
precision > `Decimal.MAX_LONG_DIGITS`.
   
   When `setNullAt` is used, the pointer to the decimal's storage area in the 
variable length region gets zeroed out. Later, when 
`InterpretedMutableProjection` calls `setDecimal` on that field, 
`UnsafeRow#setDecimal` picks up the zero pointer and stores decimal data on top 
of the null-tracking bit set. Later updates to the null-tracking bit set (e.g., 
calls to `setNotNullAt`) further corrupt the decimal data (turning 245.00 into 
239.88, for example). The stomping of the null-tracking bit set also can make 
non-null fields appear null (turning 77.77 into null, for example).
   
   This bug can manifest for end-users after codegen fallback (say, if an 
expression's generated code fails to compile).
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New unit tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to