GitHub user mgaido91 opened a pull request:
https://github.com/apache/spark/pull/22602
[SPARK-25582][SQL] Zero-out all bytes when writing decimal
## What changes were proposed in this pull request?
In #20850 when writing non-null decimals, instead of zero-ing all the 16
allocated bytes, we zero-out only the padding bytes. Since we always allocate
16 bytes, if the number of bytes needed for a decimal is lower than 9, then
this means that the bytes between 8 and 16 are not zero-ed.
I see 2 solutions here:
- we can zero-out all the bytes in advance as it was done before #20850
(safer solution IMHO);
- we can allocate only the needed bytes (may be a bit more efficient in
terms of memory used, but I have not investigated the feasibility of this
option).
Hence I propose here the first solution in order to fix the correctness
issue. We can eventually switch to the second if we think is more efficient
later.
## How was this patch tested?
Running the test attached in the JIRA. I have not yet been able to write a
good UT in order to reproduce the issue. If anyone has any suggestion it is
more than welcomed. I'll try to find one in the next days anyway.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mgaido91/spark SPARK-25582
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22602.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22602
----
commit 851d7235cb2d60af95da3cde6420fea6e63c52d3
Author: Marco Gaido <marcogaido91@...>
Date: 2018-10-01T16:22:34Z
[SPARK-25582][SQL] Zero-out all bytes when writing decimal
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]