GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/22602

    [SPARK-25582][SQL] Zero-out all bytes when writing decimal

    ## What changes were proposed in this pull request?
    
    In #20850 when writing non-null decimals, instead of zero-ing all the 16 
allocated bytes, we zero-out only the padding bytes. Since we always allocate 
16 bytes, if the number of bytes needed for a decimal is lower than 9, then 
this means that the bytes between 8 and 16 are not zero-ed.
    
    I see 2 solutions here:
     - we can zero-out all the bytes in advance as it was done before #20850 
(safer solution IMHO);
     - we can allocate only the needed bytes (may be a bit more efficient in 
terms of memory used, but I have not investigated the feasibility of this 
option).
    
    Hence I propose here the first solution in order to fix the correctness 
issue. We can eventually switch to the second if we think is more efficient 
later.
    
    ## How was this patch tested?
    
    Running the test attached in the JIRA. I have not yet been able to write a 
good UT in order to reproduce the issue. If anyone has any suggestion it is 
more than welcomed. I'll try to find one in the next days anyway.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-25582

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22602.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22602
    
----
commit 851d7235cb2d60af95da3cde6420fea6e63c52d3
Author: Marco Gaido <marcogaido91@...>
Date:   2018-10-01T16:22:34Z

    [SPARK-25582][SQL] Zero-out all bytes when writing decimal

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to