[ 
https://issues.apache.org/jira/browse/ARROW-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-5618.
-----------------------------------
    Resolution: Duplicate

> [C++] [Parquet] Using deprecated Int96 storage for timestamps triggers 
> integer overflow in some cases
> -----------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-5618
>                 URL: https://issues.apache.org/jira/browse/ARROW-5618
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: TP Boudreau
>            Assignee: TP Boudreau
>            Priority: Minor
>              Labels: parquet, pull-request-available
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When storing Arrow timestamps in Parquet files using the Int96 storage 
> format, certain combinations of array lengths and validity bitmasks cause an 
> integer overflow error on read.  It's not immediately clear whether the 
> Arrow/Parquet writer is storing zeroes when it should be storing positive 
> values or the reader is attempting to calculate a nanoseconds value 
> inappropriately from zeroed inputs (perhaps missing the null bit flag).  Also 
> not immediately clear why only certain length columns seem to be affected.
> Probably the quickest way to reproduce this undefined behavior is to alter 
> the existing unit test UseDeprecatedInt96 (in file 
> .../arrow/cpp/src/parquet/arrow/arrow-reader-writer-test.cc) by quadrupling 
> its column lengths (repeating the same values), followed by 'make unittest' 
> using clang-7 with sanitizers enabled.  (Here's a patch applicable to current 
> master that changes the test as described: [1]; I used the following cmake 
> command to build my environment: [2].)  You should get a log something like 
> [3].  If requested, I'll see if I can put together a stand-alone minimal test 
> case that induces the behavior.
> The quick-hack at [4] will prevent integer overflows, but this is only 
> included to confirm the proximate cause of the bug: the Julian days field of 
> the Int96 appears to be zero, when a strictly positive number is expected.
> I've assigned the issue to myself and I'll start looking into the root cause 
> of this.
> [1] https://gist.github.com/tpboudreau/b6610c13cbfede4d6b171da681d1f94e
> [2] https://gist.github.com/tpboudreau/59178ca8cb50a935aab7477805aa32b9
> [3] https://gist.github.com/tpboudreau/0c2d0a18960c1aa04c838fa5c2ac7d2d
> [4] https://gist.github.com/tpboudreau/0993beb5c8c1488028e76fb2ca179b7f



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to