[ https://issues.apache.org/jira/browse/ARROW-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoine Pitrou resolved ARROW-5618. ----------------------------------- Resolution: Duplicate > [C++] [Parquet] Using deprecated Int96 storage for timestamps triggers > integer overflow in some cases > ----------------------------------------------------------------------------------------------------- > > Key: ARROW-5618 > URL: https://issues.apache.org/jira/browse/ARROW-5618 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Reporter: TP Boudreau > Assignee: TP Boudreau > Priority: Minor > Labels: parquet, pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > When storing Arrow timestamps in Parquet files using the Int96 storage > format, certain combinations of array lengths and validity bitmasks cause an > integer overflow error on read. It's not immediately clear whether the > Arrow/Parquet writer is storing zeroes when it should be storing positive > values or the reader is attempting to calculate a nanoseconds value > inappropriately from zeroed inputs (perhaps missing the null bit flag). Also > not immediately clear why only certain length columns seem to be affected. > Probably the quickest way to reproduce this undefined behavior is to alter > the existing unit test UseDeprecatedInt96 (in file > .../arrow/cpp/src/parquet/arrow/arrow-reader-writer-test.cc) by quadrupling > its column lengths (repeating the same values), followed by 'make unittest' > using clang-7 with sanitizers enabled. (Here's a patch applicable to current > master that changes the test as described: [1]; I used the following cmake > command to build my environment: [2].) You should get a log something like > [3]. If requested, I'll see if I can put together a stand-alone minimal test > case that induces the behavior. > The quick-hack at [4] will prevent integer overflows, but this is only > included to confirm the proximate cause of the bug: the Julian days field of > the Int96 appears to be zero, when a strictly positive number is expected. > I've assigned the issue to myself and I'll start looking into the root cause > of this. > [1] https://gist.github.com/tpboudreau/b6610c13cbfede4d6b171da681d1f94e > [2] https://gist.github.com/tpboudreau/59178ca8cb50a935aab7477805aa32b9 > [3] https://gist.github.com/tpboudreau/0c2d0a18960c1aa04c838fa5c2ac7d2d > [4] https://gist.github.com/tpboudreau/0993beb5c8c1488028e76fb2ca179b7f -- This message was sent by Atlassian Jira (v8.3.4#803005)