[jira] [Resolved] (PARQUET-1438) [C++] corrupted files produced on 32-bit architecture (i686)
[ https://issues.apache.org/jira/browse/PARQUET-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kalinkin resolved PARQUET-1438. -- Resolution: Fixed Fix Version/s: 1.12.0 This got resolved after fixing the issue in arrow-cpp > [C++] corrupted files produced on 32-bit architecture (i686) > > > Key: PARQUET-1438 > URL: https://issues.apache.org/jira/browse/PARQUET-1438 > Project: Parquet > Issue Type: Bug >Reporter: Dmitry Kalinkin >Priority: Major > Fix For: 1.12.0 > > Attachments: 32.parquet, 64.parquet, arrow_0.10.0_i686_test_fail.log, > arrow_0.11.0_i686_test_fail.log, parquet_1.5.0_i686_test_success.log > > > I'm using C++ API to convert some data to parquet files. I've noticed a > regression when upgrading from arrow-cpp 0.10.0 + parquet-cpp 1.5.0 to > arrow-cpp 0.11.0. The issue is that I can write parquet files without an > error, but when I try to read those using pyarrow I get a segfault: > {noformat} > #0 0x7fffd17c7f0f in int > arrow::util::RleDecoder::GetBatchWithDictSpaced(float const*, float*, > int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #1 0x7fffd17c8025 in > parquet::DictionaryDecoder > >::DecodeSpaced(float*, int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #2 0x7fffd17bcf0f in > parquet::internal::TypedRecordReader > >::ReadRecordData(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #3 0x7fffd17bfbea in > parquet::internal::TypedRecordReader > >::ReadRecords(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #4 0x7fffd179d2f7 in parquet::arrow::PrimitiveImpl::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #5 0x7fffd1797162 in parquet::arrow::ColumnReader::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #6 0x7fffd179a6e5 in > parquet::arrow::FileReader::Impl::ReadSchemaField(int, std::vector std::allocator > const&, std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #7 0x7fffd179aaad in > parquet::arrow::FileReader::Impl::ReadTable(std::vector std::allocator > const&, > std::shared_ptr*)::{lambda(int)#1}::operator()(int) const () > from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > {noformat} > I have not been able to dig to the bottom of the issue, but it seems like the > problem reproduces only when I run 32 bit binaries. After I learned that, I > found that 32 bit and 64 bit codes produce very different different parquet > files for the same data. The sizes of the structures are clearly different if > I look at their hexdumps. I'm attaching those example files. Reading > "32.parquet" (produced using i686 binaries) will cause a segfault on macOS > and linux, "64.parquet" will read just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1438) [C++] corrupted files produced on 32-bit architecture (i686)
[ https://issues.apache.org/jira/browse/PARQUET-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644052#comment-16644052 ] Dmitry Kalinkin commented on PARQUET-1438: -- Opened ARROW-3477 > [C++] corrupted files produced on 32-bit architecture (i686) > > > Key: PARQUET-1438 > URL: https://issues.apache.org/jira/browse/PARQUET-1438 > Project: Parquet > Issue Type: Bug >Reporter: Dmitry Kalinkin >Priority: Major > Attachments: 32.parquet, 64.parquet, arrow_0.10.0_i686_test_fail.log, > arrow_0.11.0_i686_test_fail.log, parquet_1.5.0_i686_test_success.log > > > I'm using C++ API to convert some data to parquet files. I've noticed a > regression when upgrading from arrow-cpp 0.10.0 + parquet-cpp 1.5.0 to > arrow-cpp 0.11.0. The issue is that I can write parquet files without an > error, but when I try to read those using pyarrow I get a segfault: > {noformat} > #0 0x7fffd17c7f0f in int > arrow::util::RleDecoder::GetBatchWithDictSpaced(float const*, float*, > int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #1 0x7fffd17c8025 in > parquet::DictionaryDecoder > >::DecodeSpaced(float*, int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #2 0x7fffd17bcf0f in > parquet::internal::TypedRecordReader > >::ReadRecordData(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #3 0x7fffd17bfbea in > parquet::internal::TypedRecordReader > >::ReadRecords(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #4 0x7fffd179d2f7 in parquet::arrow::PrimitiveImpl::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #5 0x7fffd1797162 in parquet::arrow::ColumnReader::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #6 0x7fffd179a6e5 in > parquet::arrow::FileReader::Impl::ReadSchemaField(int, std::vector std::allocator > const&, std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #7 0x7fffd179aaad in > parquet::arrow::FileReader::Impl::ReadTable(std::vector std::allocator > const&, > std::shared_ptr*)::{lambda(int)#1}::operator()(int) const () > from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > {noformat} > I have not been able to dig to the bottom of the issue, but it seems like the > problem reproduces only when I run 32 bit binaries. After I learned that, I > found that 32 bit and 64 bit codes produce very different different parquet > files for the same data. The sizes of the structures are clearly different if > I look at their hexdumps. I'm attaching those example files. Reading > "32.parquet" (produced using i686 binaries) will cause a segfault on macOS > and linux, "64.parquet" will read just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1438) [C++] corrupted files produced on 32-bit architecture (i686)
[ https://issues.apache.org/jira/browse/PARQUET-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643970#comment-16643970 ] Dmitry Kalinkin commented on PARQUET-1438: -- Perhaps this is an arrow issue then? > [C++] corrupted files produced on 32-bit architecture (i686) > > > Key: PARQUET-1438 > URL: https://issues.apache.org/jira/browse/PARQUET-1438 > Project: Parquet > Issue Type: Bug >Reporter: Dmitry Kalinkin >Priority: Major > Attachments: 32.parquet, 64.parquet, arrow_0.10.0_i686_test_fail.log, > arrow_0.11.0_i686_test_fail.log, parquet_1.5.0_i686_test_success.log > > > I'm using C++ API to convert some data to parquet files. I've noticed a > regression when upgrading from arrow-cpp 0.10.0 + parquet-cpp 1.5.0 to > arrow-cpp 0.11.0. The issue is that I can write parquet files without an > error, but when I try to read those using pyarrow I get a segfault: > {noformat} > #0 0x7fffd17c7f0f in int > arrow::util::RleDecoder::GetBatchWithDictSpaced(float const*, float*, > int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #1 0x7fffd17c8025 in > parquet::DictionaryDecoder > >::DecodeSpaced(float*, int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #2 0x7fffd17bcf0f in > parquet::internal::TypedRecordReader > >::ReadRecordData(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #3 0x7fffd17bfbea in > parquet::internal::TypedRecordReader > >::ReadRecords(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #4 0x7fffd179d2f7 in parquet::arrow::PrimitiveImpl::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #5 0x7fffd1797162 in parquet::arrow::ColumnReader::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #6 0x7fffd179a6e5 in > parquet::arrow::FileReader::Impl::ReadSchemaField(int, std::vector std::allocator > const&, std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #7 0x7fffd179aaad in > parquet::arrow::FileReader::Impl::ReadTable(std::vector std::allocator > const&, > std::shared_ptr*)::{lambda(int)#1}::operator()(int) const () > from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > {noformat} > I have not been able to dig to the bottom of the issue, but it seems like the > problem reproduces only when I run 32 bit binaries. After I learned that, I > found that 32 bit and 64 bit codes produce very different different parquet > files for the same data. The sizes of the structures are clearly different if > I look at their hexdumps. I'm attaching those example files. Reading > "32.parquet" (produced using i686 binaries) will cause a segfault on macOS > and linux, "64.parquet" will read just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1438) [C++] corrupted files produced on 32-bit architecture (i686)
[ https://issues.apache.org/jira/browse/PARQUET-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643963#comment-16643963 ] Dmitry Kalinkin commented on PARQUET-1438: -- Running test suite was a great suggestion! I've tested arrow-cpp 0.10.0, parquet 1.5.0, arrow-cpp 0.11.0 and found that all tests pass on x86_64. As for tests on i686, *1* test fail on arrow-cpp 0.10.0, *0* failures for parquet 1.5.0 (against arrow-cpp 0.10.0), arrow-cpp 0.11.0 has *11* failing tests. I'm attaching log files to the ticket. > [C++] corrupted files produced on 32-bit architecture (i686) > > > Key: PARQUET-1438 > URL: https://issues.apache.org/jira/browse/PARQUET-1438 > Project: Parquet > Issue Type: Bug >Reporter: Dmitry Kalinkin >Priority: Major > Attachments: 32.parquet, 64.parquet, arrow_0.10.0_i686_test_fail.log, > arrow_0.11.0_i686_test_fail.log, parquet_1.5.0_i686_test_success.log > > > I'm using C++ API to convert some data to parquet files. I've noticed a > regression when upgrading from arrow-cpp 0.10.0 + parquet-cpp 1.5.0 to > arrow-cpp 0.11.0. The issue is that I can write parquet files without an > error, but when I try to read those using pyarrow I get a segfault: > {noformat} > #0 0x7fffd17c7f0f in int > arrow::util::RleDecoder::GetBatchWithDictSpaced(float const*, float*, > int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #1 0x7fffd17c8025 in > parquet::DictionaryDecoder > >::DecodeSpaced(float*, int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #2 0x7fffd17bcf0f in > parquet::internal::TypedRecordReader > >::ReadRecordData(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #3 0x7fffd17bfbea in > parquet::internal::TypedRecordReader > >::ReadRecords(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #4 0x7fffd179d2f7 in parquet::arrow::PrimitiveImpl::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #5 0x7fffd1797162 in parquet::arrow::ColumnReader::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #6 0x7fffd179a6e5 in > parquet::arrow::FileReader::Impl::ReadSchemaField(int, std::vector std::allocator > const&, std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #7 0x7fffd179aaad in > parquet::arrow::FileReader::Impl::ReadTable(std::vector std::allocator > const&, > std::shared_ptr*)::{lambda(int)#1}::operator()(int) const () > from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > {noformat} > I have not been able to dig to the bottom of the issue, but it seems like the > problem reproduces only when I run 32 bit binaries. After I learned that, I > found that 32 bit and 64 bit codes produce very different different parquet > files for the same data. The sizes of the structures are clearly different if > I look at their hexdumps. I'm attaching those example files. Reading > "32.parquet" (produced using i686 binaries) will cause a segfault on macOS > and linux, "64.parquet" will read just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1438) [C++] corrupted files produced on 32-bit architecture (i686)
[ https://issues.apache.org/jira/browse/PARQUET-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kalinkin updated PARQUET-1438: - Attachment: arrow_0.10.0_i686_test_fail.log > [C++] corrupted files produced on 32-bit architecture (i686) > > > Key: PARQUET-1438 > URL: https://issues.apache.org/jira/browse/PARQUET-1438 > Project: Parquet > Issue Type: Bug >Reporter: Dmitry Kalinkin >Priority: Major > Attachments: 32.parquet, 64.parquet, arrow_0.10.0_i686_test_fail.log, > arrow_0.11.0_i686_test_fail.log, parquet_1.5.0_i686_test_success.log > > > I'm using C++ API to convert some data to parquet files. I've noticed a > regression when upgrading from arrow-cpp 0.10.0 + parquet-cpp 1.5.0 to > arrow-cpp 0.11.0. The issue is that I can write parquet files without an > error, but when I try to read those using pyarrow I get a segfault: > {noformat} > #0 0x7fffd17c7f0f in int > arrow::util::RleDecoder::GetBatchWithDictSpaced(float const*, float*, > int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #1 0x7fffd17c8025 in > parquet::DictionaryDecoder > >::DecodeSpaced(float*, int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #2 0x7fffd17bcf0f in > parquet::internal::TypedRecordReader > >::ReadRecordData(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #3 0x7fffd17bfbea in > parquet::internal::TypedRecordReader > >::ReadRecords(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #4 0x7fffd179d2f7 in parquet::arrow::PrimitiveImpl::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #5 0x7fffd1797162 in parquet::arrow::ColumnReader::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #6 0x7fffd179a6e5 in > parquet::arrow::FileReader::Impl::ReadSchemaField(int, std::vector std::allocator > const&, std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #7 0x7fffd179aaad in > parquet::arrow::FileReader::Impl::ReadTable(std::vector std::allocator > const&, > std::shared_ptr*)::{lambda(int)#1}::operator()(int) const () > from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > {noformat} > I have not been able to dig to the bottom of the issue, but it seems like the > problem reproduces only when I run 32 bit binaries. After I learned that, I > found that 32 bit and 64 bit codes produce very different different parquet > files for the same data. The sizes of the structures are clearly different if > I look at their hexdumps. I'm attaching those example files. Reading > "32.parquet" (produced using i686 binaries) will cause a segfault on macOS > and linux, "64.parquet" will read just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1438) [C++] corrupted files produced on 32-bit architecture (i686)
[ https://issues.apache.org/jira/browse/PARQUET-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kalinkin updated PARQUET-1438: - Attachment: parquet_1.5.0_i686_test_success.log > [C++] corrupted files produced on 32-bit architecture (i686) > > > Key: PARQUET-1438 > URL: https://issues.apache.org/jira/browse/PARQUET-1438 > Project: Parquet > Issue Type: Bug >Reporter: Dmitry Kalinkin >Priority: Major > Attachments: 32.parquet, 64.parquet, arrow_0.10.0_i686_test_fail.log, > arrow_0.11.0_i686_test_fail.log, parquet_1.5.0_i686_test_success.log > > > I'm using C++ API to convert some data to parquet files. I've noticed a > regression when upgrading from arrow-cpp 0.10.0 + parquet-cpp 1.5.0 to > arrow-cpp 0.11.0. The issue is that I can write parquet files without an > error, but when I try to read those using pyarrow I get a segfault: > {noformat} > #0 0x7fffd17c7f0f in int > arrow::util::RleDecoder::GetBatchWithDictSpaced(float const*, float*, > int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #1 0x7fffd17c8025 in > parquet::DictionaryDecoder > >::DecodeSpaced(float*, int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #2 0x7fffd17bcf0f in > parquet::internal::TypedRecordReader > >::ReadRecordData(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #3 0x7fffd17bfbea in > parquet::internal::TypedRecordReader > >::ReadRecords(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #4 0x7fffd179d2f7 in parquet::arrow::PrimitiveImpl::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #5 0x7fffd1797162 in parquet::arrow::ColumnReader::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #6 0x7fffd179a6e5 in > parquet::arrow::FileReader::Impl::ReadSchemaField(int, std::vector std::allocator > const&, std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #7 0x7fffd179aaad in > parquet::arrow::FileReader::Impl::ReadTable(std::vector std::allocator > const&, > std::shared_ptr*)::{lambda(int)#1}::operator()(int) const () > from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > {noformat} > I have not been able to dig to the bottom of the issue, but it seems like the > problem reproduces only when I run 32 bit binaries. After I learned that, I > found that 32 bit and 64 bit codes produce very different different parquet > files for the same data. The sizes of the structures are clearly different if > I look at their hexdumps. I'm attaching those example files. Reading > "32.parquet" (produced using i686 binaries) will cause a segfault on macOS > and linux, "64.parquet" will read just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1438) [C++] corrupted files produced on 32-bit architecture (i686)
[ https://issues.apache.org/jira/browse/PARQUET-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kalinkin updated PARQUET-1438: - Attachment: arrow_0.11.0_i686_test_fail.log > [C++] corrupted files produced on 32-bit architecture (i686) > > > Key: PARQUET-1438 > URL: https://issues.apache.org/jira/browse/PARQUET-1438 > Project: Parquet > Issue Type: Bug >Reporter: Dmitry Kalinkin >Priority: Major > Attachments: 32.parquet, 64.parquet, arrow_0.10.0_i686_test_fail.log, > arrow_0.11.0_i686_test_fail.log, parquet_1.5.0_i686_test_success.log > > > I'm using C++ API to convert some data to parquet files. I've noticed a > regression when upgrading from arrow-cpp 0.10.0 + parquet-cpp 1.5.0 to > arrow-cpp 0.11.0. The issue is that I can write parquet files without an > error, but when I try to read those using pyarrow I get a segfault: > {noformat} > #0 0x7fffd17c7f0f in int > arrow::util::RleDecoder::GetBatchWithDictSpaced(float const*, float*, > int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #1 0x7fffd17c8025 in > parquet::DictionaryDecoder > >::DecodeSpaced(float*, int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #2 0x7fffd17bcf0f in > parquet::internal::TypedRecordReader > >::ReadRecordData(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #3 0x7fffd17bfbea in > parquet::internal::TypedRecordReader > >::ReadRecords(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #4 0x7fffd179d2f7 in parquet::arrow::PrimitiveImpl::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #5 0x7fffd1797162 in parquet::arrow::ColumnReader::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #6 0x7fffd179a6e5 in > parquet::arrow::FileReader::Impl::ReadSchemaField(int, std::vector std::allocator > const&, std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #7 0x7fffd179aaad in > parquet::arrow::FileReader::Impl::ReadTable(std::vector std::allocator > const&, > std::shared_ptr*)::{lambda(int)#1}::operator()(int) const () > from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > {noformat} > I have not been able to dig to the bottom of the issue, but it seems like the > problem reproduces only when I run 32 bit binaries. After I learned that, I > found that 32 bit and 64 bit codes produce very different different parquet > files for the same data. The sizes of the structures are clearly different if > I look at their hexdumps. I'm attaching those example files. Reading > "32.parquet" (produced using i686 binaries) will cause a segfault on macOS > and linux, "64.parquet" will read just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1438) [C++] corrupted files produced on 32-bit architecture (i686)
[ https://issues.apache.org/jira/browse/PARQUET-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643608#comment-16643608 ] Dmitry Kalinkin commented on PARQUET-1438: -- Thank you for providing the diff. I looked and it doesn't seem very drastic to me as well. I don't think there is a conflicting libraries problem. I do all of my builds in a sandbox and the writing of files does succeed with resulting files being grossly different for 0.11.0 on 32 bits. Unfortunately all of 3545186d6, 3545186d6~ and 9b4cd9c03 do reproduce the bug. > [C++] corrupted files produced on 32-bit architecture (i686) > > > Key: PARQUET-1438 > URL: https://issues.apache.org/jira/browse/PARQUET-1438 > Project: Parquet > Issue Type: Bug >Reporter: Dmitry Kalinkin >Priority: Major > Attachments: 32.parquet, 64.parquet > > > I'm using C++ API to convert some data to parquet files. I've noticed a > regression when upgrading from arrow-cpp 0.10.0 + parquet-cpp 1.5.0 to > arrow-cpp 0.11.0. The issue is that I can write parquet files without an > error, but when I try to read those using pyarrow I get a segfault: > {noformat} > #0 0x7fffd17c7f0f in int > arrow::util::RleDecoder::GetBatchWithDictSpaced(float const*, float*, > int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #1 0x7fffd17c8025 in > parquet::DictionaryDecoder > >::DecodeSpaced(float*, int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #2 0x7fffd17bcf0f in > parquet::internal::TypedRecordReader > >::ReadRecordData(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #3 0x7fffd17bfbea in > parquet::internal::TypedRecordReader > >::ReadRecords(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #4 0x7fffd179d2f7 in parquet::arrow::PrimitiveImpl::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #5 0x7fffd1797162 in parquet::arrow::ColumnReader::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #6 0x7fffd179a6e5 in > parquet::arrow::FileReader::Impl::ReadSchemaField(int, std::vector std::allocator > const&, std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #7 0x7fffd179aaad in > parquet::arrow::FileReader::Impl::ReadTable(std::vector std::allocator > const&, > std::shared_ptr*)::{lambda(int)#1}::operator()(int) const () > from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > {noformat} > I have not been able to dig to the bottom of the issue, but it seems like the > problem reproduces only when I run 32 bit binaries. After I learned that, I > found that 32 bit and 64 bit codes produce very different different parquet > files for the same data. The sizes of the structures are clearly different if > I look at their hexdumps. I'm attaching those example files. Reading > "32.parquet" (produced using i686 binaries) will cause a segfault on macOS > and linux, "64.parquet" will read just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1438) [C++] corrupted files produced on 32-bit architecture (i686)
[ https://issues.apache.org/jira/browse/PARQUET-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643462#comment-16643462 ] Dmitry Kalinkin commented on PARQUET-1438: -- Yes. The setup with arrow-cpp 0.10.0 and parquet-cpp 1.5.0 uses the tarball from https://github.com/apache/parquet-cpp/archive/apache-parquet-cpp-1.5.0.tar.gz > [C++] corrupted files produced on 32-bit architecture (i686) > > > Key: PARQUET-1438 > URL: https://issues.apache.org/jira/browse/PARQUET-1438 > Project: Parquet > Issue Type: Bug >Reporter: Dmitry Kalinkin >Priority: Major > Attachments: 32.parquet, 64.parquet > > > I'm using C++ API to convert some data to parquet files. I've noticed a > regression when upgrading from arrow-cpp 0.10.0 + parquet-cpp 1.5.0 to > arrow-cpp 0.11.0. The issue is that I can write parquet files without an > error, but when I try to read those using pyarrow I get a segfault: > {noformat} > #0 0x7fffd17c7f0f in int > arrow::util::RleDecoder::GetBatchWithDictSpaced(float const*, float*, > int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #1 0x7fffd17c8025 in > parquet::DictionaryDecoder > >::DecodeSpaced(float*, int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #2 0x7fffd17bcf0f in > parquet::internal::TypedRecordReader > >::ReadRecordData(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #3 0x7fffd17bfbea in > parquet::internal::TypedRecordReader > >::ReadRecords(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #4 0x7fffd179d2f7 in parquet::arrow::PrimitiveImpl::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #5 0x7fffd1797162 in parquet::arrow::ColumnReader::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #6 0x7fffd179a6e5 in > parquet::arrow::FileReader::Impl::ReadSchemaField(int, std::vector std::allocator > const&, std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #7 0x7fffd179aaad in > parquet::arrow::FileReader::Impl::ReadTable(std::vector std::allocator > const&, > std::shared_ptr*)::{lambda(int)#1}::operator()(int) const () > from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > {noformat} > I have not been able to dig to the bottom of the issue, but it seems like the > problem reproduces only when I run 32 bit binaries. After I learned that, I > found that 32 bit and 64 bit codes produce very different different parquet > files for the same data. The sizes of the structures are clearly different if > I look at their hexdumps. I'm attaching those example files. Reading > "32.parquet" (produced using i686 binaries) will cause a segfault on macOS > and linux, "64.parquet" will read just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1438) [C++] corrupted files produced on 32-bit architecture (i686)
[ https://issues.apache.org/jira/browse/PARQUET-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643415#comment-16643415 ] Dmitry Kalinkin commented on PARQUET-1438: -- I now checked files that were produced with previous version of the parquet-cpp 1.5.0 on 32 bit and they mostly match what I get on 64 bit arrow-cpp 0.11.0. I also tried to do a bisect on arrow-cpp repository, but could not find any good commit. They all either have a bug or don't build. I guess I could try to bisect paquet-cpp repository against arrow-cpp 0.10.0. I was hoping someone with the knowledge of the format could take a look at files and see which part of the structure blows up. It seems like it is the schema that blows up. That means I need to look at thrift related stuff? > [C++] corrupted files produced on 32-bit architecture (i686) > > > Key: PARQUET-1438 > URL: https://issues.apache.org/jira/browse/PARQUET-1438 > Project: Parquet > Issue Type: Bug >Reporter: Dmitry Kalinkin >Priority: Major > Attachments: 32.parquet, 64.parquet > > > I'm using C++ API to convert some data to parquet files. I've noticed a > regression when upgrading from arrow-cpp 0.10.0 + parquet-cpp 1.5.0 to > arrow-cpp 0.11.0. The issue is that I can write parquet files without an > error, but when I try to read those using pyarrow I get a segfault: > {noformat} > #0 0x7fffd17c7f0f in int > arrow::util::RleDecoder::GetBatchWithDictSpaced(float const*, float*, > int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #1 0x7fffd17c8025 in > parquet::DictionaryDecoder > >::DecodeSpaced(float*, int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #2 0x7fffd17bcf0f in > parquet::internal::TypedRecordReader > >::ReadRecordData(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #3 0x7fffd17bfbea in > parquet::internal::TypedRecordReader > >::ReadRecords(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #4 0x7fffd179d2f7 in parquet::arrow::PrimitiveImpl::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #5 0x7fffd1797162 in parquet::arrow::ColumnReader::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #6 0x7fffd179a6e5 in > parquet::arrow::FileReader::Impl::ReadSchemaField(int, std::vector std::allocator > const&, std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #7 0x7fffd179aaad in > parquet::arrow::FileReader::Impl::ReadTable(std::vector std::allocator > const&, > std::shared_ptr*)::{lambda(int)#1}::operator()(int) const () > from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > {noformat} > I have not been able to dig to the bottom of the issue, but it seems like the > problem reproduces only when I run 32 bit binaries. After I learned that, I > found that 32 bit and 64 bit codes produce very different different parquet > files for the same data. The sizes of the structures are clearly different if > I look at their hexdumps. I'm attaching those example files. Reading > "32.parquet" (produced using i686 binaries) will cause a segfault on macOS > and linux, "64.parquet" will read just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (PARQUET-1438) [C++] corrupted files produced on 32-bit architecture (i686)
[ https://issues.apache.org/jira/browse/PARQUET-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643415#comment-16643415 ] Dmitry Kalinkin edited comment on PARQUET-1438 at 10/9/18 1:20 PM: --- I now checked files that were produced with previous version of the parquet-cpp 1.5.0 on 32 bit and they mostly match what I get on 64 bit arrow-cpp 0.11.0. I also tried to do a bisect on arrow-cpp repository, but could not find any good commit. They all either have the bug or don't build. I guess, I could try to bisect paquet-cpp repository against arrow-cpp 0.10.0. I was hoping someone with the knowledge of the format could take a look at files and see which part of the structure blows up. It seems like it is the schema that blows up. That means I need to look at thrift related stuff? was (Author: veprbl): I now checked files that were produced with previous version of the parquet-cpp 1.5.0 on 32 bit and they mostly match what I get on 64 bit arrow-cpp 0.11.0. I also tried to do a bisect on arrow-cpp repository, but could not find any good commit. They all either have a bug or don't build. I guess I could try to bisect paquet-cpp repository against arrow-cpp 0.10.0. I was hoping someone with the knowledge of the format could take a look at files and see which part of the structure blows up. It seems like it is the schema that blows up. That means I need to look at thrift related stuff? > [C++] corrupted files produced on 32-bit architecture (i686) > > > Key: PARQUET-1438 > URL: https://issues.apache.org/jira/browse/PARQUET-1438 > Project: Parquet > Issue Type: Bug >Reporter: Dmitry Kalinkin >Priority: Major > Attachments: 32.parquet, 64.parquet > > > I'm using C++ API to convert some data to parquet files. I've noticed a > regression when upgrading from arrow-cpp 0.10.0 + parquet-cpp 1.5.0 to > arrow-cpp 0.11.0. The issue is that I can write parquet files without an > error, but when I try to read those using pyarrow I get a segfault: > {noformat} > #0 0x7fffd17c7f0f in int > arrow::util::RleDecoder::GetBatchWithDictSpaced(float const*, float*, > int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #1 0x7fffd17c8025 in > parquet::DictionaryDecoder > >::DecodeSpaced(float*, int, int, unsigned char const*, long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #2 0x7fffd17bcf0f in > parquet::internal::TypedRecordReader > >::ReadRecordData(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #3 0x7fffd17bfbea in > parquet::internal::TypedRecordReader > >::ReadRecords(long) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #4 0x7fffd179d2f7 in parquet::arrow::PrimitiveImpl::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #5 0x7fffd1797162 in parquet::arrow::ColumnReader::NextBatch(long, > std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #6 0x7fffd179a6e5 in > parquet::arrow::FileReader::Impl::ReadSchemaField(int, std::vector std::allocator > const&, std::shared_ptr*) () >from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > #7 0x7fffd179aaad in > parquet::arrow::FileReader::Impl::ReadTable(std::vector std::allocator > const&, > std::shared_ptr*)::{lambda(int)#1}::operator()(int) const () > from > /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 > {noformat} > I have not been able to dig to the bottom of the issue, but it seems like the > problem reproduces only when I run 32 bit binaries. After I learned that, I > found that 32 bit and 64 bit codes produce very different different parquet > files for the same data. The sizes of the structures are clearly different if > I look at their hexdumps. I'm attaching those example files. Reading > "32.parquet" (produced using i686 binaries) will cause a segfault on macOS > and linux, "64.parquet" will read just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1438) [C++] corrupted files produced on 32-bit architecture (i686)
Dmitry Kalinkin created PARQUET-1438: Summary: [C++] corrupted files produced on 32-bit architecture (i686) Key: PARQUET-1438 URL: https://issues.apache.org/jira/browse/PARQUET-1438 Project: Parquet Issue Type: Bug Reporter: Dmitry Kalinkin Attachments: 32.parquet, 64.parquet I'm using C++ API to convert some data to parquet files. I've noticed a regression when upgrading from arrow-cpp 0.10.0 + parquet-cpp 1.5.0 to arrow-cpp 0.11.0. The issue is that I can write parquet files without an error, but when I try to read those using pyarrow I get a segfault: {noformat} #0 0x7fffd17c7f0f in int arrow::util::RleDecoder::GetBatchWithDictSpaced(float const*, float*, int, int, unsigned char const*, long) () from /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 #1 0x7fffd17c8025 in parquet::DictionaryDecoder >::DecodeSpaced(float*, int, int, unsigned char const*, long) () from /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 #2 0x7fffd17bcf0f in parquet::internal::TypedRecordReader >::ReadRecordData(long) () from /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 #3 0x7fffd17bfbea in parquet::internal::TypedRecordReader >::ReadRecords(long) () from /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 #4 0x7fffd179d2f7 in parquet::arrow::PrimitiveImpl::NextBatch(long, std::shared_ptr*) () from /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 #5 0x7fffd1797162 in parquet::arrow::ColumnReader::NextBatch(long, std::shared_ptr*) () from /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 #6 0x7fffd179a6e5 in parquet::arrow::FileReader::Impl::ReadSchemaField(int, std::vector > const&, std::shared_ptr*) () from /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 #7 0x7fffd179aaad in parquet::arrow::FileReader::Impl::ReadTable(std::vector > const&, std::shared_ptr*)::{lambda(int)#1}::operator()(int) const () from /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11 {noformat} I have not been able to dig to the bottom of the issue, but it seems like the problem reproduces only when I run 32 bit binaries. After I learned that, I found that 32 bit and 64 bit codes produce very different different parquet files for the same data. The sizes of the structures are clearly different if I look at their hexdumps. I'm attaching those example files. Reading "32.parquet" (produced using i686 binaries) will cause a segfault on macOS and linux, "64.parquet" will read just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)