[jira] [Updated] (PARQUET-1438) [C++] corrupted files produced on 32-bit architecture (i686)

2018-10-09 Thread Dmitry Kalinkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kalinkin updated PARQUET-1438:
-
Attachment: arrow_0.10.0_i686_test_fail.log

> [C++] corrupted files produced on 32-bit architecture (i686)
> 
>
> Key: PARQUET-1438
> URL: https://issues.apache.org/jira/browse/PARQUET-1438
> Project: Parquet
>  Issue Type: Bug
>Reporter: Dmitry Kalinkin
>Priority: Major
> Attachments: 32.parquet, 64.parquet, arrow_0.10.0_i686_test_fail.log, 
> arrow_0.11.0_i686_test_fail.log, parquet_1.5.0_i686_test_success.log
>
>
> I'm using C++ API to convert some data to parquet files. I've noticed a 
> regression when upgrading from arrow-cpp 0.10.0 + parquet-cpp 1.5.0 to 
> arrow-cpp 0.11.0. The issue is that I can write parquet files without an 
> error, but when I try to read those using pyarrow I get a segfault:
> {noformat}
> #0  0x7fffd17c7f0f in int 
> arrow::util::RleDecoder::GetBatchWithDictSpaced(float const*, float*, 
> int, int, unsigned char const*, long) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #1  0x7fffd17c8025 in 
> parquet::DictionaryDecoder 
> >::DecodeSpaced(float*, int, int, unsigned char const*, long) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #2  0x7fffd17bcf0f in 
> parquet::internal::TypedRecordReader
>  >::ReadRecordData(long) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #3  0x7fffd17bfbea in 
> parquet::internal::TypedRecordReader
>  >::ReadRecords(long) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #4  0x7fffd179d2f7 in parquet::arrow::PrimitiveImpl::NextBatch(long, 
> std::shared_ptr*) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #5  0x7fffd1797162 in parquet::arrow::ColumnReader::NextBatch(long, 
> std::shared_ptr*) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #6  0x7fffd179a6e5 in 
> parquet::arrow::FileReader::Impl::ReadSchemaField(int, std::vector std::allocator > const&, std::shared_ptr*) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #7  0x7fffd179aaad in 
> parquet::arrow::FileReader::Impl::ReadTable(std::vector std::allocator > const&, 
> std::shared_ptr*)::{lambda(int)#1}::operator()(int) const () 
> from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> {noformat}
> I have not been able to dig to the bottom of the issue, but it seems like the 
> problem reproduces only when I run 32 bit binaries. After I learned that, I 
> found that 32 bit and 64 bit codes produce very different different parquet 
> files for the same data. The sizes of the structures are clearly different if 
> I look at their hexdumps. I'm attaching those example files. Reading 
> "32.parquet" (produced using i686 binaries) will cause a segfault on macOS 
> and linux, "64.parquet" will read just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1438) [C++] corrupted files produced on 32-bit architecture (i686)

2018-10-09 Thread Dmitry Kalinkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kalinkin updated PARQUET-1438:
-
Attachment: parquet_1.5.0_i686_test_success.log

> [C++] corrupted files produced on 32-bit architecture (i686)
> 
>
> Key: PARQUET-1438
> URL: https://issues.apache.org/jira/browse/PARQUET-1438
> Project: Parquet
>  Issue Type: Bug
>Reporter: Dmitry Kalinkin
>Priority: Major
> Attachments: 32.parquet, 64.parquet, arrow_0.10.0_i686_test_fail.log, 
> arrow_0.11.0_i686_test_fail.log, parquet_1.5.0_i686_test_success.log
>
>
> I'm using C++ API to convert some data to parquet files. I've noticed a 
> regression when upgrading from arrow-cpp 0.10.0 + parquet-cpp 1.5.0 to 
> arrow-cpp 0.11.0. The issue is that I can write parquet files without an 
> error, but when I try to read those using pyarrow I get a segfault:
> {noformat}
> #0  0x7fffd17c7f0f in int 
> arrow::util::RleDecoder::GetBatchWithDictSpaced(float const*, float*, 
> int, int, unsigned char const*, long) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #1  0x7fffd17c8025 in 
> parquet::DictionaryDecoder 
> >::DecodeSpaced(float*, int, int, unsigned char const*, long) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #2  0x7fffd17bcf0f in 
> parquet::internal::TypedRecordReader
>  >::ReadRecordData(long) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #3  0x7fffd17bfbea in 
> parquet::internal::TypedRecordReader
>  >::ReadRecords(long) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #4  0x7fffd179d2f7 in parquet::arrow::PrimitiveImpl::NextBatch(long, 
> std::shared_ptr*) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #5  0x7fffd1797162 in parquet::arrow::ColumnReader::NextBatch(long, 
> std::shared_ptr*) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #6  0x7fffd179a6e5 in 
> parquet::arrow::FileReader::Impl::ReadSchemaField(int, std::vector std::allocator > const&, std::shared_ptr*) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #7  0x7fffd179aaad in 
> parquet::arrow::FileReader::Impl::ReadTable(std::vector std::allocator > const&, 
> std::shared_ptr*)::{lambda(int)#1}::operator()(int) const () 
> from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> {noformat}
> I have not been able to dig to the bottom of the issue, but it seems like the 
> problem reproduces only when I run 32 bit binaries. After I learned that, I 
> found that 32 bit and 64 bit codes produce very different different parquet 
> files for the same data. The sizes of the structures are clearly different if 
> I look at their hexdumps. I'm attaching those example files. Reading 
> "32.parquet" (produced using i686 binaries) will cause a segfault on macOS 
> and linux, "64.parquet" will read just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1438) [C++] corrupted files produced on 32-bit architecture (i686)

2018-10-09 Thread Dmitry Kalinkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kalinkin updated PARQUET-1438:
-
Attachment: arrow_0.11.0_i686_test_fail.log

> [C++] corrupted files produced on 32-bit architecture (i686)
> 
>
> Key: PARQUET-1438
> URL: https://issues.apache.org/jira/browse/PARQUET-1438
> Project: Parquet
>  Issue Type: Bug
>Reporter: Dmitry Kalinkin
>Priority: Major
> Attachments: 32.parquet, 64.parquet, arrow_0.10.0_i686_test_fail.log, 
> arrow_0.11.0_i686_test_fail.log, parquet_1.5.0_i686_test_success.log
>
>
> I'm using C++ API to convert some data to parquet files. I've noticed a 
> regression when upgrading from arrow-cpp 0.10.0 + parquet-cpp 1.5.0 to 
> arrow-cpp 0.11.0. The issue is that I can write parquet files without an 
> error, but when I try to read those using pyarrow I get a segfault:
> {noformat}
> #0  0x7fffd17c7f0f in int 
> arrow::util::RleDecoder::GetBatchWithDictSpaced(float const*, float*, 
> int, int, unsigned char const*, long) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #1  0x7fffd17c8025 in 
> parquet::DictionaryDecoder 
> >::DecodeSpaced(float*, int, int, unsigned char const*, long) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #2  0x7fffd17bcf0f in 
> parquet::internal::TypedRecordReader
>  >::ReadRecordData(long) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #3  0x7fffd17bfbea in 
> parquet::internal::TypedRecordReader
>  >::ReadRecords(long) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #4  0x7fffd179d2f7 in parquet::arrow::PrimitiveImpl::NextBatch(long, 
> std::shared_ptr*) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #5  0x7fffd1797162 in parquet::arrow::ColumnReader::NextBatch(long, 
> std::shared_ptr*) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #6  0x7fffd179a6e5 in 
> parquet::arrow::FileReader::Impl::ReadSchemaField(int, std::vector std::allocator > const&, std::shared_ptr*) ()
>from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> #7  0x7fffd179aaad in 
> parquet::arrow::FileReader::Impl::ReadTable(std::vector std::allocator > const&, 
> std::shared_ptr*)::{lambda(int)#1}::operator()(int) const () 
> from 
> /nix/store/k6sy2ncjnkn5wnb2dq9m5f0qh446kjhg-arrow-cpp-0.11.0/lib/libparquet.so.11
> {noformat}
> I have not been able to dig to the bottom of the issue, but it seems like the 
> problem reproduces only when I run 32 bit binaries. After I learned that, I 
> found that 32 bit and 64 bit codes produce very different different parquet 
> files for the same data. The sizes of the structures are clearly different if 
> I look at their hexdumps. I'm attaching those example files. Reading 
> "32.parquet" (produced using i686 binaries) will cause a segfault on macOS 
> and linux, "64.parquet" will read just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)