[jira] [Commented] (PARQUET-1481) [C++] SEGV when reading corrupt parquet file
[ https://issues.apache.org/jira/browse/PARQUET-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726840#comment-16726840 ] Hatem Helal commented on PARQUET-1481: -- Great, thanks for that [~wesmckinn]! > [C++] SEGV when reading corrupt parquet file > > > Key: PARQUET-1481 > URL: https://issues.apache.org/jira/browse/PARQUET-1481 > Project: Parquet > Issue Type: Bug >Reporter: Hatem Helal >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Attachments: corrupt.parquet > > Time Spent: 20m > Remaining Estimate: 0h > > >>> import pyarrow.parquet as pq > >>> pq.read_table('corrupt.parquet') > fish: 'python' terminated by signal SIGSEGV (Address boundary error) > > Stack report from macOS: > > 0 libsystem_kernel.dylib 0x7fff51164cee __psynch_cvwait + 10 > 1 libsystem_pthread.dylib 0x7fff512a1662 _pthread_cond_wait + 732 > 2 libc++.1.dylib 0x7fff4f04acb0 > std::__1::condition_variable::wait(std::__1::unique_lock&) + > 18 > 3 libc++.1.dylib 0x7fff4f04b728 > std::__1::__assoc_sub_state::__sub_wait(std::__1::unique_lock&) > + 46 > 4 libparquet.11.dylib 0x000115512d00 > std::__1::__assoc_state::move() + 48 > 5 libparquet.11.dylib 0x0001154faa15 > parquet::arrow::FileReader::Impl::ReadTable(std::__1::vector std::__1::allocator > const&, std::__1::shared_ptr*) + 1093 > 6 libparquet.11.dylib 0x0001154fb6fe > parquet::arrow::FileReader::Impl::ReadTable(std::__1::shared_ptr*) > + 350 > 7 libparquet.11.dylib 0x0001154fce47 > parquet::arrow::FileReader::ReadTable(std::__1::shared_ptr*) + > 23 > 8 _parquet.so 0x00011598d97b > __pyx_pw_7pyarrow_8_parquet_13ParquetReader_9read_all(_object*, _object*, > _object*) + 1035 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1481) [C++] SEGV when reading corrupt parquet file
[ https://issues.apache.org/jira/browse/PARQUET-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726806#comment-16726806 ] Wes McKinney commented on PARQUET-1481: --- The Thrift metadata is corrupt, but it is not checked. I'm submitting a patch > [C++] SEGV when reading corrupt parquet file > > > Key: PARQUET-1481 > URL: https://issues.apache.org/jira/browse/PARQUET-1481 > Project: Parquet > Issue Type: Bug >Reporter: Hatem Helal >Assignee: Wes McKinney >Priority: Major > Attachments: corrupt.parquet > > > >>> import pyarrow.parquet as pq > >>> pq.read_table('corrupt.parquet') > fish: 'python' terminated by signal SIGSEGV (Address boundary error) > > Stack report from macOS: > > 0 libsystem_kernel.dylib 0x7fff51164cee __psynch_cvwait + 10 > 1 libsystem_pthread.dylib 0x7fff512a1662 _pthread_cond_wait + 732 > 2 libc++.1.dylib 0x7fff4f04acb0 > std::__1::condition_variable::wait(std::__1::unique_lock&) + > 18 > 3 libc++.1.dylib 0x7fff4f04b728 > std::__1::__assoc_sub_state::__sub_wait(std::__1::unique_lock&) > + 46 > 4 libparquet.11.dylib 0x000115512d00 > std::__1::__assoc_state::move() + 48 > 5 libparquet.11.dylib 0x0001154faa15 > parquet::arrow::FileReader::Impl::ReadTable(std::__1::vector std::__1::allocator > const&, std::__1::shared_ptr*) + 1093 > 6 libparquet.11.dylib 0x0001154fb6fe > parquet::arrow::FileReader::Impl::ReadTable(std::__1::shared_ptr*) > + 350 > 7 libparquet.11.dylib 0x0001154fce47 > parquet::arrow::FileReader::ReadTable(std::__1::shared_ptr*) + > 23 > 8 _parquet.so 0x00011598d97b > __pyx_pw_7pyarrow_8_parquet_13ParquetReader_9read_all(_object*, _object*, > _object*) + 1035 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1481) [C++] SEGV when reading corrupt parquet file
[ https://issues.apache.org/jira/browse/PARQUET-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726795#comment-16726795 ] Hatem Helal commented on PARQUET-1481: -- Sure, a colleague used a text editor to make a random change in the file that was originally written using parquet-cpp. I'm looking at making this throw an exception / not-ok status code. Does that sound reasonable? > [C++] SEGV when reading corrupt parquet file > > > Key: PARQUET-1481 > URL: https://issues.apache.org/jira/browse/PARQUET-1481 > Project: Parquet > Issue Type: Bug >Reporter: Hatem Helal >Assignee: Hatem Helal >Priority: Major > Attachments: corrupt.parquet > > > >>> import pyarrow.parquet as pq > >>> pq.read_table('corrupt.parquet') > fish: 'python' terminated by signal SIGSEGV (Address boundary error) > > Stack report from macOS: > > 0 libsystem_kernel.dylib 0x7fff51164cee __psynch_cvwait + 10 > 1 libsystem_pthread.dylib 0x7fff512a1662 _pthread_cond_wait + 732 > 2 libc++.1.dylib 0x7fff4f04acb0 > std::__1::condition_variable::wait(std::__1::unique_lock&) + > 18 > 3 libc++.1.dylib 0x7fff4f04b728 > std::__1::__assoc_sub_state::__sub_wait(std::__1::unique_lock&) > + 46 > 4 libparquet.11.dylib 0x000115512d00 > std::__1::__assoc_state::move() + 48 > 5 libparquet.11.dylib 0x0001154faa15 > parquet::arrow::FileReader::Impl::ReadTable(std::__1::vector std::__1::allocator > const&, std::__1::shared_ptr*) + 1093 > 6 libparquet.11.dylib 0x0001154fb6fe > parquet::arrow::FileReader::Impl::ReadTable(std::__1::shared_ptr*) > + 350 > 7 libparquet.11.dylib 0x0001154fce47 > parquet::arrow::FileReader::ReadTable(std::__1::shared_ptr*) + > 23 > 8 _parquet.so 0x00011598d97b > __pyx_pw_7pyarrow_8_parquet_13ParquetReader_9read_all(_object*, _object*, > _object*) + 1035 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1481) [C++] SEGV when reading corrupt parquet file
[ https://issues.apache.org/jira/browse/PARQUET-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726776#comment-16726776 ] Uwe L. Korn commented on PARQUET-1481: -- Can you describe how you generated this Parquet file? > [C++] SEGV when reading corrupt parquet file > > > Key: PARQUET-1481 > URL: https://issues.apache.org/jira/browse/PARQUET-1481 > Project: Parquet > Issue Type: Bug >Reporter: Hatem Helal >Assignee: Hatem Helal >Priority: Major > Attachments: corrupt.parquet > > > >>> import pyarrow.parquet as pq > >>> pq.read_table('corrupt.parquet') > fish: 'python' terminated by signal SIGSEGV (Address boundary error) > > Stack report from macOS: > > 0 libsystem_kernel.dylib 0x7fff51164cee __psynch_cvwait + 10 > 1 libsystem_pthread.dylib 0x7fff512a1662 _pthread_cond_wait + 732 > 2 libc++.1.dylib 0x7fff4f04acb0 > std::__1::condition_variable::wait(std::__1::unique_lock&) + > 18 > 3 libc++.1.dylib 0x7fff4f04b728 > std::__1::__assoc_sub_state::__sub_wait(std::__1::unique_lock&) > + 46 > 4 libparquet.11.dylib 0x000115512d00 > std::__1::__assoc_state::move() + 48 > 5 libparquet.11.dylib 0x0001154faa15 > parquet::arrow::FileReader::Impl::ReadTable(std::__1::vector std::__1::allocator > const&, std::__1::shared_ptr*) + 1093 > 6 libparquet.11.dylib 0x0001154fb6fe > parquet::arrow::FileReader::Impl::ReadTable(std::__1::shared_ptr*) > + 350 > 7 libparquet.11.dylib 0x0001154fce47 > parquet::arrow::FileReader::ReadTable(std::__1::shared_ptr*) + > 23 > 8 _parquet.so 0x00011598d97b > __pyx_pw_7pyarrow_8_parquet_13ParquetReader_9read_all(_object*, _object*, > _object*) + 1035 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1481) [C++] SEGV when reading corrupt parquet file
[ https://issues.apache.org/jira/browse/PARQUET-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726757#comment-16726757 ] Hatem Helal commented on PARQUET-1481: -- Managed to reproduce this using a simple test using latest apache arrow. Slightly nicer stack trace: {{F1220 13:29:51.966117 2315707200 record_reader.cc:854] Check failed: false}} {{*** Check failure stack trace: ***}} {{ @ 0x1083c217a google::LogMessage::Fail()}} {{ @ 0x1083c01de google::LogMessage::SendToLog()}} {{ @ 0x1083c0e1f google::LogMessage::Flush()}} {{ @ 0x1083c0c59 google::LogMessage::~LogMessage()}} {{ @ 0x1083c0f15 google::LogMessage::~LogMessage()}} {{ @ 0x10825d45c arrow::util::ArrowLog::~ArrowLog()}} {{ @ 0x10825d4a5 arrow::util::ArrowLog::~ArrowLog()}} {{ @ 0x107d5d936 parquet::internal::RecordReader::Make()}} {{ @ 0x107cf8abd parquet::arrow::PrimitiveImpl::PrimitiveImpl()}} {{ @ 0x107c69acd parquet::arrow::PrimitiveImpl::PrimitiveImpl()}} {{ @ 0x107c68ba8 parquet::arrow::FileReader::Impl::GetColumn()}} {{ @ 0x107c6b790 parquet::arrow::FileReader::Impl::GetReaderForNode()}} {{ @ 0x107c6cb3d parquet::arrow::FileReader::Impl::ReadSchemaField()}} {{ @ 0x107c79d60 parquet::arrow::FileReader::Impl::ReadTable()::$_1::operator()()}} {{ @ 0x107c764ef parquet::arrow::FileReader::Impl::ReadTable()}} {{ @ 0x107c7a9f5 parquet::arrow::FileReader::Impl::ReadTable()}} {{ @ 0x107c7f5f7 parquet::arrow::FileReader::ReadTable()}} {{ @ 0x107c6176c main}} > [C++] SEGV when reading corrupt parquet file > > > Key: PARQUET-1481 > URL: https://issues.apache.org/jira/browse/PARQUET-1481 > Project: Parquet > Issue Type: Bug >Reporter: Hatem Helal >Assignee: Hatem Helal >Priority: Major > Attachments: corrupt.parquet > > > >>> import pyarrow.parquet as pq > >>> pq.read_table('corrupt.parquet') > fish: 'python' terminated by signal SIGSEGV (Address boundary error) > > Stack report from macOS: > > 0 libsystem_kernel.dylib 0x7fff51164cee __psynch_cvwait + 10 > 1 libsystem_pthread.dylib 0x7fff512a1662 _pthread_cond_wait + 732 > 2 libc++.1.dylib 0x7fff4f04acb0 > std::__1::condition_variable::wait(std::__1::unique_lock&) + > 18 > 3 libc++.1.dylib 0x7fff4f04b728 > std::__1::__assoc_sub_state::__sub_wait(std::__1::unique_lock&) > + 46 > 4 libparquet.11.dylib 0x000115512d00 > std::__1::__assoc_state::move() + 48 > 5 libparquet.11.dylib 0x0001154faa15 > parquet::arrow::FileReader::Impl::ReadTable(std::__1::vector std::__1::allocator > const&, std::__1::shared_ptr*) + 1093 > 6 libparquet.11.dylib 0x0001154fb6fe > parquet::arrow::FileReader::Impl::ReadTable(std::__1::shared_ptr*) > + 350 > 7 libparquet.11.dylib 0x0001154fce47 > parquet::arrow::FileReader::ReadTable(std::__1::shared_ptr*) + > 23 > 8 _parquet.so 0x00011598d97b > __pyx_pw_7pyarrow_8_parquet_13ParquetReader_9read_all(_object*, _object*, > _object*) + 1035 -- This message was sent by Atlassian JIRA (v7.6.3#76005)