[jira] [Commented] (PARQUET-1481) [C++] SEGV when reading corrupt parquet file

2018-12-21 Thread Hatem Helal (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726840#comment-16726840
 ] 

Hatem Helal commented on PARQUET-1481:
--

Great, thanks for that [~wesmckinn]!

> [C++] SEGV when reading corrupt parquet file
> 
>
> Key: PARQUET-1481
> URL: https://issues.apache.org/jira/browse/PARQUET-1481
> Project: Parquet
>  Issue Type: Bug
>Reporter: Hatem Helal
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Attachments: corrupt.parquet
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> >>> import pyarrow.parquet as pq
> >>> pq.read_table('corrupt.parquet')
> fish: 'python' terminated by signal SIGSEGV (Address boundary error)
>  
> Stack report from macOS:
>  
> 0 libsystem_kernel.dylib 0x7fff51164cee __psynch_cvwait + 10
> 1 libsystem_pthread.dylib 0x7fff512a1662 _pthread_cond_wait + 732
> 2 libc++.1.dylib 0x7fff4f04acb0 
> std::__1::condition_variable::wait(std::__1::unique_lock&) + 
> 18
> 3 libc++.1.dylib 0x7fff4f04b728 
> std::__1::__assoc_sub_state::__sub_wait(std::__1::unique_lock&)
>  + 46
> 4 libparquet.11.dylib 0x000115512d00 
> std::__1::__assoc_state::move() + 48
> 5 libparquet.11.dylib 0x0001154faa15 
> parquet::arrow::FileReader::Impl::ReadTable(std::__1::vector std::__1::allocator > const&, std::__1::shared_ptr*) + 1093
> 6 libparquet.11.dylib 0x0001154fb6fe 
> parquet::arrow::FileReader::Impl::ReadTable(std::__1::shared_ptr*)
>  + 350
> 7 libparquet.11.dylib 0x0001154fce47 
> parquet::arrow::FileReader::ReadTable(std::__1::shared_ptr*) + 
> 23
> 8 _parquet.so 0x00011598d97b 
> __pyx_pw_7pyarrow_8_parquet_13ParquetReader_9read_all(_object*, _object*, 
> _object*) + 1035



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1481) [C++] SEGV when reading corrupt parquet file

2018-12-21 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726806#comment-16726806
 ] 

Wes McKinney commented on PARQUET-1481:
---

The Thrift metadata is corrupt, but it is not checked. I'm submitting a patch

> [C++] SEGV when reading corrupt parquet file
> 
>
> Key: PARQUET-1481
> URL: https://issues.apache.org/jira/browse/PARQUET-1481
> Project: Parquet
>  Issue Type: Bug
>Reporter: Hatem Helal
>Assignee: Wes McKinney
>Priority: Major
> Attachments: corrupt.parquet
>
>
> >>> import pyarrow.parquet as pq
> >>> pq.read_table('corrupt.parquet')
> fish: 'python' terminated by signal SIGSEGV (Address boundary error)
>  
> Stack report from macOS:
>  
> 0 libsystem_kernel.dylib 0x7fff51164cee __psynch_cvwait + 10
> 1 libsystem_pthread.dylib 0x7fff512a1662 _pthread_cond_wait + 732
> 2 libc++.1.dylib 0x7fff4f04acb0 
> std::__1::condition_variable::wait(std::__1::unique_lock&) + 
> 18
> 3 libc++.1.dylib 0x7fff4f04b728 
> std::__1::__assoc_sub_state::__sub_wait(std::__1::unique_lock&)
>  + 46
> 4 libparquet.11.dylib 0x000115512d00 
> std::__1::__assoc_state::move() + 48
> 5 libparquet.11.dylib 0x0001154faa15 
> parquet::arrow::FileReader::Impl::ReadTable(std::__1::vector std::__1::allocator > const&, std::__1::shared_ptr*) + 1093
> 6 libparquet.11.dylib 0x0001154fb6fe 
> parquet::arrow::FileReader::Impl::ReadTable(std::__1::shared_ptr*)
>  + 350
> 7 libparquet.11.dylib 0x0001154fce47 
> parquet::arrow::FileReader::ReadTable(std::__1::shared_ptr*) + 
> 23
> 8 _parquet.so 0x00011598d97b 
> __pyx_pw_7pyarrow_8_parquet_13ParquetReader_9read_all(_object*, _object*, 
> _object*) + 1035



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1481) [C++] SEGV when reading corrupt parquet file

2018-12-21 Thread Hatem Helal (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726795#comment-16726795
 ] 

Hatem Helal commented on PARQUET-1481:
--

Sure, a colleague used a text editor to make a random change in the file that 
was originally written using parquet-cpp.  I'm looking at making this throw an 
exception / not-ok status code.  Does that sound reasonable?

> [C++] SEGV when reading corrupt parquet file
> 
>
> Key: PARQUET-1481
> URL: https://issues.apache.org/jira/browse/PARQUET-1481
> Project: Parquet
>  Issue Type: Bug
>Reporter: Hatem Helal
>Assignee: Hatem Helal
>Priority: Major
> Attachments: corrupt.parquet
>
>
> >>> import pyarrow.parquet as pq
> >>> pq.read_table('corrupt.parquet')
> fish: 'python' terminated by signal SIGSEGV (Address boundary error)
>  
> Stack report from macOS:
>  
> 0 libsystem_kernel.dylib 0x7fff51164cee __psynch_cvwait + 10
> 1 libsystem_pthread.dylib 0x7fff512a1662 _pthread_cond_wait + 732
> 2 libc++.1.dylib 0x7fff4f04acb0 
> std::__1::condition_variable::wait(std::__1::unique_lock&) + 
> 18
> 3 libc++.1.dylib 0x7fff4f04b728 
> std::__1::__assoc_sub_state::__sub_wait(std::__1::unique_lock&)
>  + 46
> 4 libparquet.11.dylib 0x000115512d00 
> std::__1::__assoc_state::move() + 48
> 5 libparquet.11.dylib 0x0001154faa15 
> parquet::arrow::FileReader::Impl::ReadTable(std::__1::vector std::__1::allocator > const&, std::__1::shared_ptr*) + 1093
> 6 libparquet.11.dylib 0x0001154fb6fe 
> parquet::arrow::FileReader::Impl::ReadTable(std::__1::shared_ptr*)
>  + 350
> 7 libparquet.11.dylib 0x0001154fce47 
> parquet::arrow::FileReader::ReadTable(std::__1::shared_ptr*) + 
> 23
> 8 _parquet.so 0x00011598d97b 
> __pyx_pw_7pyarrow_8_parquet_13ParquetReader_9read_all(_object*, _object*, 
> _object*) + 1035



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1481) [C++] SEGV when reading corrupt parquet file

2018-12-21 Thread Uwe L. Korn (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726776#comment-16726776
 ] 

Uwe L. Korn commented on PARQUET-1481:
--

Can you describe how you generated this Parquet file?

> [C++] SEGV when reading corrupt parquet file
> 
>
> Key: PARQUET-1481
> URL: https://issues.apache.org/jira/browse/PARQUET-1481
> Project: Parquet
>  Issue Type: Bug
>Reporter: Hatem Helal
>Assignee: Hatem Helal
>Priority: Major
> Attachments: corrupt.parquet
>
>
> >>> import pyarrow.parquet as pq
> >>> pq.read_table('corrupt.parquet')
> fish: 'python' terminated by signal SIGSEGV (Address boundary error)
>  
> Stack report from macOS:
>  
> 0 libsystem_kernel.dylib 0x7fff51164cee __psynch_cvwait + 10
> 1 libsystem_pthread.dylib 0x7fff512a1662 _pthread_cond_wait + 732
> 2 libc++.1.dylib 0x7fff4f04acb0 
> std::__1::condition_variable::wait(std::__1::unique_lock&) + 
> 18
> 3 libc++.1.dylib 0x7fff4f04b728 
> std::__1::__assoc_sub_state::__sub_wait(std::__1::unique_lock&)
>  + 46
> 4 libparquet.11.dylib 0x000115512d00 
> std::__1::__assoc_state::move() + 48
> 5 libparquet.11.dylib 0x0001154faa15 
> parquet::arrow::FileReader::Impl::ReadTable(std::__1::vector std::__1::allocator > const&, std::__1::shared_ptr*) + 1093
> 6 libparquet.11.dylib 0x0001154fb6fe 
> parquet::arrow::FileReader::Impl::ReadTable(std::__1::shared_ptr*)
>  + 350
> 7 libparquet.11.dylib 0x0001154fce47 
> parquet::arrow::FileReader::ReadTable(std::__1::shared_ptr*) + 
> 23
> 8 _parquet.so 0x00011598d97b 
> __pyx_pw_7pyarrow_8_parquet_13ParquetReader_9read_all(_object*, _object*, 
> _object*) + 1035



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1481) [C++] SEGV when reading corrupt parquet file

2018-12-21 Thread Hatem Helal (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726757#comment-16726757
 ] 

Hatem Helal commented on PARQUET-1481:
--

Managed to reproduce this using a simple test using latest apache arrow.  
Slightly nicer stack trace:

 

{{F1220 13:29:51.966117 2315707200 record_reader.cc:854] Check failed: false}}
{{*** Check failure stack trace: ***}}
{{ @ 0x1083c217a google::LogMessage::Fail()}}
{{ @ 0x1083c01de google::LogMessage::SendToLog()}}
{{ @ 0x1083c0e1f google::LogMessage::Flush()}}
{{ @ 0x1083c0c59 google::LogMessage::~LogMessage()}}
{{ @ 0x1083c0f15 google::LogMessage::~LogMessage()}}
{{ @ 0x10825d45c arrow::util::ArrowLog::~ArrowLog()}}
{{ @ 0x10825d4a5 arrow::util::ArrowLog::~ArrowLog()}}
{{ @ 0x107d5d936 parquet::internal::RecordReader::Make()}}
{{ @ 0x107cf8abd parquet::arrow::PrimitiveImpl::PrimitiveImpl()}}
{{ @ 0x107c69acd parquet::arrow::PrimitiveImpl::PrimitiveImpl()}}
{{ @ 0x107c68ba8 parquet::arrow::FileReader::Impl::GetColumn()}}
{{ @ 0x107c6b790 parquet::arrow::FileReader::Impl::GetReaderForNode()}}
{{ @ 0x107c6cb3d parquet::arrow::FileReader::Impl::ReadSchemaField()}}
{{ @ 0x107c79d60 
parquet::arrow::FileReader::Impl::ReadTable()::$_1::operator()()}}
{{ @ 0x107c764ef parquet::arrow::FileReader::Impl::ReadTable()}}
{{ @ 0x107c7a9f5 parquet::arrow::FileReader::Impl::ReadTable()}}
{{ @ 0x107c7f5f7 parquet::arrow::FileReader::ReadTable()}}
{{ @ 0x107c6176c main}}

> [C++] SEGV when reading corrupt parquet file
> 
>
> Key: PARQUET-1481
> URL: https://issues.apache.org/jira/browse/PARQUET-1481
> Project: Parquet
>  Issue Type: Bug
>Reporter: Hatem Helal
>Assignee: Hatem Helal
>Priority: Major
> Attachments: corrupt.parquet
>
>
> >>> import pyarrow.parquet as pq
> >>> pq.read_table('corrupt.parquet')
> fish: 'python' terminated by signal SIGSEGV (Address boundary error)
>  
> Stack report from macOS:
>  
> 0 libsystem_kernel.dylib 0x7fff51164cee __psynch_cvwait + 10
> 1 libsystem_pthread.dylib 0x7fff512a1662 _pthread_cond_wait + 732
> 2 libc++.1.dylib 0x7fff4f04acb0 
> std::__1::condition_variable::wait(std::__1::unique_lock&) + 
> 18
> 3 libc++.1.dylib 0x7fff4f04b728 
> std::__1::__assoc_sub_state::__sub_wait(std::__1::unique_lock&)
>  + 46
> 4 libparquet.11.dylib 0x000115512d00 
> std::__1::__assoc_state::move() + 48
> 5 libparquet.11.dylib 0x0001154faa15 
> parquet::arrow::FileReader::Impl::ReadTable(std::__1::vector std::__1::allocator > const&, std::__1::shared_ptr*) + 1093
> 6 libparquet.11.dylib 0x0001154fb6fe 
> parquet::arrow::FileReader::Impl::ReadTable(std::__1::shared_ptr*)
>  + 350
> 7 libparquet.11.dylib 0x0001154fce47 
> parquet::arrow::FileReader::ReadTable(std::__1::shared_ptr*) + 
> 23
> 8 _parquet.so 0x00011598d97b 
> __pyx_pw_7pyarrow_8_parquet_13ParquetReader_9read_all(_object*, _object*, 
> _object*) + 1035



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)