[jira] [Commented] (ARROW-2459) pyarrow: Segfault with pyarrow.deserialize_pandas

2018-05-02 Thread Licht Takeuchi (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461855#comment-16461855
 ] 

Licht Takeuchi commented on ARROW-2459:
---

Seems fixed at the latest master branch.
[https://gist.github.com/Licht-T/1f0b5e79084123879b07a9388d7ab138]

 

> pyarrow: Segfault with pyarrow.deserialize_pandas
> -
>
> Key: ARROW-2459
> URL: https://issues.apache.org/jira/browse/ARROW-2459
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
> Environment: OS X, Linux
>Reporter: Travis Brady
>Priority: Major
>
> Following up from [https://github.com/apache/arrow/issues/1884] wherein I 
> found that calling deserialize_pandas in the linked app.py script in the repo 
> linked below causes the app.py process to segfault.
> I initially observed this on OS X, but have since confirmed that the behavior 
> exists on Linux as well.
> Repo containing example: [https://github.com/travisbrady/sanic-arrow] 
> And more generally: what is the right way to get a Java-based HTTP 
> microservice to talk to a Python-based HTTP microservice using Arrow as the 
> serialization format? I'm exchanging DataFrame type objects (they are 
> pandas.DataFrame's on the Python side) between the two services for real-time 
> scoring in a few xgboost models implemented in Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2459) pyarrow: Segfault with pyarrow.deserialize_pandas

2018-04-18 Thread Travis Brady (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443243#comment-16443243
 ] 

Travis Brady commented on ARROW-2459:
-

[~joshuastorck] In general I'd say PyArrow should never segfault. Throw a 
`ValueError` or something, but a hard crash of the interpreter is not 
acceptable in production.

> pyarrow: Segfault with pyarrow.deserialize_pandas
> -
>
> Key: ARROW-2459
> URL: https://issues.apache.org/jira/browse/ARROW-2459
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
> Environment: OS X, Linux
>Reporter: Travis Brady
>Priority: Major
>
> Following up from [https://github.com/apache/arrow/issues/1884] wherein I 
> found that calling deserialize_pandas in the linked app.py script in the repo 
> linked below causes the app.py process to segfault.
> I initially observed this on OS X, but have since confirmed that the behavior 
> exists on Linux as well.
> Repo containing example: [https://github.com/travisbrady/sanic-arrow] 
> And more generally: what is the right way to get a Java-based HTTP 
> microservice to talk to a Python-based HTTP microservice using Arrow as the 
> serialization format? I'm exchanging DataFrame type objects (they are 
> pandas.DataFrame's on the Python side) between the two services for real-time 
> scoring in a few xgboost models implemented in Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2459) pyarrow: Segfault with pyarrow.deserialize_pandas

2018-04-17 Thread Joshua Storck (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441371#comment-16441371
 ] 

Joshua Storck commented on ARROW-2459:
--

You are not using symmetric calls in app.py and test_request.py. Here's an 
example that works just fine:

{code:python}
import pandas as pd
import pyarrow as pa
import numpy as np
import io

df = pd.DataFrame([dict(a=99, b=100.0), dict(a=5, b=77.77)])
print(df.to_string())
serialized_df = pa.serialize_pandas(df)
bb = io.BytesIO(serialized_df)

bb = pa.py_buffer(bb.getvalue())
df = pa.deserialize_pandas(bb)
print(df.to_string())
{code}

If that works, can I close this?

> pyarrow: Segfault with pyarrow.deserialize_pandas
> -
>
> Key: ARROW-2459
> URL: https://issues.apache.org/jira/browse/ARROW-2459
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
> Environment: OS X, Linux
>Reporter: Travis Brady
>Priority: Major
>
> Following up from [https://github.com/apache/arrow/issues/1884] wherein I 
> found that calling deserialize_pandas in the linked app.py script in the repo 
> linked below causes the app.py process to segfault.
> I initially observed this on OS X, but have since confirmed that the behavior 
> exists on Linux as well.
> Repo containing example: [https://github.com/travisbrady/sanic-arrow] 
> And more generally: what is the right way to get a Java-based HTTP 
> microservice to talk to a Python-based HTTP microservice using Arrow as the 
> serialization format? I'm exchanging DataFrame type objects (they are 
> pandas.DataFrame's on the Python side) between the two services for real-time 
> scoring in a few xgboost models implemented in Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2459) pyarrow: Segfault with pyarrow.deserialize_pandas

2018-04-14 Thread EmericP (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438389#comment-16438389
 ] 

EmericP commented on ARROW-2459:


I can reproduce the issue easily on both Linux and MacOS. The segfault happens 
in libarrow:
{noformat}
==20185== Process terminating with default action of signal 11 (SIGSEGV)
==20185==  Bad permissions for mapped region at address 0x536E696
==20185==    at 0xB7B36A6: 
arrow::ipc::Message::ReadFrom(std::shared_ptr const&, 
arrow::io::InputStream*, std::unique_ptr*) (in 
/usr/lib/python3.5/site-packages/pyarrow/libarrow.so.0)
==20185==    by 0xB7B4490: arrow::ipc::ReadMessage(arrow::io::InputStream*, 
std::unique_ptr*) (in /usr/lib/python3.5/site-packages/pyarrow/libarrow.so.0)
==20185==    by 0xB7B5A0C: 
arrow::ipc::InputStreamMessageReader::ReadNextMessage(std::unique_ptr*) (in 
/usr/lib/python3.5/site-packages/pyarrow/libarrow.so.0)
==20185==    by 0xB7BDF41: 
arrow::ipc::ReadMessageAndValidate(arrow::ipc::MessageReader*, 
arrow::ipc::Message::Type, bool, std::unique_ptr*) [clone .constprop.261] (in 
/usr/lib/python3.5/site-packages/pyarrow/libarrow.so.0)
==20185==    by 0xB7C69E0: 
arrow::ipc::RecordBatchStreamReader::RecordBatchStreamReaderImpl::ReadSchema() 
(in /usr/lib/python3.5/site-packages/pyarrow/libarrow.so.0)
==20185==    by 0xB7C0EB5: 
arrow::ipc::RecordBatchStreamReader::Open(std::unique_ptr, 
std::shared_ptr*) (in 
/usr/lib/python3.5/site-packages/pyarrow/libarrow.so.0)
==20185==    by 0xB7C0FB3: 
arrow::ipc::RecordBatchStreamReader::Open(arrow::io::InputStream*, 
std::shared_ptr*) (in 
/usr/lib/python3.5/site-packages/pyarrow/libarrow.so.0)
==20185==    by 0xB3770C7: 
__pyx_pw_7pyarrow_3lib_18_RecordBatchReader_3_open(_object*, _object*) (in 
/usr/lib/python3.5/site-packages/pyarrow/lib.cpython-35m-x86_64-linux-gnu.so)
==20185==    by 0x288CAB: PyEval_EvalFrameEx (in /usr/bin/python3)
==20185==    by 0x28E0DE: PyEval_EvalCodeEx (in /usr/bin/python3)
==20185==    by 0x2CA5D2: ??? (in /usr/bin/python3)
==20185==    by 0x311646: PyObject_Call (in /usr/bin/python3){noformat}

> pyarrow: Segfault with pyarrow.deserialize_pandas
> -
>
> Key: ARROW-2459
> URL: https://issues.apache.org/jira/browse/ARROW-2459
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
> Environment: OS X, Linux
>Reporter: Travis Brady
>Priority: Major
>
> Following up from [https://github.com/apache/arrow/issues/1884] wherein I 
> found that calling deserialize_pandas in the linked app.py script in the repo 
> linked below causes the app.py process to segfault.
> I initially observed this on OS X, but have since confirmed that the behavior 
> exists on Linux as well.
> Repo containing example: [https://github.com/travisbrady/sanic-arrow] 
> And more generally: what is the right way to get a Java-based HTTP 
> microservice to talk to a Python-based HTTP microservice using Arrow as the 
> serialization format? I'm exchanging DataFrame type objects (they are 
> pandas.DataFrame's on the Python side) between the two services for real-time 
> scoring in a few xgboost models implemented in Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)