Farzad Abdolhosseini created ARROW-8868:
-------------------------------------------

             Summary: [Python] Feather format cannot store/retrieve lists 
correctly?
                 Key: ARROW-8868
                 URL: https://issues.apache.org/jira/browse/ARROW-8868
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.17.1
         Environment: Python 3.8.2
PyArrow 0.17.1
Pandas 1.0.3
Linux (Manjaro)
            Reporter: Farzad Abdolhosseini


I'm seeing a very weird behavior when I try to store and retrieve a Pandas 
data-frame using the Feather format. Simplified example:
{code:python}
>>> import pandas as pd
>>> df = pd.DataFrame(data={"scalar": [1, 2], "array": [[1], [7]]})
>>> df
 scalar array
0     1   [1]
1     2   [7]
>>> df.to_feather("test.ft")
>>> pd.read_feather("test.ft")
  scalar                  array
0      1                   [16]
1      2  [1045468844972122628]
{code}
As you can see, the retrieved data is incorrect. I was originally trying to use 
the `feather-format` (not using Pandas directly) and that didn't work well 
either.

By playing around with the data-frame that is to be stored I can also get 
different but still incorrect behavior, e.g. a larger list, an error that says 
the file size is incorrect, or simply a segmentation fault.

 

This is my first time using Feather/Arrow BTW.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to