[ https://issues.apache.org/jira/browse/ARROW-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney resolved ARROW-2135. --------------------------------- Resolution: Fixed Issue resolved by pull request 1681 [https://github.com/apache/arrow/pull/1681] > [Python] NaN values silently casted to int64 when passing explicit schema for > conversion in Table.from_pandas > ------------------------------------------------------------------------------------------------------------- > > Key: ARROW-2135 > URL: https://issues.apache.org/jira/browse/ARROW-2135 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.8.0 > Reporter: Matthew Gilbert > Assignee: Antoine Pitrou > Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > If you create a {{Table}} from a {{DataFrame}} of ints with a NaN value the > NaN is improperly cast. Since pandas casts these to floats, when converted to > a table the NaN is interpreted as an integer. This seems like a bug since a > known limitation in pandas (the inability to have null valued integers data) > is taking precedence over arrow's functionality to store these as an IntArray > with nulls. > > {code} > import pyarrow as pa > import pandas as pd > df = pd.DataFrame({"a":[1, 2, pd.np.NaN]}) > schema = pa.schema([pa.field("a", pa.int64(), nullable=True)]) > table = pa.Table.from_pandas(df, schema=schema) > table[0] > <pyarrow.lib.Column object at 0x7f2151d19c90> > chunk 0: <pyarrow.lib.Int64Array object at 0x7f213bf356d8> > [ > 1, > 2, > -9223372036854775808 > ]{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)