[ https://issues.apache.org/jira/browse/ARROW-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoine Pitrou resolved ARROW-5682. ----------------------------------- Resolution: Fixed Issue resolved by pull request 5333 [https://github.com/apache/arrow/pull/5333] > [Python] from_pandas conversion casts values to string inconsistently > --------------------------------------------------------------------- > > Key: ARROW-5682 > URL: https://issues.apache.org/jira/browse/ARROW-5682 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.13.0 > Reporter: Bryan Cutler > Assignee: Joris Van den Bossche > Priority: Minor > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 3h > Remaining Estimate: 0h > > When calling {{pa.Array.from_pandas}} primitive data as input, and casting to > string with "type=pa.string()", the resulting pyarrow Array can have > inconsistent values. For most input, the result is an empty string, however > for some types (int32, int64) the values are '\x01' etc. > {noformat} > In [8]: s = pd.Series([1, 2, 3], dtype=np.uint8) > In [9]: pa.Array.from_pandas(s, type=pa.string()) > > Out[9]: > <pyarrow.lib.StringArray object at 0x7f90b6091a48> > [ > "", > "", > "" > ] > In [10]: s = pd.Series([1, 2, 3], dtype=np.uint32) > > In [11]: pa.Array.from_pandas(s, type=pa.string()) > > Out[11]: > <pyarrow.lib.StringArray object at 0x7f9097efca48> > [ > "", > "", > "" > ] > {noformat} > This came from the Spark discussion > https://github.com/apache/spark/pull/24930/files#r296187903. Type casting > this way in Spark is not supported, but it would be good to get the behavior > consistent. Would it be better to raise an UnsupportedOperation error? -- This message was sent by Atlassian Jira (v8.3.2#803003)