[jira] [Commented] (ARROW-5295) [Python] accept pyarrow values / scalars in constructor functions ?
[ https://issues.apache.org/jira/browse/ARROW-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913756#comment-16913756 ] Wes McKinney commented on ARROW-5295: - A "simple" workaround would be to invoke Scalars' {{as_py}} method if they're passed in. That would add perf overhead, though, since we'd need to do {{isinstance}} checks. Another option is to "sanitize" inputs (using a helper function) only in the case of failure on the initial try. So the normal use case won't be affected > [Python] accept pyarrow values / scalars in constructor functions ? > --- > > Key: ARROW-5295 > URL: https://issues.apache.org/jira/browse/ARROW-5295 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Joris Van den Bossche >Priority: Major > > Currently, functions like \{{pyarrow.array}} don't accept pyarrow Arrays, or > also not scalars of it: > {code} > In [42]: arr = pa.array([1, 2, 3]) > In [43]: pa.array(arr) > ... > ArrowInvalid: Could not convert 1 with type pyarrow.lib.Int64Value: did not > recognize Python value type when inferring an Arrow data type > In [44]: pa.array(list(arr)) > ... > ArrowInvalid: Could not convert 1 with type pyarrow.lib.Int64Value: did not > recognize Python value type when inferring an Arrow data type > {code} > Do we want to allow those / recognize those here? (the first case could even > have a fastpath, as we don't need to do it element by element). > Also scalars are not supported: > {code} > In [46]: type(arr.sum()) > Out[46]: pyarrow.lib.Int64Scalar > In [47]: pa.array([arr.sum()]) > ... > ArrowInvalid: Could not convert 6 with type pyarrow.lib.Int64Scalar: did not > recognize Python value type when inferring an Arrow data type > {code} > And also in other functions we don't accept arrow scalars / values: > {code} > In [48]: string = pa.array(['a'])[0] > In [49]: type(string) > Out[49]: pyarrow.lib.StringValue > In [50]: pa.field(string, pa.int64()) > ... > TypeError: expected bytes, pyarrow.lib.StringValue found > {code} > > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-5295) [Python] accept pyarrow values / scalars in constructor functions ?
[ https://issues.apache.org/jira/browse/ARROW-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906193#comment-16906193 ] Marcel Ackermann commented on ARROW-5295: - This would be required for serializing dataframe that contain vectors: https://issues.apache.org/jira/browse/ARROW-6222 > [Python] accept pyarrow values / scalars in constructor functions ? > --- > > Key: ARROW-5295 > URL: https://issues.apache.org/jira/browse/ARROW-5295 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Joris Van den Bossche >Priority: Major > > Currently, functions like \{{pyarrow.array}} don't accept pyarrow Arrays, or > also not scalars of it: > {code} > In [42]: arr = pa.array([1, 2, 3]) > In [43]: pa.array(arr) > ... > ArrowInvalid: Could not convert 1 with type pyarrow.lib.Int64Value: did not > recognize Python value type when inferring an Arrow data type > In [44]: pa.array(list(arr)) > ... > ArrowInvalid: Could not convert 1 with type pyarrow.lib.Int64Value: did not > recognize Python value type when inferring an Arrow data type > {code} > Do we want to allow those / recognize those here? (the first case could even > have a fastpath, as we don't need to do it element by element). > Also scalars are not supported: > {code} > In [46]: type(arr.sum()) > Out[46]: pyarrow.lib.Int64Scalar > In [47]: pa.array([arr.sum()]) > ... > ArrowInvalid: Could not convert 6 with type pyarrow.lib.Int64Scalar: did not > recognize Python value type when inferring an Arrow data type > {code} > And also in other functions we don't accept arrow scalars / values: > {code} > In [48]: string = pa.array(['a'])[0] > In [49]: type(string) > Out[49]: pyarrow.lib.StringValue > In [50]: pa.field(string, pa.int64()) > ... > TypeError: expected bytes, pyarrow.lib.StringValue found > {code} > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-5295) [Python] accept pyarrow values / scalars in constructor functions ?
[ https://issues.apache.org/jira/browse/ARROW-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906190#comment-16906190 ] Joris Van den Bossche commented on ARROW-5295: -- Additional case (from ARROW-6222): pyarrow Arrays are also not recognized as list-like when converting/inferring a list array: {code} In [43]: pa.array([np.array([1, 1]), np.array([2, 2, 2])]) Out[43]: [ [ 1, 1 ], [ 2, 2, 2 ] ] In [44]: pa.array([pa.array([1, 1]), pa.array([2, 2, 2])]) --- ArrowInvalid Traceback (most recent call last) in > 1 pa.array([pa.array([1, 1]), pa.array([2, 2, 2])]) ~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib.array() ~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array() ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status() ArrowInvalid: Could not convert [ 1, 1 ] with type pyarrow.lib.Int64Array: did not recognize Python value type when inferring an Arrow data type {code} So list (or array) of numpy arrays works, but list of pyarrow arrays not. Again, not the most typical use case of pyarrow Arrays, so not sure we should add this capability. (although we might want to find a general solution for array-like objects (eg pytorch.Tensors, see ARROW-6222), and a solution for that (somehow trying to coerce to a numpy array?) might also solve the case of a list of arrow arrays) > [Python] accept pyarrow values / scalars in constructor functions ? > --- > > Key: ARROW-5295 > URL: https://issues.apache.org/jira/browse/ARROW-5295 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Joris Van den Bossche >Priority: Major > > Currently, functions like \{{pyarrow.array}} don't accept pyarrow Arrays, or > also not scalars of it: > {code} > In [42]: arr = pa.array([1, 2, 3]) > In [43]: pa.array(arr) > ... > ArrowInvalid: Could not convert 1 with type pyarrow.lib.Int64Value: did not > recognize Python value type when inferring an Arrow data type > In [44]: pa.array(list(arr)) > ... > ArrowInvalid: Could not convert 1 with type pyarrow.lib.Int64Value: did not > recognize Python value type when inferring an Arrow data type > {code} > Do we want to allow those / recognize those here? (the first case could even > have a fastpath, as we don't need to do it element by element). > Also scalars are not supported: > {code} > In [46]: type(arr.sum()) > Out[46]: pyarrow.lib.Int64Scalar > In [47]: pa.array([arr.sum()]) > ... > ArrowInvalid: Could not convert 6 with type pyarrow.lib.Int64Scalar: did not > recognize Python value type when inferring an Arrow data type > {code} > And also in other functions we don't accept arrow scalars / values: > {code} > In [48]: string = pa.array(['a'])[0] > In [49]: type(string) > Out[49]: pyarrow.lib.StringValue > In [50]: pa.field(string, pa.int64()) > ... > TypeError: expected bytes, pyarrow.lib.StringValue found > {code} > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-5295) [Python] accept pyarrow values / scalars in constructor functions ?
[ https://issues.apache.org/jira/browse/ARROW-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836524#comment-16836524 ] Antoine Pitrou commented on ARROW-5295: --- I think allowing Arrays is fine. I'm not sure we want to make Scalars first-class citizens like Numpy scalars, though. At least from an implementation POV, it may add a lot of code all over the place... > [Python] accept pyarrow values / scalars in constructor functions ? > --- > > Key: ARROW-5295 > URL: https://issues.apache.org/jira/browse/ARROW-5295 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Joris Van den Bossche >Priority: Major > > Currently, functions like \{{pyarrow.array}} don't accept pyarrow Arrays, or > also not scalars of it: > {code} > In [42]: arr = pa.array([1, 2, 3]) > In [43]: pa.array(arr) > ... > ArrowInvalid: Could not convert 1 with type pyarrow.lib.Int64Value: did not > recognize Python value type when inferring an Arrow data type > In [44]: pa.array(list(arr)) > ... > ArrowInvalid: Could not convert 1 with type pyarrow.lib.Int64Value: did not > recognize Python value type when inferring an Arrow data type > {code} > Do we want to allow those / recognize those here? (the first case could even > have a fastpath, as we don't need to do it element by element). > Also scalars are not supported: > {code} > In [46]: type(arr.sum()) > Out[46]: pyarrow.lib.Int64Scalar > In [47]: pa.array([arr.sum()]) > ... > ArrowInvalid: Could not convert 6 with type pyarrow.lib.Int64Scalar: did not > recognize Python value type when inferring an Arrow data type > {code} > And also in other functions we don't accept arrow scalars / values: > {code} > In [48]: string = pa.array(['a'])[0] > In [49]: type(string) > Out[49]: pyarrow.lib.StringValue > In [50]: pa.field(string, pa.int64()) > ... > TypeError: expected bytes, pyarrow.lib.StringValue found > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)