[jira] [Commented] (ARROW-5295) [Python] accept pyarrow values / scalars in constructor functions ?

2019-08-22 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913756#comment-16913756
 ] 

Wes McKinney commented on ARROW-5295:
-

A "simple" workaround would be to invoke Scalars' {{as_py}} method if they're 
passed in. That would add perf overhead, though, since we'd need to do 
{{isinstance}} checks. 

Another option is to "sanitize" inputs (using a helper function) only in the 
case of failure on the initial try. So the normal use case won't be affected

> [Python] accept pyarrow values / scalars in constructor functions ?
> ---
>
> Key: ARROW-5295
> URL: https://issues.apache.org/jira/browse/ARROW-5295
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Major
>
> Currently, functions like \{{pyarrow.array}} don't accept pyarrow Arrays, or 
> also not scalars of it:
> {code}
> In [42]: arr = pa.array([1, 2, 3])
> In [43]: pa.array(arr)
> ...
> ArrowInvalid: Could not convert 1 with type pyarrow.lib.Int64Value: did not 
> recognize Python value type when inferring an Arrow data type
> In [44]: pa.array(list(arr))
> ...
> ArrowInvalid: Could not convert 1 with type pyarrow.lib.Int64Value: did not 
> recognize Python value type when inferring an Arrow data type
> {code}
> Do we want to allow those / recognize those here? (the first case could even 
> have a fastpath, as we don't need to do it element by element).
> Also scalars are not supported:
> {code}
> In [46]: type(arr.sum())
> Out[46]: pyarrow.lib.Int64Scalar
> In [47]: pa.array([arr.sum()])
> ...
> ArrowInvalid: Could not convert 6 with type pyarrow.lib.Int64Scalar: did not 
> recognize Python value type when inferring an Arrow data type
> {code}
> And also in other functions we don't accept arrow scalars / values:
> {code}
> In [48]: string = pa.array(['a'])[0]
> In [49]: type(string)
> Out[49]: pyarrow.lib.StringValue
> In [50]: pa.field(string, pa.int64())
> ...
> TypeError: expected bytes, pyarrow.lib.StringValue found
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-5295) [Python] accept pyarrow values / scalars in constructor functions ?

2019-08-13 Thread Marcel Ackermann (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906193#comment-16906193
 ] 

Marcel Ackermann commented on ARROW-5295:
-

This would be required for serializing dataframe that contain vectors: 
https://issues.apache.org/jira/browse/ARROW-6222

> [Python] accept pyarrow values / scalars in constructor functions ?
> ---
>
> Key: ARROW-5295
> URL: https://issues.apache.org/jira/browse/ARROW-5295
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Major
>
> Currently, functions like \{{pyarrow.array}} don't accept pyarrow Arrays, or 
> also not scalars of it:
> {code}
> In [42]: arr = pa.array([1, 2, 3])
> In [43]: pa.array(arr)
> ...
> ArrowInvalid: Could not convert 1 with type pyarrow.lib.Int64Value: did not 
> recognize Python value type when inferring an Arrow data type
> In [44]: pa.array(list(arr))
> ...
> ArrowInvalid: Could not convert 1 with type pyarrow.lib.Int64Value: did not 
> recognize Python value type when inferring an Arrow data type
> {code}
> Do we want to allow those / recognize those here? (the first case could even 
> have a fastpath, as we don't need to do it element by element).
> Also scalars are not supported:
> {code}
> In [46]: type(arr.sum())
> Out[46]: pyarrow.lib.Int64Scalar
> In [47]: pa.array([arr.sum()])
> ...
> ArrowInvalid: Could not convert 6 with type pyarrow.lib.Int64Scalar: did not 
> recognize Python value type when inferring an Arrow data type
> {code}
> And also in other functions we don't accept arrow scalars / values:
> {code}
> In [48]: string = pa.array(['a'])[0]
> In [49]: type(string)
> Out[49]: pyarrow.lib.StringValue
> In [50]: pa.field(string, pa.int64())
> ...
> TypeError: expected bytes, pyarrow.lib.StringValue found
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-5295) [Python] accept pyarrow values / scalars in constructor functions ?

2019-08-13 Thread Joris Van den Bossche (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906190#comment-16906190
 ] 

Joris Van den Bossche commented on ARROW-5295:
--

Additional case (from ARROW-6222): pyarrow Arrays are also not recognized as 
list-like when converting/inferring a list array:

{code}
In [43]: pa.array([np.array([1, 1]), np.array([2, 2, 2])])  

   
Out[43]: 

[
  [
1,
1
  ],
  [
2,
2,
2
  ]
]

In [44]: pa.array([pa.array([1, 1]), pa.array([2, 2, 2])])  

   
---
ArrowInvalid  Traceback (most recent call last)
 in 
> 1 pa.array([pa.array([1, 1]), pa.array([2, 2, 2])])

~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()

~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()

~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Could not convert [
  1,
  1
] with type pyarrow.lib.Int64Array: did not recognize Python value type when 
inferring an Arrow data type
{code}

So list (or array) of numpy arrays works, but list of pyarrow arrays not. 
Again, not the most typical use case of pyarrow Arrays, so not sure we should 
add this capability.

(although we might want to find a general solution for array-like objects (eg 
pytorch.Tensors, see ARROW-6222), and a solution for that (somehow trying to 
coerce to a numpy array?) might also solve the case of a list of arrow arrays)

> [Python] accept pyarrow values / scalars in constructor functions ?
> ---
>
> Key: ARROW-5295
> URL: https://issues.apache.org/jira/browse/ARROW-5295
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Major
>
> Currently, functions like \{{pyarrow.array}} don't accept pyarrow Arrays, or 
> also not scalars of it:
> {code}
> In [42]: arr = pa.array([1, 2, 3])
> In [43]: pa.array(arr)
> ...
> ArrowInvalid: Could not convert 1 with type pyarrow.lib.Int64Value: did not 
> recognize Python value type when inferring an Arrow data type
> In [44]: pa.array(list(arr))
> ...
> ArrowInvalid: Could not convert 1 with type pyarrow.lib.Int64Value: did not 
> recognize Python value type when inferring an Arrow data type
> {code}
> Do we want to allow those / recognize those here? (the first case could even 
> have a fastpath, as we don't need to do it element by element).
> Also scalars are not supported:
> {code}
> In [46]: type(arr.sum())
> Out[46]: pyarrow.lib.Int64Scalar
> In [47]: pa.array([arr.sum()])
> ...
> ArrowInvalid: Could not convert 6 with type pyarrow.lib.Int64Scalar: did not 
> recognize Python value type when inferring an Arrow data type
> {code}
> And also in other functions we don't accept arrow scalars / values:
> {code}
> In [48]: string = pa.array(['a'])[0]
> In [49]: type(string)
> Out[49]: pyarrow.lib.StringValue
> In [50]: pa.field(string, pa.int64())
> ...
> TypeError: expected bytes, pyarrow.lib.StringValue found
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-5295) [Python] accept pyarrow values / scalars in constructor functions ?

2019-05-09 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836524#comment-16836524
 ] 

Antoine Pitrou commented on ARROW-5295:
---

I think allowing Arrays is fine. I'm not sure we want to make Scalars 
first-class citizens like Numpy scalars, though. At least from an 
implementation POV, it may add a lot of code all over the place...

> [Python] accept pyarrow values / scalars in constructor functions ?
> ---
>
> Key: ARROW-5295
> URL: https://issues.apache.org/jira/browse/ARROW-5295
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Major
>
> Currently, functions like \{{pyarrow.array}} don't accept pyarrow Arrays, or 
> also not scalars of it:
> {code}
> In [42]: arr = pa.array([1, 2, 3])
> In [43]: pa.array(arr)
> ...
> ArrowInvalid: Could not convert 1 with type pyarrow.lib.Int64Value: did not 
> recognize Python value type when inferring an Arrow data type
> In [44]: pa.array(list(arr))
> ...
> ArrowInvalid: Could not convert 1 with type pyarrow.lib.Int64Value: did not 
> recognize Python value type when inferring an Arrow data type
> {code}
> Do we want to allow those / recognize those here? (the first case could even 
> have a fastpath, as we don't need to do it element by element).
> Also scalars are not supported:
> {code}
> In [46]: type(arr.sum())
> Out[46]: pyarrow.lib.Int64Scalar
> In [47]: pa.array([arr.sum()])
> ...
> ArrowInvalid: Could not convert 6 with type pyarrow.lib.Int64Scalar: did not 
> recognize Python value type when inferring an Arrow data type
> {code}
> And also in other functions we don't accept arrow scalars / values:
> {code}
> In [48]: string = pa.array(['a'])[0]
> In [49]: type(string)
> Out[49]: pyarrow.lib.StringValue
> In [50]: pa.field(string, pa.int64())
> ...
> TypeError: expected bytes, pyarrow.lib.StringValue found
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)