[jira] [Commented] (ARROW-7066) [Python] support returning ChunkedArray from __arrow_array__ ?

2019-11-11 Thread Joris Van den Bossche (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972155#comment-16972155
 ] 

Joris Van den Bossche commented on ARROW-7066:
--

I still don't fully like returning a chunked array from {{pa.array}}, but also 
don't see an easy other solution to otherwise get the roundtrip working for eg 
fletcher that uses chunked arrays (alternative would be to have an "internal" 
version of {{pa.array(..)}} that allows this, and keep the public one strict, 
but that is also rather ugly).

I will add some documentation update to the current open PR.

> [Python] support returning ChunkedArray from __arrow_array__ ?
> --
>
> Key: ARROW-7066
> URL: https://issues.apache.org/jira/browse/ARROW-7066
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The {{\_\_arrow_array\_\_}} protocol was added so that custom objects can 
> define how they should be converted to a pyarrow Array (similar to numpy's 
> {{\_\_array\_\_}}). This is then also used to support converting pandas 
> DataFrames with columns using pandas' ExtensionArrays to a pyarrow Table (if 
> the pandas ExtensionArray, such as nullable integer type, implements this 
> {{\_\_arrow_array\_\_}} method).
> This last use case could also be useful for fletcher 
> (https://github.com/xhochy/fletcher/, a package that implements pandas 
> ExtensionArrays that wrap pyarrow arrays, so they can be stored as is in a 
> pandas DataFrame).  
> However, fletcher stores ChunkedArrays in ExtensionArry / the columns of a 
> pandas DataFrame (to have a better mapping with a Table, where the columns 
> also consist of chunked arrays). While we currently require that the return 
> value of {{\_\_arrow_array\_\_}} is a pyarrow.Array.
> So I was wondering: could we relax this constraint and also allow 
> ChunkedArray as return value? 
> However, this protocol is currently called in the {{pa.array(..)}} function, 
> which probably should keep returning an Array (and not ChunkedArray in 
> certain cases).
> cc [~uwe]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7066) [Python] support returning ChunkedArray from __arrow_array__ ?

2019-11-08 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970357#comment-16970357
 ] 

Uwe Korn commented on ARROW-7066:
-

Would also be OK with this but we should make this clear somewhere in the 
documenation.

> [Python] support returning ChunkedArray from __arrow_array__ ?
> --
>
> Key: ARROW-7066
> URL: https://issues.apache.org/jira/browse/ARROW-7066
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The {{\_\_arrow_array\_\_}} protocol was added so that custom objects can 
> define how they should be converted to a pyarrow Array (similar to numpy's 
> {{\_\_array\_\_}}). This is then also used to support converting pandas 
> DataFrames with columns using pandas' ExtensionArrays to a pyarrow Table (if 
> the pandas ExtensionArray, such as nullable integer type, implements this 
> {{\_\_arrow_array\_\_}} method).
> This last use case could also be useful for fletcher 
> (https://github.com/xhochy/fletcher/, a package that implements pandas 
> ExtensionArrays that wrap pyarrow arrays, so they can be stored as is in a 
> pandas DataFrame).  
> However, fletcher stores ChunkedArrays in ExtensionArry / the columns of a 
> pandas DataFrame (to have a better mapping with a Table, where the columns 
> also consist of chunked arrays). While we currently require that the return 
> value of {{\_\_arrow_array\_\_}} is a pyarrow.Array.
> So I was wondering: could we relax this constraint and also allow 
> ChunkedArray as return value? 
> However, this protocol is currently called in the {{pa.array(..)}} function, 
> which probably should keep returning an Array (and not ChunkedArray in 
> certain cases).
> cc [~uwe]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7066) [Python] support returning ChunkedArray from __arrow_array__ ?

2019-11-05 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967922#comment-16967922
 ] 

Wes McKinney commented on ARROW-7066:
-

{{pyarrow.array}} actually returns ChunkedArray in certain cases (BinaryArray 
overflows), so there is reasonable precedent for this. I think relaxing the 
constraint to return either type of array seems OK to me. 

> [Python] support returning ChunkedArray from __arrow_array__ ?
> --
>
> Key: ARROW-7066
> URL: https://issues.apache.org/jira/browse/ARROW-7066
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Major
> Fix For: 1.0.0
>
>
> The {{\_\_arrow_array\_\_}} protocol was added so that custom objects can 
> define how they should be converted to a pyarrow Array (similar to numpy's 
> {{\_\_array\_\_}}). This is then also used to support converting pandas 
> DataFrames with columns using pandas' ExtensionArrays to a pyarrow Table (if 
> the pandas ExtensionArray, such as nullable integer type, implements this 
> {{\_\_arrow_array\_\_}} method).
> This last use case could also be useful for fletcher 
> (https://github.com/xhochy/fletcher/, a package that implements pandas 
> ExtensionArrays that wrap pyarrow arrays, so they can be stored as is in a 
> pandas DataFrame).  
> However, fletcher stores ChunkedArrays in ExtensionArry / the columns of a 
> pandas DataFrame (to have a better mapping with a Table, where the columns 
> also consist of chunked arrays). While we currently require that the return 
> value of {{\_\_arrow_array\_\_}} is a pyarrow.Array.
> So I was wondering: could we relax this constraint and also allow 
> ChunkedArray as return value? 
> However, this protocol is currently called in the {{pa.array(..)}} function, 
> which probably should keep returning an Array (and not ChunkedArray in 
> certain cases).
> cc [~uwe]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)