[jira] [Commented] (ARROW-7012) [C++] Clarify ChunkedArray chunking strategy and policy

2020-06-08 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128568#comment-17128568
 ] 

Wes McKinney commented on ARROW-7012:
-

OK. I'll take care of this. 

> [C++] Clarify ChunkedArray chunking strategy and policy
> ---
>
> Key: ARROW-7012
> URL: https://issues.apache.org/jira/browse/ARROW-7012
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> See discussion on ARROW-6784 and [https://github.com/apache/arrow/pull/5686]. 
> Among the questions:
>  * Do Arrow users control the chunking, or is it an internal implementation 
> detail they should not manage?
>  * If users control it, how do they control it? E.g. if I call Take and use a 
> ChunkedArray for the indices to take, does the chunking follow how the 
> indices are chunked? Or should we attempt to preserve the mapping of data to 
> their chunks in the input table/chunked array?
>  * If it's an implementation detail, what is the optimal chunk size? And when 
> is it worth reshaping (concatenating, slicing) input data to attain this 
> optimal size? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7012) [C++] Clarify ChunkedArray chunking strategy and policy

2020-06-08 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128526#comment-17128526
 ] 

Neal Richardson commented on ARROW-7012:


IMO it should go in the main ChunkedArray docs where the class is 
described/explained. There are multiple ways you could end up at the question 
of how you should chunk arrays, so I think compute is not quite the right 
place. Either way, we can move it/link to it from both/all relevant places.

> [C++] Clarify ChunkedArray chunking strategy and policy
> ---
>
> Key: ARROW-7012
> URL: https://issues.apache.org/jira/browse/ARROW-7012
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> See discussion on ARROW-6784 and [https://github.com/apache/arrow/pull/5686]. 
> Among the questions:
>  * Do Arrow users control the chunking, or is it an internal implementation 
> detail they should not manage?
>  * If users control it, how do they control it? E.g. if I call Take and use a 
> ChunkedArray for the indices to take, does the chunking follow how the 
> indices are chunked? Or should we attempt to preserve the mapping of data to 
> their chunks in the input table/chunked array?
>  * If it's an implementation detail, what is the optimal chunk size? And when 
> is it worth reshaping (concatenating, slicing) input data to attain this 
> optimal size? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7012) [C++] Clarify ChunkedArray chunking strategy and policy

2020-06-08 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128516#comment-17128516
 ] 

Wes McKinney commented on ARROW-7012:
-

Agreed. Where do you think this should go, in the docstrings for ChunkedArray, 
in arrow/compute/README.md, somewhere else?

> [C++] Clarify ChunkedArray chunking strategy and policy
> ---
>
> Key: ARROW-7012
> URL: https://issues.apache.org/jira/browse/ARROW-7012
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> See discussion on ARROW-6784 and [https://github.com/apache/arrow/pull/5686]. 
> Among the questions:
>  * Do Arrow users control the chunking, or is it an internal implementation 
> detail they should not manage?
>  * If users control it, how do they control it? E.g. if I call Take and use a 
> ChunkedArray for the indices to take, does the chunking follow how the 
> indices are chunked? Or should we attempt to preserve the mapping of data to 
> their chunks in the input table/chunked array?
>  * If it's an implementation detail, what is the optimal chunk size? And when 
> is it worth reshaping (concatenating, slicing) input data to attain this 
> optimal size? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7012) [C++] Clarify ChunkedArray chunking strategy and policy

2020-06-08 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128474#comment-17128474
 ] 

Neal Richardson commented on ARROW-7012:


We still may want to document that fact

> [C++] Clarify ChunkedArray chunking strategy and policy
> ---
>
> Key: ARROW-7012
> URL: https://issues.apache.org/jira/browse/ARROW-7012
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> See discussion on ARROW-6784 and [https://github.com/apache/arrow/pull/5686]. 
> Among the questions:
>  * Do Arrow users control the chunking, or is it an internal implementation 
> detail they should not manage?
>  * If users control it, how do they control it? E.g. if I call Take and use a 
> ChunkedArray for the indices to take, does the chunking follow how the 
> indices are chunked? Or should we attempt to preserve the mapping of data to 
> their chunks in the input table/chunked array?
>  * If it's an implementation detail, what is the optimal chunk size? And when 
> is it worth reshaping (concatenating, slicing) input data to attain this 
> optimal size? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7012) [C++] Clarify ChunkedArray chunking strategy and policy

2020-06-08 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128441#comment-17128441
 ] 

Wes McKinney commented on ARROW-7012:
-

I think we can close this. I don't think that users need to be especially 
concerned with this, though the chunksize will be publicly configurable for 
expression execution

> [C++] Clarify ChunkedArray chunking strategy and policy
> ---
>
> Key: ARROW-7012
> URL: https://issues.apache.org/jira/browse/ARROW-7012
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> See discussion on ARROW-6784 and [https://github.com/apache/arrow/pull/5686]. 
> Among the questions:
>  * Do Arrow users control the chunking, or is it an internal implementation 
> detail they should not manage?
>  * If users control it, how do they control it? E.g. if I call Take and use a 
> ChunkedArray for the indices to take, does the chunking follow how the 
> indices are chunked? Or should we attempt to preserve the mapping of data to 
> their chunks in the input table/chunked array?
>  * If it's an implementation detail, what is the optimal chunk size? And when 
> is it worth reshaping (concatenating, slicing) input data to attain this 
> optimal size? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7012) [C++] Clarify ChunkedArray chunking strategy and policy

2020-05-25 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116220#comment-17116220
 ] 

Wes McKinney commented on ARROW-7012:
-

In general, this is not something that users should be too concerned with. The 
new kernels framework provides a configurability knob 
({{ExecContext::exec_chunksize}}) for selecting the upper limit for the size of 
chunks that are processed

> [C++] Clarify ChunkedArray chunking strategy and policy
> ---
>
> Key: ARROW-7012
> URL: https://issues.apache.org/jira/browse/ARROW-7012
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> See discussion on ARROW-6784 and [https://github.com/apache/arrow/pull/5686]. 
> Among the questions:
>  * Do Arrow users control the chunking, or is it an internal implementation 
> detail they should not manage?
>  * If users control it, how do they control it? E.g. if I call Take and use a 
> ChunkedArray for the indices to take, does the chunking follow how the 
> indices are chunked? Or should we attempt to preserve the mapping of data to 
> their chunks in the input table/chunked array?
>  * If it's an implementation detail, what is the optimal chunk size? And when 
> is it worth reshaping (concatenating, slicing) input data to attain this 
> optimal size? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)