[jira] [Commented] (ARROW-7012) [C++] Clarify ChunkedArray chunking strategy and policy
[ https://issues.apache.org/jira/browse/ARROW-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128568#comment-17128568 ] Wes McKinney commented on ARROW-7012: - OK. I'll take care of this. > [C++] Clarify ChunkedArray chunking strategy and policy > --- > > Key: ARROW-7012 > URL: https://issues.apache.org/jira/browse/ARROW-7012 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Neal Richardson >Assignee: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > See discussion on ARROW-6784 and [https://github.com/apache/arrow/pull/5686]. > Among the questions: > * Do Arrow users control the chunking, or is it an internal implementation > detail they should not manage? > * If users control it, how do they control it? E.g. if I call Take and use a > ChunkedArray for the indices to take, does the chunking follow how the > indices are chunked? Or should we attempt to preserve the mapping of data to > their chunks in the input table/chunked array? > * If it's an implementation detail, what is the optimal chunk size? And when > is it worth reshaping (concatenating, slicing) input data to attain this > optimal size? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7012) [C++] Clarify ChunkedArray chunking strategy and policy
[ https://issues.apache.org/jira/browse/ARROW-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128526#comment-17128526 ] Neal Richardson commented on ARROW-7012: IMO it should go in the main ChunkedArray docs where the class is described/explained. There are multiple ways you could end up at the question of how you should chunk arrays, so I think compute is not quite the right place. Either way, we can move it/link to it from both/all relevant places. > [C++] Clarify ChunkedArray chunking strategy and policy > --- > > Key: ARROW-7012 > URL: https://issues.apache.org/jira/browse/ARROW-7012 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > See discussion on ARROW-6784 and [https://github.com/apache/arrow/pull/5686]. > Among the questions: > * Do Arrow users control the chunking, or is it an internal implementation > detail they should not manage? > * If users control it, how do they control it? E.g. if I call Take and use a > ChunkedArray for the indices to take, does the chunking follow how the > indices are chunked? Or should we attempt to preserve the mapping of data to > their chunks in the input table/chunked array? > * If it's an implementation detail, what is the optimal chunk size? And when > is it worth reshaping (concatenating, slicing) input data to attain this > optimal size? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7012) [C++] Clarify ChunkedArray chunking strategy and policy
[ https://issues.apache.org/jira/browse/ARROW-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128516#comment-17128516 ] Wes McKinney commented on ARROW-7012: - Agreed. Where do you think this should go, in the docstrings for ChunkedArray, in arrow/compute/README.md, somewhere else? > [C++] Clarify ChunkedArray chunking strategy and policy > --- > > Key: ARROW-7012 > URL: https://issues.apache.org/jira/browse/ARROW-7012 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > See discussion on ARROW-6784 and [https://github.com/apache/arrow/pull/5686]. > Among the questions: > * Do Arrow users control the chunking, or is it an internal implementation > detail they should not manage? > * If users control it, how do they control it? E.g. if I call Take and use a > ChunkedArray for the indices to take, does the chunking follow how the > indices are chunked? Or should we attempt to preserve the mapping of data to > their chunks in the input table/chunked array? > * If it's an implementation detail, what is the optimal chunk size? And when > is it worth reshaping (concatenating, slicing) input data to attain this > optimal size? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7012) [C++] Clarify ChunkedArray chunking strategy and policy
[ https://issues.apache.org/jira/browse/ARROW-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128474#comment-17128474 ] Neal Richardson commented on ARROW-7012: We still may want to document that fact > [C++] Clarify ChunkedArray chunking strategy and policy > --- > > Key: ARROW-7012 > URL: https://issues.apache.org/jira/browse/ARROW-7012 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > See discussion on ARROW-6784 and [https://github.com/apache/arrow/pull/5686]. > Among the questions: > * Do Arrow users control the chunking, or is it an internal implementation > detail they should not manage? > * If users control it, how do they control it? E.g. if I call Take and use a > ChunkedArray for the indices to take, does the chunking follow how the > indices are chunked? Or should we attempt to preserve the mapping of data to > their chunks in the input table/chunked array? > * If it's an implementation detail, what is the optimal chunk size? And when > is it worth reshaping (concatenating, slicing) input data to attain this > optimal size? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7012) [C++] Clarify ChunkedArray chunking strategy and policy
[ https://issues.apache.org/jira/browse/ARROW-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128441#comment-17128441 ] Wes McKinney commented on ARROW-7012: - I think we can close this. I don't think that users need to be especially concerned with this, though the chunksize will be publicly configurable for expression execution > [C++] Clarify ChunkedArray chunking strategy and policy > --- > > Key: ARROW-7012 > URL: https://issues.apache.org/jira/browse/ARROW-7012 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > See discussion on ARROW-6784 and [https://github.com/apache/arrow/pull/5686]. > Among the questions: > * Do Arrow users control the chunking, or is it an internal implementation > detail they should not manage? > * If users control it, how do they control it? E.g. if I call Take and use a > ChunkedArray for the indices to take, does the chunking follow how the > indices are chunked? Or should we attempt to preserve the mapping of data to > their chunks in the input table/chunked array? > * If it's an implementation detail, what is the optimal chunk size? And when > is it worth reshaping (concatenating, slicing) input data to attain this > optimal size? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7012) [C++] Clarify ChunkedArray chunking strategy and policy
[ https://issues.apache.org/jira/browse/ARROW-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116220#comment-17116220 ] Wes McKinney commented on ARROW-7012: - In general, this is not something that users should be too concerned with. The new kernels framework provides a configurability knob ({{ExecContext::exec_chunksize}}) for selecting the upper limit for the size of chunks that are processed > [C++] Clarify ChunkedArray chunking strategy and policy > --- > > Key: ARROW-7012 > URL: https://issues.apache.org/jira/browse/ARROW-7012 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > See discussion on ARROW-6784 and [https://github.com/apache/arrow/pull/5686]. > Among the questions: > * Do Arrow users control the chunking, or is it an internal implementation > detail they should not manage? > * If users control it, how do they control it? E.g. if I call Take and use a > ChunkedArray for the indices to take, does the chunking follow how the > indices are chunked? Or should we attempt to preserve the mapping of data to > their chunks in the input table/chunked array? > * If it's an implementation detail, what is the optimal chunk size? And when > is it worth reshaping (concatenating, slicing) input data to attain this > optimal size? -- This message was sent by Atlassian Jira (v8.3.4#803005)