[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray
[ https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16393495#comment-16393495 ] ASF GitHub Bot commented on ARROW-2262: --- wesm commented on a change in pull request #1702: ARROW-2262: [Python] Support slicing on pyarrow.ChunkedArray URL: https://github.com/apache/arrow/pull/1702#discussion_r173559111 ## File path: python/pyarrow/table.pxi ## @@ -77,6 +77,52 @@ cdef class ChunkedArray: self._check_nullptr() return self.chunked_array.null_count() +def __getitem__(self, key): +cdef int64_t item +cdef int i +self._check_nullptr() +if isinstance(key, slice): +return _normalize_slice(self, key) +elif isinstance(key, six.integer_types): +item = key +if item >= self.chunked_array.length() or item < 0: +return IndexError("ChunkedArray selection out of bounds") Review comment: Agreed, perhaps let's handle this as a follow up patch This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Support slicing on pyarrow.ChunkedArray > > > Key: ARROW-2262 > URL: https://issues.apache.org/jira/browse/ARROW-2262 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray
[ https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16393497#comment-16393497 ] ASF GitHub Bot commented on ARROW-2262: --- wesm closed pull request #1702: ARROW-2262: [Python] Support slicing on pyarrow.ChunkedArray URL: https://github.com/apache/arrow/pull/1702 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/python/pyarrow/includes/libarrow.pxd b/python/pyarrow/includes/libarrow.pxd index d95f01661..776b96531 100644 --- a/python/pyarrow/includes/libarrow.pxd +++ b/python/pyarrow/includes/libarrow.pxd @@ -387,6 +387,8 @@ cdef extern from "arrow/api.h" namespace "arrow" nogil: int num_chunks() shared_ptr[CArray] chunk(int i) shared_ptr[CDataType] type() +shared_ptr[CChunkedArray] Slice(int64_t offset, int64_t length) const +shared_ptr[CChunkedArray] Slice(int64_t offset) const cdef cppclass CColumn" arrow::Column": CColumn(const shared_ptr[CField]& field, diff --git a/python/pyarrow/table.pxi b/python/pyarrow/table.pxi index c27c0edd9..94041e465 100644 --- a/python/pyarrow/table.pxi +++ b/python/pyarrow/table.pxi @@ -77,6 +77,52 @@ cdef class ChunkedArray: self._check_nullptr() return self.chunked_array.null_count() +def __getitem__(self, key): +cdef int64_t item +cdef int i +self._check_nullptr() +if isinstance(key, slice): +return _normalize_slice(self, key) +elif isinstance(key, six.integer_types): +item = key +if item >= self.chunked_array.length() or item < 0: +return IndexError("ChunkedArray selection out of bounds") +for i in range(self.num_chunks): +if item < self.chunked_array.chunk(i).get().length(): +return self.chunk(i)[item] +else: +item -= self.chunked_array.chunk(i).get().length() +else: +raise TypeError("key must either be a slice or integer") + +def slice(self, offset=0, length=None): +""" +Compute zero-copy slice of this ChunkedArray + +Parameters +-- +offset : int, default 0 +Offset from start of array to slice +length : int, default None +Length of slice (default is until end of batch starting from +offset) + +Returns +--- +sliced : ChunkedArray +""" +cdef shared_ptr[CChunkedArray] result + +if offset < 0: +raise IndexError('Offset must be non-negative') + +if length is None: +result = self.chunked_array.Slice(offset) +else: +result = self.chunked_array.Slice(offset, length) + +return pyarrow_wrap_chunked_array(result) + @property def num_chunks(self): """ diff --git a/python/pyarrow/tests/test_table.py b/python/pyarrow/tests/test_table.py index e72761d32..356ecb7e0 100644 --- a/python/pyarrow/tests/test_table.py +++ b/python/pyarrow/tests/test_table.py @@ -24,6 +24,21 @@ import pyarrow as pa +def test_chunked_array_getitem(): +data = [ +pa.array([1, 2, 3]), +pa.array([4, 5, 6]) +] +data = pa.chunked_array(data) +assert data[1].as_py() == 2 + +data_slice = data[2:4] +assert data_slice.to_pylist() == [3, 4] + +data_slice = data[4:-1] +assert data_slice.to_pylist() == [5] + + def test_column_basics(): data = [ pa.array([-10, -5, 0, 5, 10]) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Support slicing on pyarrow.ChunkedArray > > > Key: ARROW-2262 > URL: https://issues.apache.org/jira/browse/ARROW-2262 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray
[ https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391042#comment-16391042 ] ASF GitHub Bot commented on ARROW-2262: --- pitrou commented on a change in pull request #1702: ARROW-2262: [Python] Support slicing on pyarrow.ChunkedArray URL: https://github.com/apache/arrow/pull/1702#discussion_r173120083 ## File path: python/pyarrow/table.pxi ## @@ -77,6 +77,52 @@ cdef class ChunkedArray: self._check_nullptr() return self.chunked_array.null_count() +def __getitem__(self, key): +cdef int64_t item +cdef int i +self._check_nullptr() +if isinstance(key, slice): +return _normalize_slice(self, key) +elif isinstance(key, six.integer_types): +item = key +if item >= self.chunked_array.length() or item < 0: +return IndexError("ChunkedArray selection out of bounds") Review comment: If we allow negative slice bounds, I would expect us to also allow negative indices. Seems like it's time for a `_normalize_index` function? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Support slicing on pyarrow.ChunkedArray > > > Key: ARROW-2262 > URL: https://issues.apache.org/jira/browse/ARROW-2262 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray
[ https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390056#comment-16390056 ] ASF GitHub Bot commented on ARROW-2262: --- xhochy commented on a change in pull request #1702: ARROW-2262: [Python] Support slicing on pyarrow.ChunkedArray URL: https://github.com/apache/arrow/pull/1702#discussion_r172953568 ## File path: python/pyarrow/table.pxi ## @@ -77,6 +77,49 @@ cdef class ChunkedArray: self._check_nullptr() return self.chunked_array.null_count() +def __getitem__(self, key): +cdef int64_t item, i +self._check_nullptr() +if isinstance(key, slice): +return _normalize_slice(self, key) +else: +item = int(key) +if item >= self.chunked_array.length() or item < 0: +return IndexError("ChunkedArray selection out of bounds") +for i in range(self.num_chunks): Review comment: Yes they would benefit from this API but it is much more complicated to implement there as `arrow::ChunkedArray` has no type. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Support slicing on pyarrow.ChunkedArray > > > Key: ARROW-2262 > URL: https://issues.apache.org/jira/browse/ARROW-2262 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray
[ https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390057#comment-16390057 ] ASF GitHub Bot commented on ARROW-2262: --- xhochy commented on a change in pull request #1702: ARROW-2262: [Python] Support slicing on pyarrow.ChunkedArray URL: https://github.com/apache/arrow/pull/1702#discussion_r172955517 ## File path: python/pyarrow/table.pxi ## @@ -77,6 +77,49 @@ cdef class ChunkedArray: self._check_nullptr() return self.chunked_array.null_count() +def __getitem__(self, key): +cdef int64_t item, i +self._check_nullptr() +if isinstance(key, slice): +return _normalize_slice(self, key) +else: +item = int(key) Review comment: Changed it This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Support slicing on pyarrow.ChunkedArray > > > Key: ARROW-2262 > URL: https://issues.apache.org/jira/browse/ARROW-2262 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray
[ https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389696#comment-16389696 ] ASF GitHub Bot commented on ARROW-2262: --- pitrou commented on issue #1702: ARROW-2262: [Python] Support slicing on pyarrow.ChunkedArray URL: https://github.com/apache/arrow/pull/1702#issuecomment-371177042 @xhochy actually, it's probably because `arrow::ChunkedArray::chunk()` takes a C int but you are trying to pass a int64_t. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Support slicing on pyarrow.ChunkedArray > > > Key: ARROW-2262 > URL: https://issues.apache.org/jira/browse/ARROW-2262 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray
[ https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389594#comment-16389594 ] ASF GitHub Bot commented on ARROW-2262: --- pitrou commented on issue #1702: ARROW-2262: [Python] Support slicing on pyarrow.ChunkedArray URL: https://github.com/apache/arrow/pull/1702#issuecomment-371149892 @xhochy what does lib.cxx say around the lines mentioned above? That's assuming your local cython instance produces exactly the same output as AppVeyor does... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Support slicing on pyarrow.ChunkedArray > > > Key: ARROW-2262 > URL: https://issues.apache.org/jira/browse/ARROW-2262 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray
[ https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389547#comment-16389547 ] ASF GitHub Bot commented on ARROW-2262: --- pitrou commented on a change in pull request #1702: ARROW-2262: [Python] Support slicing on pyarrow.ChunkedArray URL: https://github.com/apache/arrow/pull/1702#discussion_r172840532 ## File path: python/pyarrow/table.pxi ## @@ -77,6 +77,49 @@ cdef class ChunkedArray: self._check_nullptr() return self.chunked_array.null_count() +def __getitem__(self, key): +cdef int64_t item, i +self._check_nullptr() +if isinstance(key, slice): +return _normalize_slice(self, key) +else: +item = int(key) +if item >= self.chunked_array.length() or item < 0: +return IndexError("ChunkedArray selection out of bounds") +for i in range(self.num_chunks): Review comment: I'm curious, can't this be implemented on the C++ side instead? I suppose C++ users may benefit from such an API as well. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Support slicing on pyarrow.ChunkedArray > > > Key: ARROW-2262 > URL: https://issues.apache.org/jira/browse/ARROW-2262 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray
[ https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389545#comment-16389545 ] ASF GitHub Bot commented on ARROW-2262: --- pitrou commented on a change in pull request #1702: ARROW-2262: [Python] Support slicing on pyarrow.ChunkedArray URL: https://github.com/apache/arrow/pull/1702#discussion_r172840289 ## File path: python/pyarrow/table.pxi ## @@ -77,6 +77,49 @@ cdef class ChunkedArray: self._check_nullptr() return self.chunked_array.null_count() +def __getitem__(self, key): +cdef int64_t item, i +self._check_nullptr() +if isinstance(key, slice): +return _normalize_slice(self, key) +else: +item = int(key) Review comment: You don't want to allow non-integer types such as float here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Support slicing on pyarrow.ChunkedArray > > > Key: ARROW-2262 > URL: https://issues.apache.org/jira/browse/ARROW-2262 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray
[ https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389543#comment-16389543 ] ASF GitHub Bot commented on ARROW-2262: --- xhochy commented on issue #1702: ARROW-2262: [Python] Support slicing on pyarrow.ChunkedArray URL: https://github.com/apache/arrow/pull/1702#issuecomment-371137584 Build fails with ``` C:\projects\arrow\python\build\temp.win-amd64-3.6\Release\lib.cxx(45584): warning C4244: 'argument': conversion from 'int64_t' to 'int', possible loss of data [C:\projects\arrow\python\build\temp.win-amd64-3.6\Release\lib.vcxproj] 3701C:\projects\arrow\python\build\temp.win-amd64-3.6\Release\lib.cxx(45669): warning C4244: 'argument': conversion from 'int64_t' to 'int', possible loss of data [C:\projects\arrow\python\build\temp.win-amd64-3.6\Release\lib.vcxproj] ``` A review that points me to the problematic code would be appreciated. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Support slicing on pyarrow.ChunkedArray > > > Key: ARROW-2262 > URL: https://issues.apache.org/jira/browse/ARROW-2262 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray
[ https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389544#comment-16389544 ] ASF GitHub Bot commented on ARROW-2262: --- xhochy commented on issue #1702: ARROW-2262: [Python] Support slicing on pyarrow.ChunkedArray URL: https://github.com/apache/arrow/pull/1702#issuecomment-371137584 Build fails with ``` C:\projects\arrow\python\build\temp.win-amd64-3.6\Release\lib.cxx(45584): warning C4244: 'argument': conversion from 'int64_t' to 'int', possible loss of data [C:\projects\arrow\python\build\temp.win-amd64-3.6\Release\lib.vcxproj] C:\projects\arrow\python\build\temp.win-amd64-3.6\Release\lib.cxx(45669): warning C4244: 'argument': conversion from 'int64_t' to 'int', possible loss of data [C:\projects\arrow\python\build\temp.win-amd64-3.6\Release\lib.vcxproj] ``` A review that points me to the problematic code would be appreciated. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Support slicing on pyarrow.ChunkedArray > > > Key: ARROW-2262 > URL: https://issues.apache.org/jira/browse/ARROW-2262 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray
[ https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386453#comment-16386453 ] ASF GitHub Bot commented on ARROW-2262: --- xhochy opened a new pull request #1702: ARROW-2262: [Python] Support slicing on pyarrow.ChunkedArray URL: https://github.com/apache/arrow/pull/1702 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Support slicing on pyarrow.ChunkedArray > > > Key: ARROW-2262 > URL: https://issues.apache.org/jira/browse/ARROW-2262 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)