[GitHub] [arrow] rok commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-07-10 Thread GitBox


rok commented on a change in pull request #7477:
URL: https://github.com/apache/arrow/pull/7477#discussion_r452831292



##
File path: python/pyarrow/tensor.pxi
##
@@ -270,8 +279,10 @@ shape: {0.shape}""".format(self)
   _data, _coords))
 data = PyObject_to_object(out_data)
 coords = PyObject_to_object(out_coords)
-result = coo_matrix((data[:, 0], (coords[:, 0], coords[:, 1])),
-shape=self.shape)
+row, col = coords[:, 0], coords[:, 1]
+result = coo_matrix((data[:, 0], (row, col)), shape=self.shape)
+if self.has_canonical_format:
+result.sum_duplicates()

Review comment:
   Oh, we're checking if `SparseCOOTensor.has_canonical_format` not 
`scipy.coo_matrix.has_canonical_format`. Got it.
   What about if the it's not canonical? Then we return noncanonical scipy 
object? Seems good.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rok commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-07-10 Thread GitBox


rok commented on a change in pull request #7477:
URL: https://github.com/apache/arrow/pull/7477#discussion_r452824628



##
File path: python/pyarrow/tensor.pxi
##
@@ -270,8 +279,10 @@ shape: {0.shape}""".format(self)
   _data, _coords))
 data = PyObject_to_object(out_data)
 coords = PyObject_to_object(out_coords)
-result = coo_matrix((data[:, 0], (coords[:, 0], coords[:, 1])),
-shape=self.shape)
+row, col = coords[:, 0], coords[:, 1]
+result = coo_matrix((data[:, 0], (row, col)), shape=self.shape)
+if self.has_canonical_format:
+result.sum_duplicates()

Review comment:
   Wouldn't 
[sum_duplicates](https://github.com/scipy/scipy/blob/v1.5.1/scipy/sparse/coo.py#L526-L535)
 just terminate without doing anything in case `self.has_canonical_format`?
   Sorry if I'm missing something here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rok commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-07-06 Thread GitBox


rok commented on a change in pull request #7477:
URL: https://github.com/apache/arrow/pull/7477#discussion_r450340186



##
File path: python/pyarrow/tensor.pxi
##
@@ -199,7 +202,13 @@ shape: {0.shape}""".format(self)
 for x in dim_names:
 c_dim_names.push_back(tobytes(x))
 
-coords = np.vstack([obj.row, obj.col]).T
+row = obj.row
+col = obj.col
+if obj.has_canonical_format:
+order = np.lexsort((col, row))  # row-major order

Review comment:
   So column-major is scipy-canonical and row-major is tensorflow canonical?
   Would we then not rather just introduce a new property e.g.: 
`has_duplicates`? I would find that less complex.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rok commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-07-06 Thread GitBox


rok commented on a change in pull request #7477:
URL: https://github.com/apache/arrow/pull/7477#discussion_r450325016



##
File path: python/pyarrow/tensor.pxi
##
@@ -270,8 +279,10 @@ shape: {0.shape}""".format(self)
   _data, _coords))
 data = PyObject_to_object(out_data)
 coords = PyObject_to_object(out_coords)
-result = coo_matrix((data[:, 0], (coords[:, 0], coords[:, 1])),
-shape=self.shape)
+row, col = coords[:, 0], coords[:, 1]
+result = coo_matrix((data[:, 0], (row, col)), shape=self.shape)
+if self.has_canonical_format:
+result.sum_duplicates()

Review comment:
   @mrkn - wouldn't is_canonical guarantee no duplicates if indices were 
row- or column-ordered?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org