[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index
mrkn commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r452846033 ## File path: python/pyarrow/tensor.pxi ## @@ -270,8 +279,10 @@ shape: {0.shape}""".format(self) &out_data, &out_coords)) data = PyObject_to_object(out_data) coords = PyObject_to_object(out_coords) -result = coo_matrix((data[:, 0], (coords[:, 0], coords[:, 1])), -shape=self.shape) +row, col = coords[:, 0], coords[:, 1] +result = coo_matrix((data[:, 0], (row, col)), shape=self.shape) +if self.has_canonical_format: +result.sum_duplicates() Review comment: > What about if the it's not canonical? Then we return noncanonical scipy object? Seems good. Yes, the canonicality is preserved. The returned scipy object is also noncanonical when the original `SparseCOOTensor` is noncanonical. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index
mrkn commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r452834004 ## File path: python/pyarrow/tensor.pxi ## @@ -270,8 +279,10 @@ shape: {0.shape}""".format(self) &out_data, &out_coords)) data = PyObject_to_object(out_data) coords = PyObject_to_object(out_coords) -result = coo_matrix((data[:, 0], (coords[:, 0], coords[:, 1])), -shape=self.shape) +row, col = coords[:, 0], coords[:, 1] +result = coo_matrix((data[:, 0], (row, col)), shape=self.shape) +if self.has_canonical_format: +result.sum_duplicates() Review comment: The `self` here is not an instance of `scipy.sparse.coo_matrix`, but it is a `SparseCOOTensor`. So `self.has_canonical_format == True` means the index of the `SparseCOOTensor`, that is `(row, col)` pair, is sorted in row-major order. Then `result.sum_duplicates` do sort `(row, col)` in column-major order. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index
mrkn commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r452771635 ## File path: python/pyarrow/tensor.pxi ## @@ -270,8 +279,10 @@ shape: {0.shape}""".format(self) &out_data, &out_coords)) data = PyObject_to_object(out_data) coords = PyObject_to_object(out_coords) -result = coo_matrix((data[:, 0], (coords[:, 0], coords[:, 1])), -shape=self.shape) +row, col = coords[:, 0], coords[:, 1] +result = coo_matrix((data[:, 0], (row, col)), shape=self.shape) +if self.has_canonical_format: +result.sum_duplicates() Review comment: No. As I described in the above comment, the purpose of this `sum_duplicates` is converting the order from row-major to column-major. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index
mrkn commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r452605554 ## File path: python/pyarrow/tensor.pxi ## @@ -270,8 +279,10 @@ shape: {0.shape}""".format(self) &out_data, &out_coords)) data = PyObject_to_object(out_data) coords = PyObject_to_object(out_coords) -result = coo_matrix((data[:, 0], (coords[:, 0], coords[:, 1])), -shape=self.shape) +row, col = coords[:, 0], coords[:, 1] +result = coo_matrix((data[:, 0], (row, col)), shape=self.shape) +if self.has_canonical_format: +result.sum_duplicates() Review comment: @rok Yes, if `is_canonical` is true, the indices matrix is sorted in row-major order and doesn't have duplicates. In other words, this flag has the same meaning as `has_canonical_format` of `scipy.sparse.coo_matrix` but the order is row-major, instead of column-major. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index
mrkn commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r443990095 ## File path: cpp/src/arrow/sparse_tensor_test.cc ## @@ -49,7 +49,10 @@ static inline void AssertCOOIndex(const std::shared_ptr& sidx, const int } } -TEST(TestSparseCOOIndex, Make) { +//- Review comment: I will add these tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index
mrkn commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r443990095 ## File path: cpp/src/arrow/sparse_tensor_test.cc ## @@ -49,7 +49,10 @@ static inline void AssertCOOIndex(const std::shared_ptr& sidx, const int } } -TEST(TestSparseCOOIndex, Make) { +//- Review comment: I will add these tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index
mrkn commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r443982716 ## File path: python/pyarrow/tensor.pxi ## @@ -339,6 +350,15 @@ shape: {0.shape}""".format(self) def non_zero_length(self): return self.stp.non_zero_length() +@property +def has_canonical_format(self): +cdef: +_CSparseCOOIndexPtr csi + +csi = dynamic_cast[_CSparseCOOIndexPtr](self.stp.sparse_index().get()) Review comment: The reason is that I couldn't find normal Cython's cast when I wrote this code. Thank you for telling me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index
mrkn commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r443928439 ## File path: python/pyarrow/tensor.pxi ## @@ -199,7 +202,13 @@ shape: {0.shape}""".format(self) for x in dim_names: c_dim_names.push_back(tobytes(x)) -coords = np.vstack([obj.row, obj.col]).T +row = obj.row +col = obj.col +if obj.has_canonical_format: Review comment: The attribute [actually exists](https://github.com/scipy/scipy/blob/master/scipy/sparse/coo.py#L137) but not documented. It may be an internal attribute, but it does not have `_` prefix, so I detect it can be used. I need to refer this attribute to recognize whether or not a given SciPy's coo_matrix has canonical format. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index
mrkn commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r443922882 ## File path: python/pyarrow/tensor.pxi ## @@ -199,7 +202,13 @@ shape: {0.shape}""".format(self) for x in dim_names: c_dim_names.push_back(tobytes(x)) -coords = np.vstack([obj.row, obj.col]).T +row = obj.row +col = obj.col +if obj.has_canonical_format: +order = np.lexsort((col, row)) # row-major order Review comment: SciPy's coo_matrix employs column-major order, but our SparseCOOIndex employs row-major order. This is the reason why we need to sort the index here. The reason why our SparseCOOIndex should employ row-major order is that TensorFlow's SparseTensor employs row-major order. Converting from SciPy's coo_matrrix needs to copy the index matrix even if we employ column-major order, but converting from TensorFlow's SparseTensor does not need to copy the index matrix if we employ row-major order. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org