[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-07-10 Thread GitBox


mrkn commented on a change in pull request #7477:
URL: https://github.com/apache/arrow/pull/7477#discussion_r452846033



##
File path: python/pyarrow/tensor.pxi
##
@@ -270,8 +279,10 @@ shape: {0.shape}""".format(self)
   &out_data, &out_coords))
 data = PyObject_to_object(out_data)
 coords = PyObject_to_object(out_coords)
-result = coo_matrix((data[:, 0], (coords[:, 0], coords[:, 1])),
-shape=self.shape)
+row, col = coords[:, 0], coords[:, 1]
+result = coo_matrix((data[:, 0], (row, col)), shape=self.shape)
+if self.has_canonical_format:
+result.sum_duplicates()

Review comment:
   > What about if the it's not canonical? Then we return noncanonical 
scipy object? Seems good.
   
   Yes, the canonicality is preserved.  The returned scipy object is also 
noncanonical when the original `SparseCOOTensor` is noncanonical.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-07-10 Thread GitBox


mrkn commented on a change in pull request #7477:
URL: https://github.com/apache/arrow/pull/7477#discussion_r452834004



##
File path: python/pyarrow/tensor.pxi
##
@@ -270,8 +279,10 @@ shape: {0.shape}""".format(self)
   &out_data, &out_coords))
 data = PyObject_to_object(out_data)
 coords = PyObject_to_object(out_coords)
-result = coo_matrix((data[:, 0], (coords[:, 0], coords[:, 1])),
-shape=self.shape)
+row, col = coords[:, 0], coords[:, 1]
+result = coo_matrix((data[:, 0], (row, col)), shape=self.shape)
+if self.has_canonical_format:
+result.sum_duplicates()

Review comment:
   The `self` here is not an instance of `scipy.sparse.coo_matrix`, but it 
is a `SparseCOOTensor`.  So `self.has_canonical_format == True` means the index 
of the `SparseCOOTensor`, that is `(row, col)` pair, is sorted in row-major 
order.  Then `result.sum_duplicates` do sort `(row, col)` in column-major order.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-07-10 Thread GitBox


mrkn commented on a change in pull request #7477:
URL: https://github.com/apache/arrow/pull/7477#discussion_r452771635



##
File path: python/pyarrow/tensor.pxi
##
@@ -270,8 +279,10 @@ shape: {0.shape}""".format(self)
   &out_data, &out_coords))
 data = PyObject_to_object(out_data)
 coords = PyObject_to_object(out_coords)
-result = coo_matrix((data[:, 0], (coords[:, 0], coords[:, 1])),
-shape=self.shape)
+row, col = coords[:, 0], coords[:, 1]
+result = coo_matrix((data[:, 0], (row, col)), shape=self.shape)
+if self.has_canonical_format:
+result.sum_duplicates()

Review comment:
   No.  As I described in the above comment, the purpose of this 
`sum_duplicates` is converting the order from row-major to column-major.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-07-09 Thread GitBox


mrkn commented on a change in pull request #7477:
URL: https://github.com/apache/arrow/pull/7477#discussion_r452605554



##
File path: python/pyarrow/tensor.pxi
##
@@ -270,8 +279,10 @@ shape: {0.shape}""".format(self)
   &out_data, &out_coords))
 data = PyObject_to_object(out_data)
 coords = PyObject_to_object(out_coords)
-result = coo_matrix((data[:, 0], (coords[:, 0], coords[:, 1])),
-shape=self.shape)
+row, col = coords[:, 0], coords[:, 1]
+result = coo_matrix((data[:, 0], (row, col)), shape=self.shape)
+if self.has_canonical_format:
+result.sum_duplicates()

Review comment:
   @rok Yes, if `is_canonical` is true, the indices matrix is sorted in 
row-major order and doesn't have duplicates.
   In other words, this flag has the same meaning as `has_canonical_format` of 
`scipy.sparse.coo_matrix` but the order is row-major, instead of column-major.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-06-22 Thread GitBox


mrkn commented on a change in pull request #7477:
URL: https://github.com/apache/arrow/pull/7477#discussion_r443990095



##
File path: cpp/src/arrow/sparse_tensor_test.cc
##
@@ -49,7 +49,10 @@ static inline void AssertCOOIndex(const 
std::shared_ptr& sidx, const int
   }
 }
 
-TEST(TestSparseCOOIndex, Make) {
+//-

Review comment:
   I will add these tests.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-06-22 Thread GitBox


mrkn commented on a change in pull request #7477:
URL: https://github.com/apache/arrow/pull/7477#discussion_r443990095



##
File path: cpp/src/arrow/sparse_tensor_test.cc
##
@@ -49,7 +49,10 @@ static inline void AssertCOOIndex(const 
std::shared_ptr& sidx, const int
   }
 }
 
-TEST(TestSparseCOOIndex, Make) {
+//-

Review comment:
   I will add these tests.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-06-22 Thread GitBox


mrkn commented on a change in pull request #7477:
URL: https://github.com/apache/arrow/pull/7477#discussion_r443982716



##
File path: python/pyarrow/tensor.pxi
##
@@ -339,6 +350,15 @@ shape: {0.shape}""".format(self)
 def non_zero_length(self):
 return self.stp.non_zero_length()
 
+@property
+def has_canonical_format(self):
+cdef:
+_CSparseCOOIndexPtr csi
+
+csi = dynamic_cast[_CSparseCOOIndexPtr](self.stp.sparse_index().get())

Review comment:
   The reason is that I couldn't find normal Cython's cast when I wrote 
this code.  Thank you for telling me.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-06-22 Thread GitBox


mrkn commented on a change in pull request #7477:
URL: https://github.com/apache/arrow/pull/7477#discussion_r443928439



##
File path: python/pyarrow/tensor.pxi
##
@@ -199,7 +202,13 @@ shape: {0.shape}""".format(self)
 for x in dim_names:
 c_dim_names.push_back(tobytes(x))
 
-coords = np.vstack([obj.row, obj.col]).T
+row = obj.row
+col = obj.col
+if obj.has_canonical_format:

Review comment:
   The attribute [actually 
exists](https://github.com/scipy/scipy/blob/master/scipy/sparse/coo.py#L137) 
but not documented.  It may be an internal attribute, but it does not have `_` 
prefix, so I detect it can be used.  I need to refer this attribute to 
recognize whether or not a given SciPy's coo_matrix has canonical format.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-06-22 Thread GitBox


mrkn commented on a change in pull request #7477:
URL: https://github.com/apache/arrow/pull/7477#discussion_r443922882



##
File path: python/pyarrow/tensor.pxi
##
@@ -199,7 +202,13 @@ shape: {0.shape}""".format(self)
 for x in dim_names:
 c_dim_names.push_back(tobytes(x))
 
-coords = np.vstack([obj.row, obj.col]).T
+row = obj.row
+col = obj.col
+if obj.has_canonical_format:
+order = np.lexsort((col, row))  # row-major order

Review comment:
   SciPy's coo_matrix employs column-major order, but our SparseCOOIndex 
employs row-major order.
   This is the reason why we need to sort the index here.
   
   The reason why our SparseCOOIndex should employ row-major order is that 
TensorFlow's SparseTensor employs row-major order.  Converting from SciPy's 
coo_matrrix needs to copy the index matrix even if we employ column-major 
order, but converting from TensorFlow's SparseTensor does not need to copy the 
index matrix if we employ row-major order.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org