Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1778#discussion_r17579667
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala 
---
    @@ -390,6 +393,113 @@ class RowMatrix(
         new RowMatrix(AB, nRows, B.numCols)
       }
     
    +  /**
    +   * Compute all cosine similarities between columns of this matrix using 
the brute-force
    +   * approach of computing normalized dot products.
    +   *
    +   * @return An n x n sparse upper-triangular matrix of cosine 
similarities between
    +   *         columns of this matrix.
    +   */
    +  def columnSimilarities(): CoordinateMatrix = {
    +    similarColumns(0.0)
    +  }
    +
    +  /**
    +   * Compute all similarities between columns of this matrix using a 
sampling approach.
    +   *
    +   * The threshold parameter is a trade-off knob between estimate quality 
and computational cost.
    +   *
    +   * Setting a threshold of 0 guarantees deterministic correct results, 
but comes at exactly
    +   * the same cost as the brute-force approach. Setting the threshold to 
positive values
    +   * incurs strictly less computational cost than the brute-force 
approach, however the
    +   * similarities computed will be estimates.
    +   *
    +   * The sampling guarantees relative-error correctness for those pairs of 
columns that have
    +   * similarity greater than the given similarity threshold.
    +   *
    +   * To describe the guarantee, we set some notation:
    +   * Let A be the smallest in magnitude non-zero element of this matrix.
    +   * Let B be the largest  in magnitude non-zero element of this matrix.
    +   * Let L be the number of non-zeros per row.
    --- End diff --
    
    Is it average or max number of nonzeros?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to