I have a version that works well for Netflix data but now I am validating
on internal datasets..this code will work on matrix factors and sparse
matrices that has rows = 100* columnsif columns are much smaller than
rows then col based flow works well...basically we need both flows...
I did
Right now dimsum is meant to be used for tall and skinny matrices, and so
columnSimilarities() returns similar columns, not rows. We are working on
adding an efficient row similarity as well, tracked by this JIRA:
https://issues.apache.org/jira/browse/SPARK-4823
Reza
On Mon, Apr 6, 2015 at 6:08
The example below illustrates how to use the DIMSUM algorithm to calculate
the similarity between each two rows and output row pairs with cosine
simiarity that is not less than a threshold.