Re: Using DIMSUM with ids

2015-04-07 Thread Debasish Das
I have a version that works well for Netflix data but now I am validating on internal datasets..this code will work on matrix factors and sparse matrices that has rows = 100* columnsif columns are much smaller than rows then col based flow works well...basically we need both flows... I did

Re: Using DIMSUM with ids

2015-04-06 Thread Reza Zadeh
Right now dimsum is meant to be used for tall and skinny matrices, and so columnSimilarities() returns similar columns, not rows. We are working on adding an efficient row similarity as well, tracked by this JIRA: https://issues.apache.org/jira/browse/SPARK-4823 Reza On Mon, Apr 6, 2015 at 6:08

Using DIMSUM with ids

2015-04-06 Thread James
The example below illustrates how to use the DIMSUM algorithm to calculate the similarity between each two rows and output row pairs with cosine simiarity that is not less than a threshold.