Hi, I have a question regarding one of the oddities we encountered while running mllib's column similarities operation. When we examine the output, we find duplicate matrix entries (the same i,j). Sometimes the entries have the same value/similarity score, but they're frequently different too.
Is this a known issue? An artifact of the probabilistic nature of the output? Which output score should we trust (lower vs higher one when different)? We're using a threshold of 0.3, and running Spark 1.3.1 on a 10 node cluster. Thanks Rick -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Duplicate-entries-in-output-of-mllib-column-similarities-tp22807.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org