Duplicate entries in output of mllib column similarities

rbolkey Thu, 07 May 2015 16:20:07 -0700

Hi,

I have a question regarding one of the oddities we encountered while running
mllib's column similarities operation. When we examine the output, we find
duplicate matrix entries (the same i,j). Sometimes the entries have the same
value/similarity score, but they're frequently different too.


Is this a known issue? An artifact of the probabilistic nature of the
output? Which output score should we trust (lower vs higher one when
different)? We're using a threshold of 0.3, and running Spark 1.3.1 on a 10
node cluster.

Thanks
Rick



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Duplicate-entries-in-output-of-mllib-column-similarities-tp22807.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Duplicate entries in output of mllib column similarities

Reply via email to