[ https://issues.apache.org/jira/browse/SPARK-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
RJ Nowling updated SPARK-3384: ------------------------------ Description: In the KMeans clustering implementation, the Breeze vectors are accumulated using +=. For example, https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L162 This is potentially a thread unsafe operation. (This is what I observed in local testing.) I suggest changing the += to + -- a new object will be allocated but it will be thread safe since it won't write to an old location accessed by multiple threads. Further testing is required to reproduce and verify. was: In the KMeans clustering implementation, the Breeze vectors are accumulated using +=: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L162 This is potentially a thread unsafe operation. (This is what I observed in local testing.) I suggest changing the += to + -- a new object will be allocated but it will be thread safe since it won't write to an old location accessed by multiple threads. Further testing is required to reproduce and verify. > Potential thread unsafe Breeze vector addition in KMeans > -------------------------------------------------------- > > Key: SPARK-3384 > URL: https://issues.apache.org/jira/browse/SPARK-3384 > Project: Spark > Issue Type: Bug > Components: MLlib > Reporter: RJ Nowling > > In the KMeans clustering implementation, the Breeze vectors are accumulated > using +=. For example, > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L162 > This is potentially a thread unsafe operation. (This is what I observed in > local testing.) I suggest changing the += to + -- a new object will be > allocated but it will be thread safe since it won't write to an old location > accessed by multiple threads. > Further testing is required to reproduce and verify. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org