[ 
https://issues.apache.org/jira/browse/SPARK-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

RJ Nowling updated SPARK-3384:
------------------------------
    Description: 
In the KMeans clustering implementation, the Breeze vectors are accumulated 
using +=.  For example,

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L162

 This is potentially a thread unsafe operation.  (This is what I observed in 
local testing.)  I suggest changing the += to + -- a new object will be 
allocated but it will be thread safe since it won't write to an old location 
accessed by multiple threads.

Further testing is required to reproduce and verify.

  was:
In the KMeans clustering implementation, the Breeze vectors are accumulated 
using +=: 

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L162

 This is potentially a thread unsafe operation.  (This is what I observed in 
local testing.)  I suggest changing the += to + -- a new object will be 
allocated but it will be thread safe since it won't write to an old location 
accessed by multiple threads.

Further testing is required to reproduce and verify.


> Potential thread unsafe Breeze vector addition in KMeans
> --------------------------------------------------------
>
>                 Key: SPARK-3384
>                 URL: https://issues.apache.org/jira/browse/SPARK-3384
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>            Reporter: RJ Nowling
>
> In the KMeans clustering implementation, the Breeze vectors are accumulated 
> using +=.  For example,
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L162
>  This is potentially a thread unsafe operation.  (This is what I observed in 
> local testing.)  I suggest changing the += to + -- a new object will be 
> allocated but it will be thread safe since it won't write to an old location 
> accessed by multiple threads.
> Further testing is required to reproduce and verify.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to