[GitHub] [spark] zhengruifeng commented on issue #26483: [SPARK-29823][MLLIB] Improper persist strategy in mllib.clustering.KMeans.run()

GitBox Sat, 28 Dec 2019 23:06:47 -0800

zhengruifeng commented on issue #26483: [SPARK-29823][MLLIB] Improper persist 
strategy in mllib.clustering.KMeans.run()
URL: https://github.com/apache/spark/pull/26483#issuecomment-569480945
 
 
   I mean, `input: RDD[Vector]` is likely to be cached outside of this method:
   1, it is cached in ml.KMeans
   2, end uers are likely to cache it outside of train/run, since it is 
suggested in related docs
   
   So if we cache `zippedData`, we really cache input twice.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhengruifeng commented on issue #26483: [SPARK-29823][MLLIB] Improper persist strategy in mllib.clustering.KMeans.run()

Reply via email to