Github user frreiss commented on a diff in the pull request:
https://github.com/apache/spark/pull/11119#discussion_r55276968
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
---
@@ -169,12 +182,29 @@ object KMeansModel extends MLReadable[KMeansModel] {
/** [[MLWriter]] instance for [[KMeansModel]] */
private[KMeansModel] class KMeansModelWriter(instance: KMeansModel)
extends MLWriter {
+ import org.json4s.JsonDSL._
private case class Data(clusterCenters: Array[Vector])
override protected def saveImpl(path: String): Unit = {
- // Save metadata and Params
- DefaultParamsWriter.saveMetadata(instance, path, sc)
+ if (instance.isSet(instance.initialModel)) {
+ val initialModelPath = new Path(path, "initial-model").toString
+ val initialModel = instance.getInitialModel
+ initialModel.save(initialModelPath)
+
+ // Remove the initialModel temporarily
+ instance.clear(instance.initialModel)
--- End diff --
It's probably not a good idea for this serialization method to modify the
model. Two potential problem scenarios come to mind: (a) The call to
saveMetadata() below fails, leaving the entire KMeansModel object in an
inconsistent state; or (b) another thread could be accessing the initialModel
field while the current thread calls saveImpl()
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]