[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

frreiss Mon, 07 Mar 2016 13:41:25 -0800

Github user frreiss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11119#discussion_r55276968
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
    @@ -169,12 +182,29 @@ object KMeansModel extends MLReadable[KMeansModel] {
     
       /** [[MLWriter]] instance for [[KMeansModel]] */
       private[KMeansModel] class KMeansModelWriter(instance: KMeansModel) 
extends MLWriter {
    +    import org.json4s.JsonDSL._
     
         private case class Data(clusterCenters: Array[Vector])
     
         override protected def saveImpl(path: String): Unit = {
    -      // Save metadata and Params
    -      DefaultParamsWriter.saveMetadata(instance, path, sc)
    +      if (instance.isSet(instance.initialModel)) {
    +        val initialModelPath = new Path(path, "initial-model").toString
    +        val initialModel = instance.getInitialModel
    +        initialModel.save(initialModelPath)
    +
    +        // Remove the initialModel temporarily
    +        instance.clear(instance.initialModel)
    --- End diff --
    
    It's probably not a good idea for this serialization method to modify the 
model. Two potential problem scenarios come to mind: (a) The call to 
saveMetadata() below fails, leaving the entire KMeansModel object in an 
inconsistent state; or (b) another thread could be accessing the initialModel 
field while the current thread calls saveImpl()



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

Reply via email to