Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/9207#issuecomment-210472126
@holdenk I took a very quick pass and the basics look ok. I will do a more
detailed one soon.
I was thinking, just looking at this... what about seeing if we can do
model export as a more pluggable approach? So, for example, in DataFrames we
can do `df.write.format("json").save("...")`, and in ML we can do
`kmeans.write.save("...")`.
So how about `kmeans.write.pmml.save("...")`? I'm not sure we need to do a
full-blown generic implementation here (as for DataFrame) as it's a lot of
overhead for not that much gain (essentially what is a "1st class citizen"?
probably native Spark and PMML).
But something simple like making `PMMLExportable` interact with `MLWriter`:
```scala
/** [[MLWriter]] instance for [[KMeansModel]] */
private[KMeansModel] class KMeansModelWriter(instance: KMeansModel)
extends MLWriter with PMMLWritable {
private case class Data(clusterCenters: Array[Vector])
override protected def saveImpl(path: String): Unit = {
// Save metadata and Params
DefaultParamsWriter.saveMetadata(instance, path, sc)
// Save model data: cluster centers
val data = Data(instance.clusterCenters)
val dataPath = new Path(path, "data").toString
sqlContext.createDataFrame(Seq(data)).repartition(1).write.parquet(dataPath)
}
override def pmml: MLWriter = new KMeansModelPMMWriter(instance)
}
private class KMeansModelPMMWriter(instance: KMeansModel) extends
MLWriter {
override protected def saveImpl(path: String): Unit =
instance.parentModel.toPMML(sc, path)
}
```
@jkbradley @mengxr @srowen thoughts?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]