Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20594#discussion_r167762013
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala
---
@@ -290,6 +293,27 @@ object Bucketizer extends
DefaultParamsReadable[Bucketizer] {
}
}
+
+ private[Bucketizer] class BucketizerWriter(instance: Bucketizer) extends
MLWriter {
+
+ override protected def saveImpl(path: String): Unit = {
+ // SPARK-23377: The default params will be saved and loaded as
user-supplied params.
+ // Once `inputCols` is set, the default value of `outputCol` param
causes the error
+ // when checking exclusive params. As a temporary to fix it, we
remove the default
+ // value of `outputCol` if `inputCols` is set before saving.
+ // TODO: If we modify the persistence mechanism later to better
handle default params,
+ // we can get rid of this.
+ var removedOutputCol: Option[String] = None
+ if (instance.isSet(instance.inputCols)) {
+ removedOutputCol = instance.getDefault(instance.outputCol)
+ instance.clearDefault(instance.outputCol)
+ }
+ DefaultParamsWriter.saveMetadata(instance, path, sc)
+ // Add the default param back.
+ removedOutputCol.map(instance.setDefault(instance.outputCol, _))
--- End diff --
Although the saving logic is the same as `QuantileDiscretizerWriter`, I
leave them as duplicate for now since this is a quick fix. If there is strong
preference, I can make a common class for it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]