[GitHub] spark pull request #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

viirya Mon, 12 Feb 2018 21:10:55 -0800

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20594#discussion_r167762013
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala 
---
    @@ -290,6 +293,27 @@ object Bucketizer extends 
DefaultParamsReadable[Bucketizer] {
         }
       }
     
    +
    +  private[Bucketizer] class BucketizerWriter(instance: Bucketizer) extends 
MLWriter {
    +
    +    override protected def saveImpl(path: String): Unit = {
    +      // SPARK-23377: The default params will be saved and loaded as 
user-supplied params.
    +      // Once `inputCols` is set, the default value of `outputCol` param 
causes the error
    +      // when checking exclusive params. As a temporary to fix it, we 
remove the default
    +      // value of `outputCol` if `inputCols` is set before saving.
    +      // TODO: If we modify the persistence mechanism later to better 
handle default params,
    +      // we can get rid of this.
    +      var removedOutputCol: Option[String] = None
    +      if (instance.isSet(instance.inputCols)) {
    +        removedOutputCol = instance.getDefault(instance.outputCol)
    +        instance.clearDefault(instance.outputCol)
    +      }
    +      DefaultParamsWriter.saveMetadata(instance, path, sc)
    +      // Add the default param back.
    +      removedOutputCol.map(instance.setDefault(instance.outputCol, _))
    --- End diff --
    
    Although the saving logic is the same as `QuantileDiscretizerWriter`, I 
leave them as duplicate for now since this is a quick fix. If there is strong 
preference, I can make a common class for it.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

Reply via email to