[GitHub] spark pull request #20442: [SPARK-23265][SQL]Update multi-column error handl...

huaxingao Wed, 31 Jan 2018 09:45:12 -0800

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20442#discussion_r165131413
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala ---
    @@ -167,25 +167,36 @@ final class QuantileDiscretizer @Since("1.6.0") 
(@Since("1.6.0") override val ui
       @Since("2.3.0")
       def setOutputCols(value: Array[String]): this.type = set(outputCols, 
value)
     
    -  private[feature] def getInOutCols: (Array[String], Array[String]) = {
    -    require((isSet(inputCol) && isSet(outputCol) && !isSet(inputCols) && 
!isSet(outputCols)) ||
    -      (!isSet(inputCol) && !isSet(outputCol) && isSet(inputCols) && 
isSet(outputCols)),
    -      "QuantileDiscretizer only supports setting either inputCol/outputCol 
or" +
    -        "inputCols/outputCols."
    -    )
    +  @Since("1.6.0")
    +  override def transformSchema(schema: StructType): StructType = {
    +    ParamValidators.checkSingleVsMultiColumnParams(this, Seq(outputCol),
    +      Seq(outputCols))
     
         if (isSet(inputCol)) {
    -      (Array($(inputCol)), Array($(outputCol)))
    -    } else {
    -      require($(inputCols).length == $(outputCols).length,
    -        "inputCols number do not match outputCols")
    -      ($(inputCols), $(outputCols))
    +      require(!isSet(numBucketsArray),
    +        s"numBucketsArray can't be set for single-column 
QuantileDiscretizer.")
    --- End diff --
    
    I was thinking about if I should add this check when I changed the code 
yesterday:
    If both numBucketsArray and numBuckets are set, the current code will only 
take numBucketsArray. Also, numBuckets always has a default value even if it's 
not set. So yesterday I decided not to add the check. 
    But I guess it's better to tight the code to make user not set numBuckets 
explicitly when numBucketsArray is set. I will make the change to add the check.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20442: [SPARK-23265][SQL]Update multi-column error handl...

Reply via email to