Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20442#discussion_r165131413 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -167,25 +167,36 @@ final class QuantileDiscretizer @Since("1.6.0") (@Since("1.6.0") override val ui @Since("2.3.0") def setOutputCols(value: Array[String]): this.type = set(outputCols, value) - private[feature] def getInOutCols: (Array[String], Array[String]) = { - require((isSet(inputCol) && isSet(outputCol) && !isSet(inputCols) && !isSet(outputCols)) || - (!isSet(inputCol) && !isSet(outputCol) && isSet(inputCols) && isSet(outputCols)), - "QuantileDiscretizer only supports setting either inputCol/outputCol or" + - "inputCols/outputCols." - ) + @Since("1.6.0") + override def transformSchema(schema: StructType): StructType = { + ParamValidators.checkSingleVsMultiColumnParams(this, Seq(outputCol), + Seq(outputCols)) if (isSet(inputCol)) { - (Array($(inputCol)), Array($(outputCol))) - } else { - require($(inputCols).length == $(outputCols).length, - "inputCols number do not match outputCols") - ($(inputCols), $(outputCols)) + require(!isSet(numBucketsArray), + s"numBucketsArray can't be set for single-column QuantileDiscretizer.") --- End diff -- I was thinking about if I should add this check when I changed the code yesterday: If both numBucketsArray and numBuckets are set, the current code will only take numBucketsArray. Also, numBuckets always has a default value even if it's not set. So yesterday I decided not to add the check. But I guess it's better to tight the code to make user not set numBuckets explicitly when numBucketsArray is set. I will make the change to add the check.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org