Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/20442#discussion_r165131413
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala ---
@@ -167,25 +167,36 @@ final class QuantileDiscretizer @Since("1.6.0")
(@Since("1.6.0") override val ui
@Since("2.3.0")
def setOutputCols(value: Array[String]): this.type = set(outputCols,
value)
- private[feature] def getInOutCols: (Array[String], Array[String]) = {
- require((isSet(inputCol) && isSet(outputCol) && !isSet(inputCols) &&
!isSet(outputCols)) ||
- (!isSet(inputCol) && !isSet(outputCol) && isSet(inputCols) &&
isSet(outputCols)),
- "QuantileDiscretizer only supports setting either inputCol/outputCol
or" +
- "inputCols/outputCols."
- )
+ @Since("1.6.0")
+ override def transformSchema(schema: StructType): StructType = {
+ ParamValidators.checkSingleVsMultiColumnParams(this, Seq(outputCol),
+ Seq(outputCols))
if (isSet(inputCol)) {
- (Array($(inputCol)), Array($(outputCol)))
- } else {
- require($(inputCols).length == $(outputCols).length,
- "inputCols number do not match outputCols")
- ($(inputCols), $(outputCols))
+ require(!isSet(numBucketsArray),
+ s"numBucketsArray can't be set for single-column
QuantileDiscretizer.")
--- End diff --
I was thinking about if I should add this check when I changed the code
yesterday:
If both numBucketsArray and numBuckets are set, the current code will only
take numBucketsArray. Also, numBuckets always has a default value even if it's
not set. So yesterday I decided not to add the check.
But I guess it's better to tight the code to make user not set numBuckets
explicitly when numBucketsArray is set. I will make the change to add the check.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]