Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/17819#discussion_r152891005
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala
---
@@ -108,26 +164,53 @@ final class Bucketizer @Since("1.4.0")
(@Since("1.4.0") override val uid: String
}
}
- val bucketizer: UserDefinedFunction = udf { (feature: Double) =>
- Bucketizer.binarySearchForBuckets($(splits), feature, keepInvalid)
- }.withName("bucketizer")
+ val seqOfSplits = if (isBucketizeMultipleColumns()) {
+ $(splitsArray).toSeq
--- End diff --
I am interested in the motivation of using `.toSeq` and `Seq()` here
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]