Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/21081#discussion_r182216309
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
---
@@ -90,7 +90,12 @@ private[clustering] trait KMeansParams extends Params
with HasMaxIter with HasFe
* @return output schema
*/
protected def validateAndTransformSchema(schema: StructType): StructType
= {
- SchemaUtils.checkColumnType(schema, $(featuresCol), new VectorUDT)
+ val typeCandidates = List( new VectorUDT,
+ new ArrayType(DoubleType, true),
--- End diff --
Thinking about this, let's actually disallow nullable columns. KMeans
won't handle nulls properly.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]