[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

jkbradley Tue, 17 Apr 2018 13:44:33 -0700

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21081#discussion_r182216309
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
    @@ -90,7 +90,12 @@ private[clustering] trait KMeansParams extends Params 
with HasMaxIter with HasFe
        * @return output schema
        */
       protected def validateAndTransformSchema(schema: StructType): StructType 
= {
    -    SchemaUtils.checkColumnType(schema, $(featuresCol), new VectorUDT)
    +    val typeCandidates = List( new VectorUDT,
    +      new ArrayType(DoubleType, true),
    --- End diff --
    
    Thinking about this, let's actually disallow nullable columns.  KMeans 
won't handle nulls properly.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

Reply via email to