[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

jkbradley Mon, 16 Apr 2018 11:53:02 -0700

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21081#discussion_r181847695
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
    @@ -312,6 +329,8 @@ class KMeans @Since("1.5.0") (
         val handlePersistence = dataset.storageLevel == StorageLevel.NONE
         val instances: RDD[OldVector] = 
dataset.select(col($(featuresCol))).rdd.map {
           case Row(point: Vector) => OldVectors.fromML(point)
    +      case Row(point: Seq[_]) =>
    +        
OldVectors.fromML(Vectors.dense(point.asInstanceOf[Seq[Double]].toArray))
    --- End diff --
    
    I'm not sure this will work with arrays of FloatType.  Make sure to test it



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

Reply via email to