It sounds like your data does not all have the same dimension? that's
a decent guess. Have a look at the assertions in this method.

On Tue, Aug 12, 2014 at 4:44 AM, Ge, Yao (Y.) <y...@ford.com> wrote:
> I am trying to train a KMeans model with sparse vector with Spark 1.0.1.
>
> When I run the training I got the following exception:
>
> java.lang.IllegalArgumentException: requirement failed
>
>                 at scala.Predef$.require(Predef.scala:221)
>
>                 at
> org.apache.spark.mllib.util.MLUtils$.fastSquaredDistance(MLUtils.scala:271)
>
>                 at
> org.apache.spark.mllib.clustering.KMeans$.fastSquaredDistance(KMeans.scala:398)
>
>                 at
> org.apache.spark.mllib.clustering.KMeans$$anonfun$findClosest$1.apply(KMeans.scala:372)
>
>                 at
> org.apache.spark.mllib.clustering.KMeans$$anonfun$findClosest$1.apply(KMeans.scala:366)
>
>                 at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
>                 at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
>                 at
> org.apache.spark.mllib.clustering.KMeans$.findClosest(KMeans.scala:366)
>
>                 at
> org.apache.spark.mllib.clustering.KMeans$.pointCost(KMeans.scala:389)
>
>                 at
> org.apache.spark.mllib.clustering.KMeans$$anonfun$17$$anonfun$apply$7.apply(KMeans.scala:269)
>
>                 at
> org.apache.spark.mllib.clustering.KMeans$$anonfun$17$$anonfun$apply$7.apply(KMeans.scala:268)
>
>                 at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
>                 at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
>                 at scala.collection.immutable.Range.foreach(Range.scala:141)
>
>                 at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>
>                 at
> scala.collection.AbstractTraversable.map(Traversable.scala:105)
>
>                 at
> org.apache.spark.mllib.clustering.KMeans$$anonfun$17.apply(KMeans.scala:268)
>
>                 at
> org.apache.spark.mllib.clustering.KMeans$$anonfun$17.apply(KMeans.scala:267)
>
>
>
> What does this means? How do I troubleshoot this problem?
>
> Thanks.
>
>
>
> -Yao

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to