I figured it out. My indices parameters for the sparse vector are messed up. It 
is a good learning for me:
When use the Vectors.sparse(int size, int[] indices, double[] values) to 
generate a vector, size is the size of the whole vector, not just the size of 
the elements with value. The indices array will need to be in ascending order. 
In many cases, it probably easier to use other two forms of Vectors.sparse 
functions if the indices and value positions are not naturally sorted.

-Yao


From: Ge, Yao (Y.)
Sent: Monday, August 11, 2014 11:44 PM
To: 'u...@spark.incubator.apache.org'
Subject: KMeans - java.lang.IllegalArgumentException: requirement failed

I am trying to train a KMeans model with sparse vector with Spark 1.0.1.
When I run the training I got the following exception:
java.lang.IllegalArgumentException: requirement failed
                at scala.Predef$.require(Predef.scala:221)
                at 
org.apache.spark.mllib.util.MLUtils$.fastSquaredDistance(MLUtils.scala:271)
                at 
org.apache.spark.mllib.clustering.KMeans$.fastSquaredDistance(KMeans.scala:398)
                at 
org.apache.spark.mllib.clustering.KMeans$$anonfun$findClosest$1.apply(KMeans.scala:372)
                at 
org.apache.spark.mllib.clustering.KMeans$$anonfun$findClosest$1.apply(KMeans.scala:366)
                at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
                at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
                at 
org.apache.spark.mllib.clustering.KMeans$.findClosest(KMeans.scala:366)
                at 
org.apache.spark.mllib.clustering.KMeans$.pointCost(KMeans.scala:389)
                at 
org.apache.spark.mllib.clustering.KMeans$$anonfun$17$$anonfun$apply$7.apply(KMeans.scala:269)
                at 
org.apache.spark.mllib.clustering.KMeans$$anonfun$17$$anonfun$apply$7.apply(KMeans.scala:268)
                at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
                at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
                at scala.collection.immutable.Range.foreach(Range.scala:141)
                at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
                at 
scala.collection.AbstractTraversable.map(Traversable.scala:105)
                at 
org.apache.spark.mllib.clustering.KMeans$$anonfun$17.apply(KMeans.scala:268)
                at 
org.apache.spark.mllib.clustering.KMeans$$anonfun$17.apply(KMeans.scala:267)

What does this means? How do I troubleshoot this problem?
Thanks.

-Yao

Reply via email to