Could you check whether the vectors have the same size? -Xiangrui
On Wed, Jun 4, 2014 at 1:43 AM, bluejoe2008 <[email protected]> wrote: > what does this exception mean? > > 14/06/04 16:35:15 ERROR executor.Executor: Exception in task ID 6 > java.lang.IllegalArgumentException: requirement failed > at scala.Predef$.require(Predef.scala:221) > at > org.apache.spark.mllib.util.MLUtils$.fastSquaredDistance(MLUtils.scala:271) > at > org.apache.spark.mllib.clustering.KMeans$.fastSquaredDistance(KMeans.scala:398) > at > org.apache.spark.mllib.clustering.KMeans$$anonfun$findClosest$1.apply(KMeans.scala:372) > at > org.apache.spark.mllib.clustering.KMeans$$anonfun$findClosest$1.apply(KMeans.scala:366) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at org.apache.spark.mllib.clustering.KMeans$.findClosest(KMeans.scala:366) > at org.apache.spark.mllib.clustering.KMeans$.pointCost(KMeans.scala:389) > at > org.apache.spark.mllib.clustering.KMeans$$anonfun$17$$anonfun$apply$7.apply(KMeans.scala:269) > at > org.apache.spark.mllib.clustering.KMeans$$anonfun$17$$anonfun$apply$7.apply(KMeans.scala:268) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.Range.foreach(Range.scala:141) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.mllib.clustering.KMeans$$anonfun$17.apply(KMeans.scala:268) > at > org.apache.spark.mllib.clustering.KMeans$$anonfun$17.apply(KMeans.scala:267) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95) > at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582) > at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at org.apache.spark.scheduler.Task.run(Task.scala:51) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > > my spark version: 1.0.0 > Java: 1.7 > my codes: > > JavaRDD<Vector> docVectors = generateDocVector(...); > int numClusters = 20; > int numIterations = 20; > KMeansModel clusters = KMeans.train(docVectors.rdd(), numClusters, > numIterations); > > another strange thing is that the mapPartitionsWithIndex() method call in > generateDocVector() are invoked for 3 times... > > 2014-06-04 > ________________________________ > bluejoe2008
