Hi Burak. I always see this error. I'm running the CDH 5.2 version of Spark 1.1.0. I load my data from HDFS. By the time it hits the recommender it had gone through many spark operations. On Oct 27, 2014 4:03 PM, "Burak Yavuz" <bya...@stanford.edu> wrote:
> Hi, > > I've come across this multiple times, but not in a consistent manner. I > found it hard to reproduce. I have a jira for it: SPARK-3080 > > Do you observe this error every single time? Where do you load your data > from? Which version of Spark are you running? > Figuring out the similarities may help in pinpointing the bug. > > Thanks, > Burak > > ----- Original Message ----- > From: "Ilya Ganelin" <ilgan...@gmail.com> > To: "user" <user@spark.apache.org> > Sent: Monday, October 27, 2014 11:36:46 AM > Subject: MLLib ALS ArrayIndexOutOfBoundsException with Scala Spark 1.1.0 > > Hello all - I am attempting to run MLLib's ALS algorithm on a substantial > test vector - approx. 200 million records. > > I have resolved a few issues I've had with regards to garbage collection, > KryoSeralization, and memory usage. > > I have not been able to get around this issue I see below however: > > > > java.lang. > > ArrayIndexOutOfBoundsException: 6106 > > > > > org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateBlock$1.apply$mcVI$sp(ALS. > > scala:543) > > scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) > > org.apache.spark.mllib.recommendation.ALS.org > > $apache$spark$mllib$recommendation$ALS$$updateBlock(ALS.scala:537) > > > > > org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:505) > > > > > org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:504) > > > > > org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31) > > > > > org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31) > > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > > > > > org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:144) > > > > > org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159) > > > > > org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158) > > > > > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > > > > > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > > > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > > > > > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > > org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158) > > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > > > org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) > > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > > I do not have any negative indices or indices that exceed Int-Max. > > I have partitioned the input data into 300 partitions and my Spark config > is below: > > .set("spark.executor.memory", "14g") > .set("spark.storage.memoryFraction", "0.8") > .set("spark.serializer", > "org.apache.spark.serializer.KryoSerializer") > .set("spark.kryo.registrator", "MyRegistrator") > .set("spark.core.connection.ack.wait.timeout","600") > .set("spark.akka.frameSize","50") > .set("spark.yarn.executor.memoryOverhead","1024") > > Does anyone have any suggestions as to why i'm seeing the above error or > how to get around it? > It may be possible to upgrade to the latest version of Spark but the > mechanism for doing so in our environment isn't obvious yet. > > -Ilya Ganelin > >