Thanks Sean...let me get the latest code..do you know which PR was it ? But will the executors run fine with say 32 gb or 64 gb of memory ? Does not JVM shows up issues when the max memory goes beyond certain limit...
Also the failure is due to GC limits from jblas...and I was thinking that jblas is going to call native malloc right ? May be 64 gb is not a big deal then...I will try increasing to 32 and then 64... java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded) org.jblas.DoubleMatrix.<init>(DoubleMatrix.java:323) org.jblas.DoubleMatrix.zeros(DoubleMatrix.java:471) org.jblas.DoubleMatrix.zeros(DoubleMatrix.java:476) com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$17.apply(ALSQR.scala:366) com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$17.apply(ALSQR.scala:366) scala.Array$.fill(Array.scala:267) com.verizon.bigdata.mllib.recommendation.ALSQR.updateBlock(ALSQR.scala:366) com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$com$verizon$bigdata$mllib$recommendation$ALSQR$$updateFeatures$2.apply(ALSQR.scala:346) com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$com$verizon$bigdata$mllib$recommendation$ALSQR$$updateFeatures$2.apply(ALSQR.scala:345) org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:32) org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:32) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:149) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:147) scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:147) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242) org.apache.spark.rdd.RDD.iterator(RDD.scala:233) org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:32) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242) org.apache.spark.rdd.RDD.iterator(RDD.scala:233) org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:32) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242) org.apache.spark.rdd.RDD.iterator(RDD.scala:233) org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242) org.apache.spark.rdd.RDD.iterator(RDD.scala:233) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102) org.apache.spark.scheduler.Task.run(Task.scala:53) org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) On Sun, Mar 16, 2014 at 11:42 AM, Sean Owen <so...@cloudera.com> wrote: > Are you using HEAD or 0.9.0? I know there was a memory issue fixed a few > weeks ago that made ALS need a lot more memory than is needed. > > https://github.com/apache/incubator-spark/pull/629 > > Try the latest code. > > -- > Sean Owen | Director, Data Science | London > > > On Sun, Mar 16, 2014 at 11:40 AM, Debasish Das > <debasish.da...@gmail.com>wrote: > >> Hi, >> >> I gave my spark job 16 gb of memory and it is running on 8 executors. >> >> The job needs more memory due to ALS requirements (20M x 1M matrix) >> >> On each node I do have 96 gb of memory and I am using 16 gb out of it. I >> want to increase the memory but I am not sure what is the right way to do >> that... >> >> On 8 executor if I give 96 gb it might be a issue due to GC... >> >> Ideally on 8 nodes, I would run with 48 executors and each executor will >> get 16 gb of memory..Total 48 JVMs... >> >> Is it possible to increase executors per node ? >> >> Thanks. >> Deb >> > >