Hello, I have the same problem described above using spark-rowsimilarity. I have a ~65k lines input file (each row with less than 300 items), and I run the job on a small cluster with 1 master and 2 workers, each machine has 15GB of RAM. I tried to increase executor and driver memory: --sparkExecutorMem 15g -D:spark.driver.memory=15g
but I get the OutOfMemoryError exception: 16/02/13 13:00:36 ERROR Executor: Exception in task 0.0 in stage 12.0 (TID 12) java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.mahout.math.OrderedIntDoubleMapping.growTo(OrderedIntDoubleMapping.java:86) at org.apache.mahout.math.OrderedIntDoubleMapping.set(OrderedIntDoubleMapping.java:118) [...] Thanks for any hint. Angelo On Fri, Feb 12, 2016 at 10:15 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > You have to set the executor memory. BTW you have given the driver all memory > on the machine. > >> On Feb 10, 2016, at 9:30 AM, Jaume Galí <jg...@konodrac.com> wrote: >> >> Hi again, >> (Sorry for my delay but we didn’t have machine to test your thoughts about >> memory issue.) >> >> The problem still happening testing with an input matrix of 100k rows by 300 >> items, I increase memory as you suggest but nothing changed. I attached >> spark_env.sh and new specs of machine >> >> Machine specs: >> >> m3.xlarge AWS (Ivy Bridge, 15Gb ram, 2x40gb HD) >> >> This is my spark-env.sh: >> >> #!/usr/bin/env bash >> # Licensed to ... >> >> export SPARK_HOME=${SPARK_HOME:-/usr/lib/spark} >> export SPARK_LOG_DIR=${SPARK_LOG_DIR:-/var/log/spark} >> export HADOOP_HOME=${HADOOP_HOME:-/usr/lib/hadoop} >> export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf} >> export HIVE_CONF_DIR=${HIVE_CONF_DIR:-/etc/hive/conf} >> >> export STANDALONE_SPARK_MASTER_HOST=ip-10-12-17-235.eu >> <http://ip-10-12-17-235.eu/>-west-1.compute.internal >> export SPARK_MASTER_PORT=7077 >> export SPARK_MASTER_IP=$STANDALONE_SPARK_MASTER_HOST >> export SPARK_MASTER_WEBUI_PORT=8080 >> >> export SPARK_WORKER_DIR=${SPARK_WORKER_DIR:-/var/run/spark/work} >> export SPARK_WORKER_PORT=7078 >> export SPARK_WORKER_WEBUI_PORT=8081 >> >> export HIVE_SERVER2_THRIFT_BIND_HOST=0.0.0.0 >> export HIVE_SERVER2_THRIFT_PORT=10001 >> >> export SPARK_DRIVER_MEMORY=15G >> export SPARK_DAEMON_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS >> -XX:OnOutOfMemoryError='kill -9 %p’” >> >> Log: >> >> Exception in thread "main" org.apache.spark.SparkException: Job aborted due >> to stage failure: Task 0 in stage 12.0 failed 1 times, most recent failure: >> Lost task 0.0 in stage 12.0 (TID 24, localhost): java.lang.OutOfMemoryError: >> GC overhead limit exceeded >> ……. >> ….. >> .. >> . >> >> Driver stacktrace: >> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded >> ……. >> ….. >> ... >> .. >> . >> >> >> Thanks for advance >> >>> El 2/2/2016, a las 7:48, Pat Ferrel <p...@occamsmachete.com >>> <mailto:p...@occamsmachete.com>> escribió: >>> >>> You probably need to increase your driver memory and 8g will not work. 16g >>> is probably the smallest stand alone machine that will work since the >>> driver and executors run on it. >>> >>>> On Feb 1, 2016, at 1:24 AM, jg...@konodrac.com <mailto:jg...@konodrac.com> >>>> wrote: >>>> >>>> Hello everybody, >>>> >>>> We are experimenting problems when we use "mahout spark-rowsimilarity” >>>> operation. We have an input matrix with 100k rows and 100 items and >>>> process throws an exception about “Exception in task 0.0 in stage 13.0 >>>> (TID 13) java.lang.OutOfMemoryError: Java heap space” and we try to >>>> increase JAVA HEAP MEMORY, MAHOUT HEAP MEMORY and spark.driver.memory. >>>> >>>> Environment versions: >>>> Mahout: 0.11.1 >>>> Spark: 1.6.0. >>>> >>>> Mahout command line: >>>> /opt/mahout/bin/mahout spark-rowsimilarity -i 50k_rows__50items.dat -o >>>> test_output.tmp --maxObservations 500 --maxSimilaritiesPerRow 100 >>>> --omitStrength --master local --sparkExecutorMem 8g >>>> >>>> This process is running on a machine with following specifications: >>>> Mem RAM: 8gb >>>> CPU with 8 cores >>>> >>>> .profile file: >>>> export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 >>>> export HADOOP_HOME=/opt/hadoop-2.6.0 >>>> export SPARK_HOME=/opt/spark >>>> export MAHOUT_HOME=/opt/mahout >>>> export MAHOUT_HEAPSIZE=8192 >>>> >>>> Throws exception: >>>> >>>> 16/01/22 11:45:06 ERROR Executor: Exception in task 0.0 in stage 13.0 (TID >>>> 13) >>>> java.lang.OutOfMemoryError: Java heap space >>>> at org.apache.mahout.math.DenseMatrix.<init>(DenseMatrix.java:66) >>>> at >>>> org.apache.mahout.sparkbindings.drm.package$$anonfun$blockify$1.apply(package.scala:70) >>>> at >>>> org.apache.mahout.sparkbindings.drm.package$$anonfun$blockify$1.apply(package.scala:59) >>>> at >>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) >>>> at >>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) >>>> at >>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >>>> at >>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >>>> at >>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >>>> at >>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >>>> at >>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >>>> at org.apache.spark.scheduler.Task.run(Task.scala:89) >>>> at >>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:745) >>>> 16/01/22 11:45:06 WARN NettyRpcEndpointRef: Error sending message [message >>>> = Heartbeat(driver,[Lscala.Tuple2;@12498227,BlockManagerId(driver, >>>> localhost, 42107))] in 1 attempts >>>> org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 >>>> seconds]. This timeout is controlled by spark.rpc.askTimeout >>>> at >>>> org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) >>>> at >>>> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) >>>> at >>>> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) >>>> at >>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) >>>> at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) >>>> at >>>> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) >>>> at >>>> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) >>>> at >>>> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:448) >>>> at >>>> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:468) >>>> at >>>> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468) >>>> at >>>> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468) >>>> at >>>> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741) >>>> at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:468) >>>> at >>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) >>>> at >>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) >>>> at >>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:745) >>>> 16/01/22 11:45:06 WARN NettyRpcEndpointRef: Error sending message [message >>>> = Heartbeat(driver,[Lscala.Tuple2;@12498227,BlockManagerId(driver, >>>> localhost, 42107))] in 1 attempts >>>> org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 >>>> seconds]. This timeout is controlled by spark.rpc.askTimeout >>>> at >>>> org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) >>>> at >>>> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) >>>> at >>>> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) >>>> at >>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) >>>> at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) >>>> at >>>> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) >>>> at >>>> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) >>>> at >>>> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:448) >>>> at >>>> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:468) >>>> at >>>> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468) >>>> at >>>> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468) >>>> at >>>> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741) >>>> at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:468) >>>> at >>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) >>>> at >>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) >>>> at >>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:745) >>>> Caused by: java.util.concurrent.TimeoutException: Futures timed out after >>>> [120 seconds] >>>> at >>>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) >>>> at >>>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) >>>> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) >>>> at >>>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) >>>> at scala.concurrent.Await$.result(package.scala:107) >>>> at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) >>>> ... >>>> >>>> Can you please advise? >>>> >>>> >>>> Thanks for advance. >>>> Cheers. >>> >> >