You probably need to increase your driver memory and 8g will not work. 16g is probably the smallest stand alone machine that will work since the driver and executors run on it.
> On Feb 1, 2016, at 1:24 AM, jg...@konodrac.com wrote: > > Hello everybody, > > We are experimenting problems when we use "mahout spark-rowsimilarity” > operation. We have an input matrix with 100k rows and 100 items and process > throws an exception about “Exception in task 0.0 in stage 13.0 (TID 13) > java.lang.OutOfMemoryError: Java heap space” and we try to increase JAVA HEAP > MEMORY, MAHOUT HEAP MEMORY and spark.driver.memory. > > Environment versions: > Mahout: 0.11.1 > Spark: 1.6.0. > > Mahout command line: > /opt/mahout/bin/mahout spark-rowsimilarity -i 50k_rows__50items.dat -o > test_output.tmp --maxObservations 500 --maxSimilaritiesPerRow 100 > --omitStrength --master local --sparkExecutorMem 8g > > This process is running on a machine with following specifications: > Mem RAM: 8gb > CPU with 8 cores > > .profile file: > export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 > export HADOOP_HOME=/opt/hadoop-2.6.0 > export SPARK_HOME=/opt/spark > export MAHOUT_HOME=/opt/mahout > export MAHOUT_HEAPSIZE=8192 > > Throws exception: > > 16/01/22 11:45:06 ERROR Executor: Exception in task 0.0 in stage 13.0 (TID 13) > java.lang.OutOfMemoryError: Java heap space > at org.apache.mahout.math.DenseMatrix.<init>(DenseMatrix.java:66) > at > org.apache.mahout.sparkbindings.drm.package$$anonfun$blockify$1.apply(package.scala:70) > at > org.apache.mahout.sparkbindings.drm.package$$anonfun$blockify$1.apply(package.scala:59) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 16/01/22 11:45:06 WARN NettyRpcEndpointRef: Error sending message [message = > Heartbeat(driver,[Lscala.Tuple2;@12498227,BlockManagerId(driver, localhost, > 42107))] in 1 attempts > org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 > seconds]. This timeout is controlled by spark.rpc.askTimeout > at > org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) > at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:448) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:468) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741) > at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:468) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 16/01/22 11:45:06 WARN NettyRpcEndpointRef: Error sending message [message = > Heartbeat(driver,[Lscala.Tuple2;@12498227,BlockManagerId(driver, localhost, > 42107))] in 1 attempts > org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 > seconds]. This timeout is controlled by spark.rpc.askTimeout > at > org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) > at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:448) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:468) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741) > at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:468) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.TimeoutException: Futures timed out after > [120 seconds] > at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:107) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > ... > > Can you please advise? > > > Thanks for advance. > Cheers.