Hi.
I'm trying to run the sample connected components algorithm on a large data set
on a cluster, but I get a "java.lang.OutOfMemoryError: Java heap space" error.
The cluster has 16 nodes, and each node has 24 cores and 96GB of memory. I'm
using Hadoop-2.2.0-cdh5.0.0-beta2 and running Giraph 1.1.0-snapshot as an MR2
application.
I tried allocating more memory to the mappers by setting
mapreduce.map.java.opts in the Configuration object, but that didn't solve my
problem. Any suggestions for something else I could try?
Here is the mapper exception:
Caused by: java.lang.OutOfMemoryError: Java heap space
at
org.apache.giraph.utils.UnsafeByteArrayOutputStream.<init>(UnsafeByteArrayOutputStream.java:96)
at
org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.createExtendedDataOutput(ImmutableClassesGiraphConfiguration.java:974)
at
org.apache.giraph.utils.ByteArrayVertexIdData.initialize(ByteArrayVertexIdData.java:85)
at
org.apache.giraph.utils.ByteArrayVertexIdMessages.initialize(ByteArrayVertexIdMessages.java:88)
at
org.apache.giraph.comm.SendVertexIdDataCache.getPartitionData(SendVertexIdDataCache.java:124)
at
org.apache.giraph.comm.SendVertexIdDataCache.addData(SendVertexIdDataCache.java:76)
at
org.apache.giraph.comm.SendMessageCache.addMessage(SendMessageCache.java:97)
at
org.apache.giraph.comm.SendMessageCache.sendMessageRequest(SendMessageCache.java:157)
at
org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendMessageRequest(NettyWorkerClientRequestProcessor.java:179)
at
org.apache.giraph.graph.AbstractComputation.sendMessage(AbstractComputation.java:163)
at
com.sas.analytics.giraph.connectedcomponents.ConnectedComponentsComputation.compute(ConnectedComponentsComputation.java:73)
at
org.apache.giraph.graph.ComputeCallable.computePartition(ComputeCallable.java:247)
at
org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:168)
at
org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:71)
at
org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Thanks for your help.
Stefan