That error is from the master dying (likely due to the results of
another worker dying). Can you do a rough calculation of the size of
data that you expect to be loaded and check if the memory is enough?
On 8/30/13 11:19 AM, Yasser Altowim wrote:
Guys,
Can someone please help me with this issue? Thanks.
Best,
Yasser
*From:*Yasser Altowim
*Sent:* Thursday, August 29, 2013 11:16 AM
*To:* [email protected]
*Subject:* Exception with Large Graphs
Hi,
I am implementing an algorithm using Giraph, and I was able
to run my algorithm on relatively small datasets (64,000,000 vertices
and 128,000,000 edges). However, when I increase the size of the
dataset to 128,000,000 vertices and 256,000,000 edges, the job takes
so much time to load the vertices, and then it gives me the following
exception.
I have tried to increase the heap size and the task timeout
value in the mapred-site.xml configuration file, and even vary the
number of workers from 1 to 10, but still getting the same exceptions.
I have a cluster of 10 nodes, and each node has a 4G of ram. Thanks
in advance.
2013-08-29 10:22:53,150 INFO
org.apache.giraph.utils.ProgressableUtils: waitFor: Future result not
ready yet java.util.concurrent.FutureTask@1a129460
<mailto:java.util.concurrent.FutureTask@1a129460>
2013-08-29 10:22:53,151 INFO
org.apache.giraph.utils.ProgressableUtils: waitFor: Waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4
<mailto:org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4>
2013-08-29 10:23:07,938 INFO
org.apache.giraph.worker.VertexInputSplitsCallable:
readVertexInputSplit: Loaded 7769685 vertices at 14250.953615591572
vertices/sec 15539370 edges at 28500.77593053654 edges/sec Memory
(free/total/max) = 680.21M / 3207.44M / 3555.56M
2013-08-29 10:23:14,538 INFO
org.apache.giraph.worker.VertexInputSplitsCallable:
readVertexInputSplit: Loaded 8019685 vertices at 14533.557468366102
vertices/sec 16039370 edges at 29065.97491865343 edges/sec Memory
(free/total/max) = 906.80M / 3242.75M / 3555.56M
2013-08-29 10:23:21,888 INFO
org.apache.giraph.worker.InputSplitsCallable: loadFromInputSplit:
Finished loading
/_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/9 (v=1212852,
e=2425704)
2013-08-29 10:23:37,911 INFO
org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit:
Reserved input split path
/_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/19, overall
roughly 7.518797% input splits reserved
2013-08-29 10:23:37,923 INFO
org.apache.giraph.worker.InputSplitsCallable: getInputSplit: Reserved
/_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/19 from
ZooKeeper and got input split
'org.apache.giraph.io.formats.multi.InputSplitWithInputFormatIndex@24004559'
2013-08-29 10:23:44,313 INFO
org.apache.giraph.worker.VertexInputSplitsCallable:
readVertexInputSplit: Loaded 8482537 vertices at 14585.340134636266
vertices/sec 16965074 edges at 29169.59449002283 edges/sec Memory
(free/total/max) = 538.93M / 3186.13M / 3555.56M
2013-08-29 10:23:49,963 INFO
org.apache.giraph.worker.VertexInputSplitsCallable:
readVertexInputSplit: Loaded 8732537 vertices at 14870.726503632277
vertices/sec 17465074 edges at 29740.356341344923 edges/sec Memory
(free/total/max) = 489.84M / 3222.56M / 3555.56M
2013-08-29 10:34:28,371 INFO
org.apache.giraph.utils.ProgressableUtils: waitFor: Future result not
ready yet java.util.concurrent.FutureTask@1a129460
<mailto:java.util.concurrent.FutureTask@1a129460>
2013-08-29 10:34:34,847 INFO
org.apache.giraph.utils.ProgressableUtils: waitFor: Waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4
<mailto:org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4>
2013-08-29 10:34:34,850 INFO
org.apache.giraph.comm.netty.handler.RequestDecoder: decode: Server
window metrics MBytes/sec sent = 0, MBytes/sec received = 0.0161,
MBytesSent = 0.0002, MBytesReceived = 12.3175, ave sent req MBytes =
0, ave received req MBytes = 0.0587, secs waited = 765.881
2013-08-29 10:34:35,698 INFO org.apache.zookeeper.ClientCnxn: Client
session timed out, have not heard from server in 649805ms for
sessionid 0x140cb1140540006, closing socket connection and attempting
reconnect
2013-08-29 10:34:42,471 WARN org.apache.giraph.bsp.BspService:
process: Disconnected from ZooKeeper (will automatically try to
recover) WatchedEvent state:Disconnected type:None path:null
2013-08-29 10:34:42,472 WARN
org.apache.giraph.worker.InputSplitsHandler: process: Problem with
zookeeper, got event with path null, state Disconnected, event type None
2013-08-29 10:34:43,819 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server slave5.ericsson-magic.net/10.126.72.165:22181
2013-08-29 10:34:44,077 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to
slave5.ericsson-magic.net/10.126.72.165:22181, initiating session
2013-08-29 10:34:44,220 WARN org.apache.giraph.bsp.BspService:
process: Got unknown null path event WatchedEvent state:Expired
type:None path:null
2013-08-29 10:34:44,220 WARN
org.apache.giraph.worker.InputSplitsHandler: process: Problem with
zookeeper, got event with path null, state Expired, event type None
2013-08-29 10:34:44,221 INFO org.apache.zookeeper.ClientCnxn:
EventThread shut down
2013-08-29 10:34:44,240 INFO org.apache.zookeeper.ClientCnxn: Unable
to reconnect to ZooKeeper service, session 0x140cb1140540006 has
expired, closing socket connection
2013-08-29 10:35:35,442 INFO
org.apache.giraph.utils.ProgressableUtils: waitFor: Future result not
ready yet java.util.concurrent.FutureTask@1a129460
<mailto:java.util.concurrent.FutureTask@1a129460>
2013-08-29 10:35:35,443 INFO
org.apache.giraph.utils.ProgressableUtils: waitFor: Waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4
<mailto:org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4>
2013-08-29 10:35:42,161 INFO
org.apache.giraph.comm.netty.handler.RequestDecoder: decode: Server
window metrics MBytes/sec sent = 0, MBytes/sec received = 0.1059,
MBytesSent = 0.0001, MBytesReceived = 7.1305, ave sent req MBytes = 0,
ave received req MBytes = 0.0291, secs waited = 67.311
2013-08-29 10:35:48,659 INFO
org.apache.giraph.worker.VertexInputSplitsCallable:
readVertexInputSplit: Loaded 8982537 vertices at 6882.0673288665985
vertices/sec 17965074 edges at 13763.906358998607 edges/sec Memory
(free/total/max) = 1040.32M / 3537.00M / 3555.56M
2013-08-29 10:36:14,680 INFO
org.apache.giraph.worker.VertexInputSplitsCallable:
readVertexInputSplit: Loaded 9232537 vertices at 6931.612280518087
vertices/sec 18465074 edges at 13862.99925688887 edges/sec Memory
(free/total/max) = 607.82M / 3564.69M / 3564.69M
2013-08-29 10:36:35,690 INFO
org.apache.giraph.utils.ProgressableUtils: waitFor: Future result not
ready yet java.util.concurrent.FutureTask@1a129460
<mailto:java.util.concurrent.FutureTask@1a129460>
2013-08-29 10:36:35,690 INFO
org.apache.giraph.utils.ProgressableUtils: waitFor: Waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4
<mailto:org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4>
2013-08-29 10:36:47,220 INFO
org.apache.giraph.worker.InputSplitsCallable: loadFromInputSplit:
Finished loading
/_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/19 (v=1191319,
e=2382638)
2013-08-29 10:36:47,667 ERROR
org.apache.giraph.utils.LogStacktraceCallable: Execution of callable
failed
java.lang.IllegalStateException: markInputSplitPathFinished:
KeeperException on
/_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/19/_vertexInputSplitFinished
at
org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:168)
at
org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:272)
at
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:211)
at
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60)
at
org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/19/_vertexInputSplitFinished
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
at
org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
at
org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:159)
... 9 more
2013-08-29 10:36:50,349 ERROR
org.apache.giraph.worker.BspServiceWorker: unregisterHealth: Got
failure, unregistering health on
/_hadoopBsp/job_201308290837_0003/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/slave8.ericsson-magic.net_5
on superstep -1
2013-08-29 10:36:52,498 ERROR
org.apache.giraph.graph.GraphTaskManager: run: Worker failure failed
on another RuntimeException, original expection will be rethrown
java.lang.IllegalStateException: unregisterHealth: KeeperException -
Couldn't delete
/_hadoopBsp/job_201308290837_0003/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/slave8.ericsson-magic.net_5
at
org.apache.giraph.worker.BspServiceWorker.unregisterHealth(BspServiceWorker.java:654)
at
org.apache.giraph.worker.BspServiceWorker.failureCleanup(BspServiceWorker.java:662)
at
org.apache.giraph.graph.GraphTaskManager.workerFailureCleanup(GraphTaskManager.java:897)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:100)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/_hadoopBsp/job_201308290837_0003/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/slave8.ericsson-magic.net_5
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
at
org.apache.giraph.zk.ZooKeeperExt.deleteExt(ZooKeeperExt.java:302)
at
org.apache.giraph.worker.BspServiceWorker.unregisterHealth(BspServiceWorker.java:648)
... 10 more
2013-08-29 10:36:54,571 INFO
org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-08-29 10:37:15,417 INFO org.apache.hadoop.io.nativeio.NativeIO:
Initialized cache for UID to User mapping with a cache timeout of
14400 seconds.
2013-08-29 10:37:15,456 INFO org.apache.hadoop.io.nativeio.NativeIO:
Got UserName bigdatauser for UID 1007 from the native implementation
2013-08-29 10:37:16,047 WARN org.apache.hadoop.mapred.Child: Error
running child
java.lang.IllegalStateException: run: Caught an unrecoverable
exception waitFor: ExecutionException occurred while waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4
<mailto:org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4>
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.IllegalStateException: waitFor:
ExecutionException occurred while waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4
<mailto:org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4>
at
org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:181)
at
org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:139)
at
org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:124)
at
org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:87)
at
org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:221)
at
org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:279)
at
org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:323)
at
org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:504)
at
org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:246)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91)
... 7 more
Caused by: java.util.concurrent.ExecutionException:
java.lang.IllegalStateException: markInputSplitPathFinished:
KeeperException on
/_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/19/_vertexInputSplitFinished
at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262)
at java.util.concurrent.FutureTask.get(FutureTask.java:119)
at
org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:300)
at
org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:173)
... 16 more
Caused by: java.lang.IllegalStateException:
markInputSplitPathFinished: KeeperException on
/_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/19/_vertexInputSplitFinished
at
org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:168)
at
org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:272)
at
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:211)
at
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60)
at
org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/19/_vertexInputSplitFinished
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
at
org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
at
org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:159)
... 9 more
2013-08-29 10:37:17,481 INFO org.apache.hadoop.mapred.Task: Runnning
cleanup for the task
Best,
Yasser