Hi Tripti,
Is there a chance you can use higher memory machines so you don't run
out of core? We do it this way at Facebook. We've haven't tested the
out-of-core option.
Avery
On 8/31/14, 2:34 PM, Tripti Singh wrote:
Hi,
I am able to successfully build hadoop_yarn profile for running Giraph
1.1.
I am also able to test run Connected Components on a small dataset.
However, I am seeing 2 issues while running on a bigger dataset with
400 mappers:
1. I am unable to use out of Core Graph option. It errors out saying
that it cannot read INIT partition. (Sorry I don’t have the log
currently but I will share after I run that again).
I am expecting that if the out of Core option is fixed, I should
be able to run the workflow with less mappers.
2. In order to run the workflow anyhow, I removed the out of Core
option and adjusted the heap size. This also runs with smaller
dataset but fails with huge dataset.
Worker logs are mostly empty. Non-empty logs end like this:
*mapred.task.partition is deprecated. Instead, use
mapreduce.task.partition
[STATUS: task-374] setup: Beginning worker setup. setup: Log level
remains at info
[STATUS: task-374] setup: Initializing Zookeeper services.
mapred.job.id is deprecated.
Instead, use mapreduce.job.id job.local.dir is deprecated.
Instead, use mapreduce.job.local.dir
[STATUS: task-374] setup: Setting up Zookeeper manager.
createCandidateStamp: Made the directory
_bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614
createCandidateStamp: Made the directory
_bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614/_zkServer
createCandidateStamp: Creating my filestamp
_bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614/_task/gsta33201.tan.ygrid.yahoo.com
374
getZooKeeperServerList: For task 374, got file 'null' (polling
period is 3000) *
Master log has log statements for launching the container, opening
proxy and processing event like this:
*Opening proxy : gsta31118.tan.ygrid.yahoo.com:8041
Processing Event EventType: QUERY_CONTAINER for Container
container_1407992474095_708614_01_000314
……*
I am not using SASL authentication.
Any idea what might be wrong?
Thanks,
Tripti.