Re: Connected components using giraph hadoop on amazon EC2

Vishal Patel Tue, 07 Aug 2012 23:03:58 -0700

Awesome! That works. Thank you Avery.

Vishal



On Tue, Aug 7, 2012 at 6:39 PM, Avery Ching <[email protected]> wrote:

>  Hi Vishal,
>
> This issue is due to Hadoop limiting the job counters.
>
> org.apache.hadoop.mapred.Counters$CountersExceededException: Error:
> Exceeded limits on number of counters - Counters=120 Limit=120
>
> You can either disable the Giraph counters per superstep
> (-Dgiraph.useSuperstepCounters=false), or you can try to increase the
> counter limit.
>
> Hope that helps,
>
> Avery
>
>
> On 8/7/12 6:05 PM, Vishal Patel wrote:
>
> Hi,
>
>  Using a pseudo-distributed hadoop install I am able to run the connected
> components example perfectly. However I keep getting the following error on
> a real cluster. After several attempts on the in-house hadoop install I
> decided to move to Amazon EC2's mapreduce.
>
>
>  I created a small 2 node cluster with,
>
>  Namenode: m1.small
> Slaves (1): m1.2xlarge (I tried m1.large : but it complained about the
> heap size)
>
>  I downloaded giraph, compiled with maven 3.0+ and ran the Connected
> Component example. I keep getting the same error I saw previously- am I
> missing something? Do I have to configure ZooKeeper?
>
>  *Mapper 0 returns the following:*
> java.lang.Throwable: Child Error
>  at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
> Caused by: java.io.IOException: Task process exit with nonzero status of 1.
>  at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
>
>  *Mapper 1 and 2 return the following error:*
>
> java.lang.IllegalStateException: startSuperstep: KeeperException getting 
> assignments
>       at 
> org.apache.giraph.graph.BspServiceWorker.startSuperstep(BspServiceWorker.java:929)
>       at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:550)
>       at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:681)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
>       at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for 
> /_hadoopBsp/job_201208080037_0001/_applicationAttemptsDir/0/_superstepDir/99/_partitionAssignments
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>       at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
>       at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>       at 
> org.apache.giraph.graph.BspServiceWorker.startSuperstep(BspServiceWorker.java:910)
>       ... 9 more
>
>  I also checked to make sure the job-configuration to make sure the 
> *mapred.tasktracker.map.tasks
> = 4. *
>
>  *Also, last couple lines from  **attempt_201208080037_0002_m_000000_0:*
> *
> *
>  2012-08-08 00:59:03,262 INFO org.apache.giraph.graph.MasterThread
> (org.apache.giraph.graph.MasterThread): masterThread: Coordination of
> superstep 95 took 0.116 seconds ended
> 2012-08-08 00:59:03,265 INFO
> org.apache.giraph.graph.partition.PartitionBalancer
> (org.apache.giraph.graph.MasterThread): balancePartitionsAcrossWorkers:
> Using algorithm stati
> 2012-08-08 00:59:03,265 INFO
> org.apache.giraph.graph.partition.PartitionUtils
> (org.apache.giraph.graph.MasterThread): analyzePartitionStats: Vertices -
> Mean: 10000, Min: Work
> 2012-08-08 00:59:03,265 INFO
> org.apache.giraph.graph.partition.PartitionUtils
> (org.apache.giraph.graph.MasterThread): analyzePartitionStats: Edges -
> Mean: 19996, Min: Worker(
> 2012-08-08 00:59:03,269 INFO org.apache.giraph.graph.BspServiceMaster
> (org.apache.giraph.graph.MasterThread): barrierOnWorkerList: 0 out of 1
> workers finished on superstep 96
> 2012-08-08 00:59:03,279 INFO org.apache.giraph.graph.BspServiceMaster
> (org.apache.giraph.graph.MasterThread): aggregateWorkerStats: Aggregation
> found (vtx=10000,finVtx=10000,
> 2012-08-08 00:59:03,280 INFO org.apache.giraph.graph.BspServiceMaster
> (org.apache.giraph.graph.MasterThread): coordinateSuperstep: Cleaning up
> old Superstep /_hadoopBsp/job_2
> 2012-08-08 00:59:03,321 INFO org.apache.giraph.graph.MasterThread
> (org.apache.giraph.graph.MasterThread): masterThread: Coordination of
> superstep 96 took 0.059 seconds ended
> 2012-08-08 00:59:03,387 INFO
> org.apache.giraph.graph.partition.PartitionBalancer
> (org.apache.giraph.graph.MasterThread): balancePartitionsAcrossWorkers:
> Using algorithm stati
> 2012-08-08 00:59:03,387 INFO
> org.apache.giraph.graph.partition.PartitionUtils
> (org.apache.giraph.graph.MasterThread): analyzePartitionStats: Vertices -
> Mean: 10000, Min: Work
> 2012-08-08 00:59:03,387 INFO
> org.apache.giraph.graph.partition.PartitionUtils
> (org.apache.giraph.graph.MasterThread): analyzePartitionStats: Edges -
> Mean: 19996, Min: Worker(
> 2012-08-08 00:59:03,398 INFO org.apache.giraph.graph.BspServiceMaster
> (org.apache.giraph.graph.MasterThread): barrierOnWorkerList: 0 out of 1
> workers finished on superstep 97
> 2012-08-08 00:59:03,411 INFO org.apache.giraph.graph.BspServiceMaster
> (org.apache.giraph.graph.MasterThread): aggregateWorkerStats: Aggregation
> found (vtx=10000,finVtx=10000,
> 2012-08-08 00:59:03,426 INFO org.apache.giraph.graph.BspServiceMaster
> (org.apache.giraph.graph.MasterThread): coordinateSuperstep: Cleaning up
> old Superstep /_hadoopBsp/job_2
> 2012-08-08 00:59:03,451 INFO org.apache.giraph.graph.MasterThread
> (org.apache.giraph.graph.MasterThread): masterThread: Coordination of
> superstep 97 took 0.13 seconds ended w
> 2012-08-08 00:59:03,454 INFO
> org.apache.giraph.graph.partition.PartitionBalancer
> (org.apache.giraph.graph.MasterThread): balancePartitionsAcrossWorkers:
> Using algorithm stati
> 2012-08-08 00:59:03,454 INFO
> org.apache.giraph.graph.partition.PartitionUtils
> (org.apache.giraph.graph.MasterThread): analyzePartitionStats: Vertices -
> Mean: 10000, Min: Work
> 2012-08-08 00:59:03,454 INFO
> org.apache.giraph.graph.partition.PartitionUtils
> (org.apache.giraph.graph.MasterThread): analyzePartitionStats: Edges -
> Mean: 19996, Min: Worker(
> 2012-08-08 00:59:03,458 INFO org.apache.giraph.graph.BspServiceMaster
> (org.apache.giraph.graph.MasterThread): barrierOnWorkerList: 0 out of 1
> workers finished on superstep 98
> 2012-08-08 00:59:03,536 INFO org.apache.giraph.graph.BspServiceMaster
> (org.apache.giraph.graph.MasterThread): aggregateWorkerStats: Aggregation
> found (vtx=10000,finVtx=10000,
> 2012-08-08 00:59:03,539 INFO org.apache.giraph.graph.BspServiceMaster
> (org.apache.giraph.graph.MasterThread): coordinateSuperstep: Cleaning up
> old Superstep /_hadoopBsp/job_2
> 2012-08-08 00:59:03,579 INFO org.apache.giraph.graph.MasterThread
> (org.apache.giraph.graph.MasterThread): masterThread: Coordination of
> superstep 98 took 0.128 seconds ended
> 2012-08-08 00:59:03,579 FATAL org.apache.giraph.graph.GraphMapper
> (org.apache.giraph.graph.MasterThread): uncaughtException:
> OverrideExceptionHandler on thread org.apache.gir
> org.apache.hadoop.mapred.Counters$CountersExceededException: Error:
> Exceeded limits on number of counters - Counters=120 Limit=120
>         at
> org.apache.hadoop.mapred.Counters$Group.getCounterForName(Counters.java:316)
>         at org.apache.hadoop.mapred.Counters.findCounter(Counters.java:450)
>         at
> org.apache.hadoop.mapred.Task$TaskReporter.getCounter(Task.java:601)
>         at
> org.apache.hadoop.mapred.Task$TaskReporter.getCounter(Task.java:541)
>         at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.getCounter(TaskInputOutputContext.java:88)
>         at org.apache.giraph.graph.MasterThread.run(MasterThread.java:131)
> 2012-08-08 00:59:03,580 WARN org.apache.giraph.zk.ZooKeeperManager
> (Thread-14): onlineZooKeeperServers: Forced a shutdown hook kill of the
> ZooKeeper process
>
>
>
>  Vishal
>
>
>

Re: Connected components using giraph hadoop on amazon EC2

Reply via email to