Re: What is the best configuration for Cluster hama distributed mode

Suraj Menon Wed, 26 Dec 2012 05:54:01 -0800

Hi Francis,

First I would like to know if anyone has some documentation a bit more
> comprehensive cluster configuration hama?
>


On AWS EC2, you can use Apache Whirr to configure the cluster. Edward may
share his procedures on maintaining his Hama cluster on Oracle BDA.


> I would also like some information about the cluster configuration HAMA as:
>
> 1) I have a cluster with 12 computers in HDFS which the optimal
> configuration of replication? configured to create 3 replicas of files,
> this is the best?
>
>
This depends on your availability requirements and the capacity of your
cluster. 3 would be good if you cannot tolerate data-loss. You would have
to work this out depending on the size of data and the capacity of your
cluster.


> 2) In my hama-site.xml for the best cluster configuration parameter
> hama.zookeeper.quorum? 1 node 2 nodes, 3 nodes.
>
> Once again this depends on your availability requirements and the usage of
cluster.


> 3) When I process my graph with just over 65 000 vertices got the following
> error:
> attempt_201212260904_0005_000031_0: Exception in thread "pool-2-thread-1"
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> attempt_201212260904_0005_000031_0: Exception in thread "Thread-1"
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> Is there any parameter I change more increase the memory limit? Or my
> cluster will not be able to process this amount of information? With
> smaller graphs it works correctly. I'm working with the all-pairs problem.
>

As reported recently by other users, Hama is facing scalability issues. I
am trying to close - https://issues.apache.org/jira/browse/HAMA-559 and
some other message object lifecycle issues.(Today we create a new Writable
object for every message read and received.) Also , we keep all the
vertices in the memory.

However, you can change your JVM arguments. Please look at what you can do
with the configuration parameter - bsp.child.java.opts. The default value
could be found in hama-default.xml.

Regards,
Suraj

Re: What is the best configuration for Cluster hama distributed mode

Reply via email to