Hi List, I am using the hama-0.6.0 release to run graph jobs on various input graphs in a ec2 based cluster of size 12. However as I see in the logs not every node on the cluster contributes to that job (they have no tasklog/job<ID> dir and are idle). Theoretically a distribution of 1 Million nodes across 12 buckets should hit every node at least once. Therefore I think its a configuration problem. So far I messed around with these settings:
<name>bsp.max.tasks.per.job</name> <name>bsp.local.tasks.maximum</name> <name>bsp.tasks.maximum</name> <name>bsp.child.java.opts</name> Setting bsp.local.tasks.maximum to 1 and bsp.tasks.maximum.per.job to 12 hat not the desired effect. I also split the input into 12 files (because of something in 0.5, that was fixed in 0.6). Could you recommend me some settings or guide me through the system's partition decision? I thought it would be: Input -> Input Split based on input, max* conf values -> number of tasks HashPartition.class distributes Ids across that number of tasks. Thanks, Benedikt
