So it will spawn 12 tasks. If this doesn't satisfy the load on your machines, try to use smaller blocksizes.
2012/12/5 Benedikt Elser <[email protected]> > Hi, > > thanks for your reply! > > Total size: 49078776 B > Total dirs: 1 > Total files: 12 > Total blocks (validated): 12 (avg. block size 4089898 B) > > Benedikt > > On Dec 5, 2012, at 11:47 AM, Thomas Jungblut wrote: > > > So how many blocks has your data in HDFS? > > > > 2012/12/5 Benedikt Elser <[email protected]> > > > >> Hi List, > >> > >> I am using the hama-0.6.0 release to run graph jobs on various input > >> graphs in a ec2 based cluster of size 12. However as I see in the logs > not > >> every node on the cluster contributes to that job (they have no > >> tasklog/job<ID> dir and are idle). Theoretically a distribution of 1 > >> Million nodes across 12 buckets should hit every node at least once. > >> Therefore I think its a configuration problem. So far I messed around > with > >> these settings: > >> > >> <name>bsp.max.tasks.per.job</name> > >> <name>bsp.local.tasks.maximum</name> > >> <name>bsp.tasks.maximum</name> > >> <name>bsp.child.java.opts</name> > >> > >> Setting bsp.local.tasks.maximum to 1 and bsp.tasks.maximum.per.job to 12 > >> hat not the desired effect. I also split the input into 12 files > (because > >> of something in 0.5, that was fixed in 0.6). > >> > >> Could you recommend me some settings or guide me through the system's > >> partition decision? I thought it would be: > >> > >> Input -> Input Split based on input, max* conf values -> number of tasks > >> HashPartition.class distributes Ids across that number of tasks. > >> > >> Thanks, > >> > >> Benedikt > >
