Ah, local mode, Bingo! About the communication costs: Yes I am aware of these, however this is exactly what I want to test in the first place :) Hence I would need a bsp.distributed.tasks.maximum
Thanks for the clarifications, Benedikt On Dec 5, 2012, at 12:05 PM, Thomas Jungblut wrote: > Because the property is called "local". This doesn't affect the distributed > mode. > Note that it is really bad if you compute multiple tasks on different host > machines, because this leverages your communication costs. > > 2012/12/5 Benedikt Elser <[email protected]> > >> Thank you, I will try that. However if I set bsp.local.tasks.maximum to 1, >> why doesn't it distribute one task to each machine? >> >> On Dec 5, 2012, at 11:58 AM, Thomas Jungblut wrote: >> >>> So it will spawn 12 tasks. If this doesn't satisfy the load on your >>> machines, try to use smaller blocksizes. >>> >>> 2012/12/5 Benedikt Elser <[email protected]> >>> >>>> Hi, >>>> >>>> thanks for your reply! >>>> >>>> Total size: 49078776 B >>>> Total dirs: 1 >>>> Total files: 12 >>>> Total blocks (validated): 12 (avg. block size 4089898 B) >>>> >>>> Benedikt >>>> >>>> On Dec 5, 2012, at 11:47 AM, Thomas Jungblut wrote: >>>> >>>>> So how many blocks has your data in HDFS? >>>>> >>>>> 2012/12/5 Benedikt Elser <[email protected]> >>>>> >>>>>> Hi List, >>>>>> >>>>>> I am using the hama-0.6.0 release to run graph jobs on various input >>>>>> graphs in a ec2 based cluster of size 12. However as I see in the logs >>>> not >>>>>> every node on the cluster contributes to that job (they have no >>>>>> tasklog/job<ID> dir and are idle). Theoretically a distribution of 1 >>>>>> Million nodes across 12 buckets should hit every node at least once. >>>>>> Therefore I think its a configuration problem. So far I messed around >>>> with >>>>>> these settings: >>>>>> >>>>>> <name>bsp.max.tasks.per.job</name> >>>>>> <name>bsp.local.tasks.maximum</name> >>>>>> <name>bsp.tasks.maximum</name> >>>>>> <name>bsp.child.java.opts</name> >>>>>> >>>>>> Setting bsp.local.tasks.maximum to 1 and bsp.tasks.maximum.per.job to >> 12 >>>>>> hat not the desired effect. I also split the input into 12 files >>>> (because >>>>>> of something in 0.5, that was fixed in 0.6). >>>>>> >>>>>> Could you recommend me some settings or guide me through the system's >>>>>> partition decision? I thought it would be: >>>>>> >>>>>> Input -> Input Split based on input, max* conf values -> number of >> tasks >>>>>> HashPartition.class distributes Ids across that number of tasks. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Benedikt >>>> >>>> >> >>
