Exactly, maybe you want to first read up all the different modes and how they are configured:
http://wiki.apache.org/hama/GettingStarted#Modes We also have some nice documentations as PDF which you can get here: http://wiki.apache.org/hama/GettingStarted#Hama_0.6.0 The configuration property to change the number of tasks on every host is "bsp.tasks.maximum" which is described by ">The maximum number of BSP tasks that will be run simultaneously by a groom server.". Setting this to 1 on every host where a groom server starts, and afterwards restarting your cluster should do what you want to archieve. I can recommend puppet for maintaining these kinds of configurations. If you need a more formal complexity model for BSP applications let me know, I have derived one from Rob Bisseling's BSP model that fits better to Apache Hama's style of computation. 2012/12/5 Benedikt Elser <[email protected]> > Ah, local mode, Bingo! > > About the communication costs: Yes I am aware of these, however this is > exactly what I want to test in the first place :) Hence I would need a > bsp.distributed.tasks.maximum > > Thanks for the clarifications, > > Benedikt > > On Dec 5, 2012, at 12:05 PM, Thomas Jungblut wrote: > > > Because the property is called "local". This doesn't affect the > distributed > > mode. > > Note that it is really bad if you compute multiple tasks on different > host > > machines, because this leverages your communication costs. > > > > 2012/12/5 Benedikt Elser <[email protected]> > > > >> Thank you, I will try that. However if I set bsp.local.tasks.maximum to > 1, > >> why doesn't it distribute one task to each machine? > >> > >> On Dec 5, 2012, at 11:58 AM, Thomas Jungblut wrote: > >> > >>> So it will spawn 12 tasks. If this doesn't satisfy the load on your > >>> machines, try to use smaller blocksizes. > >>> > >>> 2012/12/5 Benedikt Elser <[email protected]> > >>> > >>>> Hi, > >>>> > >>>> thanks for your reply! > >>>> > >>>> Total size: 49078776 B > >>>> Total dirs: 1 > >>>> Total files: 12 > >>>> Total blocks (validated): 12 (avg. block size 4089898 B) > >>>> > >>>> Benedikt > >>>> > >>>> On Dec 5, 2012, at 11:47 AM, Thomas Jungblut wrote: > >>>> > >>>>> So how many blocks has your data in HDFS? > >>>>> > >>>>> 2012/12/5 Benedikt Elser <[email protected]> > >>>>> > >>>>>> Hi List, > >>>>>> > >>>>>> I am using the hama-0.6.0 release to run graph jobs on various input > >>>>>> graphs in a ec2 based cluster of size 12. However as I see in the > logs > >>>> not > >>>>>> every node on the cluster contributes to that job (they have no > >>>>>> tasklog/job<ID> dir and are idle). Theoretically a distribution of 1 > >>>>>> Million nodes across 12 buckets should hit every node at least once. > >>>>>> Therefore I think its a configuration problem. So far I messed > around > >>>> with > >>>>>> these settings: > >>>>>> > >>>>>> <name>bsp.max.tasks.per.job</name> > >>>>>> <name>bsp.local.tasks.maximum</name> > >>>>>> <name>bsp.tasks.maximum</name> > >>>>>> <name>bsp.child.java.opts</name> > >>>>>> > >>>>>> Setting bsp.local.tasks.maximum to 1 and bsp.tasks.maximum.per.job > to > >> 12 > >>>>>> hat not the desired effect. I also split the input into 12 files > >>>> (because > >>>>>> of something in 0.5, that was fixed in 0.6). > >>>>>> > >>>>>> Could you recommend me some settings or guide me through the > system's > >>>>>> partition decision? I thought it would be: > >>>>>> > >>>>>> Input -> Input Split based on input, max* conf values -> number of > >> tasks > >>>>>> HashPartition.class distributes Ids across that number of tasks. > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Benedikt > >>>> > >>>> > >> > >> > >
