Re: Partitioning

Benedikt Elser Wed, 05 Dec 2012 03:39:21 -0800

Ah, local mode, Bingo! 

About the communication costs: Yes I am aware of these, however this is exactly 
what I want to test in the first place :) Hence I would need a 
bsp.distributed.tasks.maximum


Thanks for the clarifications,

Benedikt

On Dec 5, 2012, at 12:05 PM, Thomas Jungblut wrote:

> Because the property is called "local". This doesn't affect the distributed
> mode.
> Note that it is really bad if you compute multiple tasks on different host
> machines, because this leverages your communication costs.
> 
> 2012/12/5 Benedikt Elser <[email protected]>
> 
>> Thank you, I will try that. However if I set bsp.local.tasks.maximum to 1,
>> why doesn't it distribute one task to each machine?
>> 
>> On Dec 5, 2012, at 11:58 AM, Thomas Jungblut wrote:
>> 
>>> So it will spawn 12 tasks. If this doesn't satisfy the load on your
>>> machines, try to use smaller blocksizes.
>>> 
>>> 2012/12/5 Benedikt Elser <[email protected]>
>>> 
>>>> Hi,
>>>> 
>>>> thanks for your reply!
>>>> 
>>>> Total size:    49078776 B
>>>> Total dirs:    1
>>>> Total files:   12
>>>> Total blocks (validated):      12 (avg. block size 4089898 B)
>>>> 
>>>> Benedikt
>>>> 
>>>> On Dec 5, 2012, at 11:47 AM, Thomas Jungblut wrote:
>>>> 
>>>>> So how many blocks has your data in HDFS?
>>>>> 
>>>>> 2012/12/5 Benedikt Elser <[email protected]>
>>>>> 
>>>>>> Hi List,
>>>>>> 
>>>>>> I am using the hama-0.6.0 release to run graph jobs on various input
>>>>>> graphs in a ec2 based cluster of size 12. However as I see in the logs
>>>> not
>>>>>> every node on the cluster contributes to that job (they have no
>>>>>> tasklog/job<ID> dir and are idle). Theoretically a distribution of 1
>>>>>> Million nodes across 12 buckets should hit every node at least once.
>>>>>> Therefore I think its a configuration problem. So far I messed around
>>>> with
>>>>>> these settings:
>>>>>> 
>>>>>> <name>bsp.max.tasks.per.job</name>
>>>>>> <name>bsp.local.tasks.maximum</name>
>>>>>> <name>bsp.tasks.maximum</name>
>>>>>> <name>bsp.child.java.opts</name>
>>>>>> 
>>>>>> Setting bsp.local.tasks.maximum to 1 and bsp.tasks.maximum.per.job to
>> 12
>>>>>> hat not the desired effect. I also split the input into 12 files
>>>> (because
>>>>>> of something in 0.5, that was fixed in 0.6).
>>>>>> 
>>>>>> Could you recommend me some settings or guide me through the system's
>>>>>> partition decision? I thought it would be:
>>>>>> 
>>>>>> Input -> Input Split based on input, max* conf values -> number of
>> tasks
>>>>>> HashPartition.class distributes Ids across that number of tasks.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Benedikt
>>>> 
>>>> 
>> 
>>

Re: Partitioning

Reply via email to