Re: Partitioning

Thomas Jungblut Wed, 05 Dec 2012 02:59:02 -0800

So it will spawn 12 tasks. If this doesn't satisfy the load on your
machines, try to use smaller blocksizes.


2012/12/5 Benedikt Elser <[email protected]>

> Hi,
>
> thanks for your reply!
>
>  Total size:    49078776 B
>  Total dirs:    1
>  Total files:   12
>  Total blocks (validated):      12 (avg. block size 4089898 B)
>
> Benedikt
>
> On Dec 5, 2012, at 11:47 AM, Thomas Jungblut wrote:
>
> > So how many blocks has your data in HDFS?
> >
> > 2012/12/5 Benedikt Elser <[email protected]>
> >
> >> Hi List,
> >>
> >> I am using the hama-0.6.0 release to run graph jobs on various input
> >> graphs in a ec2 based cluster of size 12. However as I see in the logs
> not
> >> every node on the cluster contributes to that job (they have no
> >> tasklog/job<ID> dir and are idle). Theoretically a distribution of 1
> >> Million nodes across 12 buckets should hit every node at least once.
> >> Therefore I think its a configuration problem. So far I messed around
> with
> >> these settings:
> >>
> >>   <name>bsp.max.tasks.per.job</name>
> >>   <name>bsp.local.tasks.maximum</name>
> >>   <name>bsp.tasks.maximum</name>
> >>   <name>bsp.child.java.opts</name>
> >>
> >> Setting bsp.local.tasks.maximum to 1 and bsp.tasks.maximum.per.job to 12
> >> hat not the desired effect. I also split the input into 12 files
> (because
> >> of something in 0.5, that was fixed in 0.6).
> >>
> >> Could you recommend me some settings or guide me through the system's
> >> partition decision? I thought it would be:
> >>
> >> Input -> Input Split based on input, max* conf values -> number of tasks
> >> HashPartition.class distributes Ids across that number of tasks.
> >>
> >> Thanks,
> >>
> >> Benedikt
>
>

Re: Partitioning

Reply via email to