Issue in Aggregator

Puneet Agarwal Sat, 08 Nov 2014 06:05:52 -0800

Hi All,In my algo, I use an Aggregator which takes a Text value. I have written 
my custom aggregator class for this, as given below.


public class MyAgg extends BasicAggregator<Text> {...}

This works fine when running on my laptop with one worker.However, when running 
it on the cluster, sometimes it does not return the correctly aggregated 
value.It seems it is returning the locally aggregated value of one of the 
workers.While it should have used my logic to decide which of the aggregated 
values sent by various worker should be chosen as finally aggregated 
values.(But in fact I have not written such a code anywhere, it is therefore 
doing the best it could)

Following is how is my analysis about this issue.a.    I guess every worker 
aggregates the values locally.b.    then there is a global aggregation step, 
which simply compares the values sent by various aggregators.c.    For global 
aggregation it uses Text.compareTo() method. This method Text.compareTo() is a 
default Hadoop implementation and does not include the logic of my program.d.   
 It seem it is because of the above the value returned by my aggregator in the 
cluster is actually not globally aggregated, but the locally aggregated value 
of one of the worker gets taken.
If the above analysis is correct, following is how I think I can solve this.I 
should write my own class that implements Writable interface. In this class I 
would also write a compareTo method as a result things will start working fine.
If it was using class MyAgg itself, to decide which of the values returned by 
various workers should be taken as globally aggregated value then this problem 
would not have occurred.

I seek your guidance whether my analysis is correct.
- PuneetIIT Delhi, India

Issue in Aggregator

Reply via email to