Hello,

sum and threshold are both Integers.
for the threshold variable I first add a new resource to the configuration - conf.addResource( ... );

later I get the threshold value from the configuration.

Code
#####################################

private int threshold;

public void setup( Context context ) {

          Configuration conf = context.getConfiguration();
          threshold = conf.getInt( "threshold", -1 );

}

#####################################


Best,
Peter


On 11.05.2015 19:26, Shahab Yunus wrote:
What is the type of the threshold variable? sum I believe is a Java int.

Regards,
Shahab

On Mon, May 11, 2015 at 1:08 PM, Peter Ruch <[email protected] <mailto:[email protected]>> wrote:

    Hi,

    I am currently playing around with Hadoop and have some problems
    when trying to filter in the Reducer.

    I extended the WordCount v1.0 example from the 2.7 MapReduce
    Tutorial with some additional functionality
    and added the possibility to filter by the specific value of each
    key - e.g. only output the key-value pairs where [[ value >
    threshold ]].

    Filtering Code in Reducer
    #####################################

    for (IntWritable val : values) {
         sum += val.get();
    }
    if ( sum > threshold ) {
         result.set(sum);
         context.write(key, result);
    }

    #####################################

    For threshold smaller any value the above code works as expected
    and the output contains all key-value pairs.
    If I increase the threshold to 1 some pairs are missing in the
    output although the respective value would be larger than the
    threshold.

    I tried to work out the error myself, but I could not get it to
    work as intended. I use the exact Tutorial setup with Oracle JDK 8
    on a CentOS 7 machine.

    As far as I understand the respective Iterable<...>  in the
    Reducer already contains all the observed values for a specific key.
    Why is it possible that I am missing some of these key-value pairs
    then? It only fails in very few cases. The input file is pretty
    large - 250 MB -
    so I also tried to increase the memory for the mapping and
    reduction steps but it did not help ( tried a lot of different
    stuff without success )

    Maybe someone already experienced similar problems / is more
    experienced than I am.


    Thank you,

    Peter



Reply via email to