Re: Re: Filtering by value in Reducer

Drake민영근 Tue, 12 May 2015 03:37:58 -0700

Hi, Peter

The missing records, they are just gone without no logs? How about your
reduce tasks logs?


Thanks

Drake 민영근 Ph.D
kt NexR

On Tue, May 12, 2015 at 5:18 AM, Peter Ruch <[email protected]> wrote:

>  Hello,
>
> sum and threshold are both Integers.
> for the threshold variable I first add a new resource to the configuration
> - conf.addResource( ... );
>
> later I get the threshold value from the configuration.
>
> Code
> #####################################
>
> private int threshold;
>
> public void setup( Context context ) {
>
>           Configuration conf = context.getConfiguration();
>           threshold = conf.getInt( "threshold", -1 );
>
> }
>
> #####################################
>
>
> Best,
> Peter
>
>
>
> On 11.05.2015 19:26, Shahab Yunus wrote:
>
> What is the type of the threshold variable? sum I believe is a Java int.
>
>  Regards,
> Shahab
>
> On Mon, May 11, 2015 at 1:08 PM, Peter Ruch <[email protected]>
> wrote:
>
>>   Hi,
>>
>>  I am currently playing around with Hadoop and have some problems when
>> trying to filter in the Reducer.
>>
>> I extended the WordCount v1.0 example from the 2.7 MapReduce Tutorial
>> with some additional functionality
>> and added the possibility to filter by the specific value of each key -
>> e.g. only output the key-value pairs where [[ value > threshold ]].
>>
>>  Filtering Code in Reducer
>>  #####################################
>>
>>  for (IntWritable val : values) {
>>      sum += val.get();
>> }
>> if ( sum > threshold ) {
>>      result.set(sum);
>>      context.write(key, result);
>> }
>>
>> #####################################
>>
>>  For threshold smaller any value the above code works as expected and
>> the output contains all key-value pairs.
>>  If I increase the threshold to 1 some pairs are missing in the output
>> although the respective value would be larger than the threshold.
>>
>>  I tried to work out the error myself, but I could not get it to work as
>> intended. I use the exact Tutorial setup with Oracle JDK 8
>>  on a CentOS 7 machine.
>>
>>  As far as I understand the respective Iterable<...>  in the Reducer
>> already contains all the observed values for a specific key.
>>  Why is it possible that I am missing some of these key-value pairs
>> then? It only fails in very few cases. The input file is pretty large - 250
>> MB -
>>  so I also tried to increase the memory for the mapping and reduction
>> steps but it did not help ( tried a lot of different stuff without success )
>>
>>  Maybe someone already experienced similar problems / is more
>> experienced than I am.
>>
>>
>>  Thank you,
>>
>>  Peter
>>
>
>
>

Re: Re: Filtering by value in Reducer

Reply via email to