Hi, Peter

The missing records, they are just gone without no logs? How about your
reduce tasks logs?

Thanks

Drake 민영근 Ph.D
kt NexR

On Tue, May 12, 2015 at 5:18 AM, Peter Ruch <[email protected]> wrote:

>  Hello,
>
> sum and threshold are both Integers.
> for the threshold variable I first add a new resource to the configuration
> - conf.addResource( ... );
>
> later I get the threshold value from the configuration.
>
> Code
> #####################################
>
> private int threshold;
>
> public void setup( Context context ) {
>
>           Configuration conf = context.getConfiguration();
>           threshold = conf.getInt( "threshold", -1 );
>
> }
>
> #####################################
>
>
> Best,
> Peter
>
>
>
> On 11.05.2015 19:26, Shahab Yunus wrote:
>
> What is the type of the threshold variable? sum I believe is a Java int.
>
>  Regards,
> Shahab
>
> On Mon, May 11, 2015 at 1:08 PM, Peter Ruch <[email protected]>
> wrote:
>
>>   Hi,
>>
>>  I am currently playing around with Hadoop and have some problems when
>> trying to filter in the Reducer.
>>
>> I extended the WordCount v1.0 example from the 2.7 MapReduce Tutorial
>> with some additional functionality
>> and added the possibility to filter by the specific value of each key -
>> e.g. only output the key-value pairs where [[ value > threshold ]].
>>
>>  Filtering Code in Reducer
>>  #####################################
>>
>>  for (IntWritable val : values) {
>>      sum += val.get();
>> }
>> if ( sum > threshold ) {
>>      result.set(sum);
>>      context.write(key, result);
>> }
>>
>> #####################################
>>
>>  For threshold smaller any value the above code works as expected and
>> the output contains all key-value pairs.
>>  If I increase the threshold to 1 some pairs are missing in the output
>> although the respective value would be larger than the threshold.
>>
>>  I tried to work out the error myself, but I could not get it to work as
>> intended. I use the exact Tutorial setup with Oracle JDK 8
>>  on a CentOS 7 machine.
>>
>>  As far as I understand the respective Iterable<...>  in the Reducer
>> already contains all the observed values for a specific key.
>>  Why is it possible that I am missing some of these key-value pairs
>> then? It only fails in very few cases. The input file is pretty large - 250
>> MB -
>>  so I also tried to increase the memory for the mapping and reduction
>> steps but it did not help ( tried a lot of different stuff without success )
>>
>>  Maybe someone already experienced similar problems / is more
>> experienced than I am.
>>
>>
>>  Thank you,
>>
>>  Peter
>>
>
>
>

Reply via email to