Hi, Peter The missing records, they are just gone without no logs? How about your reduce tasks logs?
Thanks Drake 민영근 Ph.D kt NexR On Tue, May 12, 2015 at 5:18 AM, Peter Ruch <[email protected]> wrote: > Hello, > > sum and threshold are both Integers. > for the threshold variable I first add a new resource to the configuration > - conf.addResource( ... ); > > later I get the threshold value from the configuration. > > Code > ##################################### > > private int threshold; > > public void setup( Context context ) { > > Configuration conf = context.getConfiguration(); > threshold = conf.getInt( "threshold", -1 ); > > } > > ##################################### > > > Best, > Peter > > > > On 11.05.2015 19:26, Shahab Yunus wrote: > > What is the type of the threshold variable? sum I believe is a Java int. > > Regards, > Shahab > > On Mon, May 11, 2015 at 1:08 PM, Peter Ruch <[email protected]> > wrote: > >> Hi, >> >> I am currently playing around with Hadoop and have some problems when >> trying to filter in the Reducer. >> >> I extended the WordCount v1.0 example from the 2.7 MapReduce Tutorial >> with some additional functionality >> and added the possibility to filter by the specific value of each key - >> e.g. only output the key-value pairs where [[ value > threshold ]]. >> >> Filtering Code in Reducer >> ##################################### >> >> for (IntWritable val : values) { >> sum += val.get(); >> } >> if ( sum > threshold ) { >> result.set(sum); >> context.write(key, result); >> } >> >> ##################################### >> >> For threshold smaller any value the above code works as expected and >> the output contains all key-value pairs. >> If I increase the threshold to 1 some pairs are missing in the output >> although the respective value would be larger than the threshold. >> >> I tried to work out the error myself, but I could not get it to work as >> intended. I use the exact Tutorial setup with Oracle JDK 8 >> on a CentOS 7 machine. >> >> As far as I understand the respective Iterable<...> in the Reducer >> already contains all the observed values for a specific key. >> Why is it possible that I am missing some of these key-value pairs >> then? It only fails in very few cases. The input file is pretty large - 250 >> MB - >> so I also tried to increase the memory for the mapping and reduction >> steps but it did not help ( tried a lot of different stuff without success ) >> >> Maybe someone already experienced similar problems / is more >> experienced than I am. >> >> >> Thank you, >> >> Peter >> > > >
