Hi,
I already skimmed through the logs but I could not find anything special.
I am just really confused why I am having this problem.
If the Iterable<...> for a specific key contains all of the observed
values - and it seems to do so
otherwise the program wouldn't work correctly in the standard case with
[[ threshold = -1 ]] -
it should also work when I only write the key-value pairs to the output
file that suffice the condition [[ sum > threshold ]].
Did I miss something? Maybe I have to handle these cases in a specific
way, but I did not find anything about that online.
Thank you for your help,
Peter
On 12.05.2015 12:35, Drake민영근 wrote:
Hi, Peter
The missing records, they are just gone without no logs? How about
your reduce tasks logs?
Thanks
Drake 민영근 Ph.D
kt NexR
On Tue, May 12, 2015 at 5:18 AM, Peter Ruch <[email protected]
<mailto:[email protected]>> wrote:
Hello,
sum and threshold are both Integers.
for the threshold variable I first add a new resource to the
configuration - conf.addResource( ... );
later I get the threshold value from the configuration.
Code
#####################################
private int threshold;
public void setup( Context context ) {
Configuration conf = context.getConfiguration();
threshold = conf.getInt( "threshold", -1 );
}
#####################################
Best,
Peter
On 11.05.2015 19:26, Shahab Yunus wrote:
What is the type of the threshold variable? sum I believe is a
Java int.
Regards,
Shahab
On Mon, May 11, 2015 at 1:08 PM, Peter Ruch
<[email protected] <mailto:[email protected]>> wrote:
Hi,
I am currently playing around with Hadoop and have some
problems when trying to filter in the Reducer.
I extended the WordCount v1.0 example from the 2.7 MapReduce
Tutorial with some additional functionality
and added the possibility to filter by the specific value of
each key - e.g. only output the key-value pairs where [[
value > threshold ]].
Filtering Code in Reducer
#####################################
for (IntWritable val : values) {
sum += val.get();
}
if ( sum > threshold ) {
result.set(sum);
context.write(key, result);
}
#####################################
For threshold smaller any value the above code works as
expected and the output contains all key-value pairs.
If I increase the threshold to 1 some pairs are missing in
the output although the respective value would be larger than
the threshold.
I tried to work out the error myself, but I could not get it
to work as intended. I use the exact Tutorial setup with
Oracle JDK 8
on a CentOS 7 machine.
As far as I understand the respective Iterable<...> in the
Reducer already contains all the observed values for a
specific key.
Why is it possible that I am missing some of these key-value
pairs then? It only fails in very few cases. The input file
is pretty large - 250 MB -
so I also tried to increase the memory for the mapping and
reduction steps but it did not help ( tried a lot of
different stuff without success )
Maybe someone already experienced similar problems / is more
experienced than I am.
Thank you,
Peter