Hi,
No, I did not create any custom logs, I was only looking through the
"standard" logs.
I just started out with Hadoop and did not think of explicitly logging
that part of the code,
as I thought that I am simply missing a small detail that someone of you
might spot.
But I will definitely look into the custom logging and post my findings.
@ Shahab and Drake: Thank you very much for your help.
Best,
Peter
On 12.05.2015 14:57, Shahab Yunus wrote:
Have you tried explicitly printing or logging in you reducer around
the code that compares and then outputs the values? Maybe that will
give you a clue that what is happening? Debug the threshold value that
you get in the reducer and whether that is what you have set or not
(in case of when you set it to greater than -1)?
You can also try to use compare method for comparing IntWritables
though I doubt that would make any difference.
Shahab
On May 12, 2015 8:17 AM, "Peter Ruch" <[email protected]
<mailto:[email protected]>> wrote:
Hi,
I already skimmed through the logs but I could not find anything
special.
I am just really confused why I am having this problem.
If the Iterable<...> for a specific key contains all of the
observed values - and it seems to do so
otherwise the program wouldn't work correctly in the standard case
with [[ threshold = -1 ]] -
it should also work when I only write the key-value pairs to the
output file that suffice the condition [[ sum > threshold ]].
Did I miss something? Maybe I have to handle these cases in a
specific way, but I did not find anything about that online.
Thank you for your help,
Peter
On 12.05.2015 12:35, Drake민영근 wrote:
Hi, Peter
The missing records, they are just gone without no logs? How
about your reduce tasks logs?
Thanks
Drake 민영근 Ph.D
kt NexR
On Tue, May 12, 2015 at 5:18 AM, Peter Ruch
<[email protected] <mailto:[email protected]>> wrote:
Hello,
sum and threshold are both Integers.
for the threshold variable I first add a new resource to the
configuration - conf.addResource( ... );
later I get the threshold value from the configuration.
Code
#####################################
private int threshold;
public void setup( Context context ) {
Configuration conf = context.getConfiguration();
threshold = conf.getInt( "threshold", -1 );
}
#####################################
Best,
Peter
On 11.05.2015 19:26, Shahab Yunus wrote:
What is the type of the threshold variable? sum I believe is
a Java int.
Regards,
Shahab
On Mon, May 11, 2015 at 1:08 PM, Peter Ruch
<[email protected] <mailto:[email protected]>>
wrote:
Hi,
I am currently playing around with Hadoop and have some
problems when trying to filter in the Reducer.
I extended the WordCount v1.0 example from the 2.7
MapReduce Tutorial with some additional functionality
and added the possibility to filter by the specific
value of each key - e.g. only output the key-value pairs
where [[ value > threshold ]].
Filtering Code in Reducer
#####################################
for (IntWritable val : values) {
sum += val.get();
}
if ( sum > threshold ) {
result.set(sum);
context.write(key, result);
}
#####################################
For threshold smaller any value the above code works as
expected and the output contains all key-value pairs.
If I increase the threshold to 1 some pairs are missing
in the output although the respective value would be
larger than the threshold.
I tried to work out the error myself, but I could not
get it to work as intended. I use the exact Tutorial
setup with Oracle JDK 8
on a CentOS 7 machine.
As far as I understand the respective Iterable<...> in
the Reducer already contains all the observed values for
a specific key.
Why is it possible that I am missing some of these
key-value pairs then? It only fails in very few cases.
The input file is pretty large - 250 MB -
so I also tried to increase the memory for the mapping
and reduction steps but it did not help ( tried a lot of
different stuff without success )
Maybe someone already experienced similar problems / is
more experienced than I am.
Thank you,
Peter