Hi
I am completely new to Hadoop and I am trying to address the following
simple application. I apologize if this sounds trivial.
I have multiple log files I need to read the log files and collect the
entries that meet some conditions and write them back to files for further
processing. ( On other words, I need to filter out some events)
I am using the WordCount example to get going.
public static class Map extends
Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
if(-1 != meetConditions(value)) {
context.write(value, one);
}
}
}
public static class Reduce extends
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
context.write(key, new IntWritable(1));
}
}
The problem is that it prints the value 1 after each entry.
Hence my question. What is the best trivial implementation of the map and
reduce function to address the use case above ?
Thank you greatly for your help