Thanks All for replying to my thread.
I have further investigated the issue and found that hadoop is not
running/respecting any reduce for my jobs ir-respective of if, it is normal
mapreduce or hbase api of mapreduce.
I am pasting word count example that I have run and the input and output
file below for the reference. Please if anybody can find any issue in my
code:
*Job Config Class:*
================================================
package com.test.hadoop;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCountJob {
public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
if (args.length != 2) {
System.out.println("usage: [inputdir] [outputdir]");
System.exit(-1);
}
String inputdir = args[0].trim();
String outputdir = args[1].trim();
Configuration config = new Configuration();
Job job = new Job(config, "Word Count");
job.setJarByClass(WordCountMapper.class);
FileInputFormat.setInputPaths(job, new Path(inputdir));
FileOutputFormat.setOutputPath(job, new Path(outputdir));
job.setMapperClass(WordCountMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
boolean b2 = job.waitForCompletion(true);
if (!b2) {
throw new IOException("error with job!");
}
}
}
================================================
*Mapper Class:*
================================================
package com.test.hadoop;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<Object, Text, Text, IntWritable>
{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
protected void map(Object key, Text value,
org.apache.hadoop.mapreduce.Mapper.Context context)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
================================================
*Reducer Class:*
================================================
package com.test.hadoop;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text, IntWritable, Text,
IntWritable> {
protected void reduce(Text key, Iterable<IntWritable> values,
org.apache.hadoop.mapreduce.Reducer.Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
================================================
*Input File:*
================================================
-bash-4.1$ cat /tmp/testfile.txt
This is an example to test Hadoop so as to test if this example works fine
or not.
================================================
*Mapreduce Console Output:*
================================================
-bash-4.1$ hadoop jar /tmp/WordCount.jar com.test.hadoop.WordCountJob
/tmp/wc/input /tmp/wc/output
14/08/01 20:52:19 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
14/08/01 20:52:19 INFO input.FileInputFormat: Total input paths to process :
1
14/08/01 20:52:19 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
14/08/01 20:52:19 INFO lzo.LzoCodec: Successfully loaded & initialized
native-lzo library [hadoop-lzo rev cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3]
14/08/01 20:52:19 WARN snappy.LoadSnappy: Snappy native library is available
14/08/01 20:52:19 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
14/08/01 20:52:19 INFO snappy.LoadSnappy: Snappy native library loaded
14/08/01 20:52:41 INFO mapred.JobClient: Running job: job_201404021234_0090
14/08/01 20:52:42 INFO mapred.JobClient: map 0% reduce 0%
14/08/01 20:52:54 INFO mapred.JobClient: map 100% reduce 0%
14/08/01 20:53:02 INFO mapred.JobClient: map 100% reduce 33%
14/08/01 20:53:04 INFO mapred.JobClient: map 100% reduce 100%
14/08/01 20:53:05 INFO mapred.JobClient: Job complete: job_201404021234_0090
14/08/01 20:53:05 INFO mapred.JobClient: Counters: 29
14/08/01 20:53:05 INFO mapred.JobClient: Job Counters
14/08/01 20:53:05 INFO mapred.JobClient: Launched reduce tasks=1
14/08/01 20:53:05 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9171
14/08/01 20:53:05 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
14/08/01 20:53:05 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
14/08/01 20:53:05 INFO mapred.JobClient: Launched map tasks=1
14/08/01 20:53:05 INFO mapred.JobClient: Data-local map tasks=1
14/08/01 20:53:05 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9719
14/08/01 20:53:05 INFO mapred.JobClient: File Output Format Counters
14/08/01 20:53:05 INFO mapred.JobClient: Bytes Written=119
14/08/01 20:53:05 INFO mapred.JobClient: FileSystemCounters
14/08/01 20:53:05 INFO mapred.JobClient: FILE_BYTES_READ=197
14/08/01 20:53:05 INFO mapred.JobClient: HDFS_BYTES_READ=214
14/08/01 20:53:05 INFO mapred.JobClient: FILE_BYTES_WRITTEN=112948
14/08/01 20:53:05 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=119
14/08/01 20:53:05 INFO mapred.JobClient: File Input Format Counters
14/08/01 20:53:05 INFO mapred.JobClient: Bytes Read=83
14/08/01 20:53:05 INFO mapred.JobClient: Map-Reduce Framework
14/08/01 20:53:05 INFO mapred.JobClient: Map output materialized
bytes=197
14/08/01 20:53:05 INFO mapred.JobClient: Map input records=1
14/08/01 20:53:05 INFO mapred.JobClient: Reduce shuffle bytes=197
14/08/01 20:53:05 INFO mapred.JobClient: Spilled Records=36
14/08/01 20:53:05 INFO mapred.JobClient: Map output bytes=155
14/08/01 20:53:05 INFO mapred.JobClient: CPU time spent (ms)=2770
14/08/01 20:53:05 INFO mapred.JobClient: Total committed heap usage
(bytes)=398393344
14/08/01 20:53:05 INFO mapred.JobClient: Combine input records=0
14/08/01 20:53:05 INFO mapred.JobClient: SPLIT_RAW_BYTES=131
14/08/01 20:53:05 INFO mapred.JobClient: Reduce input records=18
14/08/01 20:53:05 INFO mapred.JobClient: Reduce input groups=15
14/08/01 20:53:05 INFO mapred.JobClient: Combine output records=0
14/08/01 20:53:05 INFO mapred.JobClient: Physical memory (bytes)
snapshot=385605632
14/08/01 20:53:05 INFO mapred.JobClient: Reduce output records=18
14/08/01 20:53:05 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=2707595264
14/08/01 20:53:05 INFO mapred.JobClient: Map output records=18
================================================
*Generated Output File:*
================================================
-bash-4.1$ hadoop fs -tail /tmp/wc/output/part-r-00000
Hadoop 1
This 1
an 1
as 1
example 1
example 1
fine 1
if 1
is 1
not. 1
or 1
so 1
test 1
test 1
this 1
to 1
to 1
works 1
================================================
Regards,
Parkirat Bagga
--
View this message in context:
http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141p4062222.html
Sent from the HBase User mailing list archive at Nabble.com.