Re: Hbase Mapreduce API - Reduce to a file is not working properly.

Parkirat Fri, 01 Aug 2014 14:06:06 -0700

Thanks All for replying to my thread.

I have further investigated the issue and found that hadoop is not
running/respecting any reduce for my jobs ir-respective of if, it is normal
mapreduce or hbase api of mapreduce.


I am pasting word count example that I have run and the input and output
file below for the reference. Please if anybody can find any issue in my
code:

*Job Config Class:*
================================================
package com.test.hadoop;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCountJob {
        
        public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
                
                if (args.length != 2) {
                        System.out.println("usage: [inputdir] [outputdir]");
                        System.exit(-1);
                }
                
                String inputdir = args[0].trim();
                String outputdir = args[1].trim();
                
                Configuration config = new Configuration();
                
                Job job = new Job(config, "Word Count");
                job.setJarByClass(WordCountMapper.class);
                
                FileInputFormat.setInputPaths(job, new Path(inputdir));
                FileOutputFormat.setOutputPath(job, new Path(outputdir));
                
                job.setMapperClass(WordCountMapper.class);
                job.setMapOutputKeyClass(Text.class);
                job.setMapOutputValueClass(IntWritable.class);
                
                job.setReducerClass(WordCountReducer.class);
                job.setOutputKeyClass(Text.class);
                job.setOutputValueClass(IntWritable.class);
                
                boolean b2 = job.waitForCompletion(true);
                if (!b2) {
                        throw new IOException("error with job!");
                }
        }

}
================================================

*Mapper Class:*
================================================
package com.test.hadoop;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordCountMapper extends Mapper<Object, Text, Text, IntWritable>
{

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        @Override
        protected void map(Object key, Text value,
                        org.apache.hadoop.mapreduce.Mapper.Context context)
                        throws IOException, InterruptedException {
                
                String line = value.toString();
                StringTokenizer tokenizer = new StringTokenizer(line);
                
                while (tokenizer.hasMoreTokens()) {
                        word.set(tokenizer.nextToken());
                        context.write(word, one);
                }
        }
}
================================================

*Reducer Class:*
================================================
package com.test.hadoop;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WordCountReducer extends Reducer<Text, IntWritable, Text,
IntWritable> {

        protected void reduce(Text key, Iterable<IntWritable> values,
                        org.apache.hadoop.mapreduce.Reducer.Context context)
                        throws IOException, InterruptedException {
                
                int sum = 0;
                for (IntWritable val : values) {
                        sum += val.get();
                }
                context.write(key, new IntWritable(sum));
        }
}
================================================

*Input File:*
================================================
-bash-4.1$ cat /tmp/testfile.txt
This is an example to test Hadoop so as to test if this example works fine
or not.
================================================

*Mapreduce Console Output:*
================================================
-bash-4.1$ hadoop jar /tmp/WordCount.jar com.test.hadoop.WordCountJob
/tmp/wc/input /tmp/wc/output
14/08/01 20:52:19 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
14/08/01 20:52:19 INFO input.FileInputFormat: Total input paths to process :
1
14/08/01 20:52:19 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
14/08/01 20:52:19 INFO lzo.LzoCodec: Successfully loaded & initialized
native-lzo library [hadoop-lzo rev cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3]
14/08/01 20:52:19 WARN snappy.LoadSnappy: Snappy native library is available
14/08/01 20:52:19 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
14/08/01 20:52:19 INFO snappy.LoadSnappy: Snappy native library loaded
14/08/01 20:52:41 INFO mapred.JobClient: Running job: job_201404021234_0090
14/08/01 20:52:42 INFO mapred.JobClient:  map 0% reduce 0%
14/08/01 20:52:54 INFO mapred.JobClient:  map 100% reduce 0%
14/08/01 20:53:02 INFO mapred.JobClient:  map 100% reduce 33%
14/08/01 20:53:04 INFO mapred.JobClient:  map 100% reduce 100%
14/08/01 20:53:05 INFO mapred.JobClient: Job complete: job_201404021234_0090
14/08/01 20:53:05 INFO mapred.JobClient: Counters: 29
14/08/01 20:53:05 INFO mapred.JobClient:   Job Counters
14/08/01 20:53:05 INFO mapred.JobClient:     Launched reduce tasks=1
14/08/01 20:53:05 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=9171
14/08/01 20:53:05 INFO mapred.JobClient:     Total time spent by all reduces
waiting after reserving slots (ms)=0
14/08/01 20:53:05 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
14/08/01 20:53:05 INFO mapred.JobClient:     Launched map tasks=1
14/08/01 20:53:05 INFO mapred.JobClient:     Data-local map tasks=1
14/08/01 20:53:05 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=9719
14/08/01 20:53:05 INFO mapred.JobClient:   File Output Format Counters
14/08/01 20:53:05 INFO mapred.JobClient:     Bytes Written=119
14/08/01 20:53:05 INFO mapred.JobClient:   FileSystemCounters
14/08/01 20:53:05 INFO mapred.JobClient:     FILE_BYTES_READ=197
14/08/01 20:53:05 INFO mapred.JobClient:     HDFS_BYTES_READ=214
14/08/01 20:53:05 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=112948
14/08/01 20:53:05 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=119
14/08/01 20:53:05 INFO mapred.JobClient:   File Input Format Counters
14/08/01 20:53:05 INFO mapred.JobClient:     Bytes Read=83
14/08/01 20:53:05 INFO mapred.JobClient:   Map-Reduce Framework
14/08/01 20:53:05 INFO mapred.JobClient:     Map output materialized
bytes=197
14/08/01 20:53:05 INFO mapred.JobClient:     Map input records=1
14/08/01 20:53:05 INFO mapred.JobClient:     Reduce shuffle bytes=197
14/08/01 20:53:05 INFO mapred.JobClient:     Spilled Records=36
14/08/01 20:53:05 INFO mapred.JobClient:     Map output bytes=155
14/08/01 20:53:05 INFO mapred.JobClient:     CPU time spent (ms)=2770
14/08/01 20:53:05 INFO mapred.JobClient:     Total committed heap usage
(bytes)=398393344
14/08/01 20:53:05 INFO mapred.JobClient:     Combine input records=0
14/08/01 20:53:05 INFO mapred.JobClient:     SPLIT_RAW_BYTES=131
14/08/01 20:53:05 INFO mapred.JobClient:     Reduce input records=18
14/08/01 20:53:05 INFO mapred.JobClient:     Reduce input groups=15
14/08/01 20:53:05 INFO mapred.JobClient:     Combine output records=0
14/08/01 20:53:05 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=385605632
14/08/01 20:53:05 INFO mapred.JobClient:     Reduce output records=18
14/08/01 20:53:05 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=2707595264
14/08/01 20:53:05 INFO mapred.JobClient:     Map output records=18
================================================

*Generated Output File:*
================================================
-bash-4.1$ hadoop fs -tail /tmp/wc/output/part-r-00000
Hadoop  1
This    1
an      1
as      1
example 1
example 1
fine    1
if      1
is      1
not.    1
or      1
so      1
test    1
test    1
this    1
to      1
to      1
works   1
================================================

Regards,
Parkirat Bagga




--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141p4062222.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Hbase Mapreduce API - Reduce to a file is not working properly.

Reply via email to