Just reviewed the code again, you are not really using map-reduce. you are reading all files in one map process, this is not a normal map-reduce job works.
Regards, *Stanley Shi,* On Thu, Mar 20, 2014 at 1:50 PM, Ranjini Rathinam <[email protected]>wrote: > Hi, > > If we give the below code, > ======================= > word.set("filename"+" "+tokenizer.nextToken()); > output.collect(word,one); > ====================== > > The output is wrong. because it shows the > > filename word occurance > vinitha java 4 > vinitha oracle 3 > sony java 4 > sony oracle 3 > > > Here vinitha does not have oracle word . Similarlly sony does not have > java has word. File name is merging for all words. > > I need the output has given below > > filename word occurance > > vinitha java 4 > vinitha C++ 3 > sony ETL 4 > sony oracle 3 > > > Need fileaName along with the word in that particular file only. No merge > should happen. > > Please help me out for this issue. > > Please help. > > Thanks in advance. > > Ranjini > > > > > On Thu, Mar 20, 2014 at 10:56 AM, Ranjini Rathinam <[email protected] > > wrote: > > >> >> ---------- Forwarded message ---------- >> From: Stanley Shi <[email protected]> >> Date: Thu, Mar 20, 2014 at 7:39 AM >> Subject: Re: Need FileName with Content >> To: [email protected] >> >> >> You want to do a word count for each file, but the code give you a word >> count for all the files, right? >> >> ===== >> word.set(tokenizer.nextToken()); >> output.collect(word, one); >> ====== >> change it to: >> word.set("filename"+" "+tokenizer.nextToken()); >> output.collect(word,one); >> >> >> >> >> Regards, >> *Stanley Shi,* >> >> >> >> On Wed, Mar 19, 2014 at 8:50 PM, Ranjini Rathinam <[email protected] >> > wrote: >> >>> Hi, >>> >>> I have folder named INPUT. >>> >>> Inside INPUT i have 5 resume are there. >>> >>> hduser@localhost:~/Ranjini$ hadoop fs -ls /user/hduser/INPUT >>> Found 5 items >>> -rw-r--r-- 1 hduser supergroup 5438 2014-03-18 15:20 >>> /user/hduser/INPUT/Rakesh Chowdary_Microstrategy.txt >>> -rw-r--r-- 1 hduser supergroup 6022 2014-03-18 15:22 >>> /user/hduser/INPUT/Ramarao Devineni_Microstrategy.txt >>> -rw-r--r-- 1 hduser supergroup 3517 2014-03-18 15:21 >>> /user/hduser/INPUT/vinitha.txt >>> -rw-r--r-- 1 hduser supergroup 3517 2014-03-18 15:21 >>> /user/hduser/INPUT/sony.txt >>> -rw-r--r-- 1 hduser supergroup 3517 2014-03-18 15:21 >>> /user/hduser/INPUT/ravi.txt >>> hduser@localhost:~/Ranjini$ >>> >>> I have to process the folder and the content . >>> >>> I need ouput has >>> >>> filename word occurance >>> vinitha java 4 >>> sony oracle 3 >>> >>> >>> >>> But iam not getting the filename. Has the input file content are merged >>> file name is not getting correct . >>> >>> >>> please help in this issue to fix. I have given by code below >>> >>> >>> import java.io.IOException; >>> import java.util.*; >>> import org.apache.hadoop.fs.Path; >>> import org.apache.hadoop.conf.*; >>> import org.apache.hadoop.io.*; >>> import org.apache.hadoop.mapred.*; >>> import org.apache.hadoop.util.*; >>> import java.io.File; >>> import java.io.FileReader; >>> import java.io.FileWriter; >>> import java.io.IOException; >>> import org.apache.hadoop.fs.Path; >>> import org.apache.hadoop.conf.Configuration; >>> import org.apache.hadoop.fs.FileSystem; >>> import org.apache.hadoop.fs.FileStatus; >>> import org.apache.hadoop.conf.*; >>> import org.apache.hadoop.io.*; >>> import org.apache.hadoop.mapred.*; >>> import org.apache.hadoop.util.*; >>> import org.apache.hadoop.mapred.lib.*; >>> >>> public class WordCount { >>> public static class Map extends MapReduceBase implements >>> Mapper<LongWritable, Text, Text, IntWritable> { >>> private final static IntWritable one = new IntWritable(1); >>> private Text word = new Text(); >>> public void map(LongWritable key, Text value, >>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws >>> IOException { >>> FSDataInputStream fs=null; >>> FileSystem hdfs = null; >>> String line = value.toString(); >>> int i=0,k=0; >>> try{ >>> Configuration configuration = new Configuration(); >>> configuration.set("fs.default.name", "hdfs://localhost:4440/"); >>> >>> Path srcPath = new Path("/user/hduser/INPUT/"); >>> >>> hdfs = FileSystem.get(configuration); >>> FileStatus[] status = hdfs.listStatus(srcPath); >>> fs=hdfs.open(srcPath); >>> BufferedReader br=new BufferedReader(new >>> InputStreamReader(hdfs.open(srcPath))); >>> >>> String[] splited = line.split("\\s+"); >>> for( i=0;i<splited.length;i++) >>> { >>> String sp[]=splited[i].split(","); >>> for( k=0;k<sp.length;k++) >>> { >>> >>> if(!sp[k].isEmpty()){ >>> StringTokenizer tokenizer = new StringTokenizer(sp[k]); >>> if((sp[k].equalsIgnoreCase("C"))){ >>> while (tokenizer.hasMoreTokens()) { >>> word.set(tokenizer.nextToken()); >>> output.collect(word, one); >>> } >>> } >>> if((sp[k].equalsIgnoreCase("JAVA"))){ >>> while (tokenizer.hasMoreTokens()) { >>> word.set(tokenizer.nextToken()); >>> output.collect(word, one); >>> } >>> } >>> } >>> } >>> } >>> } catch (IOException e) { >>> e.printStackTrace(); >>> } >>> } >>> } >>> public static class Reduce extends MapReduceBase implements >>> Reducer<Text, IntWritable, Text, IntWritable> { >>> public void reduce(Text key, Iterator<IntWritable> values, >>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws >>> IOException { >>> int sum = 0; >>> while (values.hasNext()) { >>> sum += values.next().get(); >>> } >>> output.collect(key, new IntWritable(sum)); >>> } >>> } >>> public static void main(String[] args) throws Exception { >>> >>> >>> JobConf conf = new JobConf(WordCount.class); >>> conf.setJobName("wordcount"); >>> conf.setOutputKeyClass(Text.class); >>> conf.setOutputValueClass(IntWritable.class); >>> conf.setMapperClass(Map.class); >>> conf.setCombinerClass(Reduce.class); >>> conf.setReducerClass(Reduce.class); >>> conf.setInputFormat(TextInputFormat.class); >>> conf.setOutputFormat(TextOutputFormat.class); >>> FileInputFormat.setInputPaths(conf, new Path(args[0])); >>> FileOutputFormat.setOutputPath(conf, new Path(args[1])); >>> JobClient.runJob(conf); >>> } >>> } >>> >>> >>> >>> Please help >>> >>> Thanks in advance. >>> >>> Ranjini >>> >>> >>> >> >> >
