I've written two blog post of how to get directory context in hadoop mapper.
http://www.idryman.org/blog/2014/01/26/capture-directory-context-in-hadoop-mapper/ http://www.idryman.org/blog/2014/01/27/capture-path-info-in-hadoop-inputformat-class/ Cheers, Felix On Mar 19, 2014, at 10:50 PM, Ranjini Rathinam <ranjinibe...@gmail.com> wrote: > Hi, > > If we give the below code, > ======================= > word.set("filename"+" "+tokenizer.nextToken()); > output.collect(word,one); > ====================== > > The output is wrong. because it shows the > > filename word occurance > vinitha java 4 > vinitha oracle 3 > sony java 4 > sony oracle 3 > > > Here vinitha does not have oracle word . Similarlly sony does not have java > has word. File name is merging for all words. > > I need the output has given below > > filename word occurance > > vinitha java 4 > vinitha C++ 3 > sony ETL 4 > sony oracle 3 > > > Need fileaName along with the word in that particular file only. No merge > should happen. > > Please help me out for this issue. > > Please help. > > Thanks in advance. > > Ranjini > > > > > On Thu, Mar 20, 2014 at 10:56 AM, Ranjini Rathinam <ranjinibe...@gmail.com> > wrote: > > > ---------- Forwarded message ---------- > From: Stanley Shi <s...@gopivotal.com> > Date: Thu, Mar 20, 2014 at 7:39 AM > Subject: Re: Need FileName with Content > To: user@hadoop.apache.org > > > You want to do a word count for each file, but the code give you a word count > for all the files, right? > > ===== > word.set(tokenizer.nextToken()); > output.collect(word, one); > ====== > change it to: > word.set("filename"+" "+tokenizer.nextToken()); > output.collect(word,one); > > > > > Regards, > Stanley Shi, > > > > On Wed, Mar 19, 2014 at 8:50 PM, Ranjini Rathinam <ranjinibe...@gmail.com> > wrote: > Hi, > > I have folder named INPUT. > > Inside INPUT i have 5 resume are there. > > hduser@localhost:~/Ranjini$ hadoop fs -ls /user/hduser/INPUT > Found 5 items > -rw-r--r-- 1 hduser supergroup 5438 2014-03-18 15:20 > /user/hduser/INPUT/Rakesh Chowdary_Microstrategy.txt > -rw-r--r-- 1 hduser supergroup 6022 2014-03-18 15:22 > /user/hduser/INPUT/Ramarao Devineni_Microstrategy.txt > -rw-r--r-- 1 hduser supergroup 3517 2014-03-18 15:21 > /user/hduser/INPUT/vinitha.txt > -rw-r--r-- 1 hduser supergroup 3517 2014-03-18 15:21 > /user/hduser/INPUT/sony.txt > -rw-r--r-- 1 hduser supergroup 3517 2014-03-18 15:21 > /user/hduser/INPUT/ravi.txt > hduser@localhost:~/Ranjini$ > > I have to process the folder and the content . > > I need ouput has > > filename word occurance > vinitha java 4 > sony oracle 3 > > > > But iam not getting the filename. Has the input file content are merged file > name is not getting correct . > > > please help in this issue to fix. I have given by code below > > > import java.io.IOException; > import java.util.*; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.conf.*; > import org.apache.hadoop.io.*; > import org.apache.hadoop.mapred.*; > import org.apache.hadoop.util.*; > import java.io.File; > import java.io.FileReader; > import java.io.FileWriter; > import java.io.IOException; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.FileSystem; > import org.apache.hadoop.fs.FileStatus; > import org.apache.hadoop.conf.*; > import org.apache.hadoop.io.*; > import org.apache.hadoop.mapred.*; > import org.apache.hadoop.util.*; > import org.apache.hadoop.mapred.lib.*; > > public class WordCount { > public static class Map extends MapReduceBase implements > Mapper<LongWritable, Text, Text, IntWritable> { > private final static IntWritable one = new IntWritable(1); > private Text word = new Text(); > public void map(LongWritable key, Text value, OutputCollector<Text, > IntWritable> output, Reporter reporter) throws IOException { > FSDataInputStream fs=null; > FileSystem hdfs = null; > String line = value.toString(); > int i=0,k=0; > try{ > Configuration configuration = new Configuration(); > configuration.set("fs.default.name", "hdfs://localhost:4440/"); > > Path srcPath = new Path("/user/hduser/INPUT/"); > > hdfs = FileSystem.get(configuration); > FileStatus[] status = hdfs.listStatus(srcPath); > fs=hdfs.open(srcPath); > BufferedReader br=new BufferedReader(new > InputStreamReader(hdfs.open(srcPath))); > > String[] splited = line.split("\\s+"); > for( i=0;i<splited.length;i++) > { > String sp[]=splited[i].split(","); > for( k=0;k<sp.length;k++) > { > > if(!sp[k].isEmpty()){ > StringTokenizer tokenizer = new StringTokenizer(sp[k]); > if((sp[k].equalsIgnoreCase("C"))){ > while (tokenizer.hasMoreTokens()) { > word.set(tokenizer.nextToken()); > output.collect(word, one); > } > } > if((sp[k].equalsIgnoreCase("JAVA"))){ > while (tokenizer.hasMoreTokens()) { > word.set(tokenizer.nextToken()); > output.collect(word, one); > } > } > } > } > } > } catch (IOException e) { > e.printStackTrace(); > } > } > } > public static class Reduce extends MapReduceBase implements Reducer<Text, > IntWritable, Text, IntWritable> { > public void reduce(Text key, Iterator<IntWritable> values, > OutputCollector<Text, IntWritable> output, Reporter reporter) throws > IOException { > int sum = 0; > while (values.hasNext()) { > sum += values.next().get(); > } > output.collect(key, new IntWritable(sum)); > } > } > public static void main(String[] args) throws Exception { > > > JobConf conf = new JobConf(WordCount.class); > conf.setJobName("wordcount"); > conf.setOutputKeyClass(Text.class); > conf.setOutputValueClass(IntWritable.class); > conf.setMapperClass(Map.class); > conf.setCombinerClass(Reduce.class); > conf.setReducerClass(Reduce.class); > conf.setInputFormat(TextInputFormat.class); > conf.setOutputFormat(TextOutputFormat.class); > FileInputFormat.setInputPaths(conf, new Path(args[0])); > FileOutputFormat.setOutputPath(conf, new Path(args[1])); > JobClient.runJob(conf); > } > } > > Please help > > Thanks in advance. > > Ranjini > > > > >