Hi Rishi, But I dont want the wordcount of all the words.. In json, there is a field "text".. and those are the words I wish to count?
On Wed, May 29, 2013 at 4:43 PM, Rishi Yadav <[email protected]> wrote: > Hi Jamal, > > I took your input and put it in sample wordcount program and it's working > just fine and giving this output. > > author 3 > foo234 1 > text 3 > foo 1 > foo123 1 > hello 3 > this 1 > world 2 > > > When we split using > > String[] words = input.split("\\W+"); > > it takes care of all non-alphanumeric characters. > > Thanks and Regards, > > Rishi Yadav > > On Wed, May 29, 2013 at 2:54 PM, jamal sasha <[email protected]>wrote: > >> Hi, >> I am stuck again. :( >> My input data is in hdfs. I am again trying to do wordcount but there is >> slight difference. >> The data is in json format. >> So each line of data is: >> >> {"author":"foo", "text": "hello"} >> {"author":"foo123", "text": "hello world"} >> {"author":"foo234", "text": "hello this world"} >> >> So I want to do wordcount for text part. >> I understand that in mapper, I just have to pass this data as json and >> extract "text" and rest of the code is just the same but I am trying to >> switch from python to java hadoop. >> How do I do this. >> Thanks >> > >
