Re: Modeling WordCount in a different way

2009-04-07 Thread Aayush Garg
directly from the hdfs and since it would be a sorted file you could just walk it and merge the count in single pass in the reduce function. - Sharad -- Aayush Garg, Phone: +41 764822440

Re: Modeling WordCount in a different way

2009-04-07 Thread Aayush Garg
Burger norbert.bur...@gmail.comwrote: Aayush, out of curiosity, why do you want model wordcount this way? What benefit do you see? Norbert On 4/6/09, Aayush Garg aayush.g...@gmail.com wrote: Hi, I want to make experiments with wordcount example in a different way. Suppose we have very

Modeling WordCount in a different way

2009-04-06 Thread Aayush Garg
Hi, I want to make experiments with wordcount example in a different way. Suppose we have very large data. Instead of splitting all the data one time, we want to feed some splits in the map-reduce job at a time. I want to model the hadoop job like this, Suppose a batch of inputsplits arrive in

Optimized way

2008-12-04 Thread Aayush Garg
Hi, I am having a 5 node cluster for hadoop usage. All nodes are multi-core. I am running a shell command in Map function of my program and this shell command takes one file as an input. Many of such files are copied in the HDFS. So in summary map function will run a command like ./run file1

Re: Error in start up

2008-04-23 Thread Aayush Garg
I put my username to R61neptun as you suggested but I am still getting that error: localhost: starting datanode, logging to /home/garga/Documents/hadoop-0.15.3/bin/../logs/hadoop-garga-datanode-R61neptun.out localhost: starting secondarynamenode, logging to

Re: Splitting in various files

2008-04-21 Thread Aayush Garg
Could anyone please tell? On Sat, Apr 19, 2008 at 1:33 PM, Aayush Garg [EMAIL PROTECTED] wrote: Hi, I have written the following code for writing my key,value pairs in the file, and this file is then read by another MR. Path pth = new Path(./dir1/dir2/filename); FileSystem fs

Re: Splitting in various files

2008-04-21 Thread Aayush Garg
I just tried the same thing (mapred.task.id)as you told..But I am getting one file named null in my directory. On Mon, Apr 21, 2008 at 8:33 AM, Amar Kamat [EMAIL PROTECTED] wrote: Aayush Garg wrote: Could anyone please tell? On Sat, Apr 19, 2008 at 1:33 PM, Aayush Garg [EMAIL PROTECTED

Re: Error in start up

2008-04-21 Thread Aayush Garg
Could anyone please help me with this error below ? I am not able to start HDFS due to this? Thanks, On Sat, Apr 19, 2008 at 7:25 PM, Aayush Garg [EMAIL PROTECTED] wrote: I have my hadoop-site.xml correct !! but it creates error in this way On Sat, Apr 19, 2008 at 6:35 PM, Stuart Sierra

Error in start up

2008-04-19 Thread Aayush Garg
HI, I am getting following error on start up the hadoop as pseudo distributed:: bin/start-all.sh localhost: starting datanode, logging to /home/garga/Documents/hadoop-0.15.3/bin/../logs/hadoop-root-datanode-R61-neptun.out localhost: starting secondarynamenode, logging to

Re: Error in start up

2008-04-19 Thread Aayush Garg
I have my hadoop-site.xml correct !! but it creates error in this way On Sat, Apr 19, 2008 at 6:35 PM, Stuart Sierra [EMAIL PROTECTED] wrote: On Sat, Apr 19, 2008 at 9:53 AM, Aayush Garg [EMAIL PROTECTED] wrote: I am getting following error on start up the hadoop as pseudo distributed

Re: Map reduce classes

2008-04-17 Thread Aayush Garg
a global picture? I guess not. In general, it is better to not try to communicate between map and reduce except via the expected mechanisms. On 4/16/08 1:33 PM, Aayush Garg [EMAIL PROTECTED] wrote: We can not read HashMap in the configure method of the reducer because

Re: Map reduce classes

2008-04-17 Thread Aayush Garg
. The file can be very big ...so can I write in such a manner that file is distributed and I can read it easily in the next MapReduce Phase. Other way, can I split the file when it becomes gerater than a certain size? Thanks, Aayush On Thu, Apr 17, 2008 at 1:01 PM, Aayush Garg [EMAIL PROTECTED] wrote

Re: Map reduce classes

2008-04-17 Thread Aayush Garg
getting FileSystem create error? Thanks, On Thu, Apr 17, 2008 at 5:54 PM, Ted Dunning [EMAIL PROTECTED] wrote: Don't assume that any variables are shared between reducers or between maps, or between maps and reducers. If you want to share data, put it into HDFS. On 4/17/08 4:01 AM, Aayush Garg

Re: Map reduce classes

2008-04-16 Thread Aayush Garg
should I choose??? Is this design and approach ok? } public static void main() {} } I hope you have got my question. Thanks, On Wed, Apr 16, 2008 at 8:33 AM, Amar Kamat [EMAIL PROTECTED] wrote: Aayush Garg wrote: Hi, Are you sure that another MR is required for eliminating some rows

Re: Map reduce classes

2008-04-16 Thread Aayush Garg
format you like. On 4/16/08 12:41 PM, Aayush Garg [EMAIL PROTECTED] wrote: HI, The current structure of my program is:: Upper class{ class Reduce{ reduce function(K1,V1,K2,V2){ // I count the frequency for each key // Add output in HashMap(Key,value) instead

Re: Map reduce classes

2008-04-16 Thread Aayush Garg
. On 4/16/08 1:33 PM, Aayush Garg [EMAIL PROTECTED] wrote: We can not read HashMap in the configure method of the reducer because it is called before reduce job. I need to eliminate rows from the HashMap when all the keys are read. Also my concern is if dataset is large will this HashMap thing

Re: Sorting the OutputCollector

2008-04-09 Thread Aayush Garg
But the problem is that I need to sort according to freq which is the part of my value field... Any inputs?? Could you provide smal piece of code of your thought On Wed, Apr 9, 2008 at 9:45 AM, Owen O'Malley [EMAIL PROTECTED] wrote: On Apr 8, 2008, at 4:54 AM, Aayush Garg wrote: I construct

Sorting the OutputCollector

2008-04-08 Thread Aayush Garg
Hi, I have implemented Key and value pairs in the following way: Key (Text class) Value(Custom class) word1 word2 class Custom{ int freq; TreeMapString, ArrayListString } I construct this type of key, value pairs in the outputcollector of reduce phase. Now I want to SORT this

Sorting the OutputCollector

2008-04-08 Thread Aayush Garg
Hi, I have implemented Key and value pairs in the following way: Key (Text class) Value(Custom class) word1 word2 class Custom{ int freq; TreeMapString, ArrayListString } I construct this type of key, value pairs in the outputcollector of reduce phase. Now I want to SORT this

Re: Hadoop: Multiple map reduce or some better way

2008-04-04 Thread Aayush Garg
Dempsey (new to the list) On Apr 4, 2008, at 5:36 PM, Ted Dunning wrote: See Nutch. See Nutch run. http://en.wikipedia.org/wiki/Nutch http://lucene.apache.org/nutch/ -- Aayush Garg, Phone: +41 76 482 240

Re: Hadoop: Multiple map reduce or some better way

2008-04-03 Thread Aayush Garg
[EMAIL PROTECTED] wrote: On Wed, 26 Mar 2008, Aayush Garg wrote: HI, I am developing the simple inverted index program frm the hadoop. My map function has the output: word, doc and the reducer has: word, list(docs) Now I want to use one more mapreduce to remove stop and scrub words

Hadoop: Multiple map reduce or some better way

2008-03-26 Thread Aayush Garg
associated with every word. How should I design my program from this stage? I mean how would I apply multiple mapreduce to this? What would be the better way to perform this? Thanks, Regards, - Aayush Garg, Phone: +41 76 482 240