Programming Multiple rounds of mapreduce

2011-06-13 Thread Arko Provo Mukherjee
Hello, I am trying to write a program where I need to write multiple rounds of map and reduce. The output of the last round of map-reduce must be fed into the input of the next round. Can anyone please guide me to any link / material that can teach me as to how I can achieve this. Thanks a lot

Re: Programming Multiple rounds of mapreduce

2011-06-13 Thread Arko Provo Mukherjee
3, 2011 at 5:46 PM, Arko Provo Mukherjee < > arkoprovomukher...@gmail.com> wrote: > >> Hello, >> >> I am trying to write a program where I need to write multiple rounds of >> map and reduce. >> >> The output of the last round of map-reduce must be f

Compiling programs with Hadoop 0.21.0

2011-08-31 Thread Arko Provo Mukherjee
Hello, I am trying to learn Hadoop and doing a project on it. I need to update some files in my project and hence wanted to use version 0.21.0 However, I am confused as to how I can compile my programs on version 0.21.0 as it doesn't have any hadoop-core-0.21.0.jar file. What option should I hav

Re: Compiling programs with Hadoop 0.21.0

2011-08-31 Thread Arko Provo Mukherjee
0.21.0.jar for accessing the mapreduce APIs. I > cannot really comment further on compilation errors without seeing the > code/error messages. > > --Bobby Evans > > > On 8/31/11 4:34 PM, "Arko Provo Mukherjee" > wrote: > > Hello, > > I am trying to lea

Re: Compiling programs with Hadoop 0.21.0

2011-09-01 Thread Arko Provo Mukherjee
ass in an IDE such as Eclipse you’ll see > that when you restrict the org.apache.hadoop.* import only to packages you > need, that indeed you are using hdfs classes. > > > > Thanks, > > > > Joep > > > > From: Arko Provo Mukherjee [mailto:arkoprovomukher...@gmai

Reducers without output files

2011-09-14 Thread Arko Provo Mukherjee
Hello Everyone, I have a small issue with my Reducer that I am trying to figure out and wanted some advice. In the reducer, when writing to the output file as declared in FileOutputFormat.setOutputPath() I want to write only the key and not the value when I am calling output.collect(). Is there

Re: Reducers without output files

2011-09-14 Thread Arko Provo Mukherjee
gt; Hi Akro >       You can achieve the same within the existing mapreduce frame work > itself. Give a NullWritable in place of reducer output value in reduce > function. In your driver class as well mention the output value type as > NullWritable. > > --Original Message-

Re: Reducers without output files

2011-09-14 Thread Arko Provo Mukherjee
> Writables in hadoop. When you nedd to use NullWritable instance you can give > NullWritable.get(), which would do the job. > Ie > output.collect ( NullWritable.get(), new Text(output_string) ); > > Regards > Bejoy K S > > -Original Message- > From: Arko Provo

Passing a Global Variable into a Mapper

2011-09-15 Thread Arko Provo Mukherjee
Hi, Is there a way to pass some data from the driver class to the Mapper class without going through the HDFS? Does the API provide us with some functionality to pass some variables? Thanks a lot in advance! Warm regards Arko

Re: Passing a Global Variable into a Mapper

2011-09-15 Thread Arko Provo Mukherjee
); > somevar = conf.get(); > > On Thu, Sep 15, 2011 at 11:13 PM, Arko Provo Mukherjee > wrote: >> >> Hi, >> >> Is there a way to pass some data from the driver class to the Mapper >> class without going through the HDFS? >> >> Does the API provi

System.out.println in Map / Reduce

2011-09-26 Thread Arko Provo Mukherjee
Hi, I am writing some Map Reduce programs in pseudo-distributed mode. I am getting some error in my program and would like to debug it. For that I want to embed some print statements in my Map / Reduce. But when I am running the mappers, the prints doesn't seem to show up in the terminal. Does

Re: System.out.println in Map / Reduce

2011-09-27 Thread Arko Provo Mukherjee
utput (stdout) and error (stderr) streams of the task are read > by the TaskTracker and logged to ${HADOOP_LOG_DIR}/userlogs * > > > > Regards, > Subroto Sanyal > > From: Arko Provo Mukherjee [mailto:arkoprovomukher...@gmail.com] > Sent: Tuesday, September 27, 2011

Re: output from one map reduce job as the input to another map reduce job?

2011-09-27 Thread Arko Provo Mukherjee
Hi, I am not sure how you can avoid the filesystem, however, I did it as follows: // For Job 1 FileInputFormat.addInputPath(job1, new Path(args[0])); FileOutputFormat.setOutputPath(job1, new Path(args[1])); // For job 2 FileInputFormat.addInputPath(job2, new Path(args[1])); FileOutputFormat.setO

Iterative MR issue

2011-10-11 Thread Arko Provo Mukherjee
Hello Everyone, I have a particular situation, where I am trying to run Iterative Map-Reduce, where the output files for one iteration are the input files for the next. It stops when there are no new files created in the output. *Code Snippet:* *int round = 0;* *JobConf jobconf = new JobConf(ne

Re: Iterative MR issue

2011-10-12 Thread Arko Provo Mukherjee
Hi, I solved it by creating a new JobConf instance for each iteration in the loop. Thanks & regards Arko On Oct 12, 2011, at 1:54 AM, Arko Provo Mukherjee wrote: > Hello Everyone, > > I have a particular situation, where I am trying to run Iterative Map-Reduce, > where the

Mappers getting killed

2011-10-27 Thread Arko Provo Mukherjee
Hi, I have a situation where I have to read a large file into every mapper. Since its a large HDFS file that is needed to work on each input to the mapper, it is taking a lot of time to read the data into the memory from HDFS. Thus the system is killing all my Mappers with the following message:

Re: Mappers getting killed

2011-10-28 Thread Arko Provo Mukherjee
Thanks! I will try and let know. Warm regards Arko On Oct 27, 2011, at 8:19 AM, Brock Noland wrote: > Hi, > > On Thu, Oct 27, 2011 at 3:22 AM, Arko Provo Mukherjee > wrote: >> Hi, >> >> I have a situation where I have to read a large file into every mapper.

Re: Mappers getting killed

2011-10-31 Thread Arko Provo Mukherjee
. You could add > some context.progress() or context.setStatus("status") in your map method > from time to time (at least once every 600 seconds, to not get the timeout). > > Regards, > Lucian > > > On Thu, Oct 27, 2011 at 11:22 AM, Arko Provo Mukherjee < > ark

Sharing data in a mapper for all values

2011-10-31 Thread Arko Provo Mukherjee
Hello, I have a situation where I am reading a big file from HDFS and then comparing all the data in that file with each input to the mapper. Now since my mapper is trying to read the entire HDFS file for each of its input, the amount of data it is having to read and keep in memory is becoming la

Issues with Distributed Caching

2011-11-07 Thread Arko Provo Mukherjee
Hello, I am having the following problem with Distributed Caching. *In the driver class, I am doing the following: (/home/arko/MyProgram/data is a directory created as an output of another map-reduce)* *FileSystem fs = FileSystem.get(jobconf_seed); String init_path = "/home/arko/MyProgram/data"

Re: how to overwrite output in HDFS?

2012-04-03 Thread Arko Provo Mukherjee
Hi, Check the links below. Read from HDFS: https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-read-a-file-from-hdfs Write from HDFS: https://sites.google.com/site/hadoopandhive/home/how-to-write-a-file-in-hdfs-using-hadoop Hope they help! Thanks & regards Arko On Tue, Apr 3, 2012 a

Re: Reducer not firing

2012-04-17 Thread Arko Provo Mukherjee
output >  into the Job output path. > > Thanks > Devaraj > > ____ > From: Arko Provo Mukherjee [arkoprovomukher...@gmail.com] > Sent: Tuesday, April 17, 2012 10:32 AM > To: mapreduce-user@hadoop.apache.org > Subject: Reducer not firin

Re: Reducer not firing

2012-04-17 Thread Arko Provo Mukherjee
the reduce phase. By default task attempt logs present in > $HADOOP_LOG_DIR/userlogs//. There could be some bug exist in your > reducer which is leading to this output. > > > Thanks > Devaraj > > > From: Arko Provo Mukherjee [arkoprovomukher...@gmail.com] > > Sent: Tue

Re: Reducer not firing

2012-04-17 Thread Arko Provo Mukherjee
Hello George, It worked. Thanks so much!! Bad typo while porting :( Thanks again to everyone who helped!! Warm regards Arko On Tue, Apr 17, 2012 at 6:59 PM, George Datskos wrote: > Arko, > > Change Iterator to Iterable > > > George > > > > On 2012/04/18 8:

Increasing Java Heap Space in Slave Nodes

2013-09-06 Thread Arko Provo Mukherjee
Hello All, I am running my job on a Hadoop Cluster and it fails due to insufficient Java Heap Memory. I searched in google, and found that I need to add the following into the conf files: mapred.child.java.opts -Xmx2000m However, I don't want to request the administrator to change

Re: Increasing Java Heap Space in Slave Nodes

2013-09-07 Thread Arko Provo Mukherjee
of your job (jobConf.set(…) or > job.getConfiguration().set(…)). Alternatively, if you implement Tool, > and use its grabbed Configruation, you can also pass it via > -Dname=value argument when running the job (the option has to precede > any custom options). > > On Sat, Sep 7, 20

Re: Increasing Java Heap Space in Slave Nodes

2013-09-07 Thread Arko Provo Mukherjee
rently and check you're not offering each more memory than the > machine has spare. > > Hope this helps, > Tim > > > On Sat, Sep 7, 2013 at 8:20 PM, Arko Provo Mukherjee < > arkoprovomukher...@gmail.com> wrote: > >> Hi Harsh, >> >> Thanks for