Re: Help: How to change number of mappers in Hadoop streaming?
On Oct 26, 2008, at 8:38 AM, chaitanya krishna wrote: I forgot to mention that although the number of map tasks are set in the code as I mentioned before, the actual number of map tasks are not essentially the same number but is very close to this number. The number of reduces is precisely the one configured by the job. The number of maps depends on the InputFormat selected. For FileInputFormats, which include TextInputFormat and SequenceFileInputFormat, the formula is complicated, but it usually defaults to the greater of the number requested or the number of hdfs blocks in the input. -- Owen
lot's of small jobs
Hi, I have lot's of small jobs and would like to compute the aggregate running time of all the mappers and reducers in my job history rather than tally the numbers by hand through the web interface. I know that the Reporter object can be used to output performance numbers for a single job, but is there a mechanism to do so across multiple jobs? Thank you, Shirley
Re: Is there a way to know the input filename at Hadoop Streaming?
Each mapper works on only one file split, which is either from file1 or file2 in your case. So the value for map.input.file gives you the exact information you need. Runping On 10/23/08 11:09 AM, Steve Gao [EMAIL PROTECTED] wrote: Thanks, Amogh. But my case is slightly different. The command line inputs are 2 files: file1 and file2. I need to tell in the mapper which line is from which file: #In mapper while (STDIN){ //how to tell the current line is from file1 or file2? } -jobconfs map.input.file param does not help in this case because file1 and file2 are both input. -Steve --- On Thu, 10/23/08, Amogh Vasekar [EMAIL PROTECTED] wrote: From: Amogh Vasekar [EMAIL PROTECTED] Subject: RE: Is there a way to know the input filename at Hadoop Streaming? To: [EMAIL PROTECTED] Date: Thursday, October 23, 2008, 12:11 AM Personally haven't worked with streaming but I guess the ur jobconfs map.input.file param should do it for you. -Original Message- From: Steve Gao [mailto:[EMAIL PROTECTED] Sent: Thursday, October 23, 2008 7:26 AM To: core-user@hadoop.apache.org Cc: [EMAIL PROTECTED] Subject: Is there a way to know the input filename at Hadoop Streaming? I am using Hadoop Streaming. The input are multiple files. Is there a way to get the current filename in mapper? For example: $HADOOP_HOME/bin/hadoop \ jar $HADOOP_HOME/hadoop-streaming.jar \ -input file1 \ -input file2 \ -output myOutputDir \ -mapper mapper \ -reducer reducer In mapper: while (STDIN){ //how to tell the current line is from file1 or file2? }
Re: writable class to be used to read floating point values from input?
Thanks .. I converted the text--string -- Float. I am trying to calculate the average of a very large set of numbers. You are right...I plan to use a dummy key (its not null as i said before) as input to reduce. Then in reduce when sorted, i will have a single record as key,n1,n2,n3... which i will use to calculate the avg. Regards, Pavan From: Owen O'Malley [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Sunday, 26 October, 2008 1:24:43 AM Subject: Re: writable class to be used to read floating point values from input? On Oct 25, 2008, at 8:32 PM, pols cut wrote: I am trying to write a map reduce function which takes take the following types of key,value pairs Map function -- should read floating point values (i dont really care about key) it should output null,floatwritable If the input is stored in a text file, using TextInputFormat is right. Your map inputs will be: LongWritable, Text Just use the Text and convert it to a Double. reduce -- input- null,floatwritable output null,floatwritable This doesn't make any sense. How should the input to the reduce be sorted? By the float? In that case, it would be: FloatWritable, NullWritable You will get one call to the reduce for each distinct float value the maps generate. The reduce can iterate through the NullWritables to see how many times that key was generated. -- Owen Bollywood news, movie reviews, film trailers and more! Go to http://in.movies.yahoo.com/
local bytes written/read much higher than the hdfs bytes
I run map/reduce job through streaming, and notice that local bytes written/read in my job is always many times higher than the hdfs bytes? but if i run the job straight through java, this problem goes away. Why does this happen? Is it because jvm memory is not enough and use the disk for cache? or I didn't configure the hadoop right? how can i configure? By the way, my hadoop version is 0.17.1. Thanks xiao xinyan
[ANNOUNCE] Apache ZooKeeper 3.0.0
The Apache ZooKeeper team is proud to announce our first official Apache release, version 3.0.0 of ZooKeeper. ZooKeeper is a high-performance coordination service for distributed applications. It exposes common services - such as naming, configuration management, synchronization, and group services - in a simple interface so you don't have to write them from scratch. You can use it off-the-shelf to implement consensus, group management, leader election, and presence protocols. And you can build on it for your own, specific needs. Version 3.0.0 is a major version upgrade from our previous 2.2.1 release on SourceForge, it addresses over 100 issues. If you are upgrading be sure to review the 3.0.0 release notes for migration instructions. For ZooKeeper release details and downloads, visit: http://hadoop.apache.org/zookeeper/releases.html ZooKeeper 3.0.0 Release Notes and Migration Instructions are at: http://hadoop.apache.org/zookeeper/docs/r3.0.0/releasenotes.html Regards, The ZooKeeper Team