Re: Help: How to change number of mappers in Hadoop streaming?

2008-10-26 Thread Owen O'Malley


On Oct 26, 2008, at 8:38 AM, chaitanya krishna wrote:

I forgot to mention that although the number of map tasks are set in  
the

code as I mentioned before, the actual number of map tasks are not
essentially the same number but is very close to this number.


The number of reduces is precisely the one configured by the job. The  
number of maps depends on the InputFormat selected. For  
FileInputFormats, which include TextInputFormat and  
SequenceFileInputFormat, the formula is complicated, but it usually  
defaults to the greater of the number requested or the number of hdfs  
blocks in the input.


-- Owen


lot's of small jobs

2008-10-26 Thread Shirley Cohen

Hi,

I have lot's of small jobs and would like to compute the aggregate  
running time of all the mappers and reducers in my job history rather  
than tally the numbers by hand through the web interface. I know that  
the Reporter object can be used to output performance numbers for a  
single job, but is there a mechanism to do so across multiple jobs?


Thank you,

Shirley




Re: Is there a way to know the input filename at Hadoop Streaming?

2008-10-26 Thread Runping Qi

Each mapper works on only one file split, which is either from file1 or
file2 in your case. So the value for map.input.file gives you the exact
information you need.


Runping
 


On 10/23/08 11:09 AM, Steve Gao [EMAIL PROTECTED] wrote:

 Thanks, Amogh. But my case is slightly different. The command line inputs are
 2 files: file1 and file2. I need to tell in the mapper which line is from
 which file:
 #In mapper
 while (STDIN){
   //how to tell the current line is from file1 or file2?
 }
 
 -jobconfs map.input.file param does not help in this case
 because file1 and file2 are both input.
 
 -Steve
 
 --- On Thu, 10/23/08, Amogh Vasekar [EMAIL PROTECTED] wrote:
 From: Amogh Vasekar [EMAIL PROTECTED]
 Subject: RE: Is there a way to know the input filename at Hadoop Streaming?
 To: [EMAIL PROTECTED]
 Date: Thursday, October 23, 2008, 12:11 AM
 
 Personally haven't worked with streaming but I guess the ur jobconfs
 map.input.file param should do it for you.
 -Original Message-
 From: Steve Gao [mailto:[EMAIL PROTECTED]
 Sent: Thursday, October 23, 2008 7:26 AM
 To: core-user@hadoop.apache.org
 Cc: [EMAIL PROTECTED]
 Subject: Is there a way to know the input filename at Hadoop Streaming?
 
 I am using Hadoop Streaming. The input are multiple files.
 Is there a way to get the current filename in mapper?
 
 For example:
 $HADOOP_HOME/bin/hadoop  \
 jar $HADOOP_HOME/hadoop-streaming.jar \
 -input file1 \
 -input file2 \
 -output myOutputDir \
 -mapper mapper \
 -reducer reducer
 
 In mapper:
 while (STDIN){
   //how to tell the current line is from file1 or file2?
 }
 
 
 
 
   
 
 
 
   



Re: writable class to be used to read floating point values from input?

2008-10-26 Thread pols cut
Thanks ..

I converted the text--string -- Float.

I am trying to calculate the average of a very large set of numbers. You are 
right...I plan to use a dummy key (its not null as i said before) as input to 
reduce. Then in reduce when sorted, i will have a single record as 
key,n1,n2,n3... which i will use to calculate the avg.

Regards,

Pavan







From: Owen O'Malley [EMAIL PROTECTED]
To: core-user@hadoop.apache.org
Sent: Sunday, 26 October, 2008 1:24:43 AM
Subject: Re: writable class to be used to read floating point values from input?


On Oct 25, 2008, at 8:32 PM, pols cut wrote:

 I am trying to write a map reduce function which takes take the  
 following types of key,value pairs


 Map function -- should read floating point values (i dont really  
 care about  key)
 it should output null,floatwritable

If the input is stored in a text file, using TextInputFormat is right.  
Your map inputs will be:

LongWritable, Text

Just use the Text and convert it to a Double.

 reduce -- input- null,floatwritable
output null,floatwritable

This doesn't make any sense. How should the input to the reduce be  
sorted? By the float? In that case, it would be:

FloatWritable, NullWritable

You will get one call to the reduce for each distinct float value the  
maps generate. The reduce can iterate through the NullWritables to see  
how many times that key was generated.

-- Owen



  Bollywood news, movie reviews, film trailers and more! Go to 
http://in.movies.yahoo.com/

local bytes written/read much higher than the hdfs bytes

2008-10-26 Thread 肖欣延
I run map/reduce job through streaming, and notice that local bytes
written/read in my job is always many times higher than the hdfs
bytes? but if i run the job straight through java, this problem goes
away.

Why does this happen? Is it because jvm memory is not enough and use the
disk for cache? or I didn't configure the hadoop right? how can i
configure? By the way, my hadoop version is 0.17.1.


Thanks

xiao xinyan



[ANNOUNCE] Apache ZooKeeper 3.0.0

2008-10-26 Thread Patrick Hunt
The Apache ZooKeeper team is proud to announce our first official Apache 
release, version 3.0.0 of ZooKeeper.


ZooKeeper is a high-performance coordination service for distributed 
applications. It exposes common services - such as naming, configuration 
management, synchronization, and group services - in a simple interface 
so you don't have to write them from scratch. You can use it 
off-the-shelf to implement consensus, group management, leader election, 
and presence protocols. And you can build on it for your own, specific 
needs.


Version 3.0.0 is a major version upgrade from our previous 2.2.1 release 
on SourceForge, it addresses over 100 issues. If you are upgrading be 
sure to review the 3.0.0 release notes for migration instructions.


For ZooKeeper release details and downloads, visit:
http://hadoop.apache.org/zookeeper/releases.html

ZooKeeper 3.0.0 Release Notes and Migration Instructions are at:
http://hadoop.apache.org/zookeeper/docs/r3.0.0/releasenotes.html

Regards,

The ZooKeeper Team