Re: Is there a way to know the input filename at Hadoop Streaming?

2008-10-26 Thread Runping Qi

Each mapper works on only one file split, which is either from file1 or
file2 in your case. So the value for map.input.file gives you the exact
information you need.


Runping
 


On 10/23/08 11:09 AM, Steve Gao [EMAIL PROTECTED] wrote:

 Thanks, Amogh. But my case is slightly different. The command line inputs are
 2 files: file1 and file2. I need to tell in the mapper which line is from
 which file:
 #In mapper
 while (STDIN){
   //how to tell the current line is from file1 or file2?
 }
 
 -jobconfs map.input.file param does not help in this case
 because file1 and file2 are both input.
 
 -Steve
 
 --- On Thu, 10/23/08, Amogh Vasekar [EMAIL PROTECTED] wrote:
 From: Amogh Vasekar [EMAIL PROTECTED]
 Subject: RE: Is there a way to know the input filename at Hadoop Streaming?
 To: [EMAIL PROTECTED]
 Date: Thursday, October 23, 2008, 12:11 AM
 
 Personally haven't worked with streaming but I guess the ur jobconfs
 map.input.file param should do it for you.
 -Original Message-
 From: Steve Gao [mailto:[EMAIL PROTECTED]
 Sent: Thursday, October 23, 2008 7:26 AM
 To: core-user@hadoop.apache.org
 Cc: [EMAIL PROTECTED]
 Subject: Is there a way to know the input filename at Hadoop Streaming?
 
 I am using Hadoop Streaming. The input are multiple files.
 Is there a way to get the current filename in mapper?
 
 For example:
 $HADOOP_HOME/bin/hadoop  \
 jar $HADOOP_HOME/hadoop-streaming.jar \
 -input file1 \
 -input file2 \
 -output myOutputDir \
 -mapper mapper \
 -reducer reducer
 
 In mapper:
 while (STDIN){
   //how to tell the current line is from file1 or file2?
 }
 
 
 
 
   
 
 
 
   



[Help needed] Is there a way to know the input filename at Hadoop Streaming?

2008-10-23 Thread Steve Gao
Sorry for the email. Thanks for any help or hint.

    I am using Hadoop Streaming. The input are multiple files.
    Is there a way to get the current filename in mapper?

    For example:
    $HADOOP_HOME/bin/hadoop  \
    jar $HADOOP_HOME/hadoop-streaming.jar \
    -input file1 \
    -input file2 \
    -output myOutputDir \
    -mapper mapper \
    -reducer reducer

    In mapper:
    while (STDIN){
  //how to tell the current line is from file1 or file2?
    }



  

RE: Is there a way to know the input filename at Hadoop Streaming?

2008-10-23 Thread Steve Gao
Thanks, Amogh. But my case is slightly different. The command line inputs are 2 
files: file1 and file2. I need to tell in the mapper which line is from which 
file:
#In mapper
while (STDIN){
  //how to tell the current line is from file1 or file2?
}

-jobconfs map.input.file param does not help in this case 
because file1 and file2 are both input.

-Steve

--- On Thu, 10/23/08, Amogh Vasekar [EMAIL PROTECTED] wrote:
From: Amogh Vasekar [EMAIL PROTECTED]
Subject: RE: Is there a way to know the input filename at Hadoop Streaming?
To: [EMAIL PROTECTED]
Date: Thursday, October 23, 2008, 12:11 AM

Personally haven't worked with streaming but I guess the ur jobconfs
map.input.file param should do it for you.
-Original Message-
From: Steve Gao [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 23, 2008 7:26 AM
To: core-user@hadoop.apache.org
Cc: [EMAIL PROTECTED]
Subject: Is there a way to know the input filename at Hadoop Streaming?

I am using Hadoop Streaming. The input are multiple files.
Is there a way to get the current filename in mapper?

For example:
$HADOOP_HOME/bin/hadoop  \
jar $HADOOP_HOME/hadoop-streaming.jar \
-input file1 \
-input file2 \
-output myOutputDir \
-mapper mapper \
-reducer reducer

In mapper:
while (STDIN){
  //how to tell the current line is from file1 or file2?
}




  



  

Re: [Help needed] Is there a way to know the input filename at Hadoop Streaming?

2008-10-23 Thread Zhengguo 'Mike' SUN
I guess one trick you can do without the help of hadoop is to encode the file 
identifier inside the file itself. For example, each line of file1 could start 
with 1'space''content of the original line'.



- Original Message 
From: Steve Gao [EMAIL PROTECTED]
To: core-user@hadoop.apache.org
Cc: [EMAIL PROTECTED]
Sent: Thursday, October 23, 2008 1:48:11 PM
Subject: [Help needed] Is there a way to know the input filename at Hadoop 
Streaming?

Sorry for the email. Thanks for any help or hint.

I am using Hadoop Streaming. The input are multiple files.
Is there a way to get the current filename in mapper?

For example:
$HADOOP_HOME/bin/hadoop  \
jar $HADOOP_HOME/hadoop-streaming.jar \
-input file1 \
-input file2 \
-output myOutputDir \
-mapper mapper \
-reducer reducer

In mapper:
while (STDIN){
  //how to tell the current line is from file1 or file2?
}


  

Re: Is there a way to know the input filename at Hadoop Streaming?

2008-10-23 Thread Rick Cox
On Wed, Oct 22, 2008 at 18:55, Steve Gao [EMAIL PROTECTED] wrote:
 I am using Hadoop Streaming. The input are multiple files.
 Is there a way to get the current filename in mapper?


Streaming map tasks should have a map_input_file environment
variable like the following:

map_input_file=hdfs://HOST/path/to/file

rick

 For example:
 $HADOOP_HOME/bin/hadoop  \
 jar $HADOOP_HOME/hadoop-streaming.jar \
-input file1 \
-input file2 \
-output myOutputDir \
-mapper mapper \
-reducer reducer

 In mapper:
 while (STDIN){
  //how to tell the current line is from file1 or file2?
 }







Is there a way to know the input filename at Hadoop Streaming?

2008-10-22 Thread Steve Gao
I am using Hadoop Streaming. The input are multiple files.
Is there a way to get the current filename in mapper?

For example:
$HADOOP_HOME/bin/hadoop  \
jar $HADOOP_HOME/hadoop-streaming.jar \
-input file1 \
-input file2 \
-output myOutputDir \
-mapper mapper \
-reducer reducer

In mapper:
while (STDIN){
  //how to tell the current line is from file1 or file2?
}