Re: Is there a way to know the input filename at Hadoop Streaming?

2008-10-23 Thread Rick Cox
On Wed, Oct 22, 2008 at 18:55, Steve Gao [EMAIL PROTECTED] wrote: I am using Hadoop Streaming. The input are multiple files. Is there a way to get the current filename in mapper? Streaming map tasks should have a map_input_file environment variable like the following:

Re: process limits for streaming jar

2008-06-27 Thread Rick Cox
On Fri, Jun 27, 2008 at 08:57, Chris Anderson [EMAIL PROTECTED] wrote: The problem is that when there are a large number of map tasks to complete, Hadoop doesn't seem to obey the map.tasks.maximum. Instead, it is spawning 8 map tasks per tasktracker (even when I change the

Re: Streaming and subprocess error code

2008-05-14 Thread Rick Cox
for the quick response! I see this feature is in trunk and not available in last stable release. Anyway will try if it works for me from the trunk, and will try does it catch segmentation faults too. Rick Cox wrote: Try -jobconf stream.non.zero.exit.status.is.failure=true

Re: Streaming and subprocess error code

2008-05-13 Thread Rick Cox
Try -jobconf stream.non.zero.exit.status.is.failure=true. That will tell streaming that a non-zero exit is a task failure. To turn that into an immediate whole job failure, I think configuring 0 task retries (mapred.map.max.attempts=1 and mapred.reduce.max.attempts=1) will be sufficient. rick

Re: New user, several questions/comments (MaxMapTaskFailuresPercent in particular)

2008-04-08 Thread Rick Cox
On Tue, Apr 8, 2008 at 12:36 PM, Ian Tegebo [EMAIL PROTECTED] wrote: My original question was about specifying MaxMapTaskFailuresPercent as a job conf parameter on the command line for streaming jobs. Is there a conf setting like the following? mapred.taskfailure.percent The job