the patch for the parse time out is better in a way as it won't restart the
task. The options you specified will try the map or reduce a couple of times
before skipping the problematic entries. It is a good practice to use it for
the parsing anyway

On 13 July 2010 19:32, brad <b...@bcs-mail.net> wrote:

> This may be a dumb question, but can you accomplish the same thing by
> placing the following code in mapred-site.xml.  Or did I misunderstand the
> fix...
>
> <property>
>  <name>mapred.skip.attempts.to.start.skipping</name>
>  <value>2</value>
>  <!-- default: 2 -->
>  <description>
>        The number of Task attempts AFTER which skip mode will be kicked
> off. When skip mode is kicked off,
>        the tasks reports the range of records which it will process next,
> to the TaskTracker. So that on failures,
>        TT knows which ones are possibly the bad records. On further
> executions, those are skipped.
>  </description>
> </property>
>
> <property>
>  <name>mapred.skip.map.max.skip.records</name>
>  <value>1</value>
>  <!-- default: 0 -->
>  <description>
>    The number of acceptable skip records surrounding the bad record PER bad
> record
>        in mapper. The number includes the bad record as well. To turn the
> feature of detection/skipping
>        of bad records off, set the value to 0. The framework tries to
> narrow down the skipped range by
>        retrying until this threshold is met OR all attempts get exhausted
> for this task. Set the value
>        to Long.MAX_VALUE to indicate that framework need not try to narrow
> down.
>        Whatever records(depends on application) get skipped are acceptable.
>  </description>
> </property>
>
> Brad
>
>


-- 
DigitalPebble Ltd

Open Source Solutions for Text Engineering
http://www.digitalpebble.com

Reply via email to