the patch for the parse time out is better in a way as it won't restart the task. The options you specified will try the map or reduce a couple of times before skipping the problematic entries. It is a good practice to use it for the parsing anyway
On 13 July 2010 19:32, brad <b...@bcs-mail.net> wrote: > This may be a dumb question, but can you accomplish the same thing by > placing the following code in mapred-site.xml. Or did I misunderstand the > fix... > > <property> > <name>mapred.skip.attempts.to.start.skipping</name> > <value>2</value> > <!-- default: 2 --> > <description> > The number of Task attempts AFTER which skip mode will be kicked > off. When skip mode is kicked off, > the tasks reports the range of records which it will process next, > to the TaskTracker. So that on failures, > TT knows which ones are possibly the bad records. On further > executions, those are skipped. > </description> > </property> > > <property> > <name>mapred.skip.map.max.skip.records</name> > <value>1</value> > <!-- default: 0 --> > <description> > The number of acceptable skip records surrounding the bad record PER bad > record > in mapper. The number includes the bad record as well. To turn the > feature of detection/skipping > of bad records off, set the value to 0. The framework tries to > narrow down the skipped range by > retrying until this threshold is met OR all attempts get exhausted > for this task. Set the value > to Long.MAX_VALUE to indicate that framework need not try to narrow > down. > Whatever records(depends on application) get skipped are acceptable. > </description> > </property> > > Brad > > -- DigitalPebble Ltd Open Source Solutions for Text Engineering http://www.digitalpebble.com