So, to give this a little more detail - Pig currently will fail a 1PB map reduce if one record is malformed. In most use cases, that is insane behavior. The ON ERROR proposal lets you handle errors in a reasonable manner: specify thresholds to fail at, and split errant records off into another relation to study later.
On Friday, December 20, 2013, Russell Jurney wrote: > http://wiki.apache.org/pig/PigErrorHandlingInScripts > https://issues.apache.org/jira/plugins/servlet/mobile#issue/PIG-2620 > > On Friday, December 20, 2013, Ruslan Al-Fakikh wrote: > >> Hi Russell, >> >> Could you be more specific. What would this operator do? >> Does it have something to do with control logic? (Like IF/ELSE, WHILE, >> etc) >> AFAIK, those are not present in Pig because it would make Pig less clean. >> >> Thanks >> >> >> On Sat, Dec 21, 2013 at 12:31 AM, Russell Jurney >> <[email protected]>wrote: >> >> > Does anyone think ON ERROR will ever get built into Pig? Would be so >> cool, >> > put pig above all other data flow tools in sophistication for large ETL. >> > >> > I would work on that, if someone would pay me to do it. >> > >> > >> > -- >> > Russell Jurney twitter.com/rjurney [email protected] >> > datasyndrome.com >> > >> > > > -- > Russell Jurney twitter.com/rjurney [email protected]<javascript:_e({}, > 'cvml', '[email protected]');> > datasyndrome.com > -- Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
