I think what makes sense is to give the UDF the control of the error handling by providing three possible outcomes via return code or exception.
(1) SUCCESS - all is good; keep going (2) FAILURE - the processing is failed for the current input. Log warning, replace with NULL, keep going (3) FATAL FAILURE - the processing is failed in such a way that there is no sense to continue. Log an error and abort the processing. One thing that we found running production systems is that bad data is a fact of everyday life and the system needs to be able to continue processing in this case. Olga > -----Original Message----- > From: Stefan Groschupf [mailto:[EMAIL PROTECTED] > Sent: Monday, March 03, 2008 10:48 AM > To: [email protected] > Subject: Re: Proposal for error handling in Pig > > Hi, > clearly A is the only working solution. > If the user wants that his function can fail on some record > the user simply catch and handle the exception inside the > user defined function. > In case a user defined function throws an exception we have > to fail the entire job. > > Stefan > > On Feb 29, 2008, at 4:21 PM, Chris Olston wrote: > > > The trickiest case is where a user-defined function fails > on some, but > > not all inputs. Do you: > > > > a) fail the entire job > > b) log the errors, omit those inputs, and keep on processing > > (questionable course of action in the case of non-monotonic > queries/ > > programs e.g. set difference, as well as aggregation like COUNT) > > c) log the errors, and insert NULLs into the stream > (perhaps the best > > option, especially given that we're going to support nulls > > anyway) > > d) support some subset of (a,b,c) and permit the user to > choose on a > > per-job basis > > > > ? > > > > -Chris > > > > On Feb 29, 2008, at 11:10 AM, Olga Natkovich wrote: > > > >> > >> > >>> -----Original Message----- > >>> From: Benjamin Francisoud [mailto:[EMAIL PROTECTED] > >>> Sent: Friday, February 29, 2008 7:05 AM > >>> To: [email protected] > >>> Subject: Re: Proposal for error handling in Pig > >>> > >>> About Internal Errors, do you consider such code to be > part of them > >>> ? > >>> > >>> public void something(Object object) { > >>> if (o == null) { > >>> throw new IllegalArgumentException("Object can't be null"); > >>> } > >>> ... > >>> } > >>> > >>> class StateMachine { > >>> public void start() {...} > >>> public void end() { > >>> if (startCalled == false) { > >>> throw new IllegalStateException("You didn't call > >>> start()"); > >>> } > >>> } > >>> } > >>> > >>> About user errors, how should we handle them ? > >>> The way I proposed in PIG-100 (1) ? > >> > >> Yes, that's fine. I personally don't see a strong reason > to log the > >> exception stack in this case but I am fine with doing it if others > >> find it helpful. I will update the doc to include this information. > >> > >>> > >>> try { > >>> plan = parser.Parse(); > >>> } catch (ParseException e) { > >>> log.error(e.getMessage()); > >>> log.debug(e); > >>> } > >>> > >>> > >>> > >>> [1] > >>> https://issues.apache.org/jira/browse/PIG-100?focusedCommentId > >>> =12573218#action_12573218 > >>> > >>> Olga Natkovich a écrit : > >>>> Pig developers, > >>>> > >>>> We had many patches submitted that are trying to improve > >>> error handling. > >>>> This is really great as many users ask exactly for that. So > >>> it seems > >>>> timely to establish some guidelines on how errors should be > >>> handled, > >>>> propagated, delivered, etc. > >>>> > >>>> I put together a proposal to start the discussion. Please, > >>> review and > >>>> comment. Once we have an agreement we would need to add > the missing > >>>> pieces to deploy it into Pig and then review the existing > >>> patches to > >>>> make sure they follow the proposed practice. > >>>> > >>>> http://wiki.apache.org/pig/PigDeveloperCookbook > >>>> > >>>> I have also started a general document called Pig Developer > >>> Cookbook > >>>> where we can keep track of development patterns we as a > >>> community want > >>>> to follow. > >>>> > >>>> Thanks again for everybody's contributions! > >>>> > >>>> Olga > >>>> > >>>> > >>> > >>> > > > > -- > > Christopher Olston, Ph.D. > > Sr. Research Scientist > > Yahoo! Research > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > 101tec Inc. > Menlo Park, California, USA > http://www.101tec.com > > >
