I think what makes sense is to give the UDF the control of the error handling 
by providing three possible outcomes via return code or exception.

(1) SUCCESS - all is good; keep going
(2) FAILURE - the processing is failed for the current input. Log warning, 
replace with NULL, keep going
(3) FATAL FAILURE - the processing is failed in such a way that there is no 
sense to continue. Log an error and abort the processing.

One thing that we found running production systems is that bad data is a fact 
of everyday life and the system needs to be able to continue processing in this 
case.

Olga 

> -----Original Message-----
> From: Stefan Groschupf [mailto:[EMAIL PROTECTED] 
> Sent: Monday, March 03, 2008 10:48 AM
> To: [email protected]
> Subject: Re: Proposal for error handling in Pig
> 
> Hi,
> clearly A is the only working solution.
> If the user wants that his function can fail on some record 
> the user simply catch and handle the exception inside the 
> user defined function.
> In case a user defined function throws an exception we have 
> to fail the entire job.
> 
> Stefan
> 
> On Feb 29, 2008, at 4:21 PM, Chris Olston wrote:
> 
> > The trickiest case is where a user-defined function fails 
> on some, but 
> > not all inputs. Do you:
> >
> > a) fail the entire job
> > b) log the errors, omit those inputs, and keep on processing 
> > (questionable course of action in the case of non-monotonic 
> queries/ 
> > programs e.g. set difference, as well as aggregation like COUNT)
> > c) log the errors, and insert NULLs into the stream 
> (perhaps the best 
> > option, especially given that we're going to support nulls
> > anyway)
> > d) support some subset of (a,b,c) and permit the user to 
> choose on a 
> > per-job basis
> >
> > ?
> >
> > -Chris
> >
> > On Feb 29, 2008, at 11:10 AM, Olga Natkovich wrote:
> >
> >>
> >>
> >>> -----Original Message-----
> >>> From: Benjamin Francisoud [mailto:[EMAIL PROTECTED]
> >>> Sent: Friday, February 29, 2008 7:05 AM
> >>> To: [email protected]
> >>> Subject: Re: Proposal for error handling in Pig
> >>>
> >>> About Internal Errors, do you consider such code to be 
> part of them 
> >>> ?
> >>>
> >>> public void something(Object object) {
> >>>    if (o == null) {
> >>>        throw new IllegalArgumentException("Object can't be null");
> >>>    }
> >>>    ...
> >>> }
> >>>
> >>> class StateMachine {
> >>>    public void start() {...}
> >>>    public void end() {
> >>>        if (startCalled == false) {
> >>>            throw new IllegalStateException("You didn't call 
> >>> start()");
> >>>        }
> >>>    }
> >>> }
> >>>
> >>> About user errors, how should we handle them ?
> >>> The way I proposed in PIG-100 (1) ?
> >>
> >> Yes, that's fine. I personally don't see a strong reason 
> to log the 
> >> exception stack in this case but I am fine with doing it if others 
> >> find it helpful. I will update the doc to include this information.
> >>
> >>>
> >>> try {
> >>>    plan = parser.Parse();
> >>> } catch (ParseException e) {
> >>>    log.error(e.getMessage());
> >>>    log.debug(e);
> >>> }
> >>>
> >>>
> >>>
> >>> [1]
> >>> https://issues.apache.org/jira/browse/PIG-100?focusedCommentId
> >>> =12573218#action_12573218
> >>>
> >>> Olga Natkovich a écrit :
> >>>> Pig developers,
> >>>>
> >>>> We had many patches submitted that are trying to improve
> >>> error handling.
> >>>> This is really great as many users ask exactly for that. So
> >>> it seems
> >>>> timely to establish some guidelines on how errors should be
> >>> handled,
> >>>> propagated, delivered, etc.
> >>>>
> >>>> I put together a proposal to start the discussion. Please,
> >>> review and
> >>>> comment. Once we have an agreement we would need to add 
> the missing 
> >>>> pieces to deploy it into Pig and then review the existing
> >>> patches to
> >>>> make sure they follow the proposed practice.
> >>>>
> >>>> http://wiki.apache.org/pig/PigDeveloperCookbook
> >>>>
> >>>> I have also started a general document called Pig Developer
> >>> Cookbook
> >>>> where we can keep track of development patterns we as a
> >>> community want
> >>>> to follow.
> >>>>
> >>>> Thanks again for everybody's contributions!
> >>>>
> >>>> Olga
> >>>>
> >>>>
> >>>
> >>>
> >
> > --
> > Christopher Olston, Ph.D.
> > Sr. Research Scientist
> > Yahoo! Research
> >
> >
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 101tec Inc.
> Menlo Park, California, USA
> http://www.101tec.com
> 
> 
> 

Reply via email to