Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by Shravan Narayanamurthy:
http://wiki.apache.org/pig/PigReporting

New page:
= Reporting in Pig =
Hadoop has a notion that jobs that do not report progress might be stuck and 
will kill it after a timeout. To adopt this into Pig Map Reduce, where we 
exectue operator plans, we need to integrate this notion of reporting into Pig 
Operators. For supporting this, we want to propose a reporter interface that 
any backend in pig can use. The main method required by any reporter as of now 
is to report progress probably with a status msg. So the following should be 
enough for now. We can add other things on a need basis.

{{{
public interface PigProgressable {
    //Use to just inform that you are
    //alive
    public void progress();
    
    //If you have a status to report
    public void progress(String msg);
}
}}}

== Changes to current code ==
In the mapReduceLayer, we have to implement this interface with a class that 
wraps the Hadoop reporter. To minimize changes to the code, I am planning to 
have a static variable in PhysicalOperator class which will be set by the map & 
reduce functions at the beginning. Changes to the currently implemented 
operators would involve adding a call to the progress method of this static 
variable as soon as it starts processing a tuple. For ex., in POFilter, the 
current code would be changed to:

Current:
{{{
while (true) {
            inp = processInput();
            if (inp.returnStatus == POStatus.STATUS_EOP
                    || inp.returnStatus == POStatus.STATUS_ERR)
                break;
...
}}}

Changed to:
{{{
while (true) {
            inp = processInput();
            PhysicalOperator.reporter.progress();
            if (inp.returnStatus == POStatus.STATUS_EOP
                    || inp.returnStatus == POStatus.STATUS_ERR)
                break;
...
}}}

Reply via email to