[Pig Wiki] Update of "PigErrorHandlingDesign" by SanthoshSrinivasan

Apache Wiki Mon, 08 Dec 2008 12:36:39 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.


The following page has been changed by SanthoshSrinivasan:
http://wiki.apache.org/pig/PigErrorHandlingDesign

------------------------------------------------------------------------------
     1. retriable: A boolean variable to indicate if an exception is retriable 
or not
     2. errorSouce: An enum to represent the source of the error. The enum can 
be extended in the future to include more information like sub-component. The 
values for this enum type will be:
        i. User input
-       ii. Bug
+       i. Bug
-       iii. User environment
+       i. User environment
-       iv. Remote environment
+       i. Remote environment
     3. errorCode: An integer that represents the error. Used for documentation 
and automation
     4. detailedMsg: A string that holds detailed information that is pertinent 
to the Pig developer. It will contain details that are not required by the user
  
@@ -45, +45 @@

  As mentioned earlier, the source of the exception is classified into the four 
categories. Each exception should report the appropriate source based on the 
context. Absence of the source will be treated as a bug by default.
  
  === Back-end ===
-    Hadoop reports errors via strings instead of exceptions. Since Hadoop does 
not support any APIs to query the exception thrown, Pig will make an assumption 
on the format of the string. The error messages contain the string format of an 
exception, i.e., name of the exception along with an error message followed by 
the stack trace. Pig will parse these error messages and report the appropriate 
error message and source (i.e., stack trace). An example of an error string 
reported by Hadoop is shown below.
+    Hadoop reports errors via strings instead of exceptions. Since Hadoop does 
not support any APIs to query the exception thrown, Pig will make an assumption 
on the format of the string. The error messages contain the string format of an 
exception, i.e., name of the exception along with an error message followed by 
the stack trace. Pig will parse these error messages and report the appropriate 
error message and source (i.e., stack trace). An example of an error string 
reported by Hadoop is shown below. The distinction between a Hadoop error 
versus a Pig error will be made by looking for the source in the first element 
of the stack trace.
  
  {{{#!java
  java.lang.RuntimeException: Unexpected data type 74 found in stream.          
                                         
@@ -62, +62 @@

          at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)          
                    
  }}}
  
-    Hadoop reports multiple failures due to retries. The retries have to be 
ignored and only the final failure has to be reported.
+    Hadoop reports multiple failures due to retries. The retries have to be 
ignored and only the final failure has to be reported. The number of retries 
will be reported. 
  
  == Warning message aggregation ==
  
  === Back-end ===
  
- Hadoop provides the ability to aggregate counters for the entire application. 
The change in counter values has to be performed via the Hadoop reporter. A new 
interface, `PigLogger` will be used to abstract logging of warning messages. A 
back-end specific `PigHadoopLogger` will implement this interface and provide 
the functionality of warning message aggregation using Hadoop counters if the 
warning message aggregation is turned on. If the warning message aggregation is 
turned off, the warning messages are sent to STDERR which will appear in 
Hadoop's STDERR logs.
+ Hadoop provides the ability to aggregate counters for the entire application. 
The change in counter values has to be performed via the Hadoop reporter. A new 
interface, `PigLogger` will be used to abstract logging of warning messages. A 
back-end specific `PigHadoopLogger` will implement this interface and provide 
the functionality of warning message aggregation using Hadoop counters if the 
warning message aggregation is turned on. The `EvalFunc` class will include a 
new `PigHadoopLogger` attribute to allow UDF authors to log and aggregate 
warning messages. The key for the warning aggregation will be one of many 
pre-defined keys in Pig. If the warning message aggregation is turned off, the 
warning messages are sent to STDERR which will appear in Hadoop's STDERR logs.
  
  attachment:PigLogger.jpg
+ 
+ {{{#!java
+ ...
+ public abstract class EvalFunc<T>  {
+     // UDFs must use this to report progress
+     // if the exec is taking more that 300 ms
+     protected PigProgressable reporter;
+    
+     // UDFs must use this to log and aggregate
+     // warning messages
+     protected PigHadoopLogger log;
+ ...
+ }
+ }}}
  
  === Front-end ===
  
@@ -79, +93 @@

  
  == Open questions ==
  
-    1. `ParseException` is throw by the parser. Ensuring that `ParseException` 
is subclassed from `FrontendException` requires the generated 
`ParseException.java` file to be checked into the source repository.
+    1. `ParseException` is throw by the parser. Ensuring that `ParseException` 
is subclassed from `FrontendException` requires the generated 
`ParseException.java` file to be checked into the source repository. Instead, 
the `ParseException` is wrapped inside `PigParseException` and rethrown.
-    2. Lexical errors in Grunt will result in a `TokenMgrError`, resulting in 
an exit from Grunt. What should be the strategy around lexical error handling.
+    2. Lexical errors in Grunt will result in a `TokenMgrError`, resulting in 
an exit from Grunt.
+    3. Error messages reported by the Parser will not be overridden with 
custom error messages till we move to a bottom up parser.
  
  == References ==

[Pig Wiki] Update of "PigErrorHandlingDesign" by SanthoshSrinivasan

Reply via email to