Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 

The following page has been changed by SanthoshSrinivasan:

New page:
#format wiki
#language en


This document describes the design for the error handling feature in Pig. The 
design follows the [#errReq requirements] and [#errFuncSpec functional 
specification]. The error handling design is influenced by Mika Raento's 
excellent online [#mika resource].

Pig's architecture is designed to support several back-ends. Currently, the 
only supported back-end is Hadoop. The front-end is ideally back-end agnostic. 
For a breakdown of the front-end components refer to the [#errFuncSpec 
functional specification].

== Error Handling ==

=== Front-end ===

`PigException` will serve as the base class for all the frontend exceptions. 
`PigException` will also be the exception thrown by Pig to external systems. 
Presently, the Pig APIs throw `IOException`. As a result, `PigException` will 
extend `IOException` in order to maintain backward compatibility. 
`FrontendException` will extend `PigException` and serve as the umbrella for 
all front-end exceptions. The task specific exceptions from the front-end 
components will subclass `FrontendException` to ensure clarity and enable 
extensions in the future.

`PigException` contains the following attributes and methods.

==== Attributes ====

   1. retriable: A boolean variable to indicate if an exception is retriable or 
   2. errorSouce: An enum to represent the source of the error. The enum can be 
extended in the future to include more information like sub-component. The 
values for this enum type will be:
      i. User input
      ii. Bug
      iii. User environment
      iv. Remote environment
   3. errorCode: An integer that represents the error. Used for documentation 
and automation
   4. detailedMsg: A string that holds detailed information that is pertinent 
to the Pig developer. It will contain details that are not required by the user

==== Methods ====
   1. retirable: Return true if the exception is retirable; false otherwise
   2. getErrorCode: Returns the error code associated with this exception
   3. getDetailedMessage: Returns the string detailedMsg

==== Methods of interest from IOException and its super classes ====
   1. getMessage: User facing message will be reported using getMessage()
   2. getCause: When exceptions are chained, the cause of each exception is 
retrieved using getCause()
   3. getStackTrace: Useful for determining the root cause and contains details 
required by the developer
   4. initCause: Used for chaining exceptions

As mentioned earlier, the source of the exception is classified into the four 
categories. Each exception should report the appropriate source based on the 
context. Absence of the source will be treated as a bug by default.

=== Back-end ===
   Hadoop reports errors via strings instead of exceptions. Since Hadoop does 
not support any APIs to query the exception thrown, Pig will make an assumption 
on the format of the string. The error messages contain the string format of an 
exception, i.e., name of the exception along with an error message followed by 
the stack trace. Pig will parse these error messages and report the appropriate 
error message and source (i.e., stack trace). An example of an error string 
reported by Hadoop is shown below.

java.lang.RuntimeException: Unexpected data type 74 found in stream.            
        at org.apache.pig.builtin.BinStorage.bytesToInteger(

   Hadoop reports multiple failures due to retries. The retries have to be 
ignored and only the final failure has to be reported.

== Warning message aggregation ==

=== Back-end ===

Hadoop provides the ability to aggregate counters for the entire application. 
The change in counter values has to be performed via the Hadoop reporter. A new 
interface, `PigLogger` will be used to abstract logging of warning messages. A 
back-end specific `PigHadoopLogger` will implement this interface and provide 
the functionality of warning message aggregation using Hadoop counters if the 
warning message aggregation is turned on. If the warning message aggregation is 
turned off, the warning messages are sent to STDERR which will appear in 
Hadoop's STDERR logs.

=== Front-end ===

Currently, the type checker uses a collector to collect error and warning 
messages. The use of the collector has to be extended for each subsystem in the 

== Open questions ==

   1. `ParseException` is throw by the parser. Ensuring that `ParseException` 
is subclassed from `FrontendException` requires the generated 
`` file to be checked into the source repository.
   2. Lexical errors in Grunt will result in a `TokenMgrError`, resulting in an 
exit from Grunt. What should be the strategy around lexical error handling.

== References ==

 1. [[Anchor(errReq)]] Santhosh Srinivasan, Pig Error Handling Requirements 
October 30, 2008,
 2. [[Anchor(errFuncSpec)]] Santhosh Srinivasan, Pig Error Handling Functional 
Specification, November 26, 2008,
 3. [[Anchor(mika)]] Mika Raento, "What should Exceptions look like?" July 30, 

Reply via email to