Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by SanthoshSrinivasan:
http://wiki.apache.org/pig/PigErrorHandling

------------------------------------------------------------------------------
  
   1. Environment issues: file not found, out of disk space, etc.
   2. Bugs in the software: null pointer exceptions, core dumps, out of bound 
access, etc.
-  3. Programmer error: Syntax errors, divide by zero, incorrect use of casts, 
etc.
+  3. User/Programmer error: Syntax errors, divide by zero, incorrect use of 
casts, etc.
  
  Users rely on the error messages to inform them about the source of the error 
along with a reasonable message that will influence the corrective course of 
action. While most errors cannot be handled in the system, at the least they 
should be reported in a reliable and readable manner.
  
@@ -18, +18 @@

  
  Using the approach mentioned in ref1, Pig can be divided into three 
components for the purpose of error handling. A schematic view of the system is 
illustrated via the diagram.
  
- attachment:SchematicDiagaramOfPig
- 
   1. The user interface. This could be the grunt shell or the command line 
execution of a script or using Pig via the Java APIs
   2. Pig
   3. The backend execution framework, i.e., Hadoop
+ 
+ attachment:Schematic.jpg
  
  Grunt is an interactive shell that allows users to submit Pig commands. The 
command line offers a mechanism for batch mode execution via scripts. The Java 
APIs provide a programmatic mechanism of accessing Pig. Irrespective of the 
mechanism, the control and data flow through Pig which in turn uses Hadoop as 
the execution framework. Errors could occur within each system and across 
systems.
  
@@ -31, +31 @@

  
  === Early error detection ===
  
- Errors that occur in each system should be caught as early as possible. A few 
examples that demonstrate this behavior are: 
+ Errors that occur in each system should be caught as early as possible. Pig 
relies on Hadoop for run time execution. Detection and reporting errors early 
will improve turnaround time by avoiding invoking Hadoop till most errors are 
fixed. A few examples that demonstrate this behavior are: 
  
   1. Syntax errors. E.g.: Missing ';'
   2. Semantic errors. E.g: Mismatch in cogroup arity
   3. Validation errors. E.g: Type mismatch when trying to add a string to an 
integer
- 
- Pig relies on Hadoop for run time execution. Detection and reporting errors 
early will improve turnaround time.
  
  === Error reporting ===
  
@@ -47, +45 @@

  
   1. Users are responsible for purging error logs
   2. Users will be able switch on/off the detailed error messages on STDERR.
-  3. Since Pig depends on Hadoop for execution, Hadoop error messages will be 
reported by Pig
+  3. Since Pig depends on Hadoop for execution, Hadoop error messages will be 
reported by Pig. An error during execution due to a bug in Pig will be shown 
differently from that of an error in Hadoop itself.
  
  
  === Warning message aggregation ===

Reply via email to