Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by Arun C Murthy:
http://wiki.apache.org/pig/PigStreamingFunctionalSpec

------------------------------------------------------------------------------
  
  ===== 4.1.1 Logging =====
  
- Users will have control over handling of `stderr` of their streaming 
application. By default, in case of errors, the full error information would be 
brought to the client and stored in the client side log.
+ Users will have control over handling of `stderr` of their streaming 
application by requesting the `stderr` is stored in DFS both for successful and 
failed jobs. This is done by adding `stderr spec` to the streaming command 
declaration:
  
- In addition, a user can request the `stderr` is stored in DFS both for 
successful and failed jobs. This is done by adding `stderr spec` to the 
streaming command declaration:
- 
  {{{
- define CMD `stream.pl` stderr('stream.stderr')  
+ define CMD `stream.pl` stderr('<dir>' limit 100)  
  }}}
  
- In this case, the streaming `stderr` will be stored in _logs directory in the 
jobs output directory. Note that the same Pig job can have multiple streaming 
applications associated with it. It would be up to the user to make sure that 
different names are used for this to avoid conflicts.
+ In this case, the streaming `stderr` will be stored in _logs/<dir> directory 
in the job's output directory. Note that the same Pig job can have multiple 
streaming applications associated with it. It would be up to the user to make 
sure that different names are used for this to avoid conflicts by passing the 
right directory to the stderr spec.
  
- Pig would store up to '''500''' logs per streaming job in this location. The 
limit is imposed to make sure that we don't create a large number of small 
files in DFS and waste space and name node resources. The user can specify a 
smaller number via `limit` keyword in the `stderr` specL
+ Pig would store logs of upto '''100''' tasks per streaming job in this 
location (so it's 100*4 = 400 logs assuming 4 retries per task). The limit is 
imposed to make sure that we don't create a large number of small files in HDFS 
and waste space and name node resources. The user can specify a smaller number 
via `limit` keyword in the `stderr` spec.
  
  {{{
- define CMD `stream.pl` stderr('stream.stderr' limit 100)  
+ define CMD `stream.pl` stderr('CMD_logs' limit 100)  
  }}}
  
- The logs would only contain stderr information from the streaming 
application. The content will include a header and a footer. The header will 
include task name, start time, input size, input file and input range if 
available. The footer will contain result code, end time, and primary output 
size.  
+ The logs would only contain stderr information from the streaming 
application. The content will include a header and a footer. The header will 
include task name, start time, input size, input file and input range if 
available. The footer will contain result code, end time, and outputs' sizes.  
  
  ===== 4.1.2 Error Handling =====
  

Reply via email to