Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/PigPerformance

------------------------------------------------------------------------------
        * by using tools
        * through code inspection
  
+ === Pig Streaming Performance
+ 
+ This section has been added on 5/30/08 to provide initial performance numbers 
for newly implemented streaming. The tests had a different setup than other 
tests:
+ 
+ (1) The same type of data as for other tests but 100GB in size
+ (2) The tests ran on 100 machines with 2 map and 2 reduce slots and 500 MB 
per task
+ (3) The tests ran against Hadoop 16 cluster
+ 
+ See PigStreamingFunctionalSpec for details of streaming.
+ 
+ ==== Test Cases ====
+ 
+ The following scripts and corresponding hadoop streaming jobs were executed.
+ 
+ ===== Load/Store =====
+ 
+ This is just to establish baseline
+ 
+ {{{
+ IP = load '/pig/in'; 
+ store IP into '/pig/out';
+ }}}
+ 
+ With binary optimization turned on
+ 
+ {{{
+ IP = load '/pig/in' split by file; 
+ store IP into '/pig/out';
+ }}}
+ 
+ ===== Load/Stream/Store =====
+ 
+ {{{
+ define CMD `filter.pl` ship('./filter.pl'); 
+ IP = load '/pig/in'; 
+ OP = stream IP through CMD; 
+ store OP into '/pig/out';
+ }}}
+ 
+ `filter.pl` implements the same filtering as the one in 
Load/Filter/Stream/Store test case.
+ 
+ We also run this with optimization turned on:
+ 
+ {{{
+ define CMD `filter.pl` ship('./filter.pl'); 
+ IP = load '/pig/in' split by file; 
+ OP = stream IP through CMD; 
+ store OP into '/pig/out';
+ }}}
+ 
+ 
+ ===== Load/Filter/Stream/Store =====
+ 
+ {{{
+ IP = load '/pig/in';
+ FILTERED_DATA = filter IP by $1 > '0';
+ OP = stream IP through `perl -ne 'print $_;'`; 
+ store OP into '/pig/out';
+ }}}
+ 
+ ===== Hadoop Streaming =====
+ 
+ Hadoop streaming code mimiced behavior of Load/Filter/Stream/Store.
+ 
+ ==== Performance Numbers ====
+ 
+ || Test || Time (sec) ||
+ || Load/Store || 1464 ||
+ || Load/Store optimized || 423 ||
+ || Load/Stream/Store || 1683 ||
+ || Load/Stream/Store optimized || 773 ||
+ || Load/Filter/Stream/Store || 1673 ||
+ || Hadoop || 810 ||
+ 
+ Note that last 4 test cases produce exactly the same data and so their timing 
can bi directly compared.
+ 

Reply via email to