Prasenjit,
Whether this is executed in the map or reduce phase, it will only produce 'local' sum. To produce global sum you should be able to do something like this

A = Load ...
DEFINE CMD `script` ship('/a/b/script');
B = Stream A through CMD  as (count: long);
C = GROUP B ALL;
D = FOREACH C GENERATE 'Num Rows', SUM(B.count)

Notice the group ALL after streaming is what will ship the counts computed by your python script in each mapper to a single reducer where it will be summed to produce a global sum.

Hope that helps

-...@nkur

prasenjit mukherjee wrote:
Apologies if I was not clear enough.

Can I use the following python script in my  DEFINE command to compute
number of rows in my relation ( basically same as the SUM command) :

#!/usr/bin/python
import sys
my_sum=0;
for line in sys.stdin:
  my_sum+=1
sys.stdout.write(my_sum)

-Prasen

On Thu, Feb 18, 2010 at 2:56 PM, Ankur Goel <[email protected]> wrote:

Depending upon where it is placed in your pig script it will be invoked in
either map or reduce phase.
To get better understanding of your pig script execution plan you can do
this from the grunt shell

explain -script <your-script> -dot -out <dot-output-file>

You can then feed the dot output file into a dot parser to generate the DAG
in jpg/gif format

-...@nkur


prasenjit mukherjee wrote:

Just wondering if I can use the DEFINE command to write my custom
mapper/reducer functions. Mapper ( I believe)   I can, but what not sure
about  reducer.  I guess this depends how the define commands are invoked.

-Prasen



Reply via email to