Prasenjit,
Whether this is executed in the map or reduce phase, it
will only produce 'local' sum. To produce global sum you should be able
to do something like this
A = Load ...
DEFINE CMD `script` ship('/a/b/script');
B = Stream A through CMD as (count: long);
C = GROUP B ALL;
D = FOREACH C GENERATE 'Num Rows', SUM(B.count)
Notice the group ALL after streaming is what will ship the counts
computed by your python script in each mapper to a single reducer where
it will be summed to produce a global sum.
Hope that helps
-...@nkur
prasenjit mukherjee wrote:
Apologies if I was not clear enough.
Can I use the following python script in my DEFINE command to compute
number of rows in my relation ( basically same as the SUM command) :
#!/usr/bin/python
import sys
my_sum=0;
for line in sys.stdin:
my_sum+=1
sys.stdout.write(my_sum)
-Prasen
On Thu, Feb 18, 2010 at 2:56 PM, Ankur Goel <[email protected]> wrote:
Depending upon where it is placed in your pig script it will be invoked in
either map or reduce phase.
To get better understanding of your pig script execution plan you can do
this from the grunt shell
explain -script <your-script> -dot -out <dot-output-file>
You can then feed the dot output file into a dot parser to generate the DAG
in jpg/gif format
-...@nkur
prasenjit mukherjee wrote:
Just wondering if I can use the DEFINE command to write my custom
mapper/reducer functions. Mapper ( I believe) I can, but what not sure
about reducer. I guess this depends how the define commands are invoked.
-Prasen