Re: General Pig store questions

Prashant Kommireddi Fri, 22 Mar 2013 13:34:14 -0700

Hi Mark,

It depends on the operations. For eg, one might want to aggregate
based on a certain field - in M/R it would be implemented by writing
out a key value pair from the mapper, and implement the aggregation
function in reducer, say Count or Sum based on the key.

To answer your question, you would typically use "group by" a certain
field in the tuple and that would the key on which the reducers
operate. For eg,

A = load 'input' as userid, accnt;
B = group A by user;
C = foreach B generate group, COUNT(A);

In this example the user field is the key. It's equivalent to a
context.write(user, 1) in the map function of plain MR (generally
speaking)

Sent from my iPhone

On Mar 22, 2013, at 12:39 PM, Mark <[email protected]> wrote:

> In map/reduce all values for 1 key are guaranteed to go to the same reducer. 
> Is there something analogous to this in Pig? If so, what determines the key 
> when I output a bunch of tuples?

Re: General Pig store questions

Reply via email to