Hi Mark, It depends on the operations. For eg, one might want to aggregate based on a certain field - in M/R it would be implemented by writing out a key value pair from the mapper, and implement the aggregation function in reducer, say Count or Sum based on the key.
To answer your question, you would typically use "group by" a certain field in the tuple and that would the key on which the reducers operate. For eg, A = load 'input' as userid, accnt; B = group A by user; C = foreach B generate group, COUNT(A); In this example the user field is the key. It's equivalent to a context.write(user, 1) in the map function of plain MR (generally speaking) Sent from my iPhone On Mar 22, 2013, at 12:39 PM, Mark <[email protected]> wrote: > In map/reduce all values for 1 key are guaranteed to go to the same reducer. > Is there something analogous to this in Pig? If so, what determines the key > when I output a bunch of tuples?
