Thanks Jacob for the response.

If I run the UDF on each tuple then how can I preserve the state of the rank
variable. I mean the UDF won't be able to save the rank value between calls,
right? Correct me if I am wrong in interpreting that the UDF would be
invoked for each tuple.

What I am looking in my output is an additional column indicating the rank.
Something like

Hick    35      1
Jimmy   30    2
Jack    25      3
Tampa   22    4
Sam     20     5

Thanks.

Arun


On Tue, Apr 26, 2011 at 7:18 PM, Jacob Perkins <[email protected]>wrote:

> The question is, do you need the entire relation all at once to assign a
> rank? If so then map-reduce may not be the answer. If not, why not just
> run the UDF on each tuple of the relation, one at a time, with a
> projection?
>
> If you need some global information, such as the max and min score, then
> you might look at the MAX and MIN operations. They do require a GROUP
> ALL but are algebraic so it's not actually going to bring all the data
> to one machine as it otherwise would.
>
> --jacob
> @thedatachef
>
>
> On Tue, 2011-04-26 at 19:07 -0700, Arun A K wrote:
> > Hi
> >
> > I have the following input relation:
> > Name Score
> > Jack    25
> > Jimmy   30
> > Sam     20
> > Hick    35
> > Tampa   22
> >
> > My goal is to rank the tuples by score.
> >
> > Pig script:
> >
> > sample_data = LOAD 'sample.txt' USING PigStorage()   AS (name:chararray,
> > score:int);
> > sample_data_group = GROUP sample_data BY score;
> > sample_data_count = FOREACH sample_data_group GENERATE group AS score,
> > COUNT(sample_data.name) AS countVal;
> > sample_data_order = ORDER sample_data_count BY score DESC;
> > sample_data_group_all = GROUP sample_data_order all;
> > sample_data_project = FOREACH sample_data_group_all GENERATE
> > FLATTEN(myUDF.Rank(sample_data_order));
> > dump sample_data_project;
> >
> > Can someone please point me to a UDF example where a relation is read in
> and
> > iterated over all its tuples? I plan to iterate over the tuples and
> assign a
> > rank to each of them based on the score value.
> >
> > Is there any other way to generate rank?
> >
> > Thanks much.
> >
> > Arun
>
>
>

Reply via email to