Hi, nested RANK is not supported yet, however it is easy to implement as a UDF. Just sort the records and assign an increasing counter with the UDF. We will probably add support for nested RANK in the next release.
Cheers, -- Gianmarco On Mon, Apr 15, 2013 at 11:10 PM, M G <[email protected]> wrote: > Hi Johnny Zhang: > > > What I am looking for is overall rank and rank within each group. Sorry if > I was not clear. > > What I am looking to get is something like this. > > (1, 1, John, Banking, 20000) > (5, 2, Jane, Banking, 35000) > (3, 1, Asha, Technology, 26000) > (2, 1, Hari, Real Estate, 22000) > (4, 2, Chen, Real Estate, 30000) > > Thanks, > Mythili > > > On Mon, Apr 15, 2013 at 1:58 PM, Johnny Zhang <[email protected]> > wrote: > > > Hi, M G: > > for input data > > John Banking 20000 > > Jane Banking 35000 > > Chen Real Estate 30000 > > Hari Real Estate 22000 > > Asha Technology 26000 > > > > > > a = load '/var/lib/jenkins/income' as (name:chararray, > industry:chararray, > > income:int); > > b = rank a by income; > > c = group b by industry; > > d = foreach c generate flatten(b); > > dump d; > > > > output is: > > (1,John,Banking,20000) > > (5,Jane,Banking,35000) > > (3,Asha,Technology,26000) > > (2,Hari,Real Estate,22000) > > (4,Chen,Real Estate,30000) > > > > Johnny > > > > > > On Mon, Apr 15, 2013 at 1:25 PM, M G <[email protected]> wrote: > > > > > Is there a way to do RANK within a group in PIG 0.11.1? > > > > > > In the following sample dataset, I would like to Rank DESC by Income, > and > > > further RANK by Income for each Industry. > > > > > > Name Industry Income > > > > > > John,Banking, 20,000 > > > Jane, Banking, 35,000 > > > Chen,Real Estate, 30,000 > > > Hari, Real Estate, 22,000 > > > Asha, Technology, 26,000 > > > > > > I tried something like this, but I get syntax error. > > > > > > names_by_ind = group names by industry; > > > > > > rank_by_ind = foreach names_by_ind { > > > results = RANK names BY income DESC; > > > GENERATE flatten(results); > > > } > > > > > >
