Sure, just join your total counts with your partials on gender. D
On Tue, Oct 11, 2011 at 11:58 PM, Marco Cadetg <[email protected]> wrote: > D'oh I just see that unfortunately my example was a bit over simplified. > The > total needs to be grouped by another field like below. > > A = LOAD 'student' USING PigStorage() AS (name:chararray, region:chararray, > gender:charrarray, iq:int); > DUMP A; > (Eva, There, Female,500) > (John, There, Male, 10) > (Alf, There, Male, 10) > (ET, There, Male, 10) > (Mary, Here, Female, 80) > (Bill, Here, Male, 100) > (Joe, Here, Male, 150) > > total_iq_per_region = GROUP A BY (region, gender); > > total_iq_per_region_per_gender = FOREACH total_iq_per_region > { > GENERATE FLATTEN(group), > SUM(A.iq) AS iq_per_region_per_gender; > } > > total_iq_per_gender = GROUP A BY (gender); > > total_iq_per_gender = FOREACH A > { > GENERATE FLATTEN(group), > SUM(A.iq) AS iq_per_gender; > } > > Now I guess I could use JOIN to combine both bags(?) by gender but somehow > I > don't get it. > > Thanks > -Marco > > On Tue, Oct 11, 2011 at 6:02 PM, Marco Cadetg <[email protected]> wrote: > > > Thanks a lot, Shawn! Looks like I need to learn some basics ;) > > -Marco > > > > On Tue, Oct 11, 2011 at 5:39 PM, Xiaomeng Wan <[email protected]> > wrote: > > > >> total_iq = foreach (group A by all) generate SUM(A.iq) as total; > >> > >> total_iq_per_region = FOREACH total_iq_per_region > >> { > >> GENERATE FLATTEN(group), > >> SUM(A.iq)/total_iq.total AS iq_per_region; > >> } > >> > >> Shawn > >> > >> > >> On Tue, Oct 11, 2011 at 9:20 AM, Marco Cadetg <[email protected]> wrote: > >> > Hi there, > >> > > >> > I would need to do something like this: > >> > > >> > A = LOAD 'student' USING PigStorage() AS (name:chararray, > >> region:chararry, > >> > iq:int); > >> > DUMP A; > >> > (John, There, 10) > >> > (Alf, There, 10) > >> > (ET, There, 10) > >> > (Mary, Here, 80) > >> > (Bill, Here, 100) > >> > (Joe, Here, 150) > >> > > >> > total_iq_per_region = GROUP A BY (region); > >> > > >> > total_iq_per_region = FOREACH total_iq_per_region > >> > { > >> > GENERATE FLATTEN(group), > >> > SUM(A.iq) AS iq_per_region; > >> > } > >> > > >> > total_iq = FOREACH A > >> > { > >> > GENERATE SUM(iq) AS total_iq: > >> > } > >> > > >> > Now I would like to retrieve the percentage of the region e.g. > >> iq_per_reqion > >> > / total_iq and store the result. How can I achieve that? I hope my > >> example > >> > is not too confusing. > >> > > >> > Cheers > >> > -Marco > >> > > >> > > > > >
