Sure, just join your total counts with your partials on gender.

D

On Tue, Oct 11, 2011 at 11:58 PM, Marco Cadetg <[email protected]> wrote:

> D'oh I just see that unfortunately my example was a bit over simplified.
> The
> total needs to be grouped by another field like below.
>
> A = LOAD 'student' USING PigStorage() AS (name:chararray, region:chararray,
> gender:charrarray, iq:int);
> DUMP A;
> (Eva, There, Female,500)
> (John, There, Male, 10)
> (Alf, There, Male, 10)
> (ET, There, Male, 10)
> (Mary, Here, Female, 80)
> (Bill, Here, Male, 100)
> (Joe, Here, Male, 150)
>
> total_iq_per_region = GROUP A BY (region, gender);
>
> total_iq_per_region_per_gender = FOREACH total_iq_per_region
> {
>  GENERATE FLATTEN(group),
>  SUM(A.iq) AS iq_per_region_per_gender;
> }
>
> total_iq_per_gender = GROUP A BY (gender);
>
> total_iq_per_gender = FOREACH A
> {
>  GENERATE FLATTEN(group),
>  SUM(A.iq) AS iq_per_gender;
> }
>
> Now I guess I could use JOIN to combine both bags(?) by gender but somehow
> I
> don't get it.
>
> Thanks
> -Marco
>
> On Tue, Oct 11, 2011 at 6:02 PM, Marco Cadetg <[email protected]> wrote:
>
> > Thanks a lot, Shawn! Looks like I need to learn some basics ;)
> > -Marco
> >
> > On Tue, Oct 11, 2011 at 5:39 PM, Xiaomeng Wan <[email protected]>
> wrote:
> >
> >> total_iq = foreach (group A by all) generate SUM(A.iq) as total;
> >>
> >> total_iq_per_region = FOREACH total_iq_per_region
> >> {
> >>  GENERATE FLATTEN(group),
> >>  SUM(A.iq)/total_iq.total AS iq_per_region;
> >> }
> >>
> >> Shawn
> >>
> >>
> >> On Tue, Oct 11, 2011 at 9:20 AM, Marco Cadetg <[email protected]> wrote:
> >> > Hi there,
> >> >
> >> > I would need to do something like this:
> >> >
> >> > A = LOAD 'student' USING PigStorage() AS (name:chararray,
> >> region:chararry,
> >> > iq:int);
> >> > DUMP A;
> >> > (John, There, 10)
> >> > (Alf, There, 10)
> >> > (ET, There, 10)
> >> > (Mary, Here, 80)
> >> > (Bill, Here, 100)
> >> > (Joe, Here, 150)
> >> >
> >> > total_iq_per_region = GROUP A BY (region);
> >> >
> >> > total_iq_per_region = FOREACH total_iq_per_region
> >> > {
> >> >  GENERATE FLATTEN(group),
> >> >  SUM(A.iq) AS iq_per_region;
> >> > }
> >> >
> >> > total_iq = FOREACH A
> >> > {
> >> >  GENERATE SUM(iq) AS total_iq:
> >> > }
> >> >
> >> > Now I would like to retrieve the percentage of the region e.g.
> >> iq_per_reqion
> >> > / total_iq and store the result. How can I achieve that? I hope my
> >> example
> >> > is not too confusing.
> >> >
> >> > Cheers
> >> > -Marco
> >> >
> >>
> >
> >
>

Reply via email to