Just to confirm... you want your output to read as follows,

{1, {(1, count), (2, count), ..., (10, count)}}
{1, {(11, count), (12, count), ..., (20, count)}}
...
correct?

I think you also have a syntax error... I'm pretty sure you can't do
FOREACH and GROUP in the same statement. You can try the following:

B = GROUP A BY season;
C = FOREACH B {
    sorted = ORDER A BY count DESC;
    quantiles = FOREACH sorted GENERATE BagSplit(10, sorted) as (mybag,
index);
    GENERATE group AS season, FLATTEN(quantiles.mybag);
};



On Mon, Jul 15, 2013 at 12:31 PM, Lars Francke <[email protected]>wrote:

> Hi!
>
> I have a problem with the following Pig script:
>
> DESCRIBE A;
> A: {id: int, season: int, count: long}
>
> foo = FOREACH (GROUP A BY season) {
>         sorted = ORDER A BY count DESC;
>         quantiles = FOREACH sorted GENERATE BagSplit(10, sorted);
>         GENERATE <???>;
>       };
>
> DESCRIBE foo::quantiles;
> foo::quantiles: {datafu.pig.bags.bagsplit_sorted_14586: {(data: {(id:
> int, season: int, count: long)},index: int)}}
>
>
> What I'd like to do is order A by "count" and then use DataFu's
> BagSplit UDF to create equal splits (deciles). I'm very very new to
> Pig and I think this can all be attributed to the fact that I
> misunderstand bags and FOREACH - especially the nested variant.
>
> I'd like my output to be:
> {season, {(id, count), (id, count), ...}}
>
> GENERATE quantiles: Is accepted but leads to "ERROR 2015: Invalid
> physical operators in the physical plan" on execution.
>
> GENERATE quantiles.$0: Same as above. In fact I can stick as many
> ".$0" at the end as I want to and it is always accepted but generates
> an error when duming the data.
>
> I'll reread the Pig Lating Basics tonight but if anyone has an idea
> what I'm doing wrong or how I can achieve my goal I'd be very
> grateful.
>
> Thanks,
> Lars
>

Reply via email to