You can't really get away from string comparisons:

B = FOREACH A GENERATE $0 AS v1;
C = FOREACH A GENERATE $0 AS v2;
D = CROSS B, C;
dump D; -- generates all permutations of the two key fields.
E = foreach D generate (v1<v2?v1:v2) as v1, (v1<v2?v2:v1) as v2;
F = distinct E;
dump F; -- results in combinations

However, I think I can see a problem with this as well. If all 'A's are not
distinct, then you might need to generate unique Id for each row


B = FOREACH A GENERATE $0 AS v1, sequential() as extra1;
C = FOREACH A GENERATE $0 AS v2, sequential() as extra2;
D = CROSS B, C;
D = filter D by extra==extra2;
E = foreach D generate (v1<v2?v1:v2) as v1, (v1<v2?v2:v1) as v2;
F = distinct E;

This gives the actual results if you are solving the combinatoric problem of
5 "A's" 6 "B's" and 7 "C's" how many combinations and permutations.



On Sat, Jun 12, 2010 at 6:20 AM, Christian <[email protected]> wrote:

> Hello, this is my first contact with Pig and its community ;-)
>
> I need to generate all the possible permutations from a bag.
>
> Let me explain it with examples:
>
> A = LOAD 'data' AS f1:chararray;
>
> DUMP A;
> ('A')
> ('B')
> ('C')
>
> I can have all the possible combinations easily with CROSS:
>
> B = FOREACH A GENERATE $0 AS v1;
> C = FOREACH A GENERATE $0 AS v2;
>
> D = CROSS B, C;
> DUMP D;
> ('A', 'A')
> ('A', 'B')
> ('A', 'C')
> ('B', 'A')
> ('B', 'B')
> ('B', 'C')
> ('C', 'A')
> ('C', 'B')
> ('C', 'C')
>
> But what I need are the permutations. The result I want to obtain is
> something like:
>
> DUMP R;
> ('A', 'A')
> ('A', 'B')
> ('A', 'C')
> ('B', 'B')
> ('B', 'C')
> ('C', 'C')
>
> My first idea to solve that was to generate de CROSS and then FILTER like:
>
> R = FILTER D BY $0 < $1;
>
> It works but I would like to know if there is a better way to do this
> without having to use string comparison and assume that only one field is
> used. For example a real scenario would look like:
>
> DUMP A;
> ('A1', 'A2')
> ('B1', 'B2')
>
> DUMP R;
> ('A1', 'A2', 'A1', 'A2')
> ('A1', 'A2', 'B1', 'B2')
> ('B1', 'B2', 'B1', 'B2')
>
> Thank you in advance.
> Christian
>

Reply via email to