Hi Pig users, Is there an easy/efficient way to sample an inner bag? For example, with input in a relation like
(id1,att1,{(a,0.01),(b,0.02),(x,0.999749968742)}) (id1,att2,{(a,0.03),(b,0.04),(x,0.998749217772)}) (id2,att1,{(b,0.05),(c,0.06),(x,0.996945334509)}) I’d like to sample 1/3 the elements of the bags, and get something like (ignoring the non-determinism) (id1,att1,{(x,0.999749968742)}) (id1,att2,{(b,0.04)}) (id2,att1,{(b,0.05)}) I have a circumlocution that seems to work using flatten+ group but that looks ugly to me: tfidf1 = load '$tfidf' as (id: chararray, att: chararray, pairs: {pair: (word: chararray, value: double)}); flat_tfidf = foreach tfidf1 generate id, att, FLATTEN(pairs); sample_flat_tfidf = sample flat_tfidf 0.33; tfidf2 = group sample_flat_tfidf by (id, att); tfidf = foreach tfidf2 { pairs = foreach sample_flat_tfidf generate pairs::word, pairs::value; generate group.id, group.att, pairs; }; Can someone suggest a better way to do this? Many thanks! William F Dowling Senior Technologist Thomson Reuters