Hi Sameer, I think you can use the BagConcat() function from DataFu (http://datafu.incubator.apache.org/docs/datafu/guide/bag-operations.html) for your use-case.
The idea is to generate a bag of bags for each id and concatenate the bag of bags into a single bag. Outlining an (untested) approach below: sample = LOAD 'sample.txt'; baggd = foreach sample generate $0, {$1..}; grpd = group baggd by $0; grpd = foreach grpd generate group, baggd.$1 as baggd; define BagConcat datafu.pig.bags.BagConcat(); output = foreach grpd generate group, BagConcat (baggd); Hope this helps. Thanks! Gufran Pathan| +91 7760913355| Correlation does not imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing "look over there." -Randall Munroe -----Original Message----- From: Sameer Tilak [mailto:ssti...@live.com] Sent: Thursday, November 13, 2014 12:51 PM To: user@pig.apache.org Subject: Group operator and variable schema (reformatted email) Hi All, I have the following question: Snippet of my sample.txt. First column is id, however each row can have variable number of columns. id1 100 200 300 400 500 id2 10 20 30id1 800 900 600 id3 10 20 30 40 50 60 70 80 90 100 id1 1 2 3 4 5 6 7 8 9id2 40 50 60 70 80 90 id3 200 sample = LOAD 'sample.txt' [how should I specify schema here]sample_grpd = GROUP sample by $0;sample_result = FOREACH sample_grpd generate group, FLATTEN(TOBAG([what should go here])) group by id so that the result is: id1 100 200 300 400 500 800 900 600 1 2 3 4 5 6 7 8 9 id2 10 20 30 40 50 60 70 80 90 id3 10 20 30 40 50 60 70 80 90 100 200 Any help with this, will be greatly appreciated! Disclaimer: http://www.mu-sigma.com/disclaimer.html