Hi Sameer,

I think you can use the BagConcat() function from DataFu 
(http://datafu.incubator.apache.org/docs/datafu/guide/bag-operations.html) for 
your use-case.

The idea is to generate a bag of bags for each id and concatenate the bag of 
bags into a single bag. Outlining an (untested) approach below:


sample = LOAD 'sample.txt';
baggd = foreach sample generate $0, {$1..};
grpd = group baggd by $0;
grpd = foreach grpd generate group, baggd.$1 as baggd;
define BagConcat datafu.pig.bags.BagConcat();
output = foreach grpd generate group, BagConcat (baggd);


Hope this helps.

Thanks!
Gufran Pathan| +91 7760913355|

Correlation does not imply causation, but it does waggle its eyebrows 
suggestively and gesture furtively while mouthing "look over there." -Randall 
Munroe

-----Original Message-----
From: Sameer Tilak [mailto:ssti...@live.com]
Sent: Thursday, November 13, 2014 12:51 PM
To: user@pig.apache.org
Subject: Group operator and variable schema (reformatted email)

Hi All,
I have the following question:
Snippet of my sample.txt. First column is id, however each row can have 
variable number of columns.

id1 100 200 300 400 500
id2 10 20 30id1 800 900 600
id3 10 20 30 40 50 60 70 80 90 100
id1 1 2 3 4 5 6 7 8 9id2 40 50 60 70 80 90
id3 200
sample = LOAD 'sample.txt' [how should I specify schema here]sample_grpd = 
GROUP sample by $0;sample_result = FOREACH sample_grpd generate group, 
FLATTEN(TOBAG([what should go here])) group by id so that the result is:
id1 100 200 300 400 500 800 900 600 1 2 3 4 5 6 7 8 9
id2 10 20 30 40 50 60 70 80 90
id3 10 20 30 40 50 60 70 80 90 100 200


Any help with this, will be greatly appreciated!


Disclaimer: http://www.mu-sigma.com/disclaimer.html

Reply via email to