Ah, apologies for the lack of precision in my original email. When I mentioned 'akin' to a matrix transposition, what I should have made clear is that I don't need the ordering perserved. I just need the columns in a relation transformed into a new relation consisting of "bag rows" from the original relation's columns, ordering isn't needed. I was hoping for advice on Pig wizardry to do this in the most efficient manner (i.e., least CPU, Mem, and lines of code). Thanx for the response!
> Date: Thu, 19 Jan 2012 23:35:27 -0800 > Subject: Re: Transpose a relation > From: [email protected] > To: [email protected] > > This is something tricky to propose. There are a couple of reasons why. > > First of all, Pig does not guarantee any specific ordering of a Bag. You > can see how this is an issue, as it means that taking A transpose transpose > might not yield A again. > > Secondly, bags are the only spillable data structure. Tuples are not. This > means you're going to have a hard limit on how big your matrix can get. > > Altogether this means that Bags aren't a great data structure to represent > matrices. You could do a Tuple or Tuples, but that will have serious memory > issues. There are ways to make a bag work, but it'd be tricky... I suppose > it depends on the problem you want to solve. > > 2012/1/19 David Langer <[email protected]> > > > > > Greetings All! > > > > Hopefully this isn't too annoying of a newbie question. > > > > I'd like to transpose the columns in a relation into a relation consisting > > of rows of bags (i.e., something akin to matrix transposition). As an > > example: > > > > 1 A 1A > > 2 B 2B > > 3 C 3C > > > > Transposes to: > > > > {1, 2, 3} > > {A, B, C} > > {3, C, 3C} > > > > The Pig code I came up with is along the lines of: > > > > Bag1 = FOREACH SomeData GENERATE Col1; > > Bag1 = GROUP Bag1 ALL; > > > > Bag2 = FOREACH SomeData GENERATE Col2; > > Bag2 = GROUP Bag2 ALL; > > > > Bag3 = FOREACH SomeData GENERATE Col3; > > Bag3 = GROUP Bag3 ALL; > > > > Bags = UNION Bag1, Bag2, Bag3; > > > > The above Pig code works, just wondering if this is the best way without > > using a UDF. > > > > Thanx, > > > > Dave
