Ah, apologies for the lack of precision in my original email.
 
When I mentioned 'akin' to a matrix transposition, what I should have made 
clear is that I don't need the ordering perserved.
 
I just need the columns in a relation transformed into a new relation 
consisting of "bag rows" from the original relation's columns, ordering isn't 
needed.
 
I was hoping for advice on Pig wizardry to do this in the most efficient manner 
(i.e., least CPU, Mem, and lines of code).
 
Thanx for the response!

 

> Date: Thu, 19 Jan 2012 23:35:27 -0800
> Subject: Re: Transpose a relation
> From: [email protected]
> To: [email protected]
> 
> This is something tricky to propose. There are a couple of reasons why.
> 
> First of all, Pig does not guarantee any specific ordering of a Bag. You
> can see how this is an issue, as it means that taking A transpose transpose
> might not yield A again.
> 
> Secondly, bags are the only spillable data structure. Tuples are not. This
> means you're going to have a hard limit on how big your matrix can get.
> 
> Altogether this means that Bags aren't a great data structure to represent
> matrices. You could do a Tuple or Tuples, but that will have serious memory
> issues. There are ways to make a bag work, but it'd be tricky... I suppose
> it depends on the problem you want to solve.
> 
> 2012/1/19 David Langer <[email protected]>
> 
> >
> > Greetings All!
> >
> > Hopefully this isn't too annoying of a newbie question.
> >
> > I'd like to transpose the columns in a relation into a relation consisting
> > of rows of bags (i.e., something akin to matrix transposition). As an
> > example:
> >
> > 1 A 1A
> > 2 B 2B
> > 3 C 3C
> >
> > Transposes to:
> >
> > {1, 2, 3}
> > {A, B, C}
> > {3, C, 3C}
> >
> > The Pig code I came up with is along the lines of:
> >
> > Bag1 = FOREACH SomeData GENERATE Col1;
> > Bag1 = GROUP Bag1 ALL;
> >
> > Bag2 = FOREACH SomeData GENERATE Col2;
> > Bag2 = GROUP Bag2 ALL;
> >
> > Bag3 = FOREACH SomeData GENERATE Col3;
> > Bag3 = GROUP Bag3 ALL;
> >
> > Bags = UNION Bag1, Bag2, Bag3;
> >
> > The above Pig code works, just wondering if this is the best way without
> > using a UDF.
> >
> > Thanx,
> >
> > Dave
                                          

Reply via email to