I don't think my version of PIG supports the rank function, I keep getting Internal Error. I would update it, but I am not in control of the cluster.
On Tue, Mar 25, 2014 at 4:16 PM, Andrew Musselman < [email protected]> wrote: > John's answer about RANK sounds like it should solve your problem > > > On Mar 25, 2014, at 1:13 PM, Christopher Surage <[email protected]> > wrote: > > > > @ pradeep, I know what the cross product will do, but I have many lines > in > > many files. So the cross will take far too long to complete. > > > > > > On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota <[email protected] > >wrote: > > > >> I don't understand what you're trying to do from your example. > >> > >> If you perform a cross on the data you have, the output will be the > >> following: > >> > >> (1,2,3,4,5,10,11) > >> (1,2,3,4,5,10,11) > >> (1,2,3,4,5,10,11) > >> (1,2,4,5,7,10,11) > >> (1,2,4,5,7,10,11) > >> (1,2,4,5,7,10,11) > >> (1,5,7,8,9,10,11) > >> (1,5,7,8,9,10,11) > >> (1,5,7,8,9,10,11) > >> > >> On this, you'll have to do a distinct to get what you're looking for. > >> > >> Let's change the example a little bit so we get a more clear > understanding > >> of your problem. What would be the output if your two relations looked > as > >> follows: > >> > >> (1,2,3,4,5) (10,11) > >> (1,2,4,5,7) (10,12) > >> (1,5,7,8,9) (10,13) > >> > >> > >> On Tue, Mar 25, 2014 at 12:18 PM, Shahab Yunus <[email protected] > >>> wrote: > >> > >>> Have you tried iterating over the first relation and in the nested > >>> *generate* clause, always appending the second relation? Your top level > >>> looping is on first relation but in the nested block you are sort of > >>> hardcoding appending of second relation. > >>> > >>> I am referring to the examples like in "Example: Nested Blocks" > section > >>> http://pig.apache.org/docs/r0.10.0/basic.html#foreach > >>> > >>> > >>> On Tue, Mar 25, 2014 at 3:01 PM, Christopher Surage <[email protected] > >>>> wrote: > >>> > >>>> I am trying to perform the following action, but the only solution I > >> have > >>>> been able to come up with is using a CROSS, but I don't want to use > >> that > >>>> statement as it is a very expensive process. > >>>> > >>>> (1,2,3,4,5) (10,11) > >>>> (1,2,4,5,7) (10,11) > >>>> (1,5,7,8,9) (10,11) > >>>> > >>>> > >>>> I want to make it > >>>> (1,2,3,4,5,10,11) > >>>> (1,2,4,5,7,10,11) > >>>> (1,5,7,8,9,10,11) > >>>> > >>>> any help would be much appreciated, > >>>> > >>>> Chris > >> >
