I don't think my version of PIG supports the rank function, I keep getting
Internal Error. I would update it, but I am not in control of the cluster.


On Tue, Mar 25, 2014 at 4:16 PM, Andrew Musselman <
[email protected]> wrote:

> John's answer about RANK sounds like it should solve your problem
>
> > On Mar 25, 2014, at 1:13 PM, Christopher Surage <[email protected]>
> wrote:
> >
> > @ pradeep, I know what the cross product will do, but I have many lines
> in
> > many files. So the cross will take far too long to complete.
> >
> >
> > On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota <[email protected]
> >wrote:
> >
> >> I don't understand what you're trying to do from your example.
> >>
> >> If you perform a cross on the data you have, the output will be the
> >> following:
> >>
> >> (1,2,3,4,5,10,11)
> >> (1,2,3,4,5,10,11)
> >> (1,2,3,4,5,10,11)
> >> (1,2,4,5,7,10,11)
> >> (1,2,4,5,7,10,11)
> >> (1,2,4,5,7,10,11)
> >> (1,5,7,8,9,10,11)
> >> (1,5,7,8,9,10,11)
> >> (1,5,7,8,9,10,11)
> >>
> >> On this, you'll have to do a distinct to get what you're looking for.
> >>
> >> Let's change the example a little bit so we get a more clear
> understanding
> >> of your problem. What would be the output if your two relations looked
> as
> >> follows:
> >>
> >> (1,2,3,4,5)          (10,11)
> >> (1,2,4,5,7)          (10,12)
> >> (1,5,7,8,9)          (10,13)
> >>
> >>
> >> On Tue, Mar 25, 2014 at 12:18 PM, Shahab Yunus <[email protected]
> >>> wrote:
> >>
> >>> Have you tried iterating over the first relation and in the nested
> >>> *generate* clause, always appending the second relation? Your top level
> >>> looping is on first relation but in the nested block you are sort of
> >>> hardcoding appending of second relation.
> >>>
> >>> I am referring to the examples like in  "Example: Nested Blocks"
> section
> >>> http://pig.apache.org/docs/r0.10.0/basic.html#foreach
> >>>
> >>>
> >>> On Tue, Mar 25, 2014 at 3:01 PM, Christopher Surage <[email protected]
> >>>> wrote:
> >>>
> >>>> I am trying to perform the following action, but the only solution I
> >> have
> >>>> been able to come up with is using a CROSS, but I don't want to use
> >> that
> >>>> statement as it is a very expensive process.
> >>>>
> >>>> (1,2,3,4,5)          (10,11)
> >>>> (1,2,4,5,7)          (10,11)
> >>>> (1,5,7,8,9)          (10,11)
> >>>>
> >>>>
> >>>> I want to make it
> >>>> (1,2,3,4,5,10,11)
> >>>> (1,2,4,5,7,10,11)
> >>>> (1,5,7,8,9,10,11)
> >>>>
> >>>> any help would be much appreciated,
> >>>>
> >>>> Chris
> >>
>

Reply via email to