CROSS is by definition a very very expensive operation. Regardless, CROSS
is the wrong operator for what you're trying to do.

As was suggested by others, you want to RANK the relations then do a JOIN
by the rank.


On Tue, Mar 25, 2014 at 1:27 PM, <william.dowl...@thomsonreuters.com> wrote:

> Here is how to use rank and join for this problem:
>
> sh cat xxx
> 1,2,3,4,5
> 1,2,4,5,7
> 1,5,7,8,9
>
> sh cat yyy
> 10,11
> 10,12
> 10,13
>
>
> a= load 'xxx' using PigStorage(',');
> b= load 'yyy' using PigStorage(',');
>
> a2 = rank a;
> b2 = rank b;
>
> c = join a1 by $0, b2 by $0;
> c2 = order c by $6;
> c3 = foreach c2 generate $1 .. $5, $7 ..;
>
> dump c3
> (1,2,3,4,5,10,11)
> (1,2,4,5,7,10,12)
> (1,5,7,8,9,10,13)
>
>
> William F Dowling
> Senior Technologist
> Thomson Reuters
>
>
> -----Original Message-----
> From: Christopher Surage [mailto:csur...@gmail.com]
> Sent: Tuesday, March 25, 2014 4:03 PM
> To: user@pig.apache.org
> Subject: Re: Any way to join two aliases without using CROSS
>
> The output I would like to see is
>
> (1,2,3,4,5,10,11)
> (1,2,4,5,7,10,12)
> (1,5,7,8,9,10,13)
>
>
> On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota <pradeep...@gmail.com
> >wrote:
>
> > I don't understand what you're trying to do from your example.
> >
> > If you perform a cross on the data you have, the output will be the
> > following:
> >
> > (1,2,3,4,5,10,11)
> > (1,2,3,4,5,10,11)
> > (1,2,3,4,5,10,11)
> > (1,2,4,5,7,10,11)
> > (1,2,4,5,7,10,11)
> > (1,2,4,5,7,10,11)
> > (1,5,7,8,9,10,11)
> > (1,5,7,8,9,10,11)
> > (1,5,7,8,9,10,11)
> >
> > On this, you'll have to do a distinct to get what you're looking for.
> >
> > Let's change the example a little bit so we get a more clear
> understanding
> > of your problem. What would be the output if your two relations looked as
> > follows:
> >
> > (1,2,3,4,5)          (10,11)
> > (1,2,4,5,7)          (10,12)
> > (1,5,7,8,9)          (10,13)
> >
> >
> > On Tue, Mar 25, 2014 at 12:18 PM, Shahab Yunus <shahab.yu...@gmail.com
> > >wrote:
> >
> > > Have you tried iterating over the first relation and in the nested
> > > *generate* clause, always appending the second relation? Your top level
> > > looping is on first relation but in the nested block you are sort of
> > > hardcoding appending of second relation.
> > >
> > > I am referring to the examples like in  "Example: Nested Blocks"
> section
> > > http://pig.apache.org/docs/r0.10.0/basic.html#foreach
> > >
> > >
> > > On Tue, Mar 25, 2014 at 3:01 PM, Christopher Surage <csur...@gmail.com
> > > >wrote:
> > >
> > > > I am trying to perform the following action, but the only solution I
> > have
> > > > been able to come up with is using a CROSS, but I don't want to use
> > that
> > > > statement as it is a very expensive process.
> > > >
> > > > (1,2,3,4,5)          (10,11)
> > > > (1,2,4,5,7)          (10,11)
> > > > (1,5,7,8,9)          (10,11)
> > > >
> > > >
> > > > I want to make it
> > > > (1,2,3,4,5,10,11)
> > > > (1,2,4,5,7,10,11)
> > > > (1,5,7,8,9,10,11)
> > > >
> > > > any help would be much appreciated,
> > > >
> > > > Chris
> > > >
> > >
> >
>

Reply via email to