0.10.0-cdh4.1.2 On 5/28/13 11:07 AM, "Pradeep Gollakota" <pradeep...@gmail.com> wrote:
>Oh I see... I don't remember if I tried to do it your way or not. I'm >using >the CDH3 version (0.8.1) of pig. I'm not sure if explicit literals in >join's are supported in that version. I'll give it a shot and see since it >will simplify my script. >What version of pig are you using? > > >On Tue, May 28, 2013 at 2:04 PM, Mehmet Tepedelenlioglu < >mehmets...@yahoo.com> wrote: > >> So, the example I gave before: x = join a by 1, b by 1 using >> 'replicated'; does a replicated cross, and it creates the synthetic keys >> implicitly, which is great because the tuple it returns does not have >>the >> synthetic keys in it. An explicit replicated cross would be good though, >> since the implementation probably is pretty simple. >> >> >> On 5/28/13 10:30 AM, "Pradeep Gollakota" <pradeep...@gmail.com> wrote: >> >> >I ran into a similar problem where I had a relation (A) which was >>massive >> >and another relation (B) which had exactly 1 record. I needed to do a >> >cross >> >product of these two relations, and the default implementation was very >> >slow. I worked around it by generating a synthetic key myself and then >> >used >> >a replicated join to cross the two relations. It looked something like >>the >> >following: >> > >> >data1 = load 'data1'; # billions of records >> >data2 = load 'data2'; # 1 record >> >A = foreach data1 generate *, 1 as fake_key; >> >B = foreach data2 generate *, 1 as fake_key; >> >C = join B by fake_key, A by fake_key using 'replicated'; >> > >> >I looked around to see if Pig supported this out of the box, but I >>didn't >> >find anything. >> > >> >Perhaps a replicated cross operator would be helpful for these type of >> >problems. >> >From the O'Reilly book, this is what is said about the cross operator: >> >"Pig >> >does implement cross in a parallel fashion. It does this by generating >>a >> >synthetic join key, replicating rows, and then doing the cross as a >>join." >> >Since the cross product operator is already being performed as join >>under >> >the hood, I wonder how difficult it would be to support different join >> >strategies for cross. >> > >> > >> >On Fri, May 24, 2013 at 12:21 PM, Mehmet Tepedelenlioglu < >> >mehmets...@yahoo.com> wrote: >> > >> >> Thanks, but is there a map-side cross? The usual cross seems to have >>a >> >> bug. I sent an example of how to replicate this bug. >> >> >> >> On 5/24/13 9:15 AM, "Jonathan Coveney" <jcove...@gmail.com> wrote: >> >> >> >> >You can do this, but pig has a CROSS keyword that you can use. >> >> > >> >> > >> >> >2013/5/23 Mehmet Tepedelenlioglu <mehmets...@yahoo.com> >> >> > >> >> >> Hi, >> >> >> >> >> >> I am using this: >> >> >> >> >> >> x = join a by 1, b by 1 using 'replicated'; >> >> >> >> >> >> with the hope that it generates some synthetic key '1' on both a >>and >> >>b >> >> >>and >> >> >> joins it on that key, thereby, in this case, doing a clean map >>side >> >> >>cross >> >> >> of >> >> >> a and b with no schema changes (exactly the way a cross would >>work). >> >>It >> >> >> seems to be working, but since I just tried it and it worked, I am >> >>not >> >> >>sure >> >> >> if there is anything in there I should be aware of. Does anyone >>know? >> >> >> >> >> >> Thanks, >> >> >> >> >> >> Mehmet >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>