last time I checked, I don't think you can do join on groups. But that was like a year ago.
On Tue, Jan 25, 2011 at 12:49 PM, Neil Kodner <nkod...@gmail.com> wrote: > I've created a relation by grouping on a composite key. I then join a > similar relation using the grouped key as the join key. > > outgoing = FOREACH raw GENERATE origin, 'OUT', passengers,flydate; > incoming = FOREACH raw GENERATE destination, 'IN', passengers,flydate; > > out_grpd = GROUP outgoing BY (origin,flydate); > in_grpd = GROUP incoming BY (destination,flydate); > > I then create some totals: > > out_totals = FOREACH out_grpd GENERATE group , SUM(outgoing.passengers) as > passengers; > in_totals = FOREACH in_grpd GENERATE group , SUM(incoming.passengers) as > passengers; > > So far so good. > > I then full-outer join out_totals to in_totals on group. > > joind = JOIN out_totals BY group FULL OUTER, in_totals BY group; > > So far, everything looks as I expect it to: > > out_totals: {group: (origin: chararray,flydate: chararray),passengers: > long} > in_totals: {group: (destination: chararray,flydate: chararray),passengers: > long} > joind: {out_totals::group: (origin: chararray,flydate: > chararray),out_totals::passengers: long,in_totals::group: (destination: > chararray,flydate: chararray),in_totals::passengers: long} > > Here's where things get tricky. I'm trying to view the output expecting to > see the composite group, but it seems that part of the group is being left > out. I'm seeing the airport code, the sum of the passengers, but i'm not > seeing flydate which is part of the group > > lstd = foreach joind GENERATE out_totals::group,out_totals::passengers; > lstd: {out_totals::group: (origin: chararray,flydate: > chararray),out_totals::passengers: long} > > When I dump the output of joind, I don't see the flydate, only the airport > code and one of the sums. Since I joined on group, I'd expect to see the > airport code and the group. > > (ACV,4,,) > (ANC,159,,) > (ANC,228,,) > (AST,6,,) > (ATL,87,,) > (BDL,3086,,) > (BDL,3216,,) > (BDL,3417,,) > (BDL,4278,,) > (BDL,6027,,) > (BDL,6061,,) > (BDL,6695,,) > (BDL,7390,,) > (BDL,7576,,) > > When I try and dump lstd, I receive the error: > > 2011-01-25 15:41:40,601 [Thread-59] WARN > org.apache.hadoop.mapred.LocalJobRunner - job_local_0004 > java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.pig.data.Tuple > at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392) > at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343) > at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291) > at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:433) > at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:401) > at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:381) > at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:251) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) > 2011-01-25 15:41:45,126 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - job job_local_0004 has failed! Stop running all dependent jobs > 2011-01-25 15:41:45,130 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 100% complete > 2011-01-25 15:41:45,130 [main] ERROR > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! > > I'm on Pig 0.0.8 r1043805 and I'm not entirely sure if joining on a group > is > permitted, or if I'm doing it incorrectly. >