last time I checked, I don't think you can do join on groups. But that was
like a year ago.

On Tue, Jan 25, 2011 at 12:49 PM, Neil Kodner <nkod...@gmail.com> wrote:

> I've created a relation by grouping on a composite key.  I then join a
> similar relation using the grouped key as the join key.
>
> outgoing = FOREACH raw GENERATE origin, 'OUT', passengers,flydate;
> incoming = FOREACH raw GENERATE destination, 'IN', passengers,flydate;
>
> out_grpd = GROUP outgoing BY (origin,flydate);
> in_grpd = GROUP incoming BY (destination,flydate);
>
> I then create some totals:
>
> out_totals = FOREACH out_grpd GENERATE  group , SUM(outgoing.passengers) as
> passengers;
> in_totals = FOREACH in_grpd GENERATE group , SUM(incoming.passengers) as
> passengers;
>
> So far so good.
>
> I then full-outer join out_totals to in_totals on group.
>
> joind = JOIN out_totals BY group FULL OUTER, in_totals BY group;
>
> So far, everything looks as I expect it to:
>
> out_totals: {group: (origin: chararray,flydate: chararray),passengers:
> long}
> in_totals: {group: (destination: chararray,flydate: chararray),passengers:
> long}
> joind: {out_totals::group: (origin: chararray,flydate:
> chararray),out_totals::passengers: long,in_totals::group: (destination:
> chararray,flydate: chararray),in_totals::passengers: long}
>
> Here's where things get tricky.  I'm trying to view the output expecting to
> see the composite group, but it seems that part of the group is being left
> out.  I'm seeing the airport code, the sum of the passengers, but i'm not
> seeing flydate which is part of the group
>
> lstd = foreach joind GENERATE out_totals::group,out_totals::passengers;
> lstd: {out_totals::group: (origin: chararray,flydate:
> chararray),out_totals::passengers: long}
>
> When I dump the output of joind, I don't see the flydate, only the airport
> code and one of the sums.  Since I joined on group, I'd expect to see the
> airport code and the group.
>
> (ACV,4,,)
> (ANC,159,,)
> (ANC,228,,)
> (AST,6,,)
> (ATL,87,,)
> (BDL,3086,,)
> (BDL,3216,,)
> (BDL,3417,,)
> (BDL,4278,,)
> (BDL,6027,,)
> (BDL,6061,,)
> (BDL,6695,,)
> (BDL,7390,,)
> (BDL,7576,,)
>
> When I try and dump lstd, I receive the error:
>
> 2011-01-25 15:41:40,601 [Thread-59] WARN
>  org.apache.hadoop.mapred.LocalJobRunner - job_local_0004
> java.lang.ClassCastException: java.lang.String cannot be cast to
> org.apache.pig.data.Tuple
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:433)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:401)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:381)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:251)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> 2011-01-25 15:41:45,126 [main] INFO
>
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - job job_local_0004 has failed! Stop running all dependent jobs
> 2011-01-25 15:41:45,130 [main] INFO
>
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2011-01-25 15:41:45,130 [main] ERROR
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
>
> I'm on Pig 0.0.8 r1043805 and I'm not entirely sure if joining on a group
> is
> permitted, or if I'm doing it incorrectly.
>

Reply via email to