Hi Lin, This does not seem to be a known issue. Can you please open a new jira ? fyi, I get a java.lang.NullPointerException when I tried running query 1 with 0.7 or trunk versions.
Thanks, Thejas On 10/12/10 3:38 PM, "Lin Guo" <[email protected]> wrote: > Hi, > > Our data contain tuples one of whose fields is a tuple containing a > bag field and we've seen the following exceptions when we access the > bag field: > > java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot > be cast to org.apache.pig.data.DataBag > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperator > s.POProject.processInputBag(POProject.java:479) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperator > s.POProject.getNext(POProject.java:197) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperator > s.POProject.processInputBag(POProject.java:477) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperator > s.POProject.getNext(POProject.java:197) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperator > s.POForEach.processPlan(POForEach.java:336) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperator > s.POForEach.getNext(POForEach.java:288) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Redu > ce.runPipeline(PigMapReduce.java:433) > at > > We can reproduce the exceptions using the following scripts. > > 1. A = LOAD 'test_input' as (a:int, T:(list:{B:(key:int, value:int)}, > world:chararray) ); > describe A; > /* > test_input contains: > 12 ({(2,13),(4,5)}, 'hello') > 24 ({(8,17),(9,11),(3,4)}, 'world') > > and got A's schema as: > A: {a: int,T: (list: {B: (key: int,value: int)},world: chararray)} > */ > > B = FOREACH A GENERATE T.list, T.world; > describe B; > /* > got: > B: {list: {B: (key: int,value: int)},world: chararray} > */ > > dump B; > > 2. > ...... > > b = foreach a generate member_id, primary_email, year_born; > c = group b by member_id; > d = foreach c generate group as member_id, b; > e = group d by member_id; > f = foreach e generate group as member_id, d; > g = foreach f generate member_id as A, flatten(d); > > h = foreach g generate $0 as A, $1 AS B, $2 AS C; > describe h; > /* get the following schema: > h: {A: int,B: int,C: {member_id: int,primary_email: chararray,year_born: int}} > */ > > h = foreach h generate $0 as A, Swap($1, $2) AS T; > describe h; > /* We use Swap to generate a tuple out of the last two fields and got > the following schema > h: {A: int,T: (C: {member_id: int,primary_email: chararray,year_born: > int},B: int)} > */ > g = foreach h generate A, T.C; > describe g; > > g = limit g 15; > dump g; > > Is it a known issue? > > Best, > Lin >
