Does this mean that distinct and filter will be recomputed several times? Olga
> -----Original Message----- > From: Alan Gates [mailto:[EMAIL PROTECTED] > Sent: Monday, June 30, 2008 11:21 AM > To: Shravan Narayanamurthy > Cc: Santhosh Srinivasan; [email protected] > Subject: Re: The plan generated for this nested plan is not > as per we had discussed > > Analysis below. > > Shravan M Narayanamurthy wrote: > > Hi Guys, > > I think we need to find a proper set of rules for the project's > > schema. The following script kinda of covers all the scenarios: > > A = load 'a'; > > B = group A by $0; > > C = foreach B { > > C1 = filter A by $0>5; > > C2 = distinct C1; > > C3 = distinct A; > > generate group, udf1(*), udf2(C2), udf3(C2.$1), udf4(C3), > udf(C3.$1); > > } > > > > I think, we had not thought about the projection in the > inner plan of > > filter. With this constraint, we need a new set of rules. > Can you post > > an algorithm that will work to set the return types of the projects? > > > > Thanks & Regards, > > --Shravan > > > > <snip> > In this case, the foreach should have the following plans: > > 0 - proj(0) > > 1 - proj( * ) -> udf1 > > 2 - proj (1) -> filter -> distinct -> proj( * ) -> udf2 > > 3 - proj (1) -> filter -> distinct -> proj(1) -> udf3 > > 4 - proj(1) -> distinct -> proj( * ) -> udf4 > > 5 - proj(1) -> distinct -> proj(1) -> udf5 > > In plans 2 and 3, filter will have an inner plan of: > > proj(0) -> gt, const(5) -> gt > > In discussing the scenario, Santhosh and I saw one issue, > which is that in plan 1, the proj( * ) will be incorrectly > trying to accumulate a bag for udf1, when it should just pass > the tuple. Santhosh is going to fix that by changing the > project to determine whether it has a predecessor, and if so > whether that predecessor is a relational operator, instead of > looking at its input to see if it's a relational operator. > > I didn't follow your comment on the issue with the project in > the filter plan. It looked fine to me. > > Alan. >
