Yes. As previously discussed, inner plans are duplicated at this point,
rather than having splits inserted. One future optimization we need to
add is putting these splits in place. Determining when to put in the
splits is easy. But first we need to write an efficient split
implementation to handle this.
Alan.
Olga Natkovich wrote:
Does this mean that distinct and filter will be recomputed several
times?
Olga
-----Original Message-----
From: Alan Gates [mailto:[EMAIL PROTECTED]
Sent: Monday, June 30, 2008 11:21 AM
To: Shravan Narayanamurthy
Cc: Santhosh Srinivasan; [email protected]
Subject: Re: The plan generated for this nested plan is not
as per we had discussed
Analysis below.
Shravan M Narayanamurthy wrote:
Hi Guys,
I think we need to find a proper set of rules for the project's
schema. The following script kinda of covers all the scenarios:
A = load 'a';
B = group A by $0;
C = foreach B {
C1 = filter A by $0>5;
C2 = distinct C1;
C3 = distinct A;
generate group, udf1(*), udf2(C2), udf3(C2.$1), udf4(C3),
udf(C3.$1);
}
I think, we had not thought about the projection in the
inner plan of
filter. With this constraint, we need a new set of rules.
Can you post
an algorithm that will work to set the return types of the projects?
Thanks & Regards,
--Shravan
<snip>
In this case, the foreach should have the following plans:
0 - proj(0)
1 - proj( * ) -> udf1
2 - proj (1) -> filter -> distinct -> proj( * ) -> udf2
3 - proj (1) -> filter -> distinct -> proj(1) -> udf3
4 - proj(1) -> distinct -> proj( * ) -> udf4
5 - proj(1) -> distinct -> proj(1) -> udf5
In plans 2 and 3, filter will have an inner plan of:
proj(0) -> gt, const(5) -> gt
In discussing the scenario, Santhosh and I saw one issue,
which is that in plan 1, the proj( * ) will be incorrectly
trying to accumulate a bag for udf1, when it should just pass
the tuple. Santhosh is going to fix that by changing the
project to determine whether it has a predecessor, and if so
whether that predecessor is a relational operator, instead of
looking at its input to see if it's a relational operator.
I didn't follow your comment on the issue with the project in
the filter plan. It looked fine to me.
Alan.