Yes. As previously discussed, inner plans are duplicated at this point, rather than having splits inserted. One future optimization we need to add is putting these splits in place. Determining when to put in the splits is easy. But first we need to write an efficient split implementation to handle this.

Alan.

Olga Natkovich wrote:
Does this mean that distinct and filter will be recomputed several
times?

Olga
-----Original Message-----
From: Alan Gates [mailto:[EMAIL PROTECTED] Sent: Monday, June 30, 2008 11:21 AM
To: Shravan Narayanamurthy
Cc: Santhosh Srinivasan; [email protected]
Subject: Re: The plan generated for this nested plan is not as per we had discussed

Analysis below.

Shravan M Narayanamurthy wrote:
Hi Guys,
I think we need to find a proper set of rules for the project's schema. The following script kinda of covers all the scenarios:
A = load 'a';
B = group A by $0;
C = foreach B {
C1 = filter A by $0>5;
C2 = distinct C1;
C3 = distinct A;
generate group, udf1(*), udf2(C2), udf3(C2.$1), udf4(C3),
udf(C3.$1);
}

I think, we had not thought about the projection in the
inner plan of
filter. With this constraint, we need a new set of rules.
Can you post
an algorithm that will work to set the return types of the projects?

Thanks & Regards,
--Shravan

<snip>
In this case, the foreach should have the following plans:

0 - proj(0)

1 - proj( * ) -> udf1

2 - proj (1) -> filter -> distinct -> proj( * ) -> udf2

3 - proj (1) -> filter -> distinct -> proj(1) -> udf3

4 - proj(1) -> distinct -> proj( * ) -> udf4

5 - proj(1) -> distinct -> proj(1) -> udf5

In plans 2 and 3, filter will have an inner plan of:

proj(0) -> gt, const(5) -> gt

In discussing the scenario, Santhosh and I saw one issue, which is that in plan 1, the proj( * ) will be incorrectly trying to accumulate a bag for udf1, when it should just pass the tuple. Santhosh is going to fix that by changing the project to determine whether it has a predecessor, and if so whether that predecessor is a relational operator, instead of looking at its input to see if it's a relational operator.

I didn't follow your comment on the issue with the project in the filter plan. It looked fine to me.

Alan.

Reply via email to