On 2016/10/25 18:58, Ashutosh Bapat wrote: You wrote:
13. The comment below is missing the main purpose i.e. creating a a unique alias, in case the relation gets converted into a subquery. Lowest or highest relid will create a unique alias at given level of join and that would be more future proof. If we decide to consider paths for every join order, following comment will no more be true. + /* + * Set the relation index. This is defined as the position of this + * joinrel in the join_rel_list list plus the length of the rtable list. + * Note that since this joinrel is at the end of the list when we are + * called, we can get the position by list_length. + */ + fpinfo->relation_index = + list_length(root->parse->rtable) + list_length(root->join_rel_list);
I don't agree on that point. As I said before, the approach using the lowest/highest relid would make a remote query having nested subqueries difficult to read. And even if we decided to consider paths for every join order, we could define the relation_index safely, by modifying postgresGetForeignJoinPaths so that it (1) sets the relation_index the proposed way at the first time it is called and (2) avoids setting the relation_index after the first call, by checking to see fpinfo->relation_index > 0.
OK. Although, the alias which contains a number, which apparently doesn't show any relation with the relation (no pun intended :)) being deparsed, won't make much sense in the deparsed query. But I am fine leaving this to the committer's judgement.
Fine with me.
May be you want to add an assert here making sure that relation_index is set only once. Something like Assert(fpinfo->relation_index == 0) just before setting it.
15. While deparsing every Var, we are descending the RelOptInfo hierarchy.
Right. I think that not only avoids creating redundant data to find the alias of a given Var node but also would be able to find it in optimal cost.
Instead of that, we should probably gather all the alias information once and store it in context as an array indexed by relid. While deparsing a Var we can get the targetlist and alias for a given relation by using var->varno as index. And then search for given Var node in the targetlist to deparse the column name by its position in the targetlist. For the relations that are not converted into subqueries, this array will not hold any information and the Var will be deparsed as it's being done today.
Sorry, I don't understand this part. How does that work when creating a remote query having nested subqueries? Is that able to search for a given Var node efficiently?
For every relation that is deparsed as a subquery, we will create a separate context. Each such context will have an array described above. This array will contain the targetlist and aliases for all the relations, covered by that relation, that are required to be deparsed as subqueries. These arrays bubble up to topmost relation that is not required to be deparsed as subquery. When a relation is required to be deparsed as a subquery, its immediate upper relation sets the alias and targetlist chosen for that relation at all the indexes corresponding to all the base relations covered by that relation.
Thanks for the explanation!
For example, let's assume that a relation (1, 2, 3) is required to be deparsed as subquery for an immediate upper relation (1, 2, 3, 4, 5) (thus the other joining relation being (4,5)). While deparsing for relation (1,2,3,4,5), the context will contain a 5 element array, with positions 1, 2, 3 filled by same targetlist and alias whereas positions 4 and 5 will not be filled as those relations are not deparsed as subqueries.
Sorry, I don't understand this. In that case, the immediate upper relation (1, 2, 3, 4, 5) would need to fill the targetlist and aliases for *the join relation (1, 2, 3) somewhere*, not the targetlist and aliases for each of the component relations 1, 2, and 3, because the join relation is deparsed as a subquery. Maybe I'm missing something, though.
Let's assume in relation (1, 2, 3), (1, 3) in turn requires subquery but (2) does not. Thus the context created while deparsing (1, 2, 3) will have a 3 element array with positions 1 and 3 containing the same targetlist and alias, where as position 2 will be empty.
When deparsing a Var node with varno = N and varattno = m, if the nth position in the array in the context is empty, that Var node will be deparsed as rN.<column name>.
What happens when deparsing eg, a Var with varno = 2 at the topmost relation (1, 2, 3, 4, 5)? The second position of the array is empty, but the join relation (1, 2, 3) is deparsed as a subquery, so the Var should be deparsed as an alias of an output column of the subquery at the topmost relation, I think.
But if that position is has alias sZ, then we search for that Var node in the targetlist and if it's found at kth position in the targetlist, we will deparse it as sZ.ck. The search in the targetlist can be performed using tlist_member, and then fetching the position by TargetEntry::resno.
This does not require any recursion and thus saves stack space and some CPU cycles required for recursion.
Is that true?
some CPU cycles required for recursion. I guess, the arrays need to be computed only once for any relation when the query for that relation is deparsed the first time.
Does this algorithm extend to the case where we consider paths for every join order?
Thanks for considering other suggestions.
Your suggestions are really helpful to improve the patch. Thanks! Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers