Hi,
Thanks for your reply.
I find the problem in a distributed database based on Postgres (Greenplum).
In distributed database
there may be distributed tables:
every single node only contain subpart of the data and combine them
all will get the full data
I think it may also be a problem for Postgres's parallel computing.
1. What postgres planner do for parallel scan a table and then join a
generate_series() function scan?
2. What postgres planner do for parallel scan a table and then join a
generate_series() function scan with a volatile filter?
Thus running the SQL in the above case, since generate_series functions can
can be taken as the same every where,
And generate_series join generate_series also have this property: the data
is complete in every single node. This property
is very helpful in a distributed join: A distributed table join
generate_series function can just join in every local node and then
gather the result back to a single node.
But things are different when there are volatile functions: volatile
functions may be in where clause, targetlist and somewhere.
That is why I come up with the above case and ask here.
To be honest, I do not care the push down so much. It is not normal usage
to writing volatile functions in where clause.
I just find it lose the property.
Best,
Zhenghua Lyu
________________________________
From: Tom Lane <[email protected]>
Sent: Friday, July 10, 2020 10:10 PM
To: Zhenghua Lyu <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: distribute_restrictinfo_to_rels if restrictinfo contains volatile
functions
Zhenghua Lyu <[email protected]> writes:
> The where clause is "pushed down to the x,y" because it only
> references these two relations.
Yeah. I agree that it's somewhat unprincipled, but changing it doesn't
seem like a great idea. There are a lot of users out there who aren't
terribly careful about marking their UDFs as non-volatile, but would be
unhappy if the optimizer suddenly crippled their queries because of
being picky about this.
Also, we specifically document that order of evaluation in WHERE clauses
is not guaranteed, so I feel no need to make promises about how often
volatile functions there will be evaluated. (Volatiles in SELECT lists
are a different story.)
This behavior has stood for a couple of decades with few user complaints,
so why are you concerned about changing it?
regards, tom lane