On Fri, Sep 5, 2014 at 2:20 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Sep 4, 2014 at 11:56 AM, Bruce Momjian <br...@momjian.us> wrote:
> >> I am thinking eventually we will need to cache the foreign server
> >> statistics on the local server.
> >> Wouldn't that lead to issues where the statistics get outdated and we
> have to
> >> anyways query the foreign server before planning any joins? Or are you
> >> of dropping the foreign table statistics once the foreign join is
> > I am thinking we would eventually have to cache the statistics, then get
> > some kind of invalidation message from the foreign server. I am also
> > thinking that cache would have to be global across all backends, I guess
> > similar to our invalidation cache.
> Maybe ... but I think this isn't really related to the ostensible
> topic of this thread. We can do join pushdown just fine without the
> ability to do anything like this.
> I'm in full agreement that we should probably have a way to cache some
> kind of statistics locally, but making that work figures to be tricky,
> because (as I'm pretty sure Tom has pointed out before) there's no
> guarantee that the remote side's statistics look anything like
> PostgreSQL statistics, and so we might not be able to easily store
> them or make sense of them. But it would be nice to at least have the
> option to store such statistics if they do happen to be something we
> can store and interpret.
I agree that we need local statistics too (full agreement to Bruce's
proposal) but playing the Devil's advocate here and trying to figure how
will things like invalidation and as you mentioned, cross compatibility
> It's also coming to seem to me more and more that we need a way to
> designate several PostgreSQL machines as a cooperating cluster. This
> would mean they'd keep connections to each other open and notify each
> other about significant events, which could include "hey, I updated
> the statistics on this table, you might want to get the new ones" or
> "hey, i've replicated your definition for function X so it's safe to
> push it down now" as well as "hey, I have just been promoted to be the
> new master" or even automatic negotiation of which of a group of
> machines should become the master after a server failure.
Thats a brilliant idea, and shouldnt be too much of a problem. One race
condition that is possible is that multiple backend may try to globally
propagate different statistics of the same table, but I think that any
standard logical ordering algorithm should handle that. Also, the automatic
master promotion seems like a brilliant idea and is also great since we
have time tested standard algorithms for that.
One thing I would like to see is that assuming all the interacting nodes do
not have identical schemas, if we can somehow maintain cross node
statistics and use them for planning cross node joins. That would lead to
similar problems as the ones already noted for having local statistics for
foreign databases, but if we solve those anyways for storing local
statistics, we could potentially look at having cross node relation
statistics as well.
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company