Re: [HACKERS] A note about hash-based catcache invalidations

2011-08-17 Thread Heikki Linnakangas

On 17.08.2011 00:17, Tom Lane wrote:

I'm looking into the idea I mentioned earlier:


All is not entirely lost, however: there's still some possible
performance benefit to be gained here, if we go to the scheme of
identifying victim catcache entries by hashvalue only.  Currently,
each heap_update in a cached catalog has to issue two sinval messages
(per cache!): one against the old TID and one against the new TID.
We'd be able to reduce that to one message in the common case where the
hashvalue remains the same because the cache key columns didn't change.


Removing the tuple ID from sinval messages turns out to have slightly
wider impact than I thought at first, because the current coding passes
those to callback functions registered with
CacheRegisterSyscacheCallback, and there are a lot of 'em.  However,
only one of them seems to be doing anything with the tuplePtr argument,
namely PlanCacheFuncCallback.  We can make it work with the hash value
instead, which will be about as good as what we're doing now.

Any objections to that plan?


A callback might be using the tuple ID in a way that fails if VACUUM 
FULL moves the tuple, so I think we *have* to change it. (as you did 
already)


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A note about hash-based catcache invalidations

2011-08-17 Thread Simon Riggs
On Tue, Aug 16, 2011 at 10:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Any objections to that plan?

None at all, but some questions.

This overhaul of the cache mechanism has been extensive, so you're now
very well placed to answer related questions.

As you know, I've been trying to reduce the lock strength of some DDL
operations. When that was last discussed there were two options. The
first was to re-write SnapshotNow, which in my opinion is necessary
but solves only part of the problem. I proposed explicit locking
around catalog access, which would affect the cache path/code. I don't
like that, but I don't see another way.

From where you are now, do have any insight about how to tackle the
locking problem? Thanks.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A note about hash-based catcache invalidations

2011-08-17 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 A callback might be using the tuple ID in a way that fails if VACUUM 
 FULL moves the tuple, so I think we *have* to change it. (as you did 
 already)

Yeah, I thought about that too.  As things stand in 9.0 and 9.1, there's
at least a theoretical possibility of this:

1. Process A prepares a plan that includes an inline'd copy of a SQL
function.  It labels the plan with the function's pg_proc TID.

2. Process B executes VACUUM FULL pg_proc, moving the SQL function's
tuple to a different TID.

3. Process C modifies the SQL function via CREATE OR REPLACE FUNCTION,
and sends out an inval against the new TID.

4. Process A doesn't invalidate its cached plan because it thinks the
TID is for some other function; so it continues to use the obsolete
version of the function.

The only way I can see to fix that is to back-patch the last set of
changes I committed yesterday.  I think that's entirely unworkable for
9.0, because of the risk of breaking third-party code that registers
syscache callbacks.  Even in 9.1 it seems a bit late to be changing that
API, so I'm thinking we should leave it alone.  The odds of anyone
actually getting burnt in the field by the above scenario seem lower
than the odds of causing problems with a late API change.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A note about hash-based catcache invalidations

2011-08-17 Thread Tom Lane
BTW, while we're thinking about this ...

The plpython patch Jan just submitted reminds me that several of the PLs
detect whether they have obsolete cached data by noting whether the
tuple's xmin *and* TID are the same as previously seen.

Unlike depending on TID alone, I think this is probably safe.  It can
obviously give a false positive (thinks tuple changed when it didn't)
after a catalog VACUUM FULL; but an error in that direction is safe.
What would be problematic is a false negative (failure to notice a
real change), and I think the inclusion of the xmin in the test protects
us against that.  An example scenario is:

1. We cache the data, saving xmin X1 and TID T1.

2. VACUUM FULL moves the tuple to TID T2.

3. Somebody else updates the tuple, by chance moving it right back to
T1.  But they will assign a new xmin X2, so we will know it changed.

Can anyone think of a situation this does not cover?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A note about hash-based catcache invalidations

2011-08-17 Thread Robert Haas
On Wed, Aug 17, 2011 at 1:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW, while we're thinking about this ...

 The plpython patch Jan just submitted reminds me that several of the PLs
 detect whether they have obsolete cached data by noting whether the
 tuple's xmin *and* TID are the same as previously seen.

 Unlike depending on TID alone, I think this is probably safe.  It can
 obviously give a false positive (thinks tuple changed when it didn't)
 after a catalog VACUUM FULL; but an error in that direction is safe.
 What would be problematic is a false negative (failure to notice a
 real change), and I think the inclusion of the xmin in the test protects
 us against that.  An example scenario is:

 1. We cache the data, saving xmin X1 and TID T1.

 2. VACUUM FULL moves the tuple to TID T2.

 3. Somebody else updates the tuple, by chance moving it right back to
 T1.  But they will assign a new xmin X2, so we will know it changed.

 Can anyone think of a situation this does not cover?

What about this:

1. We cache the data, saving xmin X1 and TID T1.

2. VACUUM FULL moves the tuple to TID T2 but stores some other tuple in TID T1.

3. If the tuple that is now at TID T1 happens to have xmin = X1, we're
in trouble.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A note about hash-based catcache invalidations

2011-08-17 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Aug 17, 2011 at 1:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The plpython patch Jan just submitted reminds me that several of the PLs
 detect whether they have obsolete cached data by noting whether the
 tuple's xmin *and* TID are the same as previously seen.
 Can anyone think of a situation this does not cover?

 What about this:

 1. We cache the data, saving xmin X1 and TID T1.

 2. VACUUM FULL moves the tuple to TID T2 but stores some other tuple in TID 
 T1.

 3. If the tuple that is now at TID T1 happens to have xmin = X1, we're
 in trouble.

No, because remember that we're also effectively demanding a match on
OID (because we fetch the tuple by OID to begin with) and that the tuple
be live (else we won't fetch it at all).  There should not be another
live tuple with the same OID that vacuum could move to T1 --- if there
is, we've got worse problems than a broken caching check.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] A note about hash-based catcache invalidations

2011-08-16 Thread Tom Lane
I'm looking into the idea I mentioned earlier:

 All is not entirely lost, however: there's still some possible
 performance benefit to be gained here, if we go to the scheme of
 identifying victim catcache entries by hashvalue only.  Currently,
 each heap_update in a cached catalog has to issue two sinval messages
 (per cache!): one against the old TID and one against the new TID.
 We'd be able to reduce that to one message in the common case where the
 hashvalue remains the same because the cache key columns didn't change.

Removing the tuple ID from sinval messages turns out to have slightly
wider impact than I thought at first, because the current coding passes
those to callback functions registered with
CacheRegisterSyscacheCallback, and there are a lot of 'em.  However,
only one of them seems to be doing anything with the tuplePtr argument,
namely PlanCacheFuncCallback.  We can make it work with the hash value
instead, which will be about as good as what we're doing now.

Any objections to that plan?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers