Re: [HACKERS] MVCC catalog access

2013-07-06 Thread Robert Haas
On Fri, Jul 5, 2013 at 11:27 AM, Andres Freund and...@2ndquadrant.com wrote:
 Hi Robert,

 On 2013-07-02 09:31:23 -0400, Robert Haas wrote:
 I have a few ideas for getting rid of the remaining uses of
 SnapshotNow that I'd like to throw out there:

 Is your current plan to get rid of SnapshotNow entirely? I am wonder
 because the changeset extraction needs to care and how the proper fix
 for dealing with CatalogSnapshotData looks depends on it...

I would like to do that, but I haven't quite figured out how to get
rid of the last few instances, per discussion upthread.  I do plan to
spend some more time on it, but likely not this week.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-07-05 Thread Andres Freund
Hi Robert,

On 2013-07-02 09:31:23 -0400, Robert Haas wrote:
 I have a few ideas for getting rid of the remaining uses of
 SnapshotNow that I'd like to throw out there:

Is your current plan to get rid of SnapshotNow entirely? I am wonder
because the changeset extraction needs to care and how the proper fix
for dealing with CatalogSnapshotData looks depends on it...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-07-02 Thread Andres Freund
On 2013-07-01 15:02:41 -0400, Robert Haas wrote:
* These are updated by GetSnapshotData.  We initialize them this way
  @@ -177,6 +188,9 @@ GetTransactionSnapshot(void)
else
CurrentSnapshot = 
  GetSnapshotData(CurrentSnapshotData);
 
  + /* Don't allow catalog snapshot to be older than xact 
  snapshot. */
  + CatalogSnapshotStale = true;
  +
  Do we really need to invalidate snapshots in either situation? Isn't it
  implied, that if it's still valid, according to a) no invalidation via local
  invalidation messages b) no invalidations from other backends, there
  shouldn't be any possible differences when you only look at the catalog?

 I had the same thought, removed that code, and then put it back.  The
 problem is that if we revive an older snapshot from the dead, so to
 speak, our backend's advertised xmin might need to go backwards, and
 that seems unsafe - e.g. suppose another backend has updated a tuple
 but not yet committed.  We don't see any invalidation messages so
 decide reuse our existing (old) snapshot and begin a scan.  After
 we've looked at the page containing the new tuple (and decided not to
 see it), vacuum nukes the old tuple (which we then also don't see).
 Bad things ensue.  It might be possible to avoid the majority of
 problems in this area via an appropriate set of grotty hacks, but I
 don't want to go there.

Yes, I thought about that and I think it's a problem that can be solved
without too ugly hacks. But, as you say:

 Yeah, I think there's room for further fine-tuning there.  But I think
 it would make sense to push the patch at this point, and then if we
 find cases that can be further improved, or things that it breaks, we
 can fix them.  This area is complicated enough that I wouldn't be
 horribly surprised if we end up having to fix a few more problem cases
 or even revert the whole thing, but I think we've probably reached the
 point where further review has less value than getting the code out
 there in front of more people and seeing where (if anywhere) the
 wheels come off out in the wild.

I am pretty sure that we will have to fix more stuff, but luckily we're
in the beginning of the cycle. And while I'd prefer more eyes on the
patch before it gets applied, especially ones knowledgeable about the
implications this has, I don't really see that happening soon. So
applying is more likely to lead to more review than waiting.

So, from me: +1.

Some things that might be worth changing when committing:
* Could you add a Assert(!RelationHasSysCache(relid)) to
  RelationInvalidatesSnapshotsOnly? It's not unlikely that it will be
  missed by the next person adding a syscache and that seems like it
  could have ugly and hard to detect consequences.
* maybe use bsearch(3) instead of open coding the binary search? We
  already use it in the backend.
* possibly paranoid, but I'd add a Assert to heap_beginscan_catalog or
  GetCatalogSnapshot ensuring we're dealing with a catalog relation. The
  consistency mechanisms in GetCatalogSnapshot() only work for those, so
  there doesn't seem to be a valid usecase for non-catalog relations.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-07-02 Thread Robert Haas
On Tue, Jul 2, 2013 at 9:02 AM, Andres Freund and...@2ndquadrant.com wrote:
 Some things that might be worth changing when committing:
 * Could you add a Assert(!RelationHasSysCache(relid)) to
   RelationInvalidatesSnapshotsOnly? It's not unlikely that it will be
   missed by the next person adding a syscache and that seems like it
   could have ugly and hard to detect consequences.

There's a cross-check in InitCatalogCache() for that very issue.

 * maybe use bsearch(3) instead of open coding the binary search? We
   already use it in the backend.

I found comments elsewhere indicating that bsearch() was slower than
open-coding it, so I copied the logic used for ScanKeywordLookup().

 * possibly paranoid, but I'd add a Assert to heap_beginscan_catalog or
   GetCatalogSnapshot ensuring we're dealing with a catalog relation. The
   consistency mechanisms in GetCatalogSnapshot() only work for those, so
   there doesn't seem to be a valid usecase for non-catalog relations.

It'll actually work find as things stand; it'll just take a new
snapshot every time.

I have a few ideas for getting rid of the remaining uses of
SnapshotNow that I'd like to throw out there:

- In pgrowlocks and pgstattuple, I think it would be fine to use
SnapshotSelf instead of SnapshotNow.  The only difference is that it
includes changes made by the current command that wouldn't otherwise
be visible until CommandCounterIncrement().  That, however, is not
really a problem for their usage.

- In genam.c and execUtils.c, we treat SnapshotNow as a kind of
default snapshot.  That seems like a crappy idea.  I propose that we
either set that pointer to NULL and let the server core dump if the
snapshot doesn't get set or (maybe better) add a new special snapshot
called SnapshotError which just errors out if you try to use it for
anything, and initialize to that.

- I'm not quite sure what to do about get_actual_variable_range().
Taking a new MVCC snapshot there seems like it might be pricey on some
workloads.  However, I wonder if we could use SnapshotDirty.
Presumably, even uncommitted tuples affect the amount of
index-scanning we have to do, so that approach seems to have some
theoretical justification.  But I'm worried there could be unfortunate
consequences to looking at uncommitted data, either now or in the
future.  SnapshotSelf seems less likely to have that problem, but
feels wrong somehow.

- currtid_byreloid() and currtid_byrelname() use SnapshotNow as an
argument to heap_get_latest_tid().  I don't know what these functions
are supposed to be good for, but taking a new snapshot for every
function call seems to guarantee that someone will find a way to use
these functions as a poster child for how to brutalize PGXACT, so I
don't particularly want to do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-07-02 Thread Andres Freund
On 2013-07-02 09:31:23 -0400, Robert Haas wrote:
 On Tue, Jul 2, 2013 at 9:02 AM, Andres Freund and...@2ndquadrant.com wrote:
  Some things that might be worth changing when committing:
  * Could you add a Assert(!RelationHasSysCache(relid)) to
RelationInvalidatesSnapshotsOnly? It's not unlikely that it will be
missed by the next person adding a syscache and that seems like it
could have ugly and hard to detect consequences.
 
 There's a cross-check in InitCatalogCache() for that very issue.

Great.

  * maybe use bsearch(3) instead of open coding the binary search? We
already use it in the backend.
 
 I found comments elsewhere indicating that bsearch() was slower than
 open-coding it, so I copied the logic used for ScanKeywordLookup().

Hm. Ok.

  * possibly paranoid, but I'd add a Assert to heap_beginscan_catalog or
GetCatalogSnapshot ensuring we're dealing with a catalog relation. The
consistency mechanisms in GetCatalogSnapshot() only work for those, so
there doesn't seem to be a valid usecase for non-catalog relations.
 
 It'll actually work find as things stand; it'll just take a new
 snapshot every time.

Ok. Doesn't really change my opinion that it's a crappy idea to use it
otherwise ;)

 - In genam.c and execUtils.c, we treat SnapshotNow as a kind of
 default snapshot.  That seems like a crappy idea.  I propose that we
 either set that pointer to NULL and let the server core dump if the
 snapshot doesn't get set or (maybe better) add a new special snapshot
 called SnapshotError which just errors out if you try to use it for
 anything, and initialize to that.

I vote for SnapshotError.

 - I'm not quite sure what to do about get_actual_variable_range().
 Taking a new MVCC snapshot there seems like it might be pricey on some
 workloads.  However, I wonder if we could use SnapshotDirty.
 Presumably, even uncommitted tuples affect the amount of
 index-scanning we have to do, so that approach seems to have some
 theoretical justification.  But I'm worried there could be unfortunate
 consequences to looking at uncommitted data, either now or in the
 future.  SnapshotSelf seems less likely to have that problem, but
 feels wrong somehow.

I don't like using SnapshotDirty either. Can't we just use the currently
active snapshot? Unless I miss something this always will be called
while we have one since when we plan we've done an explicit
PushActiveSnapshot() and if we need to replan stuff during execution
PortalRunSelect() will have pushed one.

 - currtid_byreloid() and currtid_byrelname() use SnapshotNow as an
 argument to heap_get_latest_tid().  I don't know what these functions
 are supposed to be good for, but taking a new snapshot for every
 function call seems to guarantee that someone will find a way to use
 these functions as a poster child for how to brutalize PGXACT, so I
 don't particularly want to do that.

Heikki mentioned that at some point they were added for the odbc
driver. I am not particularly inclined to worry about taking too many
snapshots here.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-07-02 Thread Robert Haas
On Tue, Jul 2, 2013 at 9:52 AM, Andres Freund and...@2ndquadrant.com wrote:
  * possibly paranoid, but I'd add a Assert to heap_beginscan_catalog or
GetCatalogSnapshot ensuring we're dealing with a catalog relation. The
consistency mechanisms in GetCatalogSnapshot() only work for those, so
there doesn't seem to be a valid usecase for non-catalog relations.

 It'll actually work find as things stand; it'll just take a new
 snapshot every time.

 Ok. Doesn't really change my opinion that it's a crappy idea to use it
 otherwise ;)

I agree, but I don't see an easy way to write the assertion you want
using only the OID.

 - In genam.c and execUtils.c, we treat SnapshotNow as a kind of
 default snapshot.  That seems like a crappy idea.  I propose that we
 either set that pointer to NULL and let the server core dump if the
 snapshot doesn't get set or (maybe better) add a new special snapshot
 called SnapshotError which just errors out if you try to use it for
 anything, and initialize to that.

 I vote for SnapshotError.

OK.

 - I'm not quite sure what to do about get_actual_variable_range().
 Taking a new MVCC snapshot there seems like it might be pricey on some
 workloads.  However, I wonder if we could use SnapshotDirty.
 Presumably, even uncommitted tuples affect the amount of
 index-scanning we have to do, so that approach seems to have some
 theoretical justification.  But I'm worried there could be unfortunate
 consequences to looking at uncommitted data, either now or in the
 future.  SnapshotSelf seems less likely to have that problem, but
 feels wrong somehow.

 I don't like using SnapshotDirty either. Can't we just use the currently
 active snapshot? Unless I miss something this always will be called
 while we have one since when we plan we've done an explicit
 PushActiveSnapshot() and if we need to replan stuff during execution
 PortalRunSelect() will have pushed one.

We could certainly do that, but I'd be a bit reluctant to do so
without input from Tom.  I imagine there might be cases where it could
cause a regression.

 - currtid_byreloid() and currtid_byrelname() use SnapshotNow as an
 argument to heap_get_latest_tid().  I don't know what these functions
 are supposed to be good for, but taking a new snapshot for every
 function call seems to guarantee that someone will find a way to use
 these functions as a poster child for how to brutalize PGXACT, so I
 don't particularly want to do that.

 Heikki mentioned that at some point they were added for the odbc
 driver. I am not particularly inclined to worry about taking too many
 snapshots here.

Well, if it uses it with any regularity I think it's a legitimate
concern.   We have plenty of customers who use ODBC and some of them
allow frightening numbers of concurrent server connections.  Now you
may say that's a bad idea, but so is 1000 backends doing
txid_current() in a loop.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-07-02 Thread Andres Freund
On 2013-07-02 10:38:17 -0400, Robert Haas wrote:
 On Tue, Jul 2, 2013 at 9:52 AM, Andres Freund and...@2ndquadrant.com wrote:
   * possibly paranoid, but I'd add a Assert to heap_beginscan_catalog or
 GetCatalogSnapshot ensuring we're dealing with a catalog relation. The
 consistency mechanisms in GetCatalogSnapshot() only work for those, so
 there doesn't seem to be a valid usecase for non-catalog relations.
 
  It'll actually work find as things stand; it'll just take a new
  snapshot every time.
 
  Ok. Doesn't really change my opinion that it's a crappy idea to use it
  otherwise ;)
 
 I agree, but I don't see an easy way to write the assertion you want
 using only the OID.

Let's add

/*
 * IsSystemRelationId
 * True iff the relation is a system catalog relation.
 */
bool
IsSystemRelationId(Oid relid)
{
   return relid  FirstNormalObjectId;
}

and change IsSystemRelation() to use that instead of what it does now...

  - currtid_byreloid() and currtid_byrelname() use SnapshotNow as an
  argument to heap_get_latest_tid().  I don't know what these functions
  are supposed to be good for, but taking a new snapshot for every
  function call seems to guarantee that someone will find a way to use
  these functions as a poster child for how to brutalize PGXACT, so I
  don't particularly want to do that.
 
  Heikki mentioned that at some point they were added for the odbc
  driver. I am not particularly inclined to worry about taking too many
  snapshots here.
 
 Well, if it uses it with any regularity I think it's a legitimate
 concern.   We have plenty of customers who use ODBC and some of them
 allow frightening numbers of concurrent server connections. 

I've quickly verified that it indeed uses them. I wish I hadn't. Brrr. I
can't even guess what that should do from the surrounding code/function
names. Except that it looks broken under concurrency as long as
SnapshotNow is used (because the query's snapshot won't be as new as
SnapshotNow, even in read committed mode).

Heikki, do you understand the code well enough to explain it without
investing time?

 Now you may say that's a bad idea, but so is 1000 backends doing
 txid_current() in a loop.

Hehe ;).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-07-02 Thread Heikki Linnakangas

On 02.07.2013 18:24, Andres Freund wrote:

I've quickly verified that it indeed uses them. I wish I hadn't. Brrr. I
can't even guess what that should do from the surrounding code/function
names. Except that it looks broken under concurrency as long as
SnapshotNow is used (because the query's snapshot won't be as new as
SnapshotNow, even in read committed mode).

Heikki, do you understand the code well enough to explain it without
investing time?


No, sorry. I think it has something to do with updateable cursors, but I 
don't understand the details.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-07-01 Thread Andres Freund
On 2013-06-28 23:14:23 -0400, Robert Haas wrote:
 Here's a further update of this patch.  In this version, I added some
 mechanism to send a new kind of sinval message that is sent when a
 catalog without catcaches is updated; it doesn't apply to all
 catalogs, just to whichever ones we want to have this treatment.  That
 means we don't need to retake snapshots for those catalogs on every
 access, so backend startup requires just one extra MVCC snapshot as
 compared with current master.  Assorted cleanup has been done, along
 with the removal of a few more SnapshotNow references.

This is really cool stuff.

 It's still possible to construct test cases that perform badly by
 pounding the server with 1000 clients running Andres's
 readonly-busy.sql.  Consider the following test case: use a DO block
 to create a schema with 10,000 functions in it and then DROP ..
 CASCADE.  When the server is unloaded, the extra MVCC overhead is
 pretty small.

 Well, now the create is 52% slower and the drop is a whopping 4.7x
 slower.  It's worth digging into the reasons just a bit.  I was able
 to speed up this case quite a bit - it was 30x slower a few hours ago
 - by adding a few new relations to the switch in
 RelationInvalidatesSnapshotsOnly().  But the code still takes one MVCC
 snapshot per object dropped, because deleteOneObject() calls
 CommandCounterIncrement() and that, as it must, invalidates our
 previous snapshot.

I have to say, if the thing that primarily suffers is pretty extreme DDL
in extreme situations I am not really worried. Anybody running anything
close to the territory of such concurrency won't perform that much DDL.

 We could, if we were inclined to spend the effort,
 probably work out that although we need to change curcid, the rest of
 the snapshot is still OK, but I'm not too convinced that it's worth
 adding an even-more-complicated mechanism for this.  We could probably
 also optimize the delete code to increment the command counter fewer
 times, but I'm not convinced that's worth doing either.

I am pretty convinced we shouldn't do either for now.

Something picked up when quickly scanning over the last version of the
patch:

 +/*
 + * Staleness detection for CatalogSnapshot.
 + */
 +static bool CatalogSnapshotStale = true;
  
  /*
   * These are updated by GetSnapshotData.  We initialize them this way
 @@ -177,6 +188,9 @@ GetTransactionSnapshot(void)
   else
   CurrentSnapshot = GetSnapshotData(CurrentSnapshotData);
  
 + /* Don't allow catalog snapshot to be older than xact snapshot. 
 */
 + CatalogSnapshotStale = true;
 +
   FirstSnapshotSet = true;
   return CurrentSnapshot;
   }
 @@ -184,6 +198,9 @@ GetTransactionSnapshot(void)
   if (IsolationUsesXactSnapshot())
   return CurrentSnapshot;
  
 + /* Don't allow catalog snapshot to be older than xact snapshot. */
 + CatalogSnapshotStale = true;
 +
   CurrentSnapshot = GetSnapshotData(CurrentSnapshotData);
  
   return CurrentSnapshot;
 @@ -207,6 +224,54 @@ GetLatestSnapshot(void)
  }

Do we really need to invalidate snapshots in either situation? Isn't it
implied, that if it's still valid, according to a) no invalidation via local
invalidation messages b) no invalidations from other backends, there
shouldn't be any possible differences when you only look at the catalog?

And if it needs to change, we could copy the newly generated snapshot
to the catalog snapshot if it's currently valid.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-07-01 Thread Robert Haas
On Mon, Jul 1, 2013 at 10:04 AM, Andres Freund and...@2ndquadrant.com wrote:
 This is really cool stuff.

Thanks.

 I have to say, if the thing that primarily suffers is pretty extreme DDL
 in extreme situations I am not really worried. Anybody running anything
 close to the territory of such concurrency won't perform that much DDL.

/me wipes brow.

 Something picked up when quickly scanning over the last version of the
 patch:

 +/*
 + * Staleness detection for CatalogSnapshot.
 + */
 +static bool CatalogSnapshotStale = true;

  /*
   * These are updated by GetSnapshotData.  We initialize them this way
 @@ -177,6 +188,9 @@ GetTransactionSnapshot(void)
   else
   CurrentSnapshot = 
 GetSnapshotData(CurrentSnapshotData);

 + /* Don't allow catalog snapshot to be older than xact 
 snapshot. */
 + CatalogSnapshotStale = true;
 +
   FirstSnapshotSet = true;
   return CurrentSnapshot;
   }
 @@ -184,6 +198,9 @@ GetTransactionSnapshot(void)
   if (IsolationUsesXactSnapshot())
   return CurrentSnapshot;

 + /* Don't allow catalog snapshot to be older than xact snapshot. */
 + CatalogSnapshotStale = true;
 +
   CurrentSnapshot = GetSnapshotData(CurrentSnapshotData);

   return CurrentSnapshot;
 @@ -207,6 +224,54 @@ GetLatestSnapshot(void)
  }

 Do we really need to invalidate snapshots in either situation? Isn't it
 implied, that if it's still valid, according to a) no invalidation via local
 invalidation messages b) no invalidations from other backends, there
 shouldn't be any possible differences when you only look at the catalog?

I had the same thought, removed that code, and then put it back.  The
problem is that if we revive an older snapshot from the dead, so to
speak, our backend's advertised xmin might need to go backwards, and
that seems unsafe - e.g. suppose another backend has updated a tuple
but not yet committed.  We don't see any invalidation messages so
decide reuse our existing (old) snapshot and begin a scan.  After
we've looked at the page containing the new tuple (and decided not to
see it), vacuum nukes the old tuple (which we then also don't see).
Bad things ensue.  It might be possible to avoid the majority of
problems in this area via an appropriate set of grotty hacks, but I
don't want to go there.

 And if it needs to change, we could copy the newly generated snapshot
 to the catalog snapshot if it's currently valid.

Yeah, I think there's room for further fine-tuning there.  But I think
it would make sense to push the patch at this point, and then if we
find cases that can be further improved, or things that it breaks, we
can fix them.  This area is complicated enough that I wouldn't be
horribly surprised if we end up having to fix a few more problem cases
or even revert the whole thing, but I think we've probably reached the
point where further review has less value than getting the code out
there in front of more people and seeing where (if anywhere) the
wheels come off out in the wild.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-27 Thread Alvaro Herrera
Robert Haas escribió:

 All right, so here's a patch that does something along those lines.
 We have to always take a new snapshot when somebody scans a catalog
 that has no syscache, because there won't be any invalidation messages
 to work off of in that case.  The only catalog in that category that's
 accessed during backend startup (which is mostly what your awful test
 case is banging on) is pg_db_role_setting.  We could add a syscache
 for that catalog or somehow force invalidation messages to be sent
 despite the lack of a syscache, but what I chose to do instead is
 refactor things slightly so that we use the same snapshot for all four
 scans of pg_db_role_setting, instead of taking a new one each time.  I
 think that's unimpeachable on correctness grounds; it's no different
 than if we'd finished all four scans in the time it took us to finish
 the first one, and then gotten put to sleep by the OS scheduler for as
 long as it took us to scan the other three.  Point being that there's
 no interlock there.

That seems perfectly acceptable to me, yeah.

 The difference is 3-4%, which is quite a lot less than what you
 measured before, although on different hardware, so results may vary.

3-4% on that synthetic benchmark sounds pretty acceptable to me, as
well.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-20 Thread Robert Haas
On Mon, Jun 17, 2013 at 8:12 AM, Andres Freund and...@2ndquadrant.com wrote:
 So, the biggest issue with the patch seems to be performance worries. I
 tried to create a worst case scenario:
 postgres (patched and HEAD) running with:
 -c shared_buffers=4GB \
 -c max_connections=2000 \
 -c maintenance_work_mem=2GB \
 -c checkpoint_segments=300 \
 -c wal_buffers=64MB \
 -c synchronous_commit=off \
 -c autovacuum=off \
 -p 5440

 With one background pgbench running:
 pgbench -p 5440 -h /tmp -f /tmp/readonly-busy.sql -c 1000 -j 10 -T 100 
 postgres
 readonly-busy.sql:
 BEGIN;
 SELECT txid_current();
 SELECT pg_sleep(0.0001);
 COMMIT;

 I measured the performance of one other pgbench:
 pgbench -h /tmp -p 5440 postgres -T 10 -c 100 -j 100 -n -f 
 /tmp/simplequery.sql -C
 simplequery.sql:
 SELECT * FROM af1, af2 WHERE af1.x = af2.x;
 tables:
 create table af1 (x) as select g from generate_series(1,4) g;
 create table af2 (x) as select g from generate_series(4,7) g;

 With that setup one can create quite a noticeable overhead for the mvcc
 patch (best of 5):

 master-optimize:
 tps = 1261.629474 (including connections establishing)
 tps = 15121.648834 (excluding connections establishing)

 dev-optimize:
 tps = 773.719637 (including connections establishing)
 tps = 2804.239979 (excluding connections establishing)

 Most of the time in both, patched and unpatched is by far spent in
 GetSnapshotData. I think the reason this shows a far higher overhead
 than what you previously measured is that a) in your test the other
 backends were idle, in mine they actually modify PGXACT which causes
 noticeable cacheline bouncing b) I have higher numer of connections 
 #max_connections

 A quick test shows that even with max_connection=600, 400 background,
 and 100 foreground pgbenches there's noticeable overhead:
 master-optimize:
 tps = 2221.226711 (including connections establishing)
 tps = 31203.259472 (excluding connections establishing)
 dev-optimize:
 tps = 1629.734352 (including connections establishing)
 tps = 4754.449726 (excluding connections establishing)

 Now I grant that's a somewhat harsh test for postgres, but I don't
 think it's entirely unreasonable and the performance impact is quite
 stark.

It's not entirely unreasonable, but it *is* mostly unreasonable.  I
mean, nobody is going to run 1000 connections in the background that
do nothing but thrash PGXACT on a real system.  I just can't get
concerned about that.  What I am concerned about is that there may be
other, more realistic workloads that show similar regressions.  But I
don't know how to find out whether that's actually the case.  On the
IBM POWER box where I tested this, it's not even GetSnapshotData()
that kills you; it's the system CPU scheduler.

The thing about this particular test is that it's artificial -
normally, any operation that wants to modify PGXACT will spend a lot
more time fighting of WALInsertLock and maybe waiting for disk I/O
than is the case here.  Of course, with Heikki's WAL scaling patch and
perhaps other optimizations we expect that other overhead to go down,
which might make the problems here more visible; and some of Heikki's
existing testing has shown significant contention around ProcArrayLock
as things stand.  But I'm still on the fence about whether this is
really a valid test.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-20 Thread Andres Freund
On 2013-06-20 09:45:26 -0400, Robert Haas wrote:
  With that setup one can create quite a noticeable overhead for the mvcc
  patch (best of 5):
 
  master-optimize:
  tps = 1261.629474 (including connections establishing)
  tps = 15121.648834 (excluding connections establishing)
 
  dev-optimize:
  tps = 773.719637 (including connections establishing)
  tps = 2804.239979 (excluding connections establishing)
 
  Most of the time in both, patched and unpatched is by far spent in
  GetSnapshotData. I think the reason this shows a far higher overhead
  than what you previously measured is that a) in your test the other
  backends were idle, in mine they actually modify PGXACT which causes
  noticeable cacheline bouncing b) I have higher numer of connections 
  #max_connections
 
  A quick test shows that even with max_connection=600, 400 background,
  and 100 foreground pgbenches there's noticeable overhead:
  master-optimize:
  tps = 2221.226711 (including connections establishing)
  tps = 31203.259472 (excluding connections establishing)
  dev-optimize:
  tps = 1629.734352 (including connections establishing)
  tps = 4754.449726 (excluding connections establishing)
 
  Now I grant that's a somewhat harsh test for postgres, but I don't
  think it's entirely unreasonable and the performance impact is quite
  stark.
 
 It's not entirely unreasonable, but it *is* mostly unreasonable. 

Well, sure. So are the tests that you ran. But that's *completely*
fine. In contrast to evaluating whether a performance improvement is
worth its complexity we're not trying to measure real world
improvements. We're trying to test the worst cases we can think of, even
if they aren't really interesting by stressing potential pain points. If
we can't find a relevant regression for those using something akin to
microbenchmarks it's less likely that there are performance regressions.

The not entirely unreasonable point is just about making sure you're
not testing something entirely irrelevant. Say, performance of a 1TB
database when shared_buffers is set to 64k. Or testing DDL performance
while locking pg_class exclusively.

The test was specifically chosen to:
* do uncached syscache lookups (-C) to mesure the impact of the added
  GetSnapshotData() calls
* make individual GetSnapshotData() calls slower. (all processes have an
  xid)
* contend on ProcArrayLock but not much else (high number of clients in
  the background)

 I
 mean, nobody is going to run 1000 connections in the background that
 do nothing but thrash PGXACT on a real system.  I just can't get
 concerned about that.

In the original mail I did retry it with 400 and the regression is still
pretty big. And the background things could also be doing something
that's not that likely to be blocked by global locks. Say, operate on
temporary or unlogged tables. Or just acquire a single row level lock
and then continue to do readonly work in a read committed transaction.

I think we both can come up with scenarios where at least part of the
above scenario is present. But imo that doesn't really matter.

 What I am concerned about is that there may be
 other, more realistic workloads that show similar regressions.  But I
 don't know how to find out whether that's actually the case.

So, given the results from that test and the profile I got where
GetSnapshotData was by far the most expensive thing a more
representative test might be something like a readonly pgbench with a
moderately high number of short lived connections. I wouldn't be
surprised if that still showed performance problems.

If that's not enough something like:
BEGIN;
SELECT * FROM my_client WHERE client_id = :id FOR UPDATE;
SELECT * FROM key_table WHERE key = :random
...
SELECT * FROM key_table WHERE key = :random
COMMIT;

will sure still show the problem.

  On the
 IBM POWER box where I tested this, it's not even GetSnapshotData()
 that kills you; it's the system CPU scheduler.

I haven't tried yet, but I'd guess the above setup shows the difference
with less than 400 clients. Might make it more reasonable to run there.

 But I'm still on the fence about whether this is really a valid test.

I think it shows that we need to be careful and do further performance
evaluations and/or alleviate the pain by making things cheaper (say, a
ddl counter in shared mem, allowing to cache snapshots for the
syscache). If that artificial test hadn't shown problems I'd have voted
for just going ahead and not worry further.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-20 Thread Robert Haas
On Thu, Jun 20, 2013 at 10:35 AM, Andres Freund and...@2ndquadrant.com wrote:
 But I'm still on the fence about whether this is really a valid test.

 I think it shows that we need to be careful and do further performance
 evaluations and/or alleviate the pain by making things cheaper (say, a
 ddl counter in shared mem, allowing to cache snapshots for the
 syscache). If that artificial test hadn't shown problems I'd have voted
 for just going ahead and not worry further.

I tried a general snapshot counter that rolls over every time any
transaction commits, but that doesn't help much.  It's a small
improvement on general workloads, but it's not effective against this
kind of hammering.  A DDL counter would be a bit more expensive
because we'd have to insert an additional branch into
GetSnapshotData() while ProcArrayLock is held, but it might be
tolerable.  Do you have code for this (or some portion of it) already?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-20 Thread Andres Freund
On 2013-06-20 10:58:59 -0400, Robert Haas wrote:
 On Thu, Jun 20, 2013 at 10:35 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  But I'm still on the fence about whether this is really a valid test.
 
  I think it shows that we need to be careful and do further performance
  evaluations and/or alleviate the pain by making things cheaper (say, a
  ddl counter in shared mem, allowing to cache snapshots for the
  syscache). If that artificial test hadn't shown problems I'd have voted
  for just going ahead and not worry further.
 
 I tried a general snapshot counter that rolls over every time any
 transaction commits, but that doesn't help much.  It's a small
 improvement on general workloads, but it's not effective against this
 kind of hammering.  A DDL counter would be a bit more expensive
 because we'd have to insert an additional branch into
 GetSnapshotData() while ProcArrayLock is held, but it might be
 tolerable.

I actually wasn't thinking of adding it at that level. More like just
not fetching a new snapshot in systable_* if a) the global ddl counter
hasn't been increased b) our transactions hasn't performed any
ddl. Instead use the one used the last time we fetched one for that
purpose.

The global ddl counter could be increased everytime we commit and the
commit has invalidations queued. The local one everytime we execute
local cache invalidation messages (i.e. around CommandCounterIncrement).

I think something roughly along those lines should be doable without
adding new overhead to global paths. Except maybe some check in
snapmgr.c for that new, longer living, snapshot.

 Do you have code for this (or some portion of it) already?

Nope, sorry.


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-17 Thread Andres Freund
On 2013-06-03 14:57:12 -0400, Robert Haas wrote:
 On Thu, May 30, 2013 at 1:39 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  +1.
 
 Here's a more serious patch for MVCC catalog access.  This one
 involves more data copying than the last one, I think, because the
 previous version did not register the snapshots it took, which I think
 is not safe.  So this needs to be re-tested for performance, which I
 have so far made no attempt to do.

Ok, I am starting to take a bit more serious look.

Minor issues I noticed:
* index.c:index_constraint_create()s - comments need to get updated
* index.c:IndexCheckExclusion() - why do we still use a SnapshotNow? I'd
  rather not use *Now if it isn't necessary.
* the * CONCURRENTLY infrastructure should be simplified once this has
  been applied, but I think it makes sense to keep that separate.
* index.c:reindex_index() - SnapshotNow comment should be updated

I still think that renaming SnapshotNow to something like
SnapshotPerTuple to force everyone to reavaluate their usage would be
good.

So, the biggest issue with the patch seems to be performance worries. I
tried to create a worst case scenario:
postgres (patched and HEAD) running with:
-c shared_buffers=4GB \
-c max_connections=2000 \
-c maintenance_work_mem=2GB \
-c checkpoint_segments=300 \
-c wal_buffers=64MB \
-c synchronous_commit=off \
-c autovacuum=off \
-p 5440

With one background pgbench running:
pgbench -p 5440 -h /tmp -f /tmp/readonly-busy.sql -c 1000 -j 10 -T 100 postgres
readonly-busy.sql:
BEGIN;
SELECT txid_current();
SELECT pg_sleep(0.0001);
COMMIT;

I measured the performance of one other pgbench:
pgbench -h /tmp -p 5440 postgres -T 10 -c 100 -j 100 -n -f /tmp/simplequery.sql 
-C
simplequery.sql:
SELECT * FROM af1, af2 WHERE af1.x = af2.x;
tables:
create table af1 (x) as select g from generate_series(1,4) g;
create table af2 (x) as select g from generate_series(4,7) g;

With that setup one can create quite a noticeable overhead for the mvcc
patch (best of 5):

master-optimize:
tps = 1261.629474 (including connections establishing)
tps = 15121.648834 (excluding connections establishing)

dev-optimize:
tps = 773.719637 (including connections establishing)
tps = 2804.239979 (excluding connections establishing)

Most of the time in both, patched and unpatched is by far spent in
GetSnapshotData. I think the reason this shows a far higher overhead
than what you previously measured is that a) in your test the other
backends were idle, in mine they actually modify PGXACT which causes
noticeable cacheline bouncing b) I have higher numer of connections 
#max_connections

A quick test shows that even with max_connection=600, 400 background,
and 100 foreground pgbenches there's noticeable overhead:
master-optimize:
tps = 2221.226711 (including connections establishing)
tps = 31203.259472 (excluding connections establishing)
dev-optimize:
tps = 1629.734352 (including connections establishing)
tps = 4754.449726 (excluding connections establishing)

Now I grant that's a somewhat harsh test for postgres, but I don't
think it's entirely unreasonable and the performance impact is quite
stark.

 It strikes me as rather unfortunate that the snapshot interface is
 designed in such a way as to require so much data copying.  It seems
 we always take a snapshot by copying from PGXACT/PGPROC into
 CurrentSnapshotData or SecondarySnapshotData, and then copying data a
 second time from there to someplace more permanent.  It would be nice
 to avoid that, at least in common cases.

Sounds doable. But let's do one thing at a atime ;). That copy wasn't
visible in the rather extreme workload from above btw...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-09 Thread Andres Freund
On 2013-06-06 12:49:14 -0400, Robert Haas wrote:
 On Thu, Jun 6, 2013 at 5:30 AM, Andres Freund and...@2ndquadrant.com wrote:
  + * XXX: Now that we have MVCC catalog access, the reasoning above is no 
  longer
  + * true.  Are there other good reasons to hard-code this, or should we 
  revisit
  + * that decision?
 
  We could just the function by looking in the shared
  relmapper. Everything that can be mapped via it is shared.
 
 I suspect there are several possible sources for this information, but
 it's hard to beat a hard-coded list for efficiency, so I wasn't sure
 if we should tinker with this or not.

I can tell from experience that it makes adding a new shared relation
more of a pain than it already is, but we're hopefully not doing that
all that often.
I just don't think that the mvcc angle has much to do with the decision.

  --- a/src/backend/commands/cluster.c
  +++ b/src/backend/commands/cluster.c
  @@ -480,6 +480,11 @@ check_index_is_clusterable(Relation OldHeap, Oid 
  indexOid, bool recheck, LOCKMOD
* against concurrent SnapshotNow scans of pg_index.  Therefore this is 
  unsafe
* to execute with less than full exclusive lock on the parent table;
* otherwise concurrent executions of RelationGetIndexList could miss 
  indexes.
  + *
  + * XXX: Now that we have MVCC catalog access, SnapshotNow scans of 
  pg_index
  + * shouldn't be common enough to worry about.  The above comment needs
  + * to be updated, and it may be possible to simplify the logic here in 
  other
  + * ways also.
*/
 
  You're right, the comment needs to be changed, but I don't think the
  effect can. A non-inplace upgrade changes the xmin of the row which is
  relevant for indcheckxmin.
 
 OK.
 
  (In fact, isn't this update possibly causing problems like delaying the
  use of such an index already)

 Well, maybe.  In general, the ephemeral snapshot taken for a catalog
 scan can't be any older than the primary snapshot already held.  But
 there could be some corner case where that's not true, if we use this
 technique somewhere that such a snapshot hasn't already been acquired.

I wasn't talking about catalog scans or this patch, I wonder whether the
update we do there won't cause the index not being used for concurrent
(normal) scans since now the xmin is newer while it might be far in the
past before. I.e. we might need to unset indexcheckxmin if xmin is far
enough in the past.

  Hm. Looks like this should also change the description of SecondarySnapshot:
 
  /*
   * CurrentSnapshot points to the only snapshot taken in transaction-snapshot
   * mode, and to the latest one taken in a read-committed transaction.
   * SecondarySnapshot is a snapshot that's always up-to-date as of the 
  current
   * instant, even in transaction-snapshot mode.  It should only be used for
   * special-purpose code (say, RI checking.)
   *
 
 I think that's still more or less true, though we could add catalog
 scans as another example.

I guess my feeling is that once catalog scans use it, it's not so much
special purpose anymore ;). But I admit that the frequency of usage
doesn't say much about its specificity...

  I actually wonder if we shouldn't just abolish GetLatestSnapshot(). None
  of the callers seem to rely on it's behaviour from a quick look and it
  seems rather confusing to have both.
 
 I assume Tom had some reason for making GetLatestSnapshot() behave the
 way it does, so I refrained from doing that.  I might be wrong.

At least I don't see any on a quick look - which doesn't say very
much. I think I just dislike having *instant and *latest functions in
there, seems to be confusing to me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-07 Thread Robert Haas
On Thu, Jun 6, 2013 at 2:48 PM, Jim Nasby j...@nasby.net wrote:
 On 6/5/13 3:49 PM, Robert Haas wrote:

 Now, I did find a couple that I thought should probably stick with
 SnapshotNow, specifically pgrowlocks and pgstattuple.

 FWIW, I've often wished for a way to make all stat access transactional,
 across all the stats views. Perhaps that couldn't be done by default, but
 I'd love something like a function that would make a snapshot of all stats
 data as of one point. Even if that snapshot itself wasn't completely atomic,
 at least then you could query any stats views however you wanted and know
 that the info wasn't changing over time.

 The reason I don't think this would work so well if done in userspace is how
 long it would take. Presumably making a complete backend-local copy of
 pg_proc etc and the stats file would be orders of magnitude faster than a
 bunch of CREATE TEMP TABLE's.

Well, maybe.  But at any rate this is completely unrelated to the main
topic of this thread.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-06 Thread Andres Freund
On 2013-06-05 18:56:28 -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  Now, I did find a couple that I thought should probably stick with
  SnapshotNow, specifically pgrowlocks and pgstattuple. Those are just
  gathering statistical information, so there's no harm in having the
  snapshot change part-way through the scan, and if the scan is long,
  the user might actually regard the results under SnapshotNow as more
  accurate.  Whether that's the case or not, holding back xmin for those
  kinds of scans does not seem wise.
 
 FWIW, I think if we're going to ditch SnapshotNow we should ditch
 SnapshotNow, full stop, even removing the tqual.c routines for it.
 Then we can require that *any* reference to SnapshotNow is replaced by
 an MVCC reference prior to execution, and throw an error if we actually
 try to test a tuple with that snapshot.

I suggest simply renaming it to something else. Every external project
that decides to follow the renaming either has a proper usecase for it
or the amount of sympathy for them keeping the old behaviour is rather
limited.

 If they really want that sort of uncertain semantics they could use
 SnapshotDirty, no?

Not that happy with that. A scan behaving inconsistently over its
proceedings is something rather different than reading uncommitted
rows. I have the feeling that trouble is waiting that way.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-06 Thread Andres Freund
Hi Robert,

Took a quick look through the patch to understand what your current
revision is actually doing and to facilitate thinking about possible
pain points.

Here are the notes I made during my reading:

On 2013-06-03 14:57:12 -0400, Robert Haas wrote:
 +++ b/src/backend/catalog/catalog.c
 @@ -232,6 +232,10 @@ IsReservedName(const char *name)
   * know if it's shared.  Fortunately, the set of shared relations is
   * fairly static, so a hand-maintained list of their OIDs isn't completely
   * impractical.
 + *
 + * XXX: Now that we have MVCC catalog access, the reasoning above is no 
 longer
 + * true.  Are there other good reasons to hard-code this, or should we 
 revisit
 + * that decision?
   */

We could just the function by looking in the shared
relmapper. Everything that can be mapped via it is shared.

 --- a/src/backend/commands/cluster.c
 +++ b/src/backend/commands/cluster.c
 @@ -480,6 +480,11 @@ check_index_is_clusterable(Relation OldHeap, Oid 
 indexOid, bool recheck, LOCKMOD
   * against concurrent SnapshotNow scans of pg_index.  Therefore this is 
 unsafe
   * to execute with less than full exclusive lock on the parent table;
   * otherwise concurrent executions of RelationGetIndexList could miss 
 indexes.
 + *
 + * XXX: Now that we have MVCC catalog access, SnapshotNow scans of pg_index
 + * shouldn't be common enough to worry about.  The above comment needs
 + * to be updated, and it may be possible to simplify the logic here in other
 + * ways also.
   */

You're right, the comment needs to be changed, but I don't think the
effect can. A non-inplace upgrade changes the xmin of the row which is
relevant for indcheckxmin.
(In fact, isn't this update possibly causing problems like delaying the
use of such an index already)


 --- a/src/backend/commands/tablecmds.c
 +++ b/src/backend/commands/tablecmds.c
 @@ -2738,7 +2738,7 @@ AlterTableGetLockLevel(List *cmds)
* multiple DDL operations occur in a stream against frequently accessed
* tables.
*
 -  * 1. Catalog tables are read using SnapshotNow, which has a race bug 
 that
 +  * 1. Catalog tables were read using SnapshotNow, which has a race bug 
 that

Heh.

 --- a/src/backend/utils/time/snapmgr.c
 +++ b/src/backend/utils/time/snapmgr.c
 @@ -207,6 +207,19 @@ GetLatestSnapshot(void)
  /*
 + * GetInstantSnapshot
 + *   Get a snapshot that is up-to-date as of the current instant,
 + *   but don't set the transaction snapshot.
 + */
 +Snapshot
 +GetInstantSnapshot(void)
 +{
 + SecondarySnapshot = GetSnapshotData(SecondarySnapshotData);
 +
 + return SecondarySnapshot;
 +}

Hm. Looks like this should also change the description of SecondarySnapshot:

/*
 * CurrentSnapshot points to the only snapshot taken in transaction-snapshot
 * mode, and to the latest one taken in a read-committed transaction.
 * SecondarySnapshot is a snapshot that's always up-to-date as of the current
 * instant, even in transaction-snapshot mode.  It should only be used for
 * special-purpose code (say, RI checking.)
 *
and
/*
 * Checking SecondarySnapshot is probably useless here, but it seems
 * better to be sure.
 */

Also looks like BuildEventTriggerCache() in evtcache.c should use
GetInstanSnapshot() now.

I actually wonder if we shouldn't just abolish GetLatestSnapshot(). None
of the callers seem to rely on it's behaviour from a quick look and it
seems rather confusing to have both.

 --- a/src/bin/pg_dump/pg_dump.c
 +++ b/src/bin/pg_dump/pg_dump.c
 @@ -14,13 +14,13 @@
   *   Note that pg_dump runs in a transaction-snapshot mode transaction,
   *   so it sees a consistent snapshot of the database including system
   *   catalogs. However, it relies in part on various specialized backend
 - *   functions like pg_get_indexdef(), and those things tend to run on
 - *   SnapshotNow time, ie they look at the currently committed state.  So
 - *   it is possible to get 'cache lookup failed' error if someone
 - *   performs DDL changes while a dump is happening. The window for this
 - *   sort of thing is from the acquisition of the transaction snapshot to
 - *   getSchemaData() (when pg_dump acquires AccessShareLock on every
 - *   table it intends to dump). It isn't very large, but it can happen.
 + *   functions like pg_get_indexdef(), and those things tend to look at
 + *   the currently committed state.  So it is possible to get 'cache
 + *   lookup failed' error if someone performs DDL changes while a dump is
 + *   happening. The window for this sort of thing is from the acquisition
 + *   of the transaction snapshot to getSchemaData() (when pg_dump acquires
 + *   AccessShareLock on every table it intends to dump). It isn't very large,
 + *   but it can happen.

I think we need another term for what SnapshotNow used to express
here... Imo this description got less clear with this change.

Greetings,

Andres Freund

-- 
 Andres Freund 

Re: [HACKERS] MVCC catalog access

2013-06-06 Thread Robert Haas
On Thu, Jun 6, 2013 at 5:30 AM, Andres Freund and...@2ndquadrant.com wrote:
 + * XXX: Now that we have MVCC catalog access, the reasoning above is no 
 longer
 + * true.  Are there other good reasons to hard-code this, or should we 
 revisit
 + * that decision?

 We could just the function by looking in the shared
 relmapper. Everything that can be mapped via it is shared.

I suspect there are several possible sources for this information, but
it's hard to beat a hard-coded list for efficiency, so I wasn't sure
if we should tinker with this or not.

 --- a/src/backend/commands/cluster.c
 +++ b/src/backend/commands/cluster.c
 @@ -480,6 +480,11 @@ check_index_is_clusterable(Relation OldHeap, Oid 
 indexOid, bool recheck, LOCKMOD
   * against concurrent SnapshotNow scans of pg_index.  Therefore this is 
 unsafe
   * to execute with less than full exclusive lock on the parent table;
   * otherwise concurrent executions of RelationGetIndexList could miss 
 indexes.
 + *
 + * XXX: Now that we have MVCC catalog access, SnapshotNow scans of pg_index
 + * shouldn't be common enough to worry about.  The above comment needs
 + * to be updated, and it may be possible to simplify the logic here in other
 + * ways also.
   */

 You're right, the comment needs to be changed, but I don't think the
 effect can. A non-inplace upgrade changes the xmin of the row which is
 relevant for indcheckxmin.

OK.

 (In fact, isn't this update possibly causing problems like delaying the
 use of such an index already)

Well, maybe.  In general, the ephemeral snapshot taken for a catalog
scan can't be any older than the primary snapshot already held.  But
there could be some corner case where that's not true, if we use this
technique somewhere that such a snapshot hasn't already been acquired.

 Hm. Looks like this should also change the description of SecondarySnapshot:

 /*
  * CurrentSnapshot points to the only snapshot taken in transaction-snapshot
  * mode, and to the latest one taken in a read-committed transaction.
  * SecondarySnapshot is a snapshot that's always up-to-date as of the current
  * instant, even in transaction-snapshot mode.  It should only be used for
  * special-purpose code (say, RI checking.)
  *

I think that's still more or less true, though we could add catalog
scans as another example.

 and
 /*
  * Checking SecondarySnapshot is probably useless here, but it seems
  * better to be sure.
  */

Yeah, that is not useless any more, for sure.

 Also looks like BuildEventTriggerCache() in evtcache.c should use
 GetInstanSnapshot() now.

OK.

 I actually wonder if we shouldn't just abolish GetLatestSnapshot(). None
 of the callers seem to rely on it's behaviour from a quick look and it
 seems rather confusing to have both.

I assume Tom had some reason for making GetLatestSnapshot() behave the
way it does, so I refrained from doing that.  I might be wrong.

 I think we need another term for what SnapshotNow used to express
 here... Imo this description got less clear with this change.

I thought it was OK but I'm open to suggestions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-06 Thread Jim Nasby

On 6/5/13 3:49 PM, Robert Haas wrote:

Now, I did find a couple that I thought should probably stick with
SnapshotNow, specifically pgrowlocks and pgstattuple.


FWIW, I've often wished for a way to make all stat access transactional, across all the 
stats views. Perhaps that couldn't be done by default, but I'd love something like a 
function that would make a snapshot of all stats data as of one point. Even 
if that snapshot itself wasn't completely atomic, at least then you could query any stats 
views however you wanted and know that the info wasn't changing over time.

The reason I don't think this would work so well if done in userspace is how 
long it would take. Presumably making a complete backend-local copy of pg_proc 
etc and the stats file would be orders of magnitude faster than a bunch of 
CREATE TEMP TABLE's.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-05 Thread Greg Stark
On Wed, May 22, 2013 at 3:18 AM, Robert Haas robertmh...@gmail.com wrote:
 We've had a number of discussions about the evils of SnapshotNow.  As
 far as I can tell, nobody likes it and everybody wants it gone, but
 there is concern about the performance impact.

I was always under the impression that the problem was we weren't
quite sure what changes would be needed to make mvcc-snapshots work
for the catalog lookups. The semantics of SnapshotNow aren't terribly
clear either but we have years of experience telling us they seem to
basically work. Most of the problems we've run into we either have
worked around in the catalog accesses. Nobody really knows how many of
the call sites will need different logic to behave properly with mvcc
snapshots.

I thought there were many call sites that were specifically depending
on seeing dirty reads to avoid race conditions with other backends --
which probably just narrowed the race condition or created different
ones. If you clean those all up it will be probably be cleaner and
better but we don't know how many such sites will need to be modified.
I'm not even sure what clean them up means. You can replace checks
with things like constraints and locks but the implementation of
constraints and locks will still need to use SnapshotNow surely?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-05 Thread Hannu Krosing
On 06/05/2013 04:28 PM, Greg Stark wrote:
 On Wed, May 22, 2013 at 3:18 AM, Robert Haas robertmh...@gmail.com wrote:
 We've had a number of discussions about the evils of SnapshotNow.  As
 far as I can tell, nobody likes it and everybody wants it gone, but
 there is concern about the performance impact.
 I was always under the impression that the problem was we weren't
 quite sure what changes would be needed to make mvcc-snapshots work
 for the catalog lookups. The semantics of SnapshotNow aren't terribly
 clear either but we have years of experience telling us they seem to
 basically work. Most of the problems we've run into we either have
 worked around in the catalog accesses. Nobody really knows how many of
 the call sites will need different logic to behave properly with mvcc
 snapshots.

 I thought there were many call sites that were specifically depending
 on seeing dirty reads to avoid race conditions with other backends --
 which probably just narrowed the race condition or created different
 ones. If you clean those all up it will be probably be cleaner and
 better but we don't know how many such sites will need to be modified.
 I'm not even sure what clean them up means. You can replace checks
 with things like constraints and locks but the implementation of
 constraints and locks will still need to use SnapshotNow surely?
I guess that anything that does *not* write should be happier with
MVCC catalogue, especially if there has been any DDL after its snapshot.

For writers the ability to compare MVCC and SnapshotNow snapshots
would tell if they need to take extra steps.

But undoubtedly the whole thing would be lot of work.

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-05 Thread Andres Freund
On 2013-06-05 15:28:09 +0100, Greg Stark wrote:
 On Wed, May 22, 2013 at 3:18 AM, Robert Haas robertmh...@gmail.com wrote:
  We've had a number of discussions about the evils of SnapshotNow.  As
  far as I can tell, nobody likes it and everybody wants it gone, but
  there is concern about the performance impact.
 
 I thought there were many call sites that were specifically depending
 on seeing dirty reads to avoid race conditions with other backends --
 which probably just narrowed the race condition or created different
 ones.

But SnapshotNow doesn't allow you to do actual dirty reads? It only
gives you rows back that were actually visible when we checked. The
difference to SnapshotMVCC is that during a scan the picture of which
transactions are committed can change.

 I'm not even sure what clean them up means. You can replace checks
 with things like constraints and locks but the implementation of
 constraints and locks will still need to use SnapshotNow surely?

The places that require this should already use HeapTupleSatisfiesDirty
which is something different.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-05 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-06-05 15:28:09 +0100, Greg Stark wrote:
 I thought there were many call sites that were specifically depending
 on seeing dirty reads to avoid race conditions with other backends --
 which probably just narrowed the race condition or created different
 ones.

 But SnapshotNow doesn't allow you to do actual dirty reads?

Yeah.  I believe the issue is that we can't simply do MVCC catalog reads
with a snapshot taken at transaction start time or statement start time,
as we would do if executing an MVCC scan for a user query.  Rather, the
snapshot has to be recent enough to ensure we see the current definition
of any table we've just acquired lock on, *even if that's newer than the
snapshot prevailing for the user's purposes*.  Otherwise we might be
using the wrong rowtype definition or failing to enforce a just-added
constraint.

The last time we talked about this, we were batting around ideas of
keeping a current snapshot for catalog purposes, which we'd update
or at least invalidate anytime we acquired a new lock.  (In principle,
if that isn't new enough, we have a race condition that we'd better fix
by adding some more locking.)  Robert's results seem to say that that
might be unnecessary optimization, and that it'd be sufficient to just
take a new snap each time we need to do a catalog scan.  TBH I'm not
sure I believe that; it seems to me that this approach is surely going
to create a great deal more contention from concurrent GetSnapshotData
calls.  But at the very least, this says we can experiment with the
behavioral aspects without bothering to build infrastructure for
tracking an appropriate catalog snapshot.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-05 Thread Andres Freund
On 2013-06-05 11:35:58 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-06-05 15:28:09 +0100, Greg Stark wrote:
  I thought there were many call sites that were specifically depending
  on seeing dirty reads to avoid race conditions with other backends --
  which probably just narrowed the race condition or created different
  ones.
 
  But SnapshotNow doesn't allow you to do actual dirty reads?
 
 Yeah.  I believe the issue is that we can't simply do MVCC catalog reads
 with a snapshot taken at transaction start time or statement start time,
 as we would do if executing an MVCC scan for a user query.  Rather, the
 snapshot has to be recent enough to ensure we see the current definition
 of any table we've just acquired lock on, *even if that's newer than the
 snapshot prevailing for the user's purposes*.  Otherwise we might be
 using the wrong rowtype definition or failing to enforce a just-added
 constraint.

Oh, definitely. At least Robert's previous prototype tried to do that
(although I am not sure if it went far enough). And I'd be surprised the
current one wouldn't do so.

 The last time we talked about this, we were batting around ideas of
 keeping a current snapshot for catalog purposes, which we'd update
 or at least invalidate anytime we acquired a new lock.  (In principle,
 if that isn't new enough, we have a race condition that we'd better fix
 by adding some more locking.)  Robert's results seem to say that that
 might be unnecessary optimization, and that it'd be sufficient to just
 take a new snap each time we need to do a catalog scan.  TBH I'm not
 sure I believe that; it seems to me that this approach is surely going
 to create a great deal more contention from concurrent GetSnapshotData
 calls.

I still have a hard time believing those results as well, but I think we
might have underestimated the effectiveness of the syscache during
workloads which are sufficiently concurrent to make locking in
GetSnapshotData() a problem.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-05 Thread Robert Haas
On Wed, Jun 5, 2013 at 10:28 AM, Greg Stark st...@mit.edu wrote:
 On Wed, May 22, 2013 at 3:18 AM, Robert Haas robertmh...@gmail.com wrote:
 We've had a number of discussions about the evils of SnapshotNow.  As
 far as I can tell, nobody likes it and everybody wants it gone, but
 there is concern about the performance impact.

 I was always under the impression that the problem was we weren't
 quite sure what changes would be needed to make mvcc-snapshots work
 for the catalog lookups. The semantics of SnapshotNow aren't terribly
 clear either but we have years of experience telling us they seem to
 basically work. Most of the problems we've run into we either have
 worked around in the catalog accesses. Nobody really knows how many of
 the call sites will need different logic to behave properly with mvcc
 snapshots.

With all respect, I think this is just plain wrong.  SnapshotNow is
just like an up-to-date MVCC snapshot.  The only difference is that an
MVCC snapshot, once established, stays fixed for the lifetime of the
scan.  On the other hand, the SnapshotNow view in the world changes
the instant another transaction commits, meaning that scans can see
multiple versions of a row, or no versions of a row, where any MVCC
scan would have seen just one.  There are very few places that want
that behavior.

Now, I did find a couple that I thought should probably stick with
SnapshotNow, specifically pgrowlocks and pgstattuple. Those are just
gathering statistical information, so there's no harm in having the
snapshot change part-way through the scan, and if the scan is long,
the user might actually regard the results under SnapshotNow as more
accurate.  Whether that's the case or not, holding back xmin for those
kinds of scans does not seem wise.

But in most other parts of the code, the changes-in-mid-scan behavior
of SnapshotNow is a huge liability.  The fact that it is fully
up-to-date *as of the time the scan starts* is critical for
correctness.  But the fact that it can then change during the scan is
in almost every case something that we do not want.  The patch
preserves the first property while ditching the second one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-05 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Now, I did find a couple that I thought should probably stick with
 SnapshotNow, specifically pgrowlocks and pgstattuple. Those are just
 gathering statistical information, so there's no harm in having the
 snapshot change part-way through the scan, and if the scan is long,
 the user might actually regard the results under SnapshotNow as more
 accurate.  Whether that's the case or not, holding back xmin for those
 kinds of scans does not seem wise.

FWIW, I think if we're going to ditch SnapshotNow we should ditch
SnapshotNow, full stop, even removing the tqual.c routines for it.
Then we can require that *any* reference to SnapshotNow is replaced by
an MVCC reference prior to execution, and throw an error if we actually
try to test a tuple with that snapshot.  If we don't do it like that
I think we'll have errors of omission all over the place (I had really
no confidence in your original patch because of that worry).  The fact
that there are a couple of contrib modules for which there might be an
arguable advantage in doing it the old way isn't sufficient reason to
expose ourselves to bugs like that.  If they really want that sort of
uncertain semantics they could use SnapshotDirty, no?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-05 Thread Robert Haas
On Wed, Jun 5, 2013 at 6:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Now, I did find a couple that I thought should probably stick with
 SnapshotNow, specifically pgrowlocks and pgstattuple. Those are just
 gathering statistical information, so there's no harm in having the
 snapshot change part-way through the scan, and if the scan is long,
 the user might actually regard the results under SnapshotNow as more
 accurate.  Whether that's the case or not, holding back xmin for those
 kinds of scans does not seem wise.

 FWIW, I think if we're going to ditch SnapshotNow we should ditch
 SnapshotNow, full stop, even removing the tqual.c routines for it.
 Then we can require that *any* reference to SnapshotNow is replaced by
 an MVCC reference prior to execution, and throw an error if we actually
 try to test a tuple with that snapshot.  If we don't do it like that
 I think we'll have errors of omission all over the place (I had really
 no confidence in your original patch because of that worry).  The fact
 that there are a couple of contrib modules for which there might be an
 arguable advantage in doing it the old way isn't sufficient reason to
 expose ourselves to bugs like that.  If they really want that sort of
 uncertain semantics they could use SnapshotDirty, no?

I had the same thought, initially.  I went through the exercise of
doing a grep for SnapshotNow and trying to eliminate as many
references as possible, but there were a few that I couldn't convince
myself to rip out.  However, if you'd like to apply the patch and grep
for SnapshotNow and suggest what to do about the remaining cases (or
hack the patch up yourself) I think that would be great.  I'd love to
see it completely gone if we can see our way clear to that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-03 Thread Michael Paquier
On Tue, Jun 4, 2013 at 3:57 AM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, May 30, 2013 at 1:39 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  +1.

 Here's a more serious patch for MVCC catalog access.  This one
 involves more data copying than the last one, I think, because the
 previous version did not register the snapshots it took, which I think
 is not safe.  So this needs to be re-tested for performance, which I
 have so far made no attempt to do.

 It strikes me as rather unfortunate that the snapshot interface is
 designed in such a way as to require so much data copying.  It seems
 we always take a snapshot by copying from PGXACT/PGPROC into
 CurrentSnapshotData or SecondarySnapshotData, and then copying data a
 second time from there to someplace more permanent.  It would be nice
 to avoid that, at least in common cases.

And here are more results comparing master branch with and without this
patch...

1) DDL CREATE/DROP test:
1-1) master:
250 connections:
Create: 24846.060, Drop: 30391.713
Create: 23771.394, Drop: 29769.396
500 connections:
Create: 24339.449, Drop: 30084.741
Create: 24152.176, Drop: 30643.471
1000 connections:
Create: 26007.960, Drop: 31019.918
Create: 25937.592, Drop: 30600.341
2000 connections:
Create: 26900.324, Drop: 30741.989
Create: 26910.660, Drop: 31577.247

1-2) mvcc catalogs:
250 connections:
Create: 25371.342, Drop: 31458.952
Create: 25685.094, Drop: 31492.377
500 connections:
Create: 28557.882, Drop: 33673.266
Create: 27901.910, Drop: 33223.006
1000 connections:
Create: 31910.130, Drop: 36770.062
Create: 32210.093, Drop: 36754.888
2000 connections:
Create: 40374.754, Drop: 43442.528
Create: 39763.691, Drop: 43234.243

2) backend startup
2-1) master branch:
250 connections:
real0m8.993s
user0m0.128s
sys 0m0.380s
500 connections:
real0m9.004s
user0m0.212s
sys 0m0.340s
1000 connections:
real0m9.072s
user0m0.272s
sys 0m0.332s
2000 connections:
real0m9.257s
user0m0.204s
sys 0m0.392s

2-2) MVCC catalogs:
250 connections:
real0m9.067s
user0m0.108s
sys 0m0.396s
500 connections:
real0m9.034s
user0m0.112s
sys 0m0.376s
1000 connections:
real0m9.303s
user0m0.176s
sys 0m0.328s
2000 connections
real0m9.916s
user0m0.160s
sys 0m0.428s

Except for the case of backend startup test for 500 connections that looks
to have some noise, performance degradation reaches 6% for 2000
connections, and less than 1% for 250 connections. This is better than last
time.
For the CREATE/DROP case, performance drop reaches 40% for 2000 connections
(32% during last tests). I also noticed a lower performance drop for 250
connections now (3~4%) compared to the 1st time (9%).

I compiled the main results on tables here:
http://michael.otacoo.com/postgresql-2/postgres-9-4-devel-mvcc-catalog-access-take-2-2/
The results of last time are also available here:
http://michael.otacoo.com/postgresql-2/postgres-9-4-devel-mvcc-catalog-access-2/

Regards,
-- 
Michael


Re: [HACKERS] MVCC catalog access

2013-05-29 Thread Michael Paquier
On Tue, May 28, 2013 at 10:39 PM, Robert Haas robertmh...@gmail.com wrote:

 IMHO, we should press forward with this approach.  Considering that
 these are pretty extreme test cases, I'm inclined to view the
 performance loss as acceptable.  We've never really viewed DDL as
 something that needs to be micro-optimized, and there is ample
 testimony to that fact in the existing code and in the treatment of
 prior patches in this area.  This is not to say that we want to go
 around willy-nilly making it slower, but I think there will be very
 few users for which the number of microseconds it takes to create or
 drop an SQL object is performance-critical, especially when you
 consider that (1) the effect will be quite a bit less when the objects
 are tables, since in that case the snapshot cost will tend to be
 drowned out by the filesystem cost and (2) people who don't habitually
 keep hundreds and hundreds of connections open - which hopefully most
 people don't - won't see the effect anyway.   Against that, this
 removes the single largest barrier to allowing more concurrent DDL, a
 feature that I suspect will make a whole lot of people *very* happy.

+1.
So, I imagine that the next step would be to add a new Snapshot validation
level in tqual.h. Something like SnapshotMVCC? Then replace SnapshotNow
by SnapshotMVCC where it is required.

I am also seeing that SnapshotNow is used in places where we might not want
to
have it changed. For example autovacuum code path when we retrieve database
or table list should not be changed, no?
-- 
Michael


Re: [HACKERS] MVCC catalog access

2013-05-28 Thread Robert Haas
On Sun, May 26, 2013 at 9:10 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Perhaps we see little difference in performance because PGPROC has been
 separated into PGPROC and PGXACT, reducing lock contention with getting
 snapshot data?

 By the way, I grabbed a 32-core machine and did some more performance tests
 with some open connections with XIDs assigned using pg_cxn v2 given by
 Robert in his previous mail to make sure that the snapshots get pretty
 large.

Thanks for checking this on another machine.  It's interesting that
you were able to measure a hit for relcache rebuild, whereas I was
not, but it doesn't look horrible.

IMHO, we should press forward with this approach.  Considering that
these are pretty extreme test cases, I'm inclined to view the
performance loss as acceptable.  We've never really viewed DDL as
something that needs to be micro-optimized, and there is ample
testimony to that fact in the existing code and in the treatment of
prior patches in this area.  This is not to say that we want to go
around willy-nilly making it slower, but I think there will be very
few users for which the number of microseconds it takes to create or
drop an SQL object is performance-critical, especially when you
consider that (1) the effect will be quite a bit less when the objects
are tables, since in that case the snapshot cost will tend to be
drowned out by the filesystem cost and (2) people who don't habitually
keep hundreds and hundreds of connections open - which hopefully most
people don't - won't see the effect anyway.   Against that, this
removes the single largest barrier to allowing more concurrent DDL, a
feature that I suspect will make a whole lot of people *very* happy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-05-26 Thread Michael Paquier
Perhaps we see little difference in performance because PGPROC has been
separated into PGPROC and PGXACT, reducing lock contention with getting
snapshot data?

By the way, I grabbed a 32-core machine and did some more performance tests
with some open connections with XIDs assigned using pg_cxn v2 given by
Robert in his previous mail to make sure that the snapshots get pretty
large.

First I ran the simple read test:
$ time for s in `seq 1 1000`
 do
 rm -f ~/bin/pgsql/master/global/pg_internal.init  psql -c 'SELECT 2+2'
/dev/null;
 done
And then the create/drop test.
I have done those tests with 250, 500, 1000 and 2000 open connections:

1) 250 open connections
1-1) read test
Round 1:
mvcc_catalog_access off:
real0m9.124s
user0m0.200s
sys0m0.392s
mvcc_catalog_access on:
real0m9.297s
user0m0.148s
sys0m0.444s
Round 2:
mvcc_catalog_access off:
real0m8.985s
user0m0.160s
sys0m0.372s
mvcc_catalog_access on:
real0m9.244s
user0m0.240s
sys0m0.400s

1-2) DDL test (drop and creation of 100,000 objects)
mvcc off: Create: 24554.849, Drop: 29755.146
mvcc on: Create: 26904.755, Drop: 32891.556
mvcc off: Create: 23337.342, Drop: 29921.990
mvcc on: Create: 24533.708, Drop: 31670.840

2) 500 open connections
2-1) read test
Round 1:
mvcc_catalog_access off:
real0m9.123s
user0m0.200s
sys0m0.396s
mvcc_catalog_access on:
real0m9.627s
user0m0.156s
sys0m0.460s
Round 2:
mvcc_catalog_access off:
real0m9.221s
user0m0.316s
sys0m0.392s
mvcc_catalog_access on:
real0m9.592s
user0m0.160s
sys0m0.484s

2-2) DDL test (drop and creation of 100,000 objects)
mvcc off: Create: 25872.886, Drop: 31723.921
mvcc on: Create: 27076.429, Drop: 33674.336
mvcc off: Create: 24039.456, Drop: 30434.019
mvcc on: Create: 29105.713, Drop: 33821.170

3) 1000 open connections
3-1) read test
Round 1:
mvcc_catalog_access off:
real0m9.240s
user0m0.192s
sys0m0.396s
mvcc_catalog_access on:
real0m9.674s
user0m0.236s
sys0m0.440s
Round 2:
mvcc_catalog_access off:
real0m9.302s
user0m0.308s
sys0m0.392s
mvcc_catalog_access on:
real0m9.746s
user0m0.204s
sys0m0.436s

3-2) DDL test (drop and creation of 100,000 objects)
mvcc off: Create: 25563.705, Drop: 31747.451
mvcc on: Create: 33281.246, Drop: 36618.166
mvcc off: Create: 28178.210, Drop: 30550.166
mvcc on: Create: 31849.825, Drop: 36831.245

4) 2000 open connections
4-1) read test
Round 1:
mvcc_catalog_access off:
real0m9.066s
user0m0.128s
sys0m0.420s
mvcc_catalog_access on:
real0m9.978s
user0m0.168s
sys0m0.412s

Round 2:
mvcc_catalog_access off:
real0m9.113s
user0m0.152s
sys0m0.444s
mvcc_catalog_access on:
real0m9.974s
user0m0.176s
sys0m0.436s
More or less the same results as previously with 10% performance drop.

4-2) DDL test (drop and creation of 100,000 objects)
mvcc off: Create: 28708.095 ms, Drop: 32510.057 ms
mvcc on: Create: 39987.815 ms, Drop: 43157.006 ms
mvcc off: Create: 28409.853 ms, Drop: 31248.163 ms
mvcc on: Create: 41669.829 ms, Drop: 44645.109 ms

For read tests, we can see a performance drop up to 10% for 2000
connections.
For the write tests, we can see a performance drop of 9~10% for 250
connections, up to 30% performance drop with 2000 connections.

We barely see users drop that many objects at the same time with so much
open transactions, they'll switch to a connection pooler before opening
that many connections to the server. I am not sure that such a performance
drop is acceptable as-is, but perhaps it is if we consider the
functionality gain we can have thanks to MVCC catalogs.
--
Michael


Re: [HACKERS] MVCC catalog access

2013-05-23 Thread Robert Haas
On Wed, May 22, 2013 at 11:11 PM, Andres Freund and...@2ndquadrant.com wrote:
 Make that actually having acquired an xid. We skip a large part of the
 work if a transaction doesn't yet have one afair. I don't think the mere
 presence of 600 idle connections without an xid in contrast to just
 having max_connection at 600 should actually make a difference in the
 cost of acquiring a snapshot?

Attached is a slightly updated version of the patch I'm using for
testing, and an updated version of the pg_cxn source that I'm using to
open lotsa connections.  With this version, I can do this:

./pg_cxn -n 600 -c BEGIN -c 'SELECT txid_current()'

...which I think is sufficient to make sure all those transactions
have XIDs.  Then I reran the depend test case (create a schema with
1000,000 functions and then drop the schema with CASCADE) that I
mentioned in my original posting.  Here are the results:

MVCC Off: Create 8685.662 ms, Drop 9973.233 ms
MVCC On: Create 7931.039 ms, Drop 10189.189 ms
MVCC Off: Create 7810.084 ms, Drop 9594.580 ms
MVCC On: Create 8854.577 ms, Drop 10240.024 ms

OK, let's try the rebuild-the-relcache test using the same pg_cxn
scenario (600 transactions that have started a transaction and
selected txid_current()).

[rhaas ~]$ time for s in `seq 1 1000`; do rm -f
pgdata/global/pg_internal.init  psql -c 'SELECT 2+2' /dev/null;
done

MVCC catalog access on:
real0m11.006s
user0m2.746s
sys 0m2.664s

MVCC catalog access off:
real0m10.583s
user0m2.745s
sys 0m2.661s

MVCC catalog access on:
real0m10.646s
user0m2.750s
sys 0m2.661s

MVCC catalog access off:
real0m10.823s
user0m2.756s
sys 0m2.681s

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


mvcc-catalog-access-v2.patch
Description: Binary data
/*
 * pg_cxn.c
 */

#include stdio.h
#include stdlib.h
#include getopt.h
#include libpq-fe.h

struct cmd_list;
typedef struct cmd_list cmd_list;

struct cmd_list
{
	char   *cmd;
	cmd_list   *next;
};

static void pg_connect(const char *conninfo, cmd_list *);
static cmd_list *add_command(cmd_list *, char *);
static void usage(void);

int
main(int argc, char **argv)
{
	int		c;
	int		n = 1;
	int		optindex;
	int		i;
	const char *conninfo;
	cmd_list   *cmds;

	while ((c = getopt_long(argc, argv, n:c:, NULL, optindex)) != -1)
	{
		switch (c)
		{
			case 'n':
n = atoi(optarg);
break;
			case 'c':
cmds = add_command(cmds, optarg);
break;
			default:
usage();
break;
		}
	}
	argv += optind;
	argc -= optind;

	if (argc  0)
		conninfo = argv[0];
	else
		conninfo = ;

	for (i = 0; i  n; ++i)
		pg_connect(conninfo, cmds);

	printf(Established %d connections.\n, n);

	while (1)
		sleep(3600);

	return 0;
}

static cmd_list *
add_command(cmd_list *cmds, char *cmd)
{
	cmd_list   *newnode;

	newnode = malloc(sizeof(cmd_list));
	if (newnode == NULL)
	{
		perror(malloc);
		exit(1);
	}
	newnode-cmd = cmd;
	newnode-next = cmds;
	return newnode;
}

static void
pg_connect(const char *conninfo, cmd_list *cmds)
{
	PGconn	   *conn;

	/* Make a connection to the database */
	conn = PQconnectdb(conninfo);

	/* Check to see that the backend connection was successfully made */
	if (PQstatus(conn) != CONNECTION_OK)
	{
		fprintf(stderr, %s, PQerrorMessage(conn));
		exit(1);
	}

	/* Execute commands. */
	while (cmds != NULL)
	{
		PGresult *res;

		res = PQexec(conn, cmds-cmd);
		if (PQresultStatus(res) == PGRES_FATAL_ERROR)
		{
			fprintf(stderr, %s, PQresultErrorMessage(res));
			exit(1);
		}
		if (PQresultStatus(res) != PGRES_COMMAND_OK
			 PQresultStatus(res) != PGRES_TUPLES_OK)
		{
			fprintf(stderr, unexpected result status: %s\n,
	PQresStatus(PQresultStatus(res)));
			exit(1);
		}
		PQclear(res);
		cmds = cmds-next;
	}
}

static void
usage()
{
	fprintf(stderr, Usage: pg_cxn [OPTION] [CONNECTION-STRING]\n\n
		Options:\n
		  -n NUM		Number of connections to open.\n
		  -c SQL		SQL to execute on each connection.\n
		(You can repeat this option more than once.)\n);
	exit(1);
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-05-22 Thread Robert Haas
On Tue, May 21, 2013 at 10:18 PM, Robert Haas robertmh...@gmail.com wrote:
 [ MVCC catalog access seems to be pretty cheap ]

In discussions today, Tom Lane suggested testing the time to start up
a backend and run a simple query such as SELECT 2+2 in the absence
of a relcache file.

I did this and can't measure any overhead as a result of MVCC catalog
access.  I tried it with no active connections.  I tried it with 600
idle active connections (to make taking MVCC snapshots more
expensive).  I couldn't quite believe it made no difference, so I
tried doing it in a tight loop under pgbench.  I still can't measure
any difference.  I haven't tested carefully enough to rule out the
possibility of an effect 1/2% at 600 connections, but there certainly
isn't anything bigger than that and I don't even think there's that
much of a difference.

Andres Freund suggested creating a couple of simple tables and having
lots of short-lived backends select data from them.

rhaas=# create table af1 (x) as select g from generate_series(1,4) g;
SELECT 4
rhaas=# create table af2 (x) as select g from generate_series(4,7) g;
SELECT 4

Test query: SELECT * FROM af1, af2 WHERE af1.x = af2.x;
pgbench command: pgbench -T 10 -c 50 -j 50 -n -f f -C

With mvcc_catalog_access=off, I get ~1553 tps; with it on, I get ~1557
tps.  Hmm... that could be because of the two-line debugging hunk my
patch addes to HeapTupleSatisfiesNow().  After removing that, I get
maybe a 1% regression with mvcc_catalog_access=on on this test, but
it's not very consistent.  If I restart the database server a few
times, the overhead bounces around each time, and sometimes it's zero;
the highest I saw is 1.4%.  But it's not much, and this is a pretty
brutal workload for PostgreSQL, since starting up 1500 connections
per second is not a workload for which we're well-suited in the first
place.

All in all, I'm still feeling optimistic.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-05-22 Thread Andres Freund
On 2013-05-22 22:51:13 -0400, Robert Haas wrote:
 In discussions today, Tom Lane suggested testing the time to start up
 a backend and run a simple query such as SELECT 2+2 in the absence
 of a relcache file.

 I did this and can't measure any overhead as a result of MVCC catalog
 access.  I tried it with no active connections.  I tried it with 600
 idle active connections (to make taking MVCC snapshots more
 expensive).

Did you try it with the 600 transactions actually being in a transaction
and having acquired a snapshot?


 All in all, I'm still feeling optimistic.

+1. I still feel like this has to be much harder since we made it out to
be hard for a long time ;)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-05-22 Thread Robert Haas
On Wed, May 22, 2013 at 11:02 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-05-22 22:51:13 -0400, Robert Haas wrote:
 In discussions today, Tom Lane suggested testing the time to start up
 a backend and run a simple query such as SELECT 2+2 in the absence
 of a relcache file.

 I did this and can't measure any overhead as a result of MVCC catalog
 access.  I tried it with no active connections.  I tried it with 600
 idle active connections (to make taking MVCC snapshots more
 expensive).

 Did you try it with the 600 transactions actually being in a transaction
 and having acquired a snapshot?

No... I can hack something up for that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-05-22 Thread Andres Freund
On 2013-05-22 23:05:40 -0400, Robert Haas wrote:
 On Wed, May 22, 2013 at 11:02 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2013-05-22 22:51:13 -0400, Robert Haas wrote:
  In discussions today, Tom Lane suggested testing the time to start up
  a backend and run a simple query such as SELECT 2+2 in the absence
  of a relcache file.
 
  I did this and can't measure any overhead as a result of MVCC catalog
  access.  I tried it with no active connections.  I tried it with 600
  idle active connections (to make taking MVCC snapshots more
  expensive).
 
  Did you try it with the 600 transactions actually being in a transaction
  and having acquired a snapshot?
 
 No... I can hack something up for that.

Make that actually having acquired an xid. We skip a large part of the
work if a transaction doesn't yet have one afair. I don't think the mere
presence of 600 idle connections without an xid in contrast to just
having max_connection at 600 should actually make a difference in the
cost of acquiring a snapshot?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] MVCC catalog access

2013-05-21 Thread Robert Haas
We've had a number of discussions about the evils of SnapshotNow.  As
far as I can tell, nobody likes it and everybody wants it gone, but
there is concern about the performance impact.  I decided to do some
testing to measure the impact.  I was pleasantly surprised by the
results.

The attached patch is a quick hack to provide for MVCC catalog access.
 It adds a GUC called mvcc_catalog_access.  When this GUC is set to
true, and heap_beginscan() or index_beginscan() is called with
SnapshotNow, they call GetLatestSnapshot() and use the resulting
snapshot in lieu of SnapshotNow.  As a debugging double-check, I
modified HeapTupleSatisfiesNow to elog(FATAL) if called with
mvcc_catalog_access is true.  Of course, both of these are dirty
hacks.  If we were actually to implement MVCC catalog access, I think
we'd probably just go through and start replacing instances of
SnapshotNow with GetLatestSnapshot(), but I wanted to make it easy to
do perf testing.

When I first made this change, I couldn't detect any real change;
indeed, it seemed that make check was running ever-so-slightly faster
than before, although that may well have been a testing artifact.  I
wrote a test case that created a schema with 100,000 functions in it
and then dropped the schema (I believe it was Tom who previously
suggested this test case as a worst-case scenario for MVCC catalog
access).  That didn't seem to be adversely affected either, even
though it must take ~700k additional MVCC snapshots with
mvcc_catalog_access = true.

MVCC Off: Create 8743.101 ms, Drop 9655.471 ms
MVCC On: Create 7462.882 ms, Drop 9515.537 ms
MVCC Off: Create 7519.160 ms, Drop 9380.905 ms
MVCC On: Create 7517.382 ms, Drop 9394.857 ms

The first Create seems to be artificially slow because of some kind
of backend startup overhead.  Not sure exactly what.

After wracking my brain for a few minutes, I realized that the lack of
any apparent performance regression was probably due to the lack of
any concurrent connections, making the scans of the PGXACT array very
cheap.  So I wrote a little program to open a bunch of extra
connections.  My MacBook Pro grumbled when I tried to open more than
about 600, so I had to settle for that number.  That was enough to
show up the cost of all those extra snapshots:

MVCC Off: Create 9065.887 ms, Drop 9599.494 ms
MVCC On: Create 8384.065 ms, Drop 10532.909 ms
MVCC Off: Create 7632.197 ms, Drop 9499.502 ms
MVCC On: Create 8215.443 ms, Drop 10033.499 ms

Now, I don't know about you, but I'm having a hard time getting
agitated about those numbers.  Most people are not going to drop
100,000 objects with a single cascaded drop.  And most people are not
going to have 600 connections open when they do.  (The snapshot
overhead should be roughly proportional to the product of the number
of drops and the number of open connections, and the number of cases
where the product is as high as 60 million has got to be pretty
small.)  But suppose that someone is in that situation.  Well, then
they will take a... 10% performance penalty?  That sounds plenty
tolerable to me, if it means we can start moving in the direction of
allowing some concurrent DDL.

Am I missing an important test case here?  Are these results worse
than I think they are?  Did I boot this testing somehow?

[MVCC catalog access patch, test program to create lots of idle
connections, and pg_depend stress test case attached.]

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


mvcc-catalog-access.patch
Description: Binary data
/*
 * pg_cxn.c
 */

#include stdio.h
#include stdlib.h
#include getopt.h
#include libpq-fe.h

static void pg_connect(const char *conninfo);

int
main(int argc, char **argv)
{
	int		c;
	int		n = 1;
	int		optindex;
	int		i;
	const char *conninfo;

	while ((c = getopt_long(argc, argv, n:, NULL, optindex)) != -1)
	{
		switch (c)
		{
			case 'n':
n = atoi(optarg);
break;
			default:
fprintf(stderr, Usage: pg_cxn [-n count] connstr\n);
exit(1);
		}
	}
	argv += optind;
	argc -= optind;

	if (argc  0)
		conninfo = argv[0];
	else
		conninfo = ;

	for (i = 0; i  n; ++i)
		pg_connect(conninfo);

	while (1)
		sleep(3600);

	return 0;
}

static void
pg_connect(const char *conninfo)
{
	PGconn	   *conn;

	/* Make a connection to the database */
	conn = PQconnectdb(conninfo);

	/* Check to see that the backend connection was successfully made */
	if (PQstatus(conn) != CONNECTION_OK)
	{
		fprintf(stderr, %s, PQerrorMessage(conn));
		exit(1);
	}
}


depend
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers