Re: [HACKERS] pg_basebackup --xlog compatibility break

2012-06-04 Thread Peter Eisentraut
On tis, 2012-05-29 at 22:31 +0200, Magnus Hagander wrote:
 Yeah, good arguments all around, i agree too :-) Next question is -
 suggestions for naming of said paramter?

--xlog-method=something?  And/or -Xsomething, which would automatically
enable -x?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] How to avoid base backup in automated failover

2012-06-04 Thread chinnaobi
Hi all,

Recently I was writing an application to implement automated failover with
env: Two 2008 R2 servers, Network area storage, asynchronous replication,
WAL archive on primary enabled.

Is there any way to avoid starting standby server always from base backup in
automated failover. I see the database is growing huge. I can't keep doing
base backup every day.

Please suggest solution

Regards,
Reddy

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/How-to-avoid-base-backup-in-automated-failover-tp5711147.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-06-04 Thread Robert Haas
On Fri, Jun 1, 2012 at 9:55 PM, Ants Aasma a...@cybertec.at wrote:
 On Sat, Jun 2, 2012 at 1:48 AM, Merlin Moncure mmonc...@gmail.com wrote:
 Buffer pins aren't a cache: with a cache you are trying to mask a slow
 operation (like a disk i/o) with a faster such that the amount of slow
 operations are minimized.  Buffer pins however are very different in
 that we only care about contention on the reference count (the buffer
 itself is not locked!) which makes me suspicious that caching type
 algorithms are the wrong place to be looking.  I think it comes to do
 picking between your relatively complex but general, lock displacement
 approach or a specific strategy based on known bottlenecks.

 I agree that pins aren't like a cache. I mentioned the caching
 algorithms because they work based on access frequency and highly
 contended locks are likely to be accessed frequently even from a
 single backend. However this only makes sense for the delayed
 unpinning method, and I also have come to the conclusion that it's not
 likely to work well. Besides delaying cleanup, the overhead for the
 common case of uncontended access is just too much.

 It seems to me that even the nailing approach will need a replacement
 algorithm. The local pins still need to be published globally and
 because shared memory size is fixed, the maximum amount of locally
 pinned nailed buffers needs to be limited as well.

 But anyway, I managed to completely misread the profile that Sergey
 gave. Somehow I missed that the time went into the retry TAS in slock
 instead of the inlined TAS. This shows that the issue isn't just
 cacheline ping-pong but cacheline stealing. This could be somewhat
 mitigated by making pinning lock-free. The Nb-GCLOCK paper that Robert
 posted earlier in another thread describes an approach for this. I
 have a WIP patch (attached) that makes the clock sweep lock-free in
 the common case. This patch gave a 40% performance increase for an
 extremely allocation heavy load running with 64 clients on a 4 core 1
 socket system, lesser gains across the board. Pinning has a shorter
 lock duration (and a different lock type) so the gain might be less,
 or it might be a larger problem and post a higher gain. Either way, I
 think the nailing approach should be explored further, cacheline
 ping-pong could still be a problem with higher number of processors
 and losing the spinlock also loses the ability to detect contention.

Note sure about the rest of this patch, but this part is definitely bogus:

+#if !defined(pg_atomic_fetch_and_set)
+#define pg_atomic_fetch_and_set(dst, src, value) \
+   do { S_LOCK(dummy_spinlock); \
+   dst = src; \
+   src = value; \
+   S_UNLOCK(dummy_spinlock); } while (0)
+#endif

Locking a dummy backend-local spinlock doesn't provide atomicity
across multiple processes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GENERAL] pg_upgrade from 9.0.7 to 9.1.3: duplicate key pg_authid_oid_index

2012-06-04 Thread Bruce Momjian
On Sat, Jun 02, 2012 at 05:10:03PM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  On Fri, Jun 01, 2012 at 09:52:59AM -0400, Tom Lane wrote:
  It seems that pg_upgrade needs a check to make sure that the bootstrap
  superuser is named the same in old and new clusters.
 
  The attached patch adds checks to verify the the old/new servers have
  the same install-user oid.
 
 That may or may not be a useful check to make, but it's got
 approximately nothing to do with what I was complaining about.
 
 In particular, supposing that the user has given you a username that
 isn't the bootstrap superuser in the new cluster, this patch is not
 going to stop the update script from failing.  Because the script is
 then going to try to replace the bootstrap superuser, and that is
 certainly going to give an error.
 
 I see the point of worrying about the install user as well as the
 bootstrap superuser, but wouldn't it be best to insist they be the same?
 Particularly in the new cluster, where if they aren't the same it means
 the user has manually created at least one role in the new cluster,
 which is likely to lead to OID conflicts or worse.
 
 Furthermore, if the bootstrap superusers aren't named the same, your
 patch fails to handle the original complaint.  In the case the
 OP mentioned, the old cluster had
   OID 10: ubuntu
   some user-defined OID: postgres
 and the new cluster had
   OID 10: postgres
 If the user tells pg_upgrade to use username postgres, your check will
 not fail AFAICS, but nonetheless things are going to be messed up after
 the upgrade, because some objects and privileges that used to belong to
 the bootstrap superuser will now belong to a non-default superuser,
 whereas what used to belong to the non-default superuser will now belong
 to the bootstrap superuser.  That cannot be thought desirable.  For one
 reason, in the old installation the postgres role could have been
 dropped (possibly after dropping a few non-builtin objects) whereas the
 ubuntu role was pinned.  In the new installation, postgres is pinned
 and ubuntu won't be.
 
 I think the checks that are actually needed here are (1) bootstrap
 superusers are named the same, and (2) there are no roles other than the
 bootstrap superuser in the new cluster.

You are right that it is more complex than I stated, but given the
limited feedback I got on the pg_upgrade/plplython, I figured people
didn't want to hear the details.  Here they are:

There are three failure modes for pg_upgrade:

1.  check failure
2.  schema restore failure
3.  silent failure/corruption

Of course, the later items are worse than the earlier ones.  The
reporter got a schema restore failure while still following the
pg_upgrade instructions.  My initial patch changed that #2 error to a #1
error.  Tom is right that creating users in the new cluster (against
instructions), can still generate a #2 error if a new/old pg_authid.oid
match, and they are not the install user, but seeing that is something
that is against the instructions, I was going to leave that as a #2.

However, since Tom feels we should check that and make it a #1 failure,
I have added that test to the attached patch.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index d226f00..9f3dcda
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*** check_new_cluster(void)
*** 138,144 
  	 *	We don't restore our own user, so both clusters must match have
  	 *	matching install-user oids.
  	 */
! 	if (old_cluster.install_user_oid != new_cluster.install_user_oid)
  		pg_log(PG_FATAL,
  		Old and new cluster install users have different values for pg_authid.oid.\n);
  
--- 138,144 
  	 *	We don't restore our own user, so both clusters must match have
  	 *	matching install-user oids.
  	 */
! 	if (old_cluster.install_role_oid != new_cluster.install_role_oid)
  		pg_log(PG_FATAL,
  		Old and new cluster install users have different values for pg_authid.oid.\n);
  
*** check_new_cluster(void)
*** 147,153 
  	 *	defined users might match users defined in the old cluster and
  	 *	generate an error during pg_dump restore.
  	 */
! 	if (new_cluster.user_count != 1)
  		pg_log(PG_FATAL, Only the install user can be defined in the new cluster.\n);
  
  	check_for_prepared_transactions(new_cluster);
--- 147,153 
  	 *	defined users might match users defined in the old cluster and
  	 *	generate an error during pg_dump restore.
  	 */
! 	if (new_cluster.role_count != 1)
  		pg_log(PG_FATAL, Only the install user can be defined in the new cluster.\n);
  
  	check_for_prepared_transactions(new_cluster);
*** check_is_super_user(ClusterInfo *cluster
*** 618,624 
  		pg_log(PG_FATAL, database user 

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-06-04 Thread Ants Aasma
On Mon, Jun 4, 2012 at 5:12 PM, Robert Haas robertmh...@gmail.com wrote:
 Note sure about the rest of this patch, but this part is definitely bogus:

 +#if !defined(pg_atomic_fetch_and_set)
 +#define pg_atomic_fetch_and_set(dst, src, value) \
 +       do { S_LOCK(dummy_spinlock); \
 +       dst = src; \
 +       src = value; \
 +       S_UNLOCK(dummy_spinlock); } while (0)
 +#endif

 Locking a dummy backend-local spinlock doesn't provide atomicity
 across multiple processes.

Right, of course. I don't know why I assumed that dummy_spinlock would
be global. In any case, this is very WIP and doesn't even aspire to be
portable yet. The main point was to see if there's any significant
performance to be gained by this method.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2012-06-04 Thread Ants Aasma
On Thu, Sep 29, 2011 at 11:30 PM, Magnus Hagander mag...@hagander.net wrote:
 it doesn't say that is not possible to use this for a standby
 server... probably that's why i get the error i put a recovery.conf
 after pg_basebackup finished... maybe we can say that  more loudly?

 The idea is, if you use it with -x (or --xlog), it's for taking a
 backup/clone, *not* for replication.

 If you use it without -x, then you can use it as the start of a
 replica, by adding a recovery.conf.

 But you can't do both at once, that will confuse it.

I stumbled upon this again today. There's nothing in the docs that
would even hint that using -x shouldn't work to create a replica. Why
does it get confused and can we (easily) make it not get confused? At
the very least it needs a big fat warning in documentation for the -x
option that the resulting backup might not be usable as a standby.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-06-04 Thread Robert Haas
On Sat, Jun 2, 2012 at 12:58 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2012/6/1 Robert Haas robertmh...@gmail.com:
 On Thu, May 31, 2012 at 3:52 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 It may be an option to separate the case into two; a situation to execute
 the given query immediately just after optimization and never reused,
 and others.

 Yes.  Simon suggested exactly this a while back, and I agree with him
 that we ought to have it.

 It is good for me, also.

 Then, if so, we will be able to push the stuff corresponding to
 RLSBYPASS into the query optimization, and works transparently
 for users.

 You're still going to need a way to make sure that the cluster can be
 dumped properly.  RLSBYPASS accomplishes that; your scheme doesn't.

 If something like has_superuser_privilege() OR is automatically
 appended to user given clauses, it makes sure whole of the database
 cluster is dumped.
 That also means any permission checks are delayed to executor stage
 (except for the case on simple query protocol, I mentioned above),
 thus it simplifies the condition to invalidate prepared statement.

 One problem I noticed last night around RLSBYPASS approach is:
 it can take much number of invalidation whenever current user-id is
 switched. Not only SET AUTHORIZATION, but SECURITY DEFINER
 function's invocation also. I'm not sure whether this invalidation
 storm is reasonable level, or not.

 Is it unavailable to handle this type of implicit superuser checks
 with existing EXECUTE statement?
 I tried to run EXPLAIN with similar case.

 postgres=# PREPARE p1(int, bool) AS SELECT * FROM tbl WHERE y  $1 OR $2;
 PREPARE
 postgres=# EXPLAIN EXECUTE p1(10, current_user in ('kaigai','rhaas'));
                       QUERY PLAN
 
  Seq Scan on tbl  (cost=0.00..21.60 rows=1160 width=40)
 (1 row

 However,

 postgres=# PREPARE p2(int) AS SELECT * FROM tbl
                   WHERE y  $1 OR current_user in ('kaigai','rhaas');
 PREPARE
 postgres=# EXPLAIN EXECUTE p2(10);
                                 QUERY PLAN
 -
  Seq Scan on tbl  (cost=0.00..30.30 rows=394 width=40)
   Filter: ((y  10) OR (current_user() = ANY ('{kaigai,rhaas}'::name[])))
 (2 rows)

 Please assume the second condition something like superuser
 checks in addition to the main security policy.
 It implies an idea to replace a certain portion of clause that
 consists of only constant values and stable / immutable functions
 by shadow parameter to be calculated at executor stage, makes
 sense to wipe out RLS policy for superusers. In addition, it requires
 no new invalidation mechanism to prepared statements.

I'm not sure how best to handle the invalidation issue... but I still
think that relying on the query planner to use theorem-proving to
simplify RLS conditions is going to be problematic.  You can certainly
try it ad see how it comes out...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-06-04 Thread Merlin Moncure
On Mon, Jun 4, 2012 at 9:20 AM, Ants Aasma a...@cybertec.at wrote:
 On Mon, Jun 4, 2012 at 5:12 PM, Robert Haas robertmh...@gmail.com wrote:
 Note sure about the rest of this patch, but this part is definitely bogus:

 +#if !defined(pg_atomic_fetch_and_set)
 +#define pg_atomic_fetch_and_set(dst, src, value) \
 +       do { S_LOCK(dummy_spinlock); \
 +       dst = src; \
 +       src = value; \
 +       S_UNLOCK(dummy_spinlock); } while (0)
 +#endif

 Locking a dummy backend-local spinlock doesn't provide atomicity
 across multiple processes.

 Right, of course. I don't know why I assumed that dummy_spinlock would
 be global. In any case, this is very WIP and doesn't even aspire to be
 portable yet. The main point was to see if there's any significant
 performance to be gained by this method.

yeah -- those are fallback routines in case the compiler primitives
don't exist.  I think I understand what Ants is doing here: he's
reducing the coverage of the free list lock to only cover actually
popping a buffer off the free list; it no longer covers the clock
sweep.  That's a massive win if it works.   In order to get away with
that he had to decompose all manipulations from the clock sweep to the
Strategy to thread safe atomic operations.

What happens (in the very unlikely, but possible case?) if another
backend races to the buffer you've pointed at with 'victim'?  It looks
like multiple backends share the clock sweep now, but don't you need
to need an extra test to ensure it's still a candidate victim buffer?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2012-06-04 Thread Fujii Masao
On Mon, Jun 4, 2012 at 11:25 PM, Ants Aasma a...@cybertec.at wrote:
 On Thu, Sep 29, 2011 at 11:30 PM, Magnus Hagander mag...@hagander.net wrote:
 it doesn't say that is not possible to use this for a standby
 server... probably that's why i get the error i put a recovery.conf
 after pg_basebackup finished... maybe we can say that  more loudly?

 The idea is, if you use it with -x (or --xlog), it's for taking a
 backup/clone, *not* for replication.

 If you use it without -x, then you can use it as the start of a
 replica, by adding a recovery.conf.

 But you can't do both at once, that will confuse it.

 I stumbled upon this again today. There's nothing in the docs that
 would even hint that using -x shouldn't work to create a replica. Why
 does it get confused and can we (easily) make it not get confused? At
 the very least it needs a big fat warning in documentation for the -x
 option that the resulting backup might not be usable as a standby.

Unless I'm missing something, you can use pg_basebackup -x for the
standby. If lots of WAL files are generated in the master after
pg_basebackup -x ends and before you start the standby instance,
you may get the following error. In this case, you need to consult with
archived WAL files even though you specified -x option in pg_basebackup.

 FATAL:  could not receive data from WAL stream: FATAL:  requested WAL
 segment 0001005C has already been removed

Though we have the above problem, pg_basebackup -x is usable for
the standby, I think.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-06-04 Thread Merlin Moncure
On Mon, Jun 4, 2012 at 10:17 AM, Merlin Moncure mmonc...@gmail.com wrote:
 What happens (in the very unlikely, but possible case?) if another
 backend races to the buffer you've pointed at with 'victim'?  It looks
 like multiple backends share the clock sweep now, but don't you need
 to need an extra test to ensure it's still a candidate victim buffer?

Actually, I don't think you do: the existing check on refcount is
probably good enough.  Hm, why did you get rid of
BufferStrategyControl.lastFreeBuffer?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-06-04 Thread Kohei KaiGai
2012/6/4 Robert Haas robertmh...@gmail.com:
 On Sat, Jun 2, 2012 at 12:58 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2012/6/1 Robert Haas robertmh...@gmail.com:
 On Thu, May 31, 2012 at 3:52 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 It may be an option to separate the case into two; a situation to execute
 the given query immediately just after optimization and never reused,
 and others.

 Yes.  Simon suggested exactly this a while back, and I agree with him
 that we ought to have it.

 It is good for me, also.

 Then, if so, we will be able to push the stuff corresponding to
 RLSBYPASS into the query optimization, and works transparently
 for users.

 You're still going to need a way to make sure that the cluster can be
 dumped properly.  RLSBYPASS accomplishes that; your scheme doesn't.

 If something like has_superuser_privilege() OR is automatically
 appended to user given clauses, it makes sure whole of the database
 cluster is dumped.
 That also means any permission checks are delayed to executor stage
 (except for the case on simple query protocol, I mentioned above),
 thus it simplifies the condition to invalidate prepared statement.

 One problem I noticed last night around RLSBYPASS approach is:
 it can take much number of invalidation whenever current user-id is
 switched. Not only SET AUTHORIZATION, but SECURITY DEFINER
 function's invocation also. I'm not sure whether this invalidation
 storm is reasonable level, or not.

 Is it unavailable to handle this type of implicit superuser checks
 with existing EXECUTE statement?
 I tried to run EXPLAIN with similar case.

 postgres=# PREPARE p1(int, bool) AS SELECT * FROM tbl WHERE y  $1 OR $2;
 PREPARE
 postgres=# EXPLAIN EXECUTE p1(10, current_user in ('kaigai','rhaas'));
                       QUERY PLAN
 
  Seq Scan on tbl  (cost=0.00..21.60 rows=1160 width=40)
 (1 row

 However,

 postgres=# PREPARE p2(int) AS SELECT * FROM tbl
                   WHERE y  $1 OR current_user in ('kaigai','rhaas');
 PREPARE
 postgres=# EXPLAIN EXECUTE p2(10);
                                 QUERY PLAN
 -
  Seq Scan on tbl  (cost=0.00..30.30 rows=394 width=40)
   Filter: ((y  10) OR (current_user() = ANY ('{kaigai,rhaas}'::name[])))
 (2 rows)

 Please assume the second condition something like superuser
 checks in addition to the main security policy.
 It implies an idea to replace a certain portion of clause that
 consists of only constant values and stable / immutable functions
 by shadow parameter to be calculated at executor stage, makes
 sense to wipe out RLS policy for superusers. In addition, it requires
 no new invalidation mechanism to prepared statements.

 I'm not sure how best to handle the invalidation issue... but I still
 think that relying on the query planner to use theorem-proving to
 simplify RLS conditions is going to be problematic.  You can certainly
 try it ad see how it comes out...

I think, the enhancement of planner can be handled independently
from RLS policy stuff, and we should avoid to contain many complex
stuff into one functionality.
At least, it will work correctly with idea to append implicit condition
(OR has_superuser_privilege()) without any invalidation mechanism,
although its performance penalty is not negligible; without planner-
enhancement.
How about your opinion?

I'm worry about future maintenance issues, once we have
RLSBYPASS permission or something user visible...

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-06-04 Thread Ants Aasma
On Mon, Jun 4, 2012 at 6:38 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Mon, Jun 4, 2012 at 10:17 AM, Merlin Moncure mmonc...@gmail.com wrote:
 What happens (in the very unlikely, but possible case?) if another
 backend races to the buffer you've pointed at with 'victim'?  It looks
 like multiple backends share the clock sweep now, but don't you need
 to need an extra test to ensure it's still a candidate victim buffer?

 Actually, I don't think you do: the existing check on refcount is
 probably good enough.  Hm, why did you get rid of
 BufferStrategyControl.lastFreeBuffer?

It was dead code as far as I could tell. That change isn't actually
relevant for this patch because free-list management is still
protected by a lock (except the initial unlocked test that is
doublechecked under lock) and so doesn't need any adjustment.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2012-06-04 Thread Ants Aasma
On Mon, Jun 4, 2012 at 6:20 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Jun 4, 2012 at 11:25 PM, Ants Aasma a...@cybertec.at wrote:
 On Thu, Sep 29, 2011 at 11:30 PM, Magnus Hagander mag...@hagander.net 
 wrote:
 it doesn't say that is not possible to use this for a standby
 server... probably that's why i get the error i put a recovery.conf
 after pg_basebackup finished... maybe we can say that  more loudly?

 The idea is, if you use it with -x (or --xlog), it's for taking a
 backup/clone, *not* for replication.

 If you use it without -x, then you can use it as the start of a
 replica, by adding a recovery.conf.

 But you can't do both at once, that will confuse it.

 I stumbled upon this again today. There's nothing in the docs that
 would even hint that using -x shouldn't work to create a replica. Why
 does it get confused and can we (easily) make it not get confused? At
 the very least it needs a big fat warning in documentation for the -x
 option that the resulting backup might not be usable as a standby.

 Unless I'm missing something, you can use pg_basebackup -x for the
 standby. If lots of WAL files are generated in the master after
 pg_basebackup -x ends and before you start the standby instance,
 you may get the following error. In this case, you need to consult with
 archived WAL files even though you specified -x option in pg_basebackup.

 FATAL:  could not receive data from WAL stream: FATAL:  requested WAL
 segment 0001005C has already been removed

 Though we have the above problem, pg_basebackup -x is usable for
 the standby, I think.

I assumed from Magnus's comment that this is a known problem. I wonder
what went wrong if it should have worked. In the case where this
turned up the missing file was an xlog file with the new timeline ID
but one segment before the timeline switch. I'll have to see if I can
create a reproducible case for this.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2012-06-04 Thread Magnus Hagander
On Mon, Jun 4, 2012 at 5:48 PM, Ants Aasma a...@cybertec.at wrote:
 On Mon, Jun 4, 2012 at 6:20 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Jun 4, 2012 at 11:25 PM, Ants Aasma a...@cybertec.at wrote:
 On Thu, Sep 29, 2011 at 11:30 PM, Magnus Hagander mag...@hagander.net 
 wrote:
 it doesn't say that is not possible to use this for a standby
 server... probably that's why i get the error i put a recovery.conf
 after pg_basebackup finished... maybe we can say that  more loudly?

 The idea is, if you use it with -x (or --xlog), it's for taking a
 backup/clone, *not* for replication.

 If you use it without -x, then you can use it as the start of a
 replica, by adding a recovery.conf.

 But you can't do both at once, that will confuse it.

 I stumbled upon this again today. There's nothing in the docs that
 would even hint that using -x shouldn't work to create a replica. Why
 does it get confused and can we (easily) make it not get confused? At
 the very least it needs a big fat warning in documentation for the -x
 option that the resulting backup might not be usable as a standby.

 Unless I'm missing something, you can use pg_basebackup -x for the
 standby. If lots of WAL files are generated in the master after
 pg_basebackup -x ends and before you start the standby instance,
 you may get the following error. In this case, you need to consult with
 archived WAL files even though you specified -x option in pg_basebackup.

 FATAL:  could not receive data from WAL stream: FATAL:  requested WAL
 segment 0001005C has already been removed

 Though we have the above problem, pg_basebackup -x is usable for
 the standby, I think.

 I assumed from Magnus's comment that this is a known problem. I wonder
 what went wrong if it should have worked. In the case where this
 turned up the missing file was an xlog file with the new timeline ID
 but one segment before the timeline switch. I'll have to see if I can
 create a reproducible case for this.

No, it's more a there's no reason to do that. I don't think it
should necessarily be an actual problem.

In your case the missing piece of information is why was there a
timeline switch? pg_basebackup shouldn't cause a timeline switch
whether you use it in -x mode or not.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-06-04 Thread Florian Pflug
On Jun4, 2012, at 17:38 , Kohei KaiGai wrote:
 I'm worry about future maintenance issues, once we have
 RLSBYPASS permission or something user visible…

I fear that without a generic way to disable RLS regardless which
RLS policy function is in effect, we're creating a huge maintenance
issue for DBAs. In a lot of shops, the DBA is responsible for a large
number of databases, each potentially using a completely different
approach to RLS and hence a completely different policy function.

Without something like RLSBYPASS, the DBA needs to have intimate
knowledge about the different RLS policies to e.g. guarantee that his
backups aren't missing crucial information, or that the replication
system indeed replicates all rows.

With RLSBYPASS, all he needs to do is grant one privilege to his
replication or backup user. The rest can be left to the development
or support team for a specific application.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2012-06-04 Thread Ants Aasma
On Mon, Jun 4, 2012 at 6:53 PM, Magnus Hagander mag...@hagander.net wrote:
 No, it's more a there's no reason to do that. I don't think it
 should necessarily be an actual problem.

Ok, good to know.

 In your case the missing piece of information is why was there a
 timeline switch? pg_basebackup shouldn't cause a timeline switch
 whether you use it in -x mode or not.

No mystery there. The timeline switch was because I had just promoted
the master for standby mode. There's a chance I might have
accidentally done something horribly wrong somewhere because I can't
immediately reproduce this. I'll let you know if I find out how I
managed to create this error.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Btree or not btree? That is the question

2012-06-04 Thread Greg Sabino Mullane
We have a 8.3.18 system (yes, the same one from the previous 
thread, finally upgraded!) that gave us this error yesterday:

ERROR:  index pg_class_oid_index is not a btree

It seems like this happened from just a normal SELECT query, and did 
not occur again (and this is a very busy system) The pg_class table 
does gets vacuumed and reindexed often. All appeared fine when I 
did some manual investigation.

Since this only happened this one time, can I chalk it up to 
some random rare race condition having to do with a reindex? 
Or is there anything else I should be looking for or can 
proactively do? No other system catalog problems have been seen 
before it or since.

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgp83Kexp6hqX.pgp
Description: PGP signature


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-06-04 Thread Kohei KaiGai
2012/6/4 Florian Pflug f...@phlo.org:
 On Jun4, 2012, at 17:38 , Kohei KaiGai wrote:
 I'm worry about future maintenance issues, once we have
 RLSBYPASS permission or something user visible…

 I fear that without a generic way to disable RLS regardless which
 RLS policy function is in effect, we're creating a huge maintenance
 issue for DBAs. In a lot of shops, the DBA is responsible for a large
 number of databases, each potentially using a completely different
 approach to RLS and hence a completely different policy function.

Here is two problems around RLSBYPASS. The first is we have
no idea to handle invalidation of prepared-statement when current
user is switched, right now. The second is we can have another
way to describe same RLS policy without PG original enhancement
towards permission mechanism...

 Without something like RLSBYPASS, the DBA needs to have intimate
 knowledge about the different RLS policies to e.g. guarantee that his
 backups aren't missing crucial information, or that the replication
 system indeed replicates all rows.

 With RLSBYPASS, all he needs to do is grant one privilege to his
 replication or backup user. The rest can be left to the development
 or support team for a specific application.

It seems to me you can define a function which implements site-
specific security requirement (E.g backup should not be prevented
by RLS policy), then include it as a part of RLS policy
(or implicitly added by extensions, like sepgsql tries to do).

These are the reason why I hesitate to go ahead with RLSBYPASS
permission.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-06-04 Thread Merlin Moncure
On Mon, Jun 4, 2012 at 10:42 AM, Ants Aasma a...@cybertec.at wrote:
 On Mon, Jun 4, 2012 at 6:38 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Mon, Jun 4, 2012 at 10:17 AM, Merlin Moncure mmonc...@gmail.com wrote:
 What happens (in the very unlikely, but possible case?) if another
 backend races to the buffer you've pointed at with 'victim'?  It looks
 like multiple backends share the clock sweep now, but don't you need
 to need an extra test to ensure it's still a candidate victim buffer?

 Actually, I don't think you do: the existing check on refcount is
 probably good enough.  Hm, why did you get rid of
 BufferStrategyControl.lastFreeBuffer?

 It was dead code as far as I could tell. That change isn't actually
 relevant for this patch because free-list management is still
 protected by a lock (except the initial unlocked test that is
 doublechecked under lock) and so doesn't need any adjustment.

I have to admit -- this is pretty cool.  The changes look pretty clean
but I think it's going to need some benchmarking to prove it's
actually faster under all workloads and some more review though. If
there's heavy contention on getting victim buffers you're going to be
making a lot of atomic operations -- each one of which will lock down
the cache line.  A yielding lwlock is obviously going to have very
different behaviors under contention, so there needs to be a
performance test that really burns up the freelist.

If it's Well And Truly faster, you have to immediately start wondering
if the freelist lock can't be completely scrapped: buffer invalidation
only occurs in fairly rare events -- but when it does ISTM to be
fairly counter productive.  In trying to avoid the sweep you end up
pushing everyone towards a locky free list.  I don't know about you,
but I'd prefer to have more consistent behavior (as in, not drop big
table, concurrency drops 50%).

Noted, this doesn't really address the OP's problem :-).  I bet it'd
supplement the nailing strategy nicely however.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-06-04 Thread Tom Lane
Kohei KaiGai kai...@kaigai.gr.jp writes:
 Here is two problems around RLSBYPASS. The first is we have
 no idea to handle invalidation of prepared-statement when current
 user is switched, right now.

How is that specifically the fault of RLSBYPASS?  *Any* of the schemes
you're proposing for inlined RLS checks will have problems with userID
switching.

My guess is we'd have to treat the effective userID as part of the
plancache lookup key to make it safe to inline anything related to RLS.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree or not btree? That is the question

2012-06-04 Thread Tom Lane
Greg Sabino Mullane g...@endpoint.com writes:
 We have a 8.3.18 system (yes, the same one from the previous 
 thread, finally upgraded!) that gave us this error yesterday:

 ERROR:  index pg_class_oid_index is not a btree

That means you got bogus data while reading the metapage.
I'm beginning to wonder about the hardware on this server ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-06-04 Thread Kohei KaiGai
2012/6/4 Tom Lane t...@sss.pgh.pa.us:
 Kohei KaiGai kai...@kaigai.gr.jp writes:
 Here is two problems around RLSBYPASS. The first is we have
 no idea to handle invalidation of prepared-statement when current
 user is switched, right now.

 How is that specifically the fault of RLSBYPASS?  *Any* of the schemes
 you're proposing for inlined RLS checks will have problems with userID
 switching.

Really? I don't find out a scenario that cause a problem with user-id
switching in case when RLS policy is *unconditionally* appended then
evaluated on executor stage. I'd like to see the scenario.

 My guess is we'd have to treat the effective userID as part of the
 plancache lookup key to make it safe to inline anything related to RLS.

It might be a solution, if we append individual RLS policy at the
planner stage, depending on user-id.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

2012-06-04 Thread Christopher Browne
On Sat, Jun 2, 2012 at 12:01 PM, Michael Nolan htf...@gmail.com wrote:
 On 6/2/12, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On the other hand, if we simply say PostgreSQL computes the
 replication delay by subtracting the time at which the WAL was
 generated, as recorded on the master, from the time at which it is
 replayed by the slave then, hey, we still have a wart, but it's
 pretty clear what the wart is and how to fix it, and we can easily
 document that.  Again, if we could get rid of the failure modes and
 make this really water-tight, I think I'd be in favor of that, but it
 seems to me that we are in the process of expending a lot of energy
 and an even larger amount of calendar time to create a system that
 will misbehave in numerous subtle ways instead of one straightforward
 one.  I don't see that as a good trade.

 Well, okay, but let's document if you use this feature, it's incumbent
 on you to make sure the master and slave clocks are synced.  We
 recommend running NTP. or words to that effect.

 What if the two servers are in different time zones?

NTP shouldn't have any problem; it uses UTC underneath.  As does
PostgreSQL, underneath.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VIP: new format for psql - shell - simple using psql in shell

2012-06-04 Thread Jim Nasby

On 5/26/12 10:16 PM, Pavel Stehule wrote:

My idea is secure to separator - because separator is just space and
new line and these symbols are escaped.


ISTM it'd be a really good idea to support something other than space, since 
presumably that'd be trivial.

I'm not a fan of supporting the array construction. If you get to that level of 
complexity in bash it's a really good sign that you need to use a real language.

If we do add array support then it needs to be keyed to the actual shell in 
use, because it's inconsistent between them.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No, pg_size_pretty(numeric) was not such a hot idea

2012-06-04 Thread Jim Nasby

On 5/27/12 2:54 PM, Euler Taveira wrote:

On 27-05-2012 10:45, Fujii Masao wrote:

OK, let me propose another approach: add pg_size_pretty(int).
If we do this, all usability and performance problems will be solved.


I wouldn't like to add another function but if it solves both problems... +1.


FWIW, I would argue that the case of pg_size_pretty(8*1024*1024) is pretty 
contrived... when would you actually do something like that? ISTM that any time 
you're using pg_size_pretty you'd be coming off a real datatype.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in new buffering GiST build code

2012-06-04 Thread Alexander Korotkov
On Mon, May 28, 2012 at 1:46 AM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Sat, May 26, 2012 at 12:33 AM, Heikki Linnakangas 
 heikki.linnakan...@enterprisedb.com wrote:

 Alexander, do you still have the test environments and data lying around
 that you used for GiST buffering testing last summer? Could you rerun some
 of those tests with this patch?


 I think I can restore test environment and data. Will rerun tests soon.


I rerun some of tests. There are index build times in seconds for old way
of parent refind and new way of it.

  old  new
usnoa2   2385 2452
usnoa2_shuffled  8131 8055
uniform  8327 8359

I thinks difference can be described by round error.
Indexes seem to be exactly same. It's predictable because changing
algorithm of parent refind shouldn't change the result.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] No, pg_size_pretty(numeric) was not such a hot idea

2012-06-04 Thread Tom Lane
Jim Nasby j...@nasby.net writes:
 On 5/27/12 2:54 PM, Euler Taveira wrote:
 On 27-05-2012 10:45, Fujii Masao wrote:
 OK, let me propose another approach: add pg_size_pretty(int).

 I wouldn't like to add another function but if it solves both problems... +1.

 FWIW, I would argue that the case of pg_size_pretty(8*1024*1024) is
 pretty contrived...

Yeah, possibly.  In any case, I don't think we're making either of these
changes in 9.2, because the time for forcing initdbs is past.  It would
only be realistic to think about changing pg_size_pretty() if we come
across some other, much more compelling reason to force a system catalog
contents change.

Assuming that's how 9.2 ships, we might as well wait to see if there
are any real complaints from the field before we decide whether any
changing is needed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-06-04 Thread Florian Pflug
On Jun4, 2012, at 18:38 , Kohei KaiGai wrote:
 2012/6/4 Florian Pflug f...@phlo.org:
 Without something like RLSBYPASS, the DBA needs to have intimate
 knowledge about the different RLS policies to e.g. guarantee that his
 backups aren't missing crucial information, or that the replication
 system indeed replicates all rows.
 
 With RLSBYPASS, all he needs to do is grant one privilege to his
 replication or backup user. The rest can be left to the development
 or support team for a specific application.

 It seems to me you can define a function which implements site-
 specific security requirement (E.g backup should not be prevented
 by RLS policy), then include it as a part of RLS policy
 (or implicitly added by extensions, like sepgsql tries to do).

Sure. But that requires each and every application which uses RLS
to provide support for special backup (or replication, or whatever)
privileges. And it requires the DBA to know how to assign these
privileges to certain roles for each any every application in question.
For shops which uses a lot of different applications, all with their
own RLS policy, that can quickly get out of hand.

Plus, a bug in one of these RLS policies has the potential to render
backups incomplete.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-06-04 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 On Jun4, 2012, at 18:38 , Kohei KaiGai wrote:
 2012/6/4 Florian Pflug f...@phlo.org:
 Without something like RLSBYPASS, the DBA needs to have intimate
 knowledge about the different RLS policies to e.g. guarantee that his
 backups aren't missing crucial information, or that the replication
 system indeed replicates all rows.

 It seems to me you can define a function which implements site-
 specific security requirement (E.g backup should not be prevented
 by RLS policy), then include it as a part of RLS policy
 (or implicitly added by extensions, like sepgsql tries to do).

 Sure. But that requires each and every application which uses RLS
 to provide support for special backup (or replication, or whatever)
 privileges. And it requires the DBA to know how to assign these
 privileges to certain roles for each any every application in question.
 For shops which uses a lot of different applications, all with their
 own RLS policy, that can quickly get out of hand.

 Plus, a bug in one of these RLS policies has the potential to render
 backups incomplete.

I agree with Florian here: if there is no way to disable RLS for
backups, the database will be un-administratable.  RLSBYPASS is not
necessarily the only or best way to provide such an override, but we
have to have something that is simple, foolproof, and *not* dependent
on the details of any one RLS policy.

I suspect that KaiGai-san's objection basically comes down to not
wanting to have what amounts to a backdoor in RLS policies.  However,
what Florian is saying is that you have to have a backdoor anyway,
unless you'd like to be at risk of losing data because it wasn't
backed up.  You can either have one well-understood, well-documented,
well-tested backdoor, or you can have an ad-hoc backdoor in each RLS
policy.  Nobody can think that the latter approach is preferable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.3: load path to mitigate load penalty for checksums

2012-06-04 Thread Jeff Davis
Introduction:
=
A lot of progress has been made on the checksums issue, with Simon's
excellent work beginning to gain consensus:

http://archives.postgresql.org/message-id/CA
+u5nmkw_gbs6qq_y8-rjgl1v7mvw2hwbhartb8lojhnpfx...@mail.gmail.com

For the purposes of this proposal, I'll assume that's the general
direction we'll be taking for CRCs.

The major drawback of that proposal is that it increases the amount of
work to be done after a large data load by requiring more WAL.

Proposal:
=
I propose a special user-initiated loading mode at the table
granularity. During this time, readers must ignore PD_ALL_VISIBLE,
HEAP_XMIN_COMMITTED, and the visibility map entirely. However, writers
may set all of those bits before the writing transaction commits,
obviating the need to rewrite (and WAL) the data again later. Ideally,
there would be no work for VACUUM to do after the data load (unless a
transaction aborted).

This would also help the non-CRC case of course, but I expect CRCs to
make this significantly more important.

Goals:
=

  * Table granularity (doesn't affect other tables at all)
  * Allows concurrent loaders
  * Allows loading into existing tables with existing data
  * Online (allow reads to proceed, even if degraded)

Rough user interaction:
=

  INITIATE LOAD ON foo AS 'job name';

  -- run concurrent loading sessions

  FINALIZE LOAD 'job name';

High-level design:
=

By hints I mean the VM bit, PD_ALL_VISIBLE, and HEAP_XMIN_COMMITTED.

By ignorehints I mean a flag in pg_class indicating that readers
should ignore hints.

By optimistichints I mean a flag in pg_class indicating that writers
can optimistically set hints.

Obviously, readers and writers would need a mechanism to honor those
flags, but I haven't dug into the details yet (additional routines in
tqual.c?).

  States:
0: normal
   * ignorehints = false
   * optimistichints = false
1: trying to enter data load mode, waiting on existing
   lockers (who might be scanning) to finish
   * ignorehints = true
   * optimistichints = false
2: data load mode
   * ignorehints = true
   * optimistichints = true
3: trying to leave data load mode, waiting on old snapshots to
   be released and aborted transactions to be cleaned up
   * ignorehints = true
   * optimistichints = false

INITIATE LOAD would first transition from state 0 to 1 by acquiring a
ShareUpdateExclusiveLock on the table (to be sure no concurrent INITIATE
or FINALIZE LOAD is going on) and setting ignorehints = true.

Then it moves from state 1 to state 2 by waiting for all transactions
that hold a lock on that table. Any transactions that don't already have
a lock will see the new flag when they try to get it. Now we're sure
that all readers will see the ignorehints flag, so we can set the
optimistichints flag to indicate that writers can write hints
optimistically.

FINALIZE LOAD would first move from state 2 to state 3 by acquiring a
ShareUpdateExclusiveLock on the table setting optimistichints = false.

Then, it would move from state 3 to state 0 by first waiting for all
transactions that currently hold a lock on the table, to ensure they see
the optimistichints=false flag. Then, it would remember the current xid
as max_loader_xid, and wait until the global xmin is greater than
max_loader_xid. This should ensure that all snapshots regard all loading
transactions as complete. Also, it would need to run a lazy VACUUM to
remove any tuples from aborted transactions.

Details and optimizations
=
 * We probably want a graceful way to handle multiple data loads
happening on the same table. Rather than erroring out, we could treat it
more like a reference count, and only do the work to move in to data
load mode if not there already, and only move out of data load mode if
we're the last loading job on the table.
 * In general, there are some usability issues to sort out, to make sure
a table isn't put into data load mode and left that way. Right now, I'm
mostly concerned with getting a working design, but those will be
important, too.
 * We could optimize away the VACUUM going from 3 - 0 if we are sure no
writing transactions aborted.
 * INITIATE and FINALIZE probably need to use PreventTransactionChain()
and multiple transactions, to avoid holding the ShareUpdateExclusiveLock
for too long. Also, we want to keep people from using it in the same
transaction as the loading xact, because they might not realize that
they would get a concurrency of 1 that way (because of the
ShareUpdateExclusiveLock).

Thoughts?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers