Re: [HACKERS] TAP test breakage on MacOS X

2014-10-31 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Thu, Oct 30, 2014 at 10:49:33PM -0400, Tom Lane wrote:
 I think the installs as such aren't the only reason for the sucky
 performance.  We need to also reduce the number of initdb cycles incurred
 by the TAP tests.  It's useless for example that the pg_controldata test
 creates its very own $PGDATA rather than sharing one with other tests.

 One could memoize initdb within the suite.  Call it once per distinct command
 line, caching the resulting data directory.  Copy the cached data directory
 for each test desiring one.

At least on older/slower machines like prairiedog, even having to copy
$PGDATA for each test is unappealing.  Some numbers for reference:

make install22 sec
initdb  76 sec
initdb -N   33 sec
cp -r $PGDATA /tmp  17 sec

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-10-31 Thread Andres Freund
On 2014-10-31 14:55:11 +0900, Michael Paquier wrote:
 On Tue, Oct 28, 2014 at 9:25 PM, Simon Riggs si...@2ndquadrant.com wrote:
 
  On 13 October 2014 10:05, Petr Jelinek p...@2ndquadrant.com wrote:
 
   I worked bit on this patch to make it closer to committable state.
 
   Here is updated version that works with current HEAD for the October
   committfest.
 
  I've reviewed this and it looks good to me. Clean, follows existing
  code neatly, documented and tested.
 
  Please could you document a few things
 
  * ExtendCommitTS() works only because commit_ts_enabled can only be
  set at server start.
  We need that documented so somebody doesn't make it more easily
  enabled and break something.
  (Could we make it enabled at next checkpoint or similar?)
 
  * The SLRU tracks timestamps of both xids and subxids. We need to
  document that it does this because Subtrans SLRU is not persistent. If
  we made Subtrans persistent we might need to store only the top level
  xid's commitTS, but that's very useful for typical use cases and
  wouldn't save much time at commit.
 
 
 Hm. What is the performance impact of this feature using the latest version
 of this patch?

I haven't measured it recently, but it wasn't large, but measureable.

 I imagine that the penalty of the additional operations this
 feature introduces is not zero, particularly for loads with lots of short
 transactions.

Which is why you can disable it...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-10-31 Thread furuyao
  We seem to be going in circles. You suggested having two options,
  --feedback, and --fsync, which is almost exactly what Furuya posted
  originally. I objected to that, because I think that user interface
 is
  too complicated. Instead, I suggested having just a single option
  called --synchronous, or even better, have no option at all and have
  the server tell the client if it's participating in synchronous
  replication, and have pg_receivexlog automatically fsync when it is,
  and not otherwise [1]. That way you don't need to expose any new
  options to the user. What did you think of that idea?
 
 I think it's pretty weird to make the fsync behavior of the client is
 controlled by the server.
 
 I also don't think that fsync() on the client side is useless in
 asynchronous replication.  Yeah, it's true that there are no
 *guarantees* with asynchronous replication, but the bound on how long
 the data can take to get out to disk is a heck of a lot shorter if you
 fsync frequently than if you don't.  And on the flip side, that has a
 performance impact.
 
 So I actually think the design you proposed is not as good as what was
 proposed by Furuya and Simon.  But I don't feel incredibly strongly about
 it.

Thanks for lots of comments!!

I fixed the patch.
As a default, it behave like a walreceiver.
Same as walreceiver, it fsync and send a feedback after fsync.

When --no-fsync is specified, it will fsync only wal has switched.

feedback message is specified by --status-interval.

Regards,

--
Furuya Osamu



pg_receivexlog-fsync-feedback-v7.patch
Description: pg_receivexlog-fsync-feedback-v7.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lockless StrategyGetBuffer() clock sweep

2014-10-31 Thread Amit Kapila
On Thu, Oct 30, 2014 at 5:01 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-10-30 10:23:56 +0530, Amit Kapila wrote:
  I have a feeling that this might also have some regression at higher
  loads (like scale_factor = 5000, shared_buffers = 8GB,
  client_count = 128, 256) for the similar reasons as bgreclaimer patch,
  means although both reduces contention around spin lock, however
  it moves contention somewhere else.  I have yet to take data before
  concluding anything (I am just waiting for your other patch (wait free
  LW_SHARED) to be committed).

 I have a hard time to see how this could be. In the uncontended case the
 number of cachelines touched and the number of atomic operations is
 exactly the same. In the contended case the new implementation does far
 fewer atomic ops - and doesn't do spinning.

 What's your theory?

I have observed that once we reduce the contention in one path, it doesn't
always lead to performance/scalability gain and rather shifts to other lock
if that exists.  This is the reason why we have to work on reducing
contention
around both BufFreeList and Buf Mapping tables lock together.  I have taken
some performance data and it seems this patch also exhibits similar
behaviour
as bgreclaimer patch and I believe resolving contention around dynahash can
improve the situation (Robert's chash patch can be helpful).


Performance Data
--
Configuration and Db Details
IBM POWER-8 24 cores, 192 hardware threads
RAM = 492GB
max_connections = 300
shared_buffers = 8GB
checkpoint_segments=30
checkpoint_timeout=15min
Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8)
Duration of each individual run = 5mins
Test mode - pgbench readonly (-M prepared)

Data below is median of 3 runs, for individual run data check document
attached with this mail.

Scale_Factor = 1000
  Patch_ver/Client_Count 128 256  HEAD 265502 201689  Patch 283448 224888

Scale_Factor = 5000
  Patch_ver/Client_Count 128 256  HEAD 190435 177477  Patch 171485 167794


The above data indicates that there is performance gain at scale factor
1000, however there is a regression at scale factor 5000.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


perf_lockless_strategy_getbuf.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Tweaking Foreign Keys for larger tables

2014-10-31 Thread Simon Riggs
Various ways of tweaking Foreign Keys are suggested that are helpful
for larger databases.

* Deferrable Enforcement Timing Clause

* NOT DEFERRABLE - immediate execution
* DEFERRABLE
*INITIALLY IMMEDIATE - existing
*INITIALLY DEFERRED - existing
*INITIALLY NOT ENFORCED
FK created, but is not enforced during DML.
Will be/Must be marked NOT VALID when first created.
We can run a VALIDATE on the constraint at any time; if it passes the
check it is marked VALID and presumed to stay that way until the next
VALIDATE run. If it fails that check the FK would be marked as NOT
VALID, causing it to be no longer useful for optimization.
This allows FKs to be checked in bulk, rather than executing during
front-end code path, but yet still be there for optimization and
documentation (or visibility by tools etc).

There is no corresponding SET CONSTRAINTs call for the NOT ENFORCED
case, since that would require us to mark the constraint as not valid.

* Referenced Table actions

ON DELETE IGNORE
ON UPDATE IGNORE
If we allow this specification then the FK is one way - we check the
existence of a row in the referenced table, but there is no need for a
trigger on the referenced table to enforce an action on delete or
update, so no need to lock the referenced table when adding FKs.
This is very useful for very highly referenced tables.
Or for larger tables where we aren't planning on deleting or updating
the referenced table without also deleting or updating the referencing
table.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Reducing Catalog Locking

2014-10-31 Thread Simon Riggs
Recent work on parallel query has opened my eyes to exactly how
frequently we request locks on various catalog tables. (Attached file
is a lock analysis on a representative Pg server).

Given these are catalog tables, we aren't doing much to them that
requires a strong lock. Specifically, only CLUSTER and VACUUM FULL
touch those tables like that. When we do that, pretty much everything
else hangs, cos you can't get much done while fundamental tables are
locked.

So my proposal is that we invent a big catalog lock. The benefit of
this is that we greatly reduce lock request traffic, as well as being
able to control exactly when such requests occur. (Fine grained
locking isn't always helpful).

Currently, SearchCatCache() requests locks on individual catalog
tables. Alternatively, we could request an AccessShareLock on a big
catalog lock that must be accessed first before a strong
relation-specific lock is requested. We just need to change the lockid
used for each cache.

We can still CREATE, ALTER, DROP and VACUUM all catalog tables - but
this idea would block VACUUM FULL, but that would have been blocked
anyway by general activity.

We reduce lock traffic by having SearchCatCache() use a new call
heap_catalog_open() which calls a new LockCatalogLock() which
specifically caches whether we have already locked the BigCatalogLock
or not. That cache can be cleared easily and cheaply at EOX. And it
can be set quickly and easily in parallel worker tasks.

We then add a special locking clause for VAC FULL on catalog tables.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


stat.out
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Simon Riggs
On 31 October 2014 04:42, Amit Kapila amit.kapil...@gmail.com wrote:

 In fact it would be more sensible to lock the toast table earlier.


 It might make some sense to lock the toast table earlier for this
 particular case, but I don't think in general it will be feasible to lock
 all the tables (including catalog tables) which might get used in the
 overall operation before starting parallel operation.  I believe it will
 make doing parallel stuff difficult if we try to go via this route, as
 we need to always do this for any operation we want to make parallel.
 It will always have the risk that we might miss something and another
 thing is it might not be feasible in all kind of cases.

 If I understand correctly, you are suggesting that to get the first version
 of parallel operation out earlier, we should try to avoid all the stuff
 (like group locking, ...),

I advocate trying to avoid it, to see. The problems cited so far are
not so extreme they cannot be easily got around, if you have a will to
do so.

I've just posted about an idea to reduce catalog locking.

 have some restrictions on which kind of cases
 Create Index can work, do some hackery in Create Index path to avoid
 any kind of locking problems and do the parallel operation for External
 Sort (which might avoid need for shared memory allocator) and then
 call it a day and then do the things which we need for other parallel
 operations as and when they are required.

 I think this might be a good
 approach in itself if somebody wants to target something to release
 early, however in medium to short term when we want to provide
 non-trivial parallel operations we have to eventually solve those problems
 and delaying will only make the things worse.

My (by now) long experience on Postgres has shown me that developing
something now is a good route to take. By moving towards a solution at
a reasonable pace you learn things which may affect the longer term
viability of the project. New functionality in each release is useful.
Big bang developments that don't deliver until the end have a habit of
not delivering at the end either. MVP is a term frequently used in
VCs for a reason.

Lack of urgency in delivering a useful outcome from the project looks
strange. If all you develop in this release is core infrastructure for
parallelism and a custom scan API, it begins to look like there may
not be an open source version delivered at all. How people make their
income is up to them, but its a concern that's been raised my way
before, so it seems reasonable to do so for others also. That is why
we have spent much time explaining clearly the goals and licencing of
BDR, for example. No need for big arguments; these are not allegations
or implications etc..

 In short, I agree that there is a merit in your idea w.r.t getting the first
 version of parallel operation out early, however if we see from medium
 to long term, I feel solving group locking problem (or some other general
 infrastructure what Robert mentioned upthread) can yield better results,
 unless you feel that is not at all required for parallel operations.

Is it genuinely required for most parallel operations? I think it's
clear that none of us knows the answer. Sure, the general case needs
it, but is the general case the same thing as the reasonably common
case?

Certainly, having a working prototype for something useful would help
build the case for why these things are needed. It would certainly
help the cause to have something working in this release.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Robert Haas
On Fri, Oct 31, 2014 at 6:41 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Is it genuinely required for most parallel operations? I think it's
 clear that none of us knows the answer. Sure, the general case needs
 it, but is the general case the same thing as the reasonably common
 case?

Well, I think that the answer is pretty clear.  Most of the time,
perhaps in 99.9% of cases, group locking will make no difference as to
whether a parallel operation succeeds or fails.  Occasionally,
however, it will cause an undetected deadlock.  I don't hear anyone
arguing that that's OK.  Does anyone wish to make that argument?

If not, then we must prevent it.  The only design, other than what
I've proposed here, that seems like it will do that consistently in
all cases is to have the user backend lock every table that the child
backend might possibly want to lock and retain those locks throughout
the entire duration of the computation whether the child would
actually need those locks or not.  I think that could be made to work,
but there are two probems:

1. Turing's theorem being what it is, predicting what catalog tables
the child might lock is not necessarily simple.

2. It might end up taking any more locks than necessary and holding
them for much longer than necessary.  Right now, for example, a
syscache lookup locks the table only if we actually need to read from
it and releases the lock as soon as the actual read is complete.
Under this design, every syscache that the parallel worker might
conceivably consult needs to be locked for the entire duration of the
parallel computation.  I would expect this to provoke a violent
negative reaction from at least one prominent community member (and I
bet users wouldn't like it much, either).

So, I am still of the opinion that group locking makes sense.   While
I appreciate the urge to avoid solving difficult problems where it's
reasonably avoidable, I think we're in danger of spending more effort
trying to avoid solving this particular problem than it would take to
actually solve it.  Based on what I've done so far, I'm guessing that
a complete group locking patch will be between 1000 and 1500 lines of
code and that nearly all of the new logic will be skipped when it's
not in use (i.e. no parallelism).  That sounds to me like a hell of a
deal compared to trying to predict what locks the child might
conceivably take and preemptively acquire them all, which sounds
annoyingly tedious even for simple things (like equality operators)
and nearly impossible for anything more complicated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing Catalog Locking

2014-10-31 Thread Robert Haas
On Fri, Oct 31, 2014 at 6:35 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Recent work on parallel query has opened my eyes to exactly how
 frequently we request locks on various catalog tables. (Attached file
 is a lock analysis on a representative Pg server).

That analysis is interesting.

 Given these are catalog tables, we aren't doing much to them that
 requires a strong lock. Specifically, only CLUSTER and VACUUM FULL
 touch those tables like that. When we do that, pretty much everything
 else hangs, cos you can't get much done while fundamental tables are
 locked.

True, although it's currently the case that catalog tables are only
locked for the minimum time necessary.  So, VF on pg_class will block
nearly any new query from starting up, but already-running queries may
be able to keep going, and idle transactions don't cause a problem.
If we held system catalogs until transaction commit, a VF on pg_class
would basically wait until every other transaction in the system
completed and preclude any other transaction from starting until it
finished.  That's significantly more problematic in my view.

I think that the fast-path locking mechanism prevents the overwhelming
majority of lock-related pain for these kinds of things.  Most system
catalog locks are weak within the meaning of fast-path locking, so
the main lock table is rarely touched at all, and manipulating our own
PGPROC - which generally nobody else is touching - just isn't that
expensive.

On a related note, I've previously had the thought that it would be
nice to have a big DDL lock - that is, a lock that prevents
concurrent DDL without preventing anything else - so that pg_dump
could get just that one lock and then not worry about the state of the
world changing under it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Andres Freund
On 2014-10-31 08:54:54 -0400, Robert Haas wrote:
 On Fri, Oct 31, 2014 at 6:41 AM, Simon Riggs si...@2ndquadrant.com wrote:
  Is it genuinely required for most parallel operations? I think it's
  clear that none of us knows the answer. Sure, the general case needs
  it, but is the general case the same thing as the reasonably common
  case?
 
 Well, I think that the answer is pretty clear.  Most of the time,
 perhaps in 99.9% of cases, group locking will make no difference as to
 whether a parallel operation succeeds or fails.  Occasionally,
 however, it will cause an undetected deadlock.  I don't hear anyone
 arguing that that's OK.  Does anyone wish to make that argument?

Maybe we can, as a first step, make those edges in the lock graph
visible to the deadlock detector? It's pretty clear that undetected
deadlocks aren't ok, but detectable deadlocks in a couple corner cases
might be acceptable.

 If not, then we must prevent it.  The only design, other than what
 I've proposed here, that seems like it will do that consistently in
 all cases is to have the user backend lock every table that the child
 backend might possibly want to lock and retain those locks throughout
 the entire duration of the computation whether the child would
 actually need those locks or not.  I think that could be made to work,
 but there are two probems:
 
 1. Turing's theorem being what it is, predicting what catalog tables
 the child might lock is not necessarily simple.

Well, there's really no need to be absolutely general here. We're only
going to support a subset of functionality as parallelizable. And in
that case we don't need anything complicated to acquire all locks.

It seems quite possible to combine a heuristic approach with improved
deadlock detection.

 2. It might end up taking any more locks than necessary and holding
 them for much longer than necessary.  Right now, for example, a
 syscache lookup locks the table only if we actually need to read from
 it and releases the lock as soon as the actual read is complete.
 Under this design, every syscache that the parallel worker might
 conceivably consult needs to be locked for the entire duration of the
 parallel computation.  I would expect this to provoke a violent
 negative reaction from at least one prominent community member (and I
 bet users wouldn't like it much, either).

I see little reason to do this for system level relations.

 So, I am still of the opinion that group locking makes sense.   While
 I appreciate the urge to avoid solving difficult problems where it's
 reasonably avoidable, I think we're in danger of spending more effort
 trying to avoid solving this particular problem than it would take to
 actually solve it.  Based on what I've done so far, I'm guessing that
 a complete group locking patch will be between 1000 and 1500 lines of
 code and that nearly all of the new logic will be skipped when it's
 not in use (i.e. no parallelism).  That sounds to me like a hell of a
 deal compared to trying to predict what locks the child might
 conceivably take and preemptively acquire them all, which sounds
 annoyingly tedious even for simple things (like equality operators)
 and nearly impossible for anything more complicated.

What I'm worried about is the performance impact of group locking when
it's not in use. The heavyweight locking code is already quite complex
and often a noticeable bottleneck...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-31 Thread Andrew Dunstan


On 10/30/2014 09:17 PM, Andres Freund wrote:

On 2014-10-30 21:03:43 -0400, Tom Lane wrote:

Andres Freund and...@2ndquadrant.com writes:

On 2014-10-30 20:13:53 -0400, Tom Lane wrote:

As I said upthread, that approach seems to me to be contrary to the
project policy about how configure should behave.

I don't think that holds much water. There's a fair amount of things
that configure detects automatically. I don't think the comparison to
plperl or such is meaningful - that's a runtime/install time
difference. These tests are not.

Meh.  Right now, it's easy to dismiss these tests as unimportant,
figuring that they play little part in whether the completed build
is reliable.  But that may not always be true.  If they do become
a significant part of our test arsenal, silently omitting them will
not be cool for configure to do.

Well, I'm all for erroring out if somebody passed --enable-foo-tests and
the prerequisites aren't there. What I *am* against is requiring an
explicit flag to enable them because then they'll just not be run in
enough environments. And that's what's much more likely to cause
unnoticed bugs.


When this is properly sorted out I will enable this in the buildfarm 
default configuration. So I don't think that's going to be an issue in 
the long term.




cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing Catalog Locking

2014-10-31 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 Recent work on parallel query has opened my eyes to exactly how
 frequently we request locks on various catalog tables. (Attached file
 is a lock analysis on a representative Pg server).

 Given these are catalog tables, we aren't doing much to them that
 requires a strong lock. Specifically, only CLUSTER and VACUUM FULL
 touch those tables like that. When we do that, pretty much everything
 else hangs, cos you can't get much done while fundamental tables are
 locked.

So don't do that --- I'm not aware that either operation is ever
considered recommended on catalogs.

 So my proposal is that we invent a big catalog lock. The benefit of
 this is that we greatly reduce lock request traffic, as well as being
 able to control exactly when such requests occur. (Fine grained
 locking isn't always helpful).

 Currently, SearchCatCache() requests locks on individual catalog
 tables. Alternatively, we could request an AccessShareLock on a big
 catalog lock that must be accessed first before a strong
 relation-specific lock is requested. We just need to change the lockid
 used for each cache.

I doubt that this can ever be safe, because it will effectively assume
that all operations on catalog tables are done by code that knows that it
is accessing a catalog.  What about manual DML, or even DDL, on a catalog?
Miss even one place that can modify a table, and you have a problem.

More to the point, how would using a big lock not make the contention
situation *worse* rather than better?  At least if you decide you need
to cluster pg_statistic, you aren't blocking sessions that don't need
to touch pg_statistic --- and furthermore, they aren't blocking you.
I think the proposal would render it completely impossible to ever get a
strong lock on a catalog table in a busy system, not even a little-used
catalog.

In fact, since we can assume that a transaction trying to do CLUSTER
pg_class will have touched at least one syscache during startup, this
proposal would absolutely guarantee that would fail (even in a completely
idle system) because it would already hold the BigLock, and that would
have to be seen as existing use of the table.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Missing FIN_CRC32 calls in logical replication code

2014-10-31 Thread Andres Freund
On 2014-10-27 09:30:33 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-10-27 12:51:44 +0200, Heikki Linnakangas wrote:
  replication/slot.c and replication/logical/snapbuild.c use a CRC on the
  physical slot and snapshot files. It uses the same algorithm as used e.g.
  for the WAL. However, they are not doing the finalization step, FIN_CRC32()
  on the calculated checksums. Not that it matters much, but it's a bit weird
  and inconsistent, and was probably an oversight.
 
  Hm. Good catch - that's stupid. I wonder what to do about it. I'm
  tempted to just add a comment about it to 9.4 and fix it on master as
  changing it is essentially a catversion bump. Any objections to that?
 
 Yeah, I think you should get it right the first time.  It hardly seems
 likely that any 9.4 beta testers are depending on those files to be stable
 yet.

Since both state files have the version embedded it'd be trivial to just
do the FIN_CRC32() when loading a version 2 file. Does anybody object to
the relevant two lines of code + docs?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing Catalog Locking

2014-10-31 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On a related note, I've previously had the thought that it would be
 nice to have a big DDL lock - that is, a lock that prevents
 concurrent DDL without preventing anything else - so that pg_dump
 could get just that one lock and then not worry about the state of the
 world changing under it.

Hm ... how would that work exactly?  Every DDL operation has to take
the BigDDLLock in shared mode, and then pg_dump takes it in exclusive
mode?  That would preclude two pg_dumps running in parallel, which
maybe isn't a mainstream usage but still there's never been such a
restriction before.  Parallel pg_dump might have an issue in particular.

But more to the point, this seems like optimizing pg_dump startup by
adding overhead everywhere else, which doesn't really sound like a
great tradeoff to me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing Catalog Locking

2014-10-31 Thread Andres Freund
On 2014-10-31 09:48:52 -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On a related note, I've previously had the thought that it would be
  nice to have a big DDL lock - that is, a lock that prevents
  concurrent DDL without preventing anything else - so that pg_dump
  could get just that one lock and then not worry about the state of the
  world changing under it.
 
 Hm ... how would that work exactly?  Every DDL operation has to take
 the BigDDLLock in shared mode, and then pg_dump takes it in exclusive
 mode?

 That would preclude two pg_dumps running in parallel, which
 maybe isn't a mainstream usage but still there's never been such a
 restriction before.  Parallel pg_dump might have an issue in particular.

It should probably be a heavyweight lock. Then every DDL operation can
take it in RowExclusiveLock mode and pg_dump can take ShareLock. As
RowExclusive is a fastpath elegible lock, that'll not even hit the
global lock table most of the time.

 But more to the point, this seems like optimizing pg_dump startup by
 adding overhead everywhere else, which doesn't really sound like a
 great tradeoff to me.

Well, it'd finally make pg_dump correct under concurrent DDL. That's
quite a worthwile thing.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing Catalog Locking

2014-10-31 Thread Robert Haas
On Fri, Oct 31, 2014 at 9:54 AM, Andres Freund and...@2ndquadrant.com wrote:
 But more to the point, this seems like optimizing pg_dump startup by
 adding overhead everywhere else, which doesn't really sound like a
 great tradeoff to me.

 Well, it'd finally make pg_dump correct under concurrent DDL. That's
 quite a worthwile thing.

Yeah, exactly.  I agree with Tom that the overhead might be a concern.
But on the other hand, nobody has been more concerned about the
failure of pg_dump to handle this issue correctly than Tom.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-10-31 Thread Merlin Moncure
On Tue, Dec 10, 2013 at 2:48 PM, Robert Haas robertmh...@gmail.com wrote:
 Speaking of the functionality this does offer, it seems pretty limited. A
 commit timestamp is nice, but it isn't very interesting on its own. You
 really also want to know what the transaction did, who ran it, etc. ISTM
 some kind of a auditing or log-parsing system that could tell you all that
 would be much more useful, but this patch doesn't get us any closer to that.

 For what it's worth, I think that this has been requested numerous
 times over the years by numerous developers of replication solutions.
 My main question (apart from whether or not it may have bugs) is
 whether it makes a noticeable performance difference.  If it does,
 that sucks.  If it does not, maybe we ought to enable it by default.

+1

It's also requested now and then in the context of auditing and
forensic analysis of application problems.  But I also agree that the
tolerance for performance overhead is got to be quite low.  If a GUC
is introduced to manage the tradeoff, it should be defaulted to 'on'.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing Catalog Locking

2014-10-31 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-10-31 09:48:52 -0400, Tom Lane wrote:
 But more to the point, this seems like optimizing pg_dump startup by
 adding overhead everywhere else, which doesn't really sound like a
 great tradeoff to me.

 Well, it'd finally make pg_dump correct under concurrent DDL. That's
 quite a worthwile thing.

I lack adequate caffeine at the moment, so explain to me how this adds
any guarantees whatsoever?  It sounded like only a performance
optimization from here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-10-31 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 It's also requested now and then in the context of auditing and
 forensic analysis of application problems.  But I also agree that the
 tolerance for performance overhead is got to be quite low.  If a GUC
 is introduced to manage the tradeoff, it should be defaulted to 'on'.

Alvaro's original submission specified that the feature defaults to off.
Since there's no use-case for it unless you've installed some third-party
code (eg an external replication solution), I think that should stay true.
The feature's overhead might be small, but it is most certainly not zero,
and people shouldn't be paying for it unless they need it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Robert Haas
On Fri, Oct 31, 2014 at 9:07 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-10-31 08:54:54 -0400, Robert Haas wrote:
 On Fri, Oct 31, 2014 at 6:41 AM, Simon Riggs si...@2ndquadrant.com wrote:
  Is it genuinely required for most parallel operations? I think it's
  clear that none of us knows the answer. Sure, the general case needs
  it, but is the general case the same thing as the reasonably common
  case?

 Well, I think that the answer is pretty clear.  Most of the time,
 perhaps in 99.9% of cases, group locking will make no difference as to
 whether a parallel operation succeeds or fails.  Occasionally,
 however, it will cause an undetected deadlock.  I don't hear anyone
 arguing that that's OK.  Does anyone wish to make that argument?

 Maybe we can, as a first step, make those edges in the lock graph
 visible to the deadlock detector? It's pretty clear that undetected
 deadlocks aren't ok, but detectable deadlocks in a couple corner cases
 might be acceptable.

Interesting idea.  I agree that would be more acceptable.  It would be
kind of sad, though, if you got a deadlock from this:

BEGIN;
LOCK TABLE foo;
SELECT * FROM foo;

You can, of course, avoid that, but it's basically transferring the
burden from the lock manager to every bit of parallel code we ever
write.  If it turns out to be easier to do this than what I'm
currently proposing, I'm definitely amenable to going this route for
version 1, but I don't think it's going to be appealing as a permanent
solution.

 1. Turing's theorem being what it is, predicting what catalog tables
 the child might lock is not necessarily simple.

 Well, there's really no need to be absolutely general here. We're only
 going to support a subset of functionality as parallelizable. And in
 that case we don't need anything complicated to acquire all locks.

See, that's what I don't believe.  Even very simple things like
equality operators do a surprising amount of catalog locking.

 2. It might end up taking any more locks than necessary and holding
 them for much longer than necessary.  Right now, for example, a
 syscache lookup locks the table only if we actually need to read from
 it and releases the lock as soon as the actual read is complete.
 Under this design, every syscache that the parallel worker might
 conceivably consult needs to be locked for the entire duration of the
 parallel computation.  I would expect this to provoke a violent
 negative reaction from at least one prominent community member (and I
 bet users wouldn't like it much, either).

 I see little reason to do this for system level relations.

I'm not following you.

 So, I am still of the opinion that group locking makes sense.   While
 I appreciate the urge to avoid solving difficult problems where it's
 reasonably avoidable, I think we're in danger of spending more effort
 trying to avoid solving this particular problem than it would take to
 actually solve it.  Based on what I've done so far, I'm guessing that
 a complete group locking patch will be between 1000 and 1500 lines of
 code and that nearly all of the new logic will be skipped when it's
 not in use (i.e. no parallelism).  That sounds to me like a hell of a
 deal compared to trying to predict what locks the child might
 conceivably take and preemptively acquire them all, which sounds
 annoyingly tedious even for simple things (like equality operators)
 and nearly impossible for anything more complicated.

 What I'm worried about is the performance impact of group locking when
 it's not in use. The heavyweight locking code is already quite complex
 and often a noticeable bottleneck...

I certainly agree that it would be unacceptable for group locking to
significantly regress the normal case.  I'm pretty optimistic about
reducing the overhead to near-zero, though - a few extra branch
instructions on the non-fast-path case should be lost in the noise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CREATE INDEX CONCURRENTLY?

2014-10-31 Thread Mark Woodward
I have not kept up with PostgreSQL changes and have just been using it. A
co-worker recently told me that you need to word CONCURRENTLY in CREATE
INDEX to avoid table locking. I called BS on this because to my knowledge
PostgreSQL does not lock tables. I referenced this page in the
documentation:

http://www.postgresql.org/docs/9.3/static/locking-indexes.html

However, I do see this sentence in the indexing page that was not in the
docs prior to 8.0:

Creating an index can interfere with regular operation of a database.
Normally PostgreSQL locks the table to be indexed against writes and
performs the entire index build with a single scan of the table.

Is this true? When/why the change?

When we use concurrently, it seems to hang. I am looking into it.


Re: [HACKERS] infinite loop in _bt_getstackbuf

2014-10-31 Thread Robert Haas
On Thu, Oct 30, 2014 at 11:45 PM, Noah Misch n...@leadboat.com wrote:
 Given the lack of prior complaints about this
 loop, I'm not sure I see the need to work harder than that; corruption
 of this sort must be quite rare.

 Looks like _bt_getstackbuf() is always called with some buffer lock held, so
 CHECK_FOR_INTERRUPTS() alone would not help:

 http://www.postgresql.org/message-id/flat/16519.1401395...@sss.pgh.pa.us

That's the insert path.  What about the vacuum path?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-10-31 Thread Petr Jelinek

On 31/10/14 15:07, Tom Lane wrote:

Merlin Moncure mmonc...@gmail.com writes:

It's also requested now and then in the context of auditing and
forensic analysis of application problems.  But I also agree that the
tolerance for performance overhead is got to be quite low.  If a GUC
is introduced to manage the tradeoff, it should be defaulted to 'on'.


Alvaro's original submission specified that the feature defaults to off.
Since there's no use-case for it unless you've installed some third-party
code (eg an external replication solution), I think that should stay true.
The feature's overhead might be small, but it is most certainly not zero,
and people shouldn't be paying for it unless they need it.



Agreed, that's why it stayed 'off' in my version too.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-31 Thread Simon Riggs
On 16 October 2014 01:29, Claudio Freire klaussfre...@gmail.com wrote:

 But in any case, if the deterrence isn't enough, and you get attacked,
 anything involving redaction as fleshed out in the OP is good for
 nothing. The damage has been done already. The feature doesn't
 meaningfully slow down extraction of data, so anything you do can only
 punish the attacker, not prevent further data theft or damaged
 reputation/business.

Deterrence is exactly the goal.

Only punishing the attacker is exactly what this is for. This is not
the same thing as preventative security.

Redaction is designed to prevent authorized users from accidental
misuse. Your business already trusts these people. You know their
names, their addresses, their bank account details and you'll have
already run security scans on them.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Andres Freund
On 2014-10-31 10:08:59 -0400, Robert Haas wrote:
 On Fri, Oct 31, 2014 at 9:07 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2014-10-31 08:54:54 -0400, Robert Haas wrote:
  On Fri, Oct 31, 2014 at 6:41 AM, Simon Riggs si...@2ndquadrant.com wrote:
   Is it genuinely required for most parallel operations? I think it's
   clear that none of us knows the answer. Sure, the general case needs
   it, but is the general case the same thing as the reasonably common
   case?
 
  Well, I think that the answer is pretty clear.  Most of the time,
  perhaps in 99.9% of cases, group locking will make no difference as to
  whether a parallel operation succeeds or fails.  Occasionally,
  however, it will cause an undetected deadlock.  I don't hear anyone
  arguing that that's OK.  Does anyone wish to make that argument?
 
  Maybe we can, as a first step, make those edges in the lock graph
  visible to the deadlock detector? It's pretty clear that undetected
  deadlocks aren't ok, but detectable deadlocks in a couple corner cases
  might be acceptable.
 
 Interesting idea.  I agree that would be more acceptable.  It would be
 kind of sad, though, if you got a deadlock from this:
 
 BEGIN;
 LOCK TABLE foo;
 SELECT * FROM foo;

I have serious doubts about the number of cases where it's correct to
access relations in a second backend that are exclusively locked. Not so
much when that happens for a user issued LOCK statement of course, but
when the system has done so. Personally I think to stay sane we'll have
to forbid access to anything normal other backends can't access either -
otherwise things will get too hard to reason about.

So just refusing parallelism as soon as anything has taken an access
exclusive lock doesn't sound too bad to me.

 You can, of course, avoid that, but it's basically transferring the
 burden from the lock manager to every bit of parallel code we ever
 write.  If it turns out to be easier to do this than what I'm
 currently proposing, I'm definitely amenable to going this route for
 version 1, but I don't think it's going to be appealing as a permanent
 solution.

It's probably not the solution forever, I agree. But it might be the
simplest way forward till we have more actual users.

  1. Turing's theorem being what it is, predicting what catalog tables
  the child might lock is not necessarily simple.
 
  Well, there's really no need to be absolutely general here. We're only
  going to support a subset of functionality as parallelizable. And in
  that case we don't need anything complicated to acquire all locks.
 
 See, that's what I don't believe.  Even very simple things like
 equality operators do a surprising amount of catalog locking.

So what? I don't see catalog locking as something problematic? Maybe I'm
missing something, but I don't see cases would you expect deadlocks or
anything like it on catalog relations?

  So, I am still of the opinion that group locking makes sense.   While
  I appreciate the urge to avoid solving difficult problems where it's
  reasonably avoidable, I think we're in danger of spending more effort
  trying to avoid solving this particular problem than it would take to
  actually solve it.  Based on what I've done so far, I'm guessing that
  a complete group locking patch will be between 1000 and 1500 lines of
  code and that nearly all of the new logic will be skipped when it's
  not in use (i.e. no parallelism).  That sounds to me like a hell of a
  deal compared to trying to predict what locks the child might
  conceivably take and preemptively acquire them all, which sounds
  annoyingly tedious even for simple things (like equality operators)
  and nearly impossible for anything more complicated.
 
  What I'm worried about is the performance impact of group locking when
  it's not in use. The heavyweight locking code is already quite complex
  and often a noticeable bottleneck...
 
 I certainly agree that it would be unacceptable for group locking to
 significantly regress the normal case.  I'm pretty optimistic about
 reducing the overhead to near-zero, though - a few extra branch
 instructions on the non-fast-path case should be lost in the noise.

I'm less worried about the additional branches than about increasing the
size of the datastructures. They're already pretty huge.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE INDEX CONCURRENTLY?

2014-10-31 Thread Andrew Dunstan


On 10/31/2014 10:28 AM, Mark Woodward wrote:
I have not kept up with PostgreSQL changes and have just been using 
it. A co-worker recently told me that you need to word CONCURRENTLY 
in CREATE INDEX to avoid table locking. I called BS on this because 
to my knowledge PostgreSQL does not lock tables. I referenced this 
page in the documentation:


http://www.postgresql.org/docs/9.3/static/locking-indexes.html


That page refers to using the indexes, not creating them.



However, I do see this sentence in the indexing page that was not in 
the docs prior to 8.0:


Creating an index can interfere with regular operation of a database. 
Normally PostgreSQL locks the table to be indexed against writes and 
performs the entire index build with a single scan of the table.


Is this true? When/why the change?

When we use concurrently, it seems to hang. I am looking into it.





Creating indexes always did lock tables. See for example 
http://www.postgresql.org/docs/7.4/static/explicit-locking.html#LOCKING-TABLES 
there CREATE INDEX is documented to take a SHARE lock on the table.


CONCURRENTLY was an additional feature to allow you to get around this, 
at the possible cost of some extra processing.


So we haven't made things harder, we've made them easier, and your 
understanding of old releases is incorrect.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE INDEX CONCURRENTLY?

2014-10-31 Thread Greg Stark
On Fri, Oct 31, 2014 at 2:28 PM, Mark Woodward
mark.woodw...@actifio.com wrote:
 I have not kept up with PostgreSQL changes and have just been using it. A
 co-worker recently told me that you need to word CONCURRENTLY in CREATE
 INDEX to avoid table locking. I called BS on this because to my knowledge
 PostgreSQL does not lock tables. I referenced this page in the
 documentation:


You can read from tables while a normal index build is in progress but
you can't insert, update, or delete from them. CREATE INDEX
CONCURRENTLY allows you to insert, update, and delete data while the
index build is running at the expense of having the index build take
longer.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing Catalog Locking

2014-10-31 Thread Simon Riggs
On 31 October 2014 13:39, Tom Lane t...@sss.pgh.pa.us wrote:

 I doubt that this can ever be safe, because it will effectively assume
 that all operations on catalog tables are done by code that knows that it
 is accessing a catalog.

 What about manual DML, or even DDL, on a catalog?

I've never really understood why you think its a good idea to allow
such commands.

It's pretty easy to see that can screw things up a million ways.

It would be easy enough to make the superuser check acquire the
BigCatalogLock before it does anything else. That way only the
superuser code path would be affected by the special case required to
get around that problem.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing Catalog Locking

2014-10-31 Thread Andres Freund
On 2014-10-31 10:02:28 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-10-31 09:48:52 -0400, Tom Lane wrote:
  But more to the point, this seems like optimizing pg_dump startup by
  adding overhead everywhere else, which doesn't really sound like a
  great tradeoff to me.
 
  Well, it'd finally make pg_dump correct under concurrent DDL. That's
  quite a worthwile thing.
 
 I lack adequate caffeine at the moment, so explain to me how this adds
 any guarantees whatsoever?  It sounded like only a performance
 optimization from here.

A performance optimization might be what Simon intended, but it isn't
primarily what I (and presumably Robert) thought it be useful for.

Consider the example in
http://archives.postgresql.org/message-id/20130507141526.GA6117%40awork2.anarazel.de

If pg_dump were to take the 'ddl lock' *before* acquiring the snapshot
to lock all tables, that scenario couldn't happen anymore. As soon as
pg_dump has acquired the actual locks the ddl lock could be released
again.

Taking the ddl lock from SQL would probably require some 'backup' or
superuser permission, but luckily there seems to be movement around
that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing Catalog Locking

2014-10-31 Thread Simon Riggs
On 31 October 2014 13:03, Robert Haas robertmh...@gmail.com wrote:

 Given these are catalog tables, we aren't doing much to them that
 requires a strong lock. Specifically, only CLUSTER and VACUUM FULL
 touch those tables like that. When we do that, pretty much everything
 else hangs, cos you can't get much done while fundamental tables are
 locked.

 True, although it's currently the case that catalog tables are only
 locked for the minimum time necessary.  So, VF on pg_class will block
 nearly any new query from starting up, but already-running queries may
 be able to keep going, and idle transactions don't cause a problem.
 If we held system catalogs until transaction commit, a VF on pg_class
 would basically wait until every other transaction in the system
 completed and preclude any other transaction from starting until it
 finished.  That's significantly more problematic in my view.

No, not really. As soon as you put that VF in there, queries will
begin to block. It doesn't really matter at what point they block, so
it doesn't make the problem worse.

VFs on pg_class are very rare and not usually run while trying to make
a normal workload happen, so its a strange thing to care about how
well that is optimized.

VACUUM FULL on pg_class only ever happens because of temp tables
anyway. I have been investigating that for other purposes, see new
thread soon.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Temp tables, pg_class_temp and AccessExclusiveLocks

2014-10-31 Thread Simon Riggs
While investigating how to reduce logging of AccessExclusiveLocks for
temp tables, I came up with the attached patch.

My feeling was that's an ugly patch, but it set me thinking that a
more specialised code path around temp tables could be useful in other
ways, and therefore worth pursuing further.

The patch creates a relation_open_temp(), which could then allow a
RelationIdGetTempRelation() which could have a path that calls
ScanPgRelation on a new catalog table called pg_class_temp, and other
_temp catalog table accesses.

So we could isolate all temp table access to specific tables. That would
* cause less bloat in the main catalog tables
* avoid logging AccessExclusiveLocks on temp tables
* less WAL traffic

Why less WAL traffic? Because we can choose then to make *_temp catalog tables
1. use unlogged table mechanism
2. use local tables inside the local buffer manager

My preference would be (1) because all we need to do is change the
catalog table oid when searching, we don't need to do anything else.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Simon Riggs
On 31 October 2014 12:54, Robert Haas robertmh...@gmail.com wrote:

 1. Turing's theorem being what it is, predicting what catalog tables
 the child might lock is not necessarily simple.

The Pareto principle offers ways to cope with the world's lack of simplicity.

You mentioned earlier that functions would need to be marked proisparallel etc..

What conditions will that be protecting against? If we aren't going to
support the general case where every single thing works, can we at
least discuss what the list of cases is that we will support.

I don't think we can argue that everything must be generic when we
already admit it won't be.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Temp tables, pg_class_temp and AccessExclusiveLocks

2014-10-31 Thread Simon Riggs
On 31 October 2014 14:53, Simon Riggs si...@2ndquadrant.com wrote:

 While investigating how to reduce logging of AccessExclusiveLocks for
 temp tables, I came up with the attached patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


temp_tables_skip_logging_locks.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing Catalog Locking

2014-10-31 Thread Simon Riggs
On 31 October 2014 14:49, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-10-31 10:02:28 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-10-31 09:48:52 -0400, Tom Lane wrote:
  But more to the point, this seems like optimizing pg_dump startup by
  adding overhead everywhere else, which doesn't really sound like a
  great tradeoff to me.

  Well, it'd finally make pg_dump correct under concurrent DDL. That's
  quite a worthwile thing.

 I lack adequate caffeine at the moment, so explain to me how this adds
 any guarantees whatsoever?  It sounded like only a performance
 optimization from here.

 A performance optimization might be what Simon intended, but it isn't
 primarily what I (and presumably Robert) thought it be useful for.

 Consider the example in
 http://archives.postgresql.org/message-id/20130507141526.GA6117%40awork2.anarazel.de

 If pg_dump were to take the 'ddl lock' *before* acquiring the snapshot
 to lock all tables, that scenario couldn't happen anymore. As soon as
 pg_dump has acquired the actual locks the ddl lock could be released
 again.

 Taking the ddl lock from SQL would probably require some 'backup' or
 superuser permission, but luckily there seems to be movement around
 that.

Good idea. But it is a different idea. I can do that as well...

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf

2014-10-31 Thread Petr Jelinek

Hi,

Attached is the v2 of the patch with the review comments addressed (see 
below).


On 29/10/14 21:08, Petr Jelinek wrote:

On 29/10/14 20:27, Asif Naeem wrote:

1. It seems that following log message need to be more descriptive about
reason for shutdown i.e.

+   if (recoveryShutdownAtTarget  reachedStopPoint)
+   {
+   ereport(LOG, (errmsg(shutting
down)));



Agreed, I just wasn't sure on what exactly to writes, I originally had
there shutting down by user request or some such but that's misleading.



I changed it to shutting down at recovery target hope that's better.


2. As Simon suggesting following recovery settings are not clear i.e.

shutdown_at_recovery_target = true
pause_at_recovery_target = true


Hmm I completely missed Simon's email, strange. Well other option would
be to throw error if both are set to true - error will have to happen
anyway if action_at_recovery_target is set to shutdown while
pause_at_recovery_target is true (I think we need to keep
pause_at_recovery_target for compatibility).

But considering all of you think something like
action_at_recovery_target is better solution, I will do it that way then.


Done, there is now action_at_recovery_target which can be set to either 
pause, continue or shutdown, defaulting to pause (which is same as old 
behavior of pause_at_recovery_target defaulting to true).
I also added check that prohibits using both pause_at_recovery_target 
and action_at_recovery_target in the same config, mainly to avoid users 
shooting themselves in the foot.






3. As it don’t rename reconvery.conf, subsequent attempt (without any
changes in reconvery.conf) to start of server keep showing the following
i.e.

...
LOG:  redo starts at 0/1803620
DEBUG:  checkpointer updated shared memory configuration values
LOG:  consistent recovery state reached at 0/1803658
LOG:  restored log file 00010002 from archive
DEBUG:  got WAL segment from archive
LOG:  restored log file 00010003 from archive
DEBUG:  got WAL segment from archive
LOG:  restored log file 00010004 from archive
DEBUG:  got WAL segment from archive
LOG:  restored log file 00010005 from archive
DEBUG:  got WAL segment from archive
LOG:  restored log file 00010006 from archive
DEBUG:  got WAL segment from archive
…



Yes, it will still replay everything since last checkpoint, that's by
design since otherwise we would have to write checkpoint on shutdown and
that would mean the instance can't be used as hot standby anymore and I
consider that an important feature to have.



I added note to the documentation that says this will happen.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 0f1ff34..fe42394 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -289,12 +289,39 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p'  # Windows
   /term
   listitem
para
-Specifies whether recovery should pause when the recovery target
-is reached. The default is true.
-This is intended to allow queries to be executed against the
-database to check if this recovery target is the most desirable
-point for recovery. The paused state can be resumed by using
-functionpg_xlog_replay_resume()/ (See
+Alias for action_at_recovery_target, literaltrue/ is same as
+action_at_recovery_target = literalpause/ and literalfalse/
+is same as action_at_recovery_target = literalcontinue/.
+   /para
+   para
+This setting has no effect if xref linkend=guc-hot-standby is not
+enabled, or if no recovery target is set.
+   /para
+  /listitem
+ /varlistentry
+
+ /variablelist
+
+ varlistentry id=action-at-recovery-target
+   xreflabel=action_at_recovery_target
+  termvarnameaction_at_recovery_target/varname (typeenum/type)
+  indexterm
+primaryvarnameaction_at_recovery_target/ recovery parameter/primary
+  /indexterm
+  /term
+  listitem
+   para
+Specifies what action should PostgreSQL do once the recovery target is
+reached. The default is literalpause/, which means recovery will
+be paused. literalcontinue/ means there will be no special action
+taken when recovery target is reached and normal operation will continue.
+Finally literalshutdown/ will stop the PostgreSQL instance at
+recovery target.
+   /para
+The intended use of literalpause/ setting is to allow queries to be
+executed against the database to check if this recovery target is the
+most desirable 

Re: [HACKERS] tracking commit timestamps

2014-10-31 Thread Petr Jelinek

Hi,

On 28/10/14 13:25, Simon Riggs wrote:

On 13 October 2014 10:05, Petr Jelinek p...@2ndquadrant.com wrote:


I worked bit on this patch to make it closer to committable state.



Here is updated version that works with current HEAD for the October
committfest.


I've reviewed this and it looks good to me. Clean, follows existing
code neatly, documented and tested.



Thanks for looking at this.


Please could you document a few things

* ExtendCommitTS() works only because commit_ts_enabled can only be
set at server start.
We need that documented so somebody doesn't make it more easily
enabled and break something.
(Could we make it enabled at next checkpoint or similar?)


Maybe we could, but that means some kind of two step enabling facility 
and I don't want to write that as part of the initial patch since that 
will need separate discussion (i.e. do we want to have new GUC flag for 
this, or hack solution just for committs?). So maybe later in a 
follow-up patch.



* The SLRU tracks timestamps of both xids and subxids. We need to
document that it does this because Subtrans SLRU is not persistent. If
we made Subtrans persistent we might need to store only the top level
xid's commitTS, but that's very useful for typical use cases and
wouldn't save much time at commit.


Attached version with the above comments near the relevant code.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c
index 3b8241b..f0a023f 100644
--- a/contrib/pg_upgrade/pg_upgrade.c
+++ b/contrib/pg_upgrade/pg_upgrade.c
@@ -423,8 +423,10 @@ copy_clog_xlog_xid(void)
 	/* set the next transaction id and epoch of the new cluster */
 	prep_status(Setting next transaction ID and epoch for new cluster);
 	exec_prog(UTILITY_LOG_FILE, NULL, true,
-			  \%s/pg_resetxlog\ -f -x %u \%s\,
-			  new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid,
+			  \%s/pg_resetxlog\ -f -x %u -c %u \%s\,
+			  new_cluster.bindir,
+			  old_cluster.controldata.chkpnt_nxtxid,
+			  old_cluster.controldata.chkpnt_nxtxid,
 			  new_cluster.pgdata);
 	exec_prog(UTILITY_LOG_FILE, NULL, true,
 			  \%s/pg_resetxlog\ -f -e %u \%s\,
diff --git a/contrib/pg_xlogdump/rmgrdesc.c b/contrib/pg_xlogdump/rmgrdesc.c
index bfb3573..c0a0409 100644
--- a/contrib/pg_xlogdump/rmgrdesc.c
+++ b/contrib/pg_xlogdump/rmgrdesc.c
@@ -9,6 +9,7 @@
 #include postgres.h
 
 #include access/clog.h
+#include access/committs.h
 #include access/gin.h
 #include access/gist_private.h
 #include access/hash.h
diff --git a/contrib/test_committs/.gitignore b/contrib/test_committs/.gitignore
new file mode 100644
index 000..1f95503
--- /dev/null
+++ b/contrib/test_committs/.gitignore
@@ -0,0 +1,5 @@
+# Generated subdirectories
+/log/
+/isolation_output/
+/regression_output/
+/tmp_check/
diff --git a/contrib/test_committs/Makefile b/contrib/test_committs/Makefile
new file mode 100644
index 000..2240749
--- /dev/null
+++ b/contrib/test_committs/Makefile
@@ -0,0 +1,45 @@
+# Note: because we don't tell the Makefile there are any regression tests,
+# we have to clean those result files explicitly
+EXTRA_CLEAN = $(pg_regress_clean_files) ./regression_output ./isolation_output
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/test_committs
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+# We can't support installcheck because normally installcheck users don't have
+# the required track_commit_timestamp on
+installcheck:;
+
+check: regresscheck
+
+submake-regress:
+	$(MAKE) -C $(top_builddir)/src/test/regress all
+
+submake-test_committs:
+	$(MAKE) -C $(top_builddir)/contrib/test_committs
+
+REGRESSCHECKS=committs_on
+
+regresscheck: all | submake-regress submake-test_committs
+	$(MKDIR_P) regression_output
+	$(pg_regress_check) \
+	--temp-config $(top_srcdir)/contrib/test_committs/committs.conf \
+	--temp-install=./tmp_check \
+	--extra-install=contrib/test_committs \
+	--outputdir=./regression_output \
+	$(REGRESSCHECKS)
+
+regresscheck-install-force: | submake-regress submake-test_committs
+	$(pg_regress_installcheck) \
+	--extra-install=contrib/test_committs \
+	$(REGRESSCHECKS)
+
+PHONY: submake-test_committs submake-regress check \
+	regresscheck regresscheck-install-force
\ No newline at end of file
diff --git a/contrib/test_committs/committs.conf b/contrib/test_committs/committs.conf
new file mode 100644
index 000..d221a60
--- /dev/null
+++ b/contrib/test_committs/committs.conf
@@ -0,0 +1 @@
+track_commit_timestamp = on
\ No newline at end of file
diff --git a/contrib/test_committs/expected/committs_on.out b/contrib/test_committs/expected/committs_on.out
new file mode 100644
index 000..9920343
--- /dev/null
+++ b/contrib/test_committs/expected/committs_on.out

[HACKERS] why does LIMIT 2 take orders of magnitude longer than LIMIT 1 in this query?

2014-10-31 Thread Chris Rogers
I'm on PostgreSQL 9.3.  This should reproduce on any table with 100,000+
rows.  The EXPLAIN ANALYZE shows many more rows getting scanned with LIMIT
2, but I can't figure out why.

Limit 1:

EXPLAIN ANALYZE WITH base AS (
  SELECT *, ROW_NUMBER() OVER () AS rownum FROM a_big_table
), filter AS (
  SELECT rownum, true AS thing FROM base
) SELECT * FROM base LEFT JOIN filter USING (rownum) WHERE filter.thing
LIMIT 1

Result:

Limit  (cost=283512.19..283517.66 rows=1 width=2114) (actual
time=0.019..0.019 rows=1 loops=1)
  CTE base
-  WindowAgg  (cost=0.00..188702.69 rows=4740475 width=101) (actual
time=0.008..0.008 rows=1 loops=1)
  -  Seq Scan on a_big_table  (cost=0.00..129446.75 rows=4740475
width=101) (actual time=0.003..0.003 rows=1 loops=1)
  CTE filter
-  CTE Scan on base base_1  (cost=0.00..94809.50 rows=4740475 width=8)
(actual time=0.000..0.000 rows=1 loops=1)
  -  Nested Loop  (cost=0.00..307677626611.24 rows=56180269915 width=2114)
(actual time=0.018..0.018 rows=1 loops=1)
Join Filter: (base.rownum = filter.rownum)
-  CTE Scan on base  (cost=0.00..94809.50 rows=4740475 width=2113)
(actual time=0.011..0.011 rows=1 loops=1)
-  CTE Scan on filter  (cost=0.00..94809.50 rows=2370238 width=9)
(actual time=0.002..0.002 rows=1 loops=1)
  Filter: thing
Total runtime: 0.057 ms

Limit 2:

EXPLAIN ANALYZE WITH base AS (
  SELECT *, ROW_NUMBER() OVER () AS rownum FROM a_big_table
), filter AS (
  SELECT rownum, true AS thing FROM base
) SELECT * FROM base LEFT JOIN filter USING (rownum) WHERE filter.thing
LIMIT 2

Result:

Limit  (cost=283512.19..283523.14 rows=2 width=2114) (actual
time=0.018..14162.283 rows=2 loops=1)
  CTE base
-  WindowAgg  (cost=0.00..188702.69 rows=4740475 width=101) (actual
time=0.008..4443.359 rows=4714243 loops=1)
  -  Seq Scan on a_big_table  (cost=0.00..129446.75 rows=4740475
width=101) (actual time=0.002..1421.622 rows=4714243 loops=1)
  CTE filter
-  CTE Scan on base base_1  (cost=0.00..94809.50 rows=4740475 width=8)
(actual time=0.001..10214.684 rows=4714243 loops=1)
  -  Nested Loop  (cost=0.00..307677626611.24 rows=56180269915 width=2114)
(actual time=0.018..14162.280 rows=2 loops=1)
Join Filter: (base.rownum = filter.rownum)
Rows Removed by Join Filter: 4714243
-  CTE Scan on base  (cost=0.00..94809.50 rows=4740475 width=2113)
(actual time=0.011..0.028 rows=2 loops=1)
-  CTE Scan on filter  (cost=0.00..94809.50 rows=2370238 width=9)
(actual time=0.009..6595.770 rows=2357122 loops=2)
  Filter: thing
Total runtime: 14247.374 ms


Re: [HACKERS] tracking commit timestamps

2014-10-31 Thread Simon Riggs
On 31 October 2014 15:46, Petr Jelinek p...@2ndquadrant.com wrote:

 Attached version with the above comments near the relevant code.

Looks cooked and ready to serve. Who's gonna commit this? Alvaro, or
do you want me to?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: why does LIMIT 2 take orders of magnitude longer than LIMIT 1 in this query?

2014-10-31 Thread David G Johnston
Chris Rogers wrote
 I'm on PostgreSQL 9.3.  This should reproduce on any table with 100,000+
 rows.  The EXPLAIN ANALYZE shows many more rows getting scanned with LIMIT
 2, but I can't figure out why.
 
 EXPLAIN ANALYZE WITH base AS (
   SELECT *, ROW_NUMBER() OVER () AS rownum FROM a_big_table
 ), filter AS (
   SELECT rownum, true AS thing FROM base
 ) SELECT * FROM base LEFT JOIN filter USING (rownum) WHERE filter.thing
 LIMIT 1

The LIMIT 1 case has been optimized (special cased) while all others end up
using a normal plan.

Two things make your example query particularly unrealistic:

1. The presence of a ROW_NUMBER() window aggregate on an unsorted input
2. A LEFT JOIN condition matched with a WHERE clause with a right-side
column being non-NULL 

David J.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/why-does-LIMIT-2-take-orders-of-magnitude-longer-than-LIMIT-1-in-this-query-tp5825209p5825212.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why does LIMIT 2 take orders of magnitude longer than LIMIT 1 in this query?

2014-10-31 Thread Tom Lane
Chris Rogers teuk...@gmail.com writes:
 I'm on PostgreSQL 9.3.  This should reproduce on any table with 100,000+
 rows.  The EXPLAIN ANALYZE shows many more rows getting scanned with LIMIT
 2, but I can't figure out why.

This is not -hackers material.

The first row pulled from the nestloop LEFT JOIN is created by joining
the first output row from base to the first output row from filter
(which is indirectly also the first row from base).  So, cheap; we've
only had to read one row from a_big_table.

The second attempt to pull a row from the nestloop LEFT JOIN requires
evaluating all the rest of the output of the filter CTE, to see if there
are any more filter rows matching the first base row.  (There are not,
since the rownum output is unique, but the nestloop doesn't know that.)
That in turn causes scanning all the rest of base and so all the rest
of a_big_table.  Only after that do we get to the second base row
at which another join output row is possible.

If the planner were about an order of magnitude cleverer than it is,
it might realize that this would happen and switch to another plan ...
although TBH the only way I can see to not have a large startup cost
would be to somehow realize that the outputs of both CTEs are sorted
by rownum and hence can be mergejoined without an explicit sort step.
That would require more understanding of the behavior of row_number()
than it's got or probably should have.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE IF NOT EXISTS INDEX

2014-10-31 Thread Adam Brightwell
All,

FWIW, I've cleanly applied v8 of this patch to master (252e652) and
check-world was successful.  I also successfully ran through a few manual
test cases.

-Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com


Re: [HACKERS] CREATE IF NOT EXISTS INDEX

2014-10-31 Thread Fabrízio de Royes Mello
On Fri, Oct 31, 2014 at 2:46 PM, Adam Brightwell 
adam.brightw...@crunchydatasolutions.com wrote:

 All,

 FWIW, I've cleanly applied v8 of this patch to master (252e652) and
check-world was successful.  I also successfully ran through a few manual
test cases.


Thanks for your review!

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


Re: [HACKERS] CREATE INDEX CONCURRENTLY?

2014-10-31 Thread Michael Banck
Am Freitag, den 31.10.2014, 14:43 + schrieb Greg Stark:
 On Fri, Oct 31, 2014 at 2:28 PM, Mark Woodward
 mark.woodw...@actifio.com wrote:
  I have not kept up with PostgreSQL changes and have just been using it. A
  co-worker recently told me that you need to word CONCURRENTLY in CREATE
  INDEX to avoid table locking. I called BS on this because to my knowledge
  PostgreSQL does not lock tables. I referenced this page in the
  documentation:
 
 You can read from tables while a normal index build is in progress but
 you can't insert, update, or delete from them. CREATE INDEX
 CONCURRENTLY allows you to insert, update, and delete data while the
 index build is running at the expense of having the index build take
 longer.

I believe there is one caveat: If there is an idle-in-transaction
backend from before the start of CREATE INDEX CONCURRENTLY, it can hold
up the index creation indefinitely as long as it doesn't commit.

src/backend/access/heap/README.HOT mentions this WRT CIC: Then we wait
until every transaction that could have a snapshot older than the second
reference snapshot is finished.  This ensures that nobody is alive any
longer who could need to see any tuples that might be missing from the
index, as well as ensuring that no one can see any inconsistent rows in
a broken HOT chain (the first condition is stronger than the second).

I have seen CIC stall at clients when there were (seemlingy) unrelated
idle-in-transactions open (their locks even touching only other
schemas). I believe it depends on the specific locks that the other
backend acquired, but at least with a DECLARE CURSOR I can trivially
reproduce it:

first session:

postgres=# CREATE SCHEMA foo1;
CREATE SCHEMA
postgres=# CREATE TABLE foo1.foo1 (id int);
CREATE TABLE
postgres=# CREATE SCHEMA foo2;
CREATE SCHEMA
postgres=# CREATE TABLE foo2.foo2 (id int);
CREATE TABLE

second session:

postgres=# BEGIN; DECLARE c1 CURSOR FOR SELECT * FROM foo1.foo1;
BEGIN
DECLARE CURSOR

first session:

postgres=# CREATE INDEX CONCURRENTLY ixfoo2 ON foo2.foo2(id);
(hangs)

I wonder whether that is pilot error (fair enough), or whether something
could be done about this?


Michael

-- 
Michael Banck
Projektleiter / Berater
Tel.: +49 (2161) 4643-171
Fax:  +49 (2161) 4643-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Robert Haas
On Fri, Oct 31, 2014 at 10:38 AM, Andres Freund and...@2ndquadrant.com wrote:
 I have serious doubts about the number of cases where it's correct to
 access relations in a second backend that are exclusively locked. Not so
 much when that happens for a user issued LOCK statement of course, but
 when the system has done so. Personally I think to stay sane we'll have
 to forbid access to anything normal other backends can't access either -
 otherwise things will get too hard to reason about.

 So just refusing parallelism as soon as anything has taken an access
 exclusive lock doesn't sound too bad to me.

That restriction seems onerous to me; for example, it would prevent a
parallel sort for CLUSTER or a parallel index rebuild for VACUUM FULL.
Those seems like worthy targets for parallelism, and maybe even early
targets, since I expect a lot of the real complexity here to be in the
query planner, and you can avoid most of that complexity when planning
DDL.

I also think it's unnecessary.  It's wrong to think of two parallel
backends that both want AccessExclusiveLock on a given target relation
as conflicting with each other; if the process were not running in
parallel, those lock requests would both have come from the same
process and both requests would have been granted.  Parallelism is
generally not about making different things happen; it's about
spreading the stuff that would happen anyway across multiple backends,
and things that would succeed if done in a single backend shouldn't
fail when spread across 2 or more without some compelling
justification.  The cases where it's actually unsafe for a table to be
accessed even by some other code within the same backend are handled
not by the lock manager, but by CheckTableNotInUse().  That call
should probably fail categorically if issued while parallelism is
active.

  1. Turing's theorem being what it is, predicting what catalog tables
  the child might lock is not necessarily simple.
 
  Well, there's really no need to be absolutely general here. We're only
  going to support a subset of functionality as parallelizable. And in
  that case we don't need anything complicated to acquire all locks.

 See, that's what I don't believe.  Even very simple things like
 equality operators do a surprising amount of catalog locking.

 So what? I don't see catalog locking as something problematic? Maybe I'm
 missing something, but I don't see cases would you expect deadlocks or
 anything like it on catalog relations?

Suppose somebody fires off a parallel sort on a text column, or a
parallel sequential scan involving a qual of the form textcol = 'zog'.
We launch a bunch of workers to do comparisons; they do lookups
against pg_collation.  After some but not all of them have loaded the
collation information they need into their local cache, the DBA types
cluster pg_collate.  It queues up for an AccessExclusiveLock.  The
remaining parallel workers join the wait queue waiting for their
AccessShareLock. The system is now deadlocked; the only solution is to
move the parallel workers ahead of the AccessExclusiveLock request,
but the deadlock detector won't do that unless it knows about the
relationship among the parallel workers.

It's possible to construct more obscure cases as well.  For example,
suppose a user (for reasons known only to himself and God) does LOCK
TABLE pg_opclass and then begins a sort of a column full of enums.
Meanwhile, another user tries to CLUSTER pg_enum.  This will deadlock
if, and only if, some of the enum OIDs are odd.  I mention this
example not because I think it's a particularly useful case but
because it illustrates just how robust the current system is: it will
catch that case, and break the deadlock somehow, and you as a
PostgreSQL backend hacker do not need to think about it.  The guy who
added pg_enum did not need to think about it.  It just works.  If we
decide that parallelism is allowed to break that guarantee, then every
patch that does parallel anything has got to worry about what
undetected deadlocks might result, and then argue about which of them
are obscure enough that we shouldn't care and which are not.  That
doesn't sound like a fun way to spend my time.

But, really, the first case is the one I'm more worried about: a bunch
of parallel backends all grab the same locks, but slightly separated
in time.  A strong locker joins the queue midway through and we're
hosed.  Obviously any system cache lookups whatsoever are going to be
separated in time like this, just because the world isn't synchronous.
The pg_enum case is an example of how the lookups could be more widely
spaced: each backend will scan pg_enum when it first hits an odd OID,
and that could happen much later for some than others.  But even a
small window is enough to render undetected deadlock a possibility,
and you will not convince me that hope the race condition is narrow
enough in practice that this never happens is a remotely sane
strategy.

Down the road, it's 

Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Simon Riggs
On 31 October 2014 18:29, Robert Haas robertmh...@gmail.com wrote:

 Suppose somebody fires off a parallel sort on a text column, or a
 parallel sequential scan involving a qual of the form textcol = 'zog'.
 We launch a bunch of workers to do comparisons; they do lookups
 against pg_collation.  After some but not all of them have loaded the
 collation information they need into their local cache, the DBA types
 cluster pg_collate.  It queues up for an AccessExclusiveLock.  The
 remaining parallel workers join the wait queue waiting for their
 AccessShareLock. The system is now deadlocked; the only solution is to
 move the parallel workers ahead of the AccessExclusiveLock request,
 but the deadlock detector won't do that unless it knows about the
 relationship among the parallel workers.

It's an obscure case and its not the only solution either.

I'm really surprised that having workers do their own locking doesn't
scare you. Personally, it scares me.

It's not like we're the first do parallel query. Honestly, how many
parallel databases do you think do this?

I can't see this being a practical, workable solution for running a
parallel query across a cluster of many nodes. Shared, distributed
lock table?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Robert Haas
On Fri, Oct 31, 2014 at 11:02 AM, Simon Riggs si...@2ndquadrant.com wrote:
 You mentioned earlier that functions would need to be marked proisparallel 
 etc..

 What conditions will that be protecting against? If we aren't going to
 support the general case where every single thing works, can we at
 least discuss what the list of cases is that we will support.

In general, any operation that relies on backend-private state not
shared with a cooperating backend isn't parallel-safe.  For example,
consider:

SELECT setseed(1);
SELECT random() FROM generate_series(1,1000) g;

This sequence of queries can be relied upon to produce the same output
every time.  But if some of the random() calls are executed in another
backend, you'll get different results because random() works by using
backend-private memory to store its pseudo-random state.  It's
unlikely to be worth the effort to move the pseudo-random state to a
DSM for parallelism, so we probably just want to disallow parallel
query when random() is in use, or, better still, disallow
parallelizing only the particular query node that uses random().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Andres Freund
On 2014-10-31 14:29:57 -0400, Robert Haas wrote:
 On Fri, Oct 31, 2014 at 10:38 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  I have serious doubts about the number of cases where it's correct to
  access relations in a second backend that are exclusively locked. Not so
  much when that happens for a user issued LOCK statement of course, but
  when the system has done so. Personally I think to stay sane we'll have
  to forbid access to anything normal other backends can't access either -
  otherwise things will get too hard to reason about.
 
  So just refusing parallelism as soon as anything has taken an access
  exclusive lock doesn't sound too bad to me.

 That restriction seems onerous to me; for example, it would prevent a
 parallel sort for CLUSTER or a parallel index rebuild for VACUUM FULL.
 Those seems like worthy targets for parallelism, and maybe even early
 targets, since I expect a lot of the real complexity here to be in the
 query planner, and you can avoid most of that complexity when planning
 DDL.

Ugh. I think that's aiming too high. You'll suddenly need to share cche
invalidations between the backends. I think we should start by requiring
that there's no DDL done *at all* in the transaction that starts the
parallel activity. For CREATE INDEX PARALLEL that can be done by reusing
the logic we have for CONCURRENTLY, thereby moving the parallelized part
into a transaction that hasn't done any DDL yet.

 I also think it's unnecessary.  It's wrong to think of two parallel
 backends that both want AccessExclusiveLock on a given target relation
 as conflicting with each other; if the process were not running in
 parallel, those lock requests would both have come from the same
 process and both requests would have been granted.

I don't think that's a realistic view. There's lots of backend private
state around. If we try to make all that coherent between backends we'll
fail.

 Parallelism is
 generally not about making different things happen; it's about
 spreading the stuff that would happen anyway across multiple backends,
 and things that would succeed if done in a single backend shouldn't
 fail when spread across 2 or more without some compelling
 justification.  The cases where it's actually unsafe for a table to be
 accessed even by some other code within the same backend are handled
 not by the lock manager, but by CheckTableNotInUse().  That call
 should probably fail categorically if issued while parallelism is
 active.

Which will e.g. prohibit CLUSTER and VACUUM FULL.

   1. Turing's theorem being what it is, predicting what catalog tables
   the child might lock is not necessarily simple.
  
   Well, there's really no need to be absolutely general here. We're only
   going to support a subset of functionality as parallelizable. And in
   that case we don't need anything complicated to acquire all locks.
 
  See, that's what I don't believe.  Even very simple things like
  equality operators do a surprising amount of catalog locking.
 
  So what? I don't see catalog locking as something problematic? Maybe I'm
  missing something, but I don't see cases would you expect deadlocks or
  anything like it on catalog relations?

 Suppose somebody fires off a parallel sort on a text column, or a
 parallel sequential scan involving a qual of the form textcol = 'zog'.
 We launch a bunch of workers to do comparisons; they do lookups
 against pg_collation.  After some but not all of them have loaded the
 collation information they need into their local cache, the DBA types
 cluster pg_collate.  It queues up for an AccessExclusiveLock.  The
 remaining parallel workers join the wait queue waiting for their
 AccessShareLock. The system is now deadlocked; the only solution is to
 move the parallel workers ahead of the AccessExclusiveLock request,
 but the deadlock detector won't do that unless it knows about the
 relationship among the parallel workers.

I think you would need to make the case more complex - we only hold
locks on system object for a very short time, and mostly not in a nested
fashion.

But generally I think it's absolutely perfectly alright to fail in such
case. We need to fail reliably, but *none* of this is a new hazard. You
can have pretty similar issues today, without any parallelism.

 It's possible to construct more obscure cases as well.  For example,
 suppose a user (for reasons known only to himself and God) does LOCK
 TABLE pg_opclass and then begins a sort of a column full of enums.
 Meanwhile, another user tries to CLUSTER pg_enum.  This will deadlock
 if, and only if, some of the enum OIDs are odd.  I mention this
 example not because I think it's a particularly useful case but
 because it illustrates just how robust the current system is: it will
 catch that case, and break the deadlock somehow, and you as a
 PostgreSQL backend hacker do not need to think about it.  The guy who
 added pg_enum did not need to think about it.  It just works.  If we
 decide that 

Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Robert Haas
On Fri, Oct 31, 2014 at 2:46 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 31 October 2014 18:29, Robert Haas robertmh...@gmail.com wrote:
 Suppose somebody fires off a parallel sort on a text column, or a
 parallel sequential scan involving a qual of the form textcol = 'zog'.
 We launch a bunch of workers to do comparisons; they do lookups
 against pg_collation.  After some but not all of them have loaded the
 collation information they need into their local cache, the DBA types
 cluster pg_collate.  It queues up for an AccessExclusiveLock.  The
 remaining parallel workers join the wait queue waiting for their
 AccessShareLock. The system is now deadlocked; the only solution is to
 move the parallel workers ahead of the AccessExclusiveLock request,
 but the deadlock detector won't do that unless it knows about the
 relationship among the parallel workers.

 It's an obscure case and its not the only solution either.

I don't think that's an obscure situation at all.  Do you really think
a patch that could cause an attempt to VACUUM FULL a system catalog to
suffer an undetected deadlock meets this community's quality
standards?  Because that's what we're talking about.

You are right that it isn't the only solution.  I have said the same
thing myself, multiple times, on this thread.

 I'm really surprised that having workers do their own locking doesn't
 scare you. Personally, it scares me.

Frankly, the reverse decision scares me a heck of a lot more.  It
superficially seems like a good idea to have the master take locks on
behalf of the workers, but when you start to really think about how
many low-level parts of the code take locks, it quickly becomes
evident, at least to me, that trying to make the resulting system
robust will be a nightmare.

Don't forget that there are not only relation locks but also page
locks, tuple locks, relation extension locks, XID and VXID locks,
locks on arbitrary database or shared objects.  The deadlock detector
handles them all.  If avoiding deadlock outside the lock manager is so
easy, why do we have a deadlock detector in the first place?  Why not
just change all of our other code to avoid them instead?

 It's not like we're the first do parallel query. Honestly, how many
 parallel databases do you think do this?

All of them.

I might be wrong, of course, but I see no reason at all to believe
that Oracle or SQL Server have ignoring deadlock detection when
parallel query is in use.

 I can't see this being a practical, workable solution for running a
 parallel query across a cluster of many nodes. Shared, distributed
 lock table?

I am not proposing a distributed lock manager that can run across many
nodes.  The considerations for such a piece of software are largely
orthogonal to the problem that I'm actually trying to solve here, and
at least an order of magnitude more complex.  If you make it sound
like that's the project that I'm doing, then of course it's going to
sound frighteningly complicated.  But I'm not.

I'm sort of baffled by your concern on this point.  Sure, the lock
manager is a complex and performance-sensitive piece of code, and we
have to be careful not to break it or make it slower.  But we're not
talking about launching a rocket to Mars here; we're just talking
about making the lock manager interface look the same way to two
cooperating backends issuing lock requests in parallel that it looks
to a single backend making those requests serially.  I don't think
it's noticeably harder than getting Hot Standby conflict resolution
work, or getting historical snapshots working for logical decoding, or
making the visibility map crash-safe - in fact, I'd bet on it being
substantially easier than the first two of those, because all the
logic is inside a single, relatively simple backend subsystem.  Why
we'd want to minimize complexity there at the cost of spreading it
across half the backend is mysterious to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Robert Haas
On Fri, Oct 31, 2014 at 3:32 PM, Andres Freund and...@2ndquadrant.com wrote:
  So just refusing parallelism as soon as anything has taken an access
  exclusive lock doesn't sound too bad to me.

 That restriction seems onerous to me; for example, it would prevent a
 parallel sort for CLUSTER or a parallel index rebuild for VACUUM FULL.
 Those seems like worthy targets for parallelism, and maybe even early
 targets, since I expect a lot of the real complexity here to be in the
 query planner, and you can avoid most of that complexity when planning
 DDL.

 Ugh. I think that's aiming too high. You'll suddenly need to share cche
 invalidations between the backends. I think we should start by requiring
 that there's no DDL done *at all* in the transaction that starts the
 parallel activity. For CREATE INDEX PARALLEL that can be done by reusing
 the logic we have for CONCURRENTLY, thereby moving the parallelized part
 into a transaction that hasn't done any DDL yet.

I don't think that's correct.  We only need to process local
invalidation messages after CommandCounterIncrement(), which I
anticipate prohibiting during parallel execution (short thought should
convince you that anything else is completely nuts).  So I think
there's no real problem with invalidation messages being generated
during parallel execution.  I wouldn't anticipate supporting that in
V1, because they'd have to be propagated back to the user backend
prior to commit, but in the longer term it seems pretty feasible.

 I also think it's unnecessary.  It's wrong to think of two parallel
 backends that both want AccessExclusiveLock on a given target relation
 as conflicting with each other; if the process were not running in
 parallel, those lock requests would both have come from the same
 process and both requests would have been granted.

 I don't think that's a realistic view. There's lots of backend private
 state around. If we try to make all that coherent between backends we'll
 fail.

I agree that there's a lot of backend private state, and that's the
principle problem with making this work at all.  The challenge here is
to figure out which parts of that backend-private state we absolutely
must make coherent between backends to enable parallelism to have a
useful, non-trivial scope, and which parts we can punt on.  I have
ideas about that - which are mostly documented at
https://wiki.postgresql.org/wiki/Parallel_Sort - but I'm trying to
keep an open mind on the details.

 Parallelism is
 generally not about making different things happen; it's about
 spreading the stuff that would happen anyway across multiple backends,
 and things that would succeed if done in a single backend shouldn't
 fail when spread across 2 or more without some compelling
 justification.  The cases where it's actually unsafe for a table to be
 accessed even by some other code within the same backend are handled
 not by the lock manager, but by CheckTableNotInUse().  That call
 should probably fail categorically if issued while parallelism is
 active.

 Which will e.g. prohibit CLUSTER and VACUUM FULL.

I don't think so.  I think it's safe to get to the point where we
check that the table is not in use, then do a parallel scan and sort,
then collect the results and write them out.  But I would not try to
make it safe, for example, to be scanning a table and have some
user-defined function invoked along the way to go try to do anything
that involves CheckTableNotInUse().  Many of those would fall over
anyway because of PreventTransactionChain(), but I've got no problem
nailing that hatch shut twice.  And conversely, I would expect we'd
disallow the sort operators invoked by CLUSTER from reaching any code
guarded by CheckTableNotInUse().

 Suppose somebody fires off a parallel sort on a text column, or a
 parallel sequential scan involving a qual of the form textcol = 'zog'.
 We launch a bunch of workers to do comparisons; they do lookups
 against pg_collation.  After some but not all of them have loaded the
 collation information they need into their local cache, the DBA types
 cluster pg_collate.  It queues up for an AccessExclusiveLock.  The
 remaining parallel workers join the wait queue waiting for their
 AccessShareLock. The system is now deadlocked; the only solution is to
 move the parallel workers ahead of the AccessExclusiveLock request,
 but the deadlock detector won't do that unless it knows about the
 relationship among the parallel workers.

 I think you would need to make the case more complex - we only hold
 locks on system object for a very short time, and mostly not in a nested
 fashion.

Hmm.  You're right.  If the parallel backends release their locks
quickly, the CLUSTER could run and then the parallel job could finish.
But it sounds as if you get the idea.

 But generally I think it's absolutely perfectly alright to fail in such
 case. We need to fail reliably, but *none* of this is a new hazard. You
 can have pretty similar issues today, 

Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Andres Freund
On 2014-10-31 15:56:42 -0400, Robert Haas wrote:
 On Fri, Oct 31, 2014 at 3:32 PM, Andres Freund and...@2ndquadrant.com wrote:
   So just refusing parallelism as soon as anything has taken an access
   exclusive lock doesn't sound too bad to me.
 
  That restriction seems onerous to me; for example, it would prevent a
  parallel sort for CLUSTER or a parallel index rebuild for VACUUM FULL.
  Those seems like worthy targets for parallelism, and maybe even early
  targets, since I expect a lot of the real complexity here to be in the
  query planner, and you can avoid most of that complexity when planning
  DDL.
 
  Ugh. I think that's aiming too high. You'll suddenly need to share cche
  invalidations between the backends. I think we should start by requiring
  that there's no DDL done *at all* in the transaction that starts the
  parallel activity. For CREATE INDEX PARALLEL that can be done by reusing
  the logic we have for CONCURRENTLY, thereby moving the parallelized part
  into a transaction that hasn't done any DDL yet.
 
 I don't think that's correct.  We only need to process local
 invalidation messages after CommandCounterIncrement(), which I
 anticipate prohibiting during parallel execution (short thought should
 convince you that anything else is completely nuts).

It is more complex, even without CCI. As long as you're doing anything
inside a transaction that already has done DDL, you'll have to play
nasty tricks. Imagine the primary backend has done a BEGIN; CLUSTER
pg_class; and then then starts a child backend. Which will get the same
snapshot, combocids, yada yada. But it *also* will have preexisting
cache entries. Those you need to invalidate before it can do anything
correct.


  I'm sorry to be a bit pessimistic here. But my intuition says that
  starting to do group locking opens a can of worms that'll take us a long
  time to close again. Maybe I'm just imagining complexity that won't
  actually be there. But I have a hard time believing that.
 
 What's the distinction between teach the deadlock detector to catch
 these kind of cases and group locking?  Because I think those are
 at least 90% the same thing.

I understand under 'group locking' that a primary/second backends can
coown a lock that normally is self-exclusive. That's a fair bit more
than adding an edge to the deadlock graph between primary/secondary
backends to make the deadlock detector recognize problems.

What I have serious doubts about is 'coowning' locks. Especially if two
backends normally wouldn't be able to both get that lock.

  I wonder if we couldn't implement a 'halfway' by allowing parallel
  workers to jump the lockqueue if the parent process already has the
  lock. That'd only work for nonconflicting locks, but I think that's
  actually good.
 
 The patch takes precisely that approach; that part of the logic is
 already implemented.

Well, my point is that if the solution is just to jump the queue, there
really isn't any data structure changes needed. Secondary acquirers just
need to check whether a lock is already owned by the primary and then
acquire the lock in the absolutely normal way - with a completely
separate entry. Only that they ignored the queue.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISTINCT with btree skip scan

2014-10-31 Thread Thomas Munro
On 27 October 2014 20:24, David Rowley dgrowle...@gmail.com wrote:
 I've had a quick look at this and it seems like a great win! I'm quite
 surprised that we've not got this already. I think this technology could
 also really help performance of queries such as SELECT * from bigtable bt
 WHERE EXISTS(SELECT 1 FROM otherbigtable obt WHERE bt.somecol =
 obt.someIndexedButNonUniqueColumn); I know you're not proposing to improve
 that first off, but it could be done later once the benefits of this are
 more commonly realised.

 I think our shortfalls in this area have not gone unnoticed. I was reading
 this post
 https://www.periscope.io/blog/count-distinct-in-mysql-postgres-sql-server-and-oracle.html
 about comparing performance of COUNT(DISTINCT col). I think this would give
 a big win for query 3 in that post. I'm trying to find some time make some
 changes to transform queries to allow the group by to happen before the
 joins when possible, so between that and your patch we'd be 2 steps closer
 to making query 1 in the link above perform a little better on PostgreSQL.

 Do you think you'll manage to get time to look at this a bit more? I'd be
 keen to look into the costing side of it if you think that'll help you any?

Hi David,

Sorry for the silence, I have been busy moving countries.

I am definitely interested in collaborating on a series of patches to
implement various kinds of skip-based plans as seen in other RDBMSs if
others think it could be useful.  I see skip-based DISTINCT as a good
place to start.  (I suspect the semi-related skip scan plan (for the
case when your WHERE clause doesn't have a qual for the leading
column(s)) is much harder to plan and execute and I also suspect it's
less generally useful).

Here is a rebased version of that patch which fixes a crashing
off-by-one error, and handles backward scans properly, I think.  This
is still just a quick hack to play with the general idea at this
stage.

It works by adding a new index operation 'skip' which the executor
code can use during a scan to advance to the next value (for some
prefix of the index's columns).  That may be a terrible idea and
totally unnecessary... but let me explain my
reasoning:

1.  Perhaps some index implementations can do something better than a
search for the next key value from the root.  Is it possible or
desirable to use the current position as a starting point for a btree
traversal?  I don't know.

2.  It seemed that I'd need to create a new search ScanKey to use the
'rescan' interface for skipping to the next value, but I already had
an insertion ScanKey so I wanted a way to just reuse that.  But maybe
there is some other way to reuse existing index interfaces, or maybe
there is an easy way to make a new search ScanKey from the existing
insertion ScanKey?

Currently it uses the existing IndexOnlyScan plan node, adding an
extra member.  I briefly tried making a different plan called
DistinctIndexScan but it seemed I was going to need to duplicate or
refactor/share a lot of IndexOnlyScan code (this might be the right
thing to do but at this stage I'm just trying to demo the general idea
with minimal changes).

As for costing, I haven't looked at that part of PG at all yet.  If
you *can* do a distinct index scan, would you *always* want to?  How
well could we estimate the cost using existing statistics?  I suppose
the expected number of page fetches is proportional to the expected
number of distinct values (ie skip operations) times btree depth, and
the cost may be adjusted in some way for cache effects (?), but how
well can we estimate the number of distinct values (a), (a, b) or (a,
b, c) in some range for an index on (a, b, c, d)?

As before, you still need the kludge of disabling hashagg to reach the new plan:

postgres=# set enable_hashagg = true;
SET
Time: 0.213 ms
postgres=# select distinct a from foo;
┌───┐
│ a │
├───┤
│ 6 │
│ 8 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 9 │
│ 7 │
│ 0 │
└───┘
(10 rows)

Time: 890.166 ms
postgres=# set enable_hashagg = false;
SET
Time: 0.271 ms
postgres=# select distinct a from foo;
┌───┐
│ a │
├───┤
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 6 │
│ 7 │
│ 8 │
│ 9 │
└───┘
(10 rows)

Time: 0.573 ms

Best regards,
Thomas Munro
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 53cf96f..5f10d7f 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -29,6 +29,7 @@
  *		index_can_return	- does index support index-only scans?
  *		index_getprocid - get a support procedure OID
  *		index_getprocinfo - get a support procedure's lookup info
+ *		index_skip		- advance to the next key value in a scan
  *
  * NOTES
  *		This file contains the index_ routines which used
@@ -742,6 +743,37 @@ index_can_return(Relation indexRelation)
 	  PointerGetDatum(indexRelation)));
 }
 
+bool
+index_can_skip(Relation indexRelation)
+{
+	FmgrInfo   *procedure;
+
+	RELATION_CHECKS;
+
+	/* amcanskip is optional; assume FALSE if 

Re: [HACKERS] alter user/role CURRENT_USER

2014-10-31 Thread Adam Brightwell
All,


 I agree that we should probably seperate the concerns here.  Personally,
 I like the idea of being able to say CURRENT_USER in utility commands
 to refer to the current user where a role would normally be expected, as
 I could see it simplifying things for some applications, but that's a
 new feature and independent of making role-vs-user cases more
 consistent.


So, I've been doing a little digging and it would appear that the ALTER
ROLE/USER consistency was brought up earlier in the year.

http://www.postgresql.org/message-id/cadyruxmv-tvsbv7mttcs+qedny6xj1+krtzfowvuhdjc5mg...@mail.gmail.com

It was returned with feedback in Commitfest 2014-06 and apparently lost
steam:

https://commitfest.postgresql.org/action/patch_view?id=1408

Tom put forward a suggestion for how to fix it:

http://www.postgresql.org/message-id/21570.1408832...@sss.pgh.pa.us

I have taken that patch and updated the documentation (attached) and ran it
through some cursory testing.

At any rate, this is probably a good starting point for those changes.

-Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com
diff --git a/doc/src/sgml/ref/alter_user.sgml b/doc/src/sgml/ref/alter_user.sgml
new file mode 100644
index 58ae1da..ac05682
*** a/doc/src/sgml/ref/alter_user.sgml
--- b/doc/src/sgml/ref/alter_user.sgml
*** ALTER USER replaceable class=PARAMETER
*** 38,47 
  
  ALTER USER replaceable class=PARAMETERname/replaceable RENAME TO replaceablenew_name/replaceable
  
! ALTER USER replaceable class=PARAMETERname/replaceable SET replaceableconfiguration_parameter/replaceable { TO | = } { replaceablevalue/replaceable | DEFAULT }
! ALTER USER replaceable class=PARAMETERname/replaceable SET replaceableconfiguration_parameter/replaceable FROM CURRENT
! ALTER USER replaceable class=PARAMETERname/replaceable RESET replaceableconfiguration_parameter/replaceable
! ALTER USER replaceable class=PARAMETERname/replaceable RESET ALL
  /synopsis
   /refsynopsisdiv
  
--- 38,47 
  
  ALTER USER replaceable class=PARAMETERname/replaceable RENAME TO replaceablenew_name/replaceable
  
! ALTER USER replaceable class=PARAMETERname/replaceable [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] SET replaceableconfiguration_parameter/replaceable { TO | = } { replaceablevalue/replaceable | DEFAULT }
! ALTER USER { replaceable class=PARAMETERname/replaceable | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] SET replaceableconfiguration_parameter/replaceable FROM CURRENT
! ALTER USER { replaceable class=PARAMETERname/replaceable | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] RESET replaceableconfiguration_parameter/replaceable
! ALTER USER { replaceable class=PARAMETERname/replaceable | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] RESET ALL
  /synopsis
   /refsynopsisdiv
  
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
new file mode 100644
index 0de9584..d7c0586
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*** static Node *makeRecursiveViewSelect(cha
*** 230,236 
  		AlterFdwStmt AlterForeignServerStmt AlterGroupStmt
  		AlterObjectSchemaStmt AlterOwnerStmt AlterSeqStmt AlterSystemStmt AlterTableStmt
  		AlterTblSpcStmt AlterExtensionStmt AlterExtensionContentsStmt AlterForeignTableStmt
! 		AlterCompositeTypeStmt AlterUserStmt AlterUserMappingStmt AlterUserSetStmt
  		AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt
  		AlterDefaultPrivilegesStmt DefACLAction
  		AnalyzeStmt ClosePortalStmt ClusterStmt CommentStmt
--- 230,236 
  		AlterFdwStmt AlterForeignServerStmt AlterGroupStmt
  		AlterObjectSchemaStmt AlterOwnerStmt AlterSeqStmt AlterSystemStmt AlterTableStmt
  		AlterTblSpcStmt AlterExtensionStmt AlterExtensionContentsStmt AlterForeignTableStmt
! 		AlterCompositeTypeStmt AlterUserMappingStmt
  		AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt
  		AlterDefaultPrivilegesStmt DefACLAction
  		AnalyzeStmt ClosePortalStmt ClusterStmt CommentStmt
*** static Node *makeRecursiveViewSelect(cha
*** 520,525 
--- 520,527 
  %type str		opt_existing_window_name
  %type boolean opt_if_not_exists
  
+ %type str		role_or_user
+ 
  /*
   * Non-keyword token types.  These are hard-wired into the flex lexer.
   * They must be listed first so that their numeric codes do not depend on
*** stmt :
*** 756,763 
  			| AlterTSConfigurationStmt
  			| AlterTSDictionaryStmt
  			| AlterUserMappingStmt
- 			| AlterUserSetStmt
- 			| AlterUserStmt
  			| AnalyzeStmt
  			| CheckPointStmt
  			| ClosePortalStmt
--- 758,763 
*** CreateUserStmt:
*** 1033,1042 
   *
   * Alter a postgresql DBMS role
   *
   */
  
  AlterRoleStmt:
! 			ALTER ROLE RoleId opt_with AlterOptRoleList
   {
  	AlterRoleStmt *n = 

[HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-10-31 Thread Josh Berkus
PostgreSQL has two bits of obsolete, incomplete functionality which
entrap and frustrate new users in large numbers.  Both of these features
share the following characteristics:

* added more than 10 years ago
* have the same names as useful features from other databases
* were never finished and lack critical functionality
* have not seen significant work in the last 4 releases

Every other day on IRC I run into a newbie who has used one of these
features under the mistaken impression that it is useful, and then had
to be guided in how to get their data out of this broken feature at some
length.  Unknown are the number of users who didn't ask for help but
simply chose to use a different database instead.

Of course, I'm talking about the MONEY type and hash indexes (not the
hash ops class, which is useful, just the index type).  It's time to put
both of these features out to pasture.  Certainly neither of theise
features would be accepted into PostgreSQL today given the shape they're in.

Having these broken features around is like leaving an armed bear-trap
in a public park.

Now, I know the first thing someone will do is jump up and claim that
they were just about to fix WAL-logging on hash indexes, or add casts to
the money type.  But if that hasn't happened in the last 5 years, it's
not going to happen.

We'd be doing our users a huge favor by just removing them in 9.5.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-10-31 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Of course, I'm talking about the MONEY type and hash indexes (not the
 hash ops class, which is useful, just the index type).  It's time to put
 both of these features out to pasture.  Certainly neither of theise
 features would be accepted into PostgreSQL today given the shape they're in.

I don't care one way or the other about the money type, but I will defend
hash indexes, especially seeing that we've already added a pretty
in-your-face warning as of 9.5:

regression=# create table foo(f1 int);
CREATE TABLE
regression=# create index on foo using hash (f1);
WARNING:  hash indexes are not WAL-logged and their use is discouraged
CREATE INDEX

 Now, I know the first thing someone will do is jump up and claim that
 they were just about to fix WAL-logging on hash indexes,

I don't know if/when that will happen as such, but Simon was making noises
about writing code to treat hash indexes as unlogged automatically, which
would more or less fix the worst risks.  That's not just a special case
for hash indexes, but any index AM that lacks WAL support, as third-party
AMs might well do.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] _mdfd_getseg can be expensive

2014-10-31 Thread Andres Freund
Hi,

On 2014-03-31 12:10:01 +0200, Andres Freund wrote:
 I recently have seen some perf profiles in which _mdfd_getseg() was in
 the top #3 when VACUUMing large (~200GB) relations. Called by mdread(),
 mdwrite(). Looking at it's implementation, I am not surprised. It
 iterates over all segment entries a relations has; for every read or
 write. That's not painful for smaller relations, but at a couple of
 hundred GB it starts to be annoying. Especially if kernel readahead has
 already read in all data from disk.
 
 I don't have a good idea what to do about this yet, but it seems like
 something that should be fixed mid-term.
 
 The best I can come up is is caching the last mdvec used, but that's
 fairly ugly. Alternatively it might be a good idea to not store MdfdVec
 as a linked list, but as a densely allocated array.

I've seen this a couple times more since. On larger relations it gets
even more pronounced. When sequentially scanning a 2TB relation,
_mdfd_getseg() gets up to 80% proportionate CPU time towards the end of
the scan.

I wrote the attached patch that get rids of that essentially quadratic
behaviour, by replacing the mdfd chain/singly linked list with an
array. Since we seldomly grow files by a whole segment I can't see the
slightly bigger memory reallocations matter significantly. In pretty
much every other case the array is bound to be a winner.

Does anybody have fundamental arguments against that idea?

With some additional work we could save a bit more memory by getting rid
of the mdfd_segno as it's essentially redundant - but that's not
entirely trivial and I'm unsure if it's worth it.

I've also attached a second patch that makes PageIsVerified() noticeably
faster when the page is new. That's helpful and related because it makes
it easier to test the correctness of the md.c rewrite by faking full 1GB
segments. It's also pretty simple, so there's imo little reason not to
do it.

Greetings,

Andres Freund

PS:
  The research leading to these results has received funding from the
  European Union's Seventh Framework Programme (FP7/2007-2013) under
  grant agreement n° 318633

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From f7bf48137d5c3c4268e0fb6231ca6996d9fe Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Thu, 30 Oct 2014 13:24:20 +0100
Subject: [PATCH 1/2] Faster PageIsVerified() for the all zeroes case.

---
 src/backend/storage/page/bufpage.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 6351a9b..c175ed0 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -81,7 +81,7 @@ bool
 PageIsVerified(Page page, BlockNumber blkno)
 {
 	PageHeader	p = (PageHeader) page;
-	char	   *pagebytes;
+	size_t	   *pagebytes;
 	int			i;
 	bool		checksum_failure = false;
 	bool		header_sane = false;
@@ -118,10 +118,17 @@ PageIsVerified(Page page, BlockNumber blkno)
 			return true;
 	}
 
-	/* Check all-zeroes case */
+	/*
+	 * Check all-zeroes case. Luckily BLCKSZ is guaranteed to always be a
+	 * multiple of size_t - and it's much faster compare memory using the
+	 * processor's native word size.
+	 */
+	StaticAssertStmt(BLCKSZ == (BLCKSZ / sizeof(size_t)) * sizeof(size_t),
+	 BLCKSZ has to be a multiple of sizeof(size_t));
+
 	all_zeroes = true;
-	pagebytes = (char *) page;
-	for (i = 0; i  BLCKSZ; i++)
+	pagebytes = (size_t *) page;
+	for (i = 0; i  (BLCKSZ / sizeof(size_t)); i++)
 	{
 		if (pagebytes[i] != 0)
 		{
-- 
2.0.0.rc2.4.g1dc51c6.dirty

From 5bc40c11081a0d034f2857b45cb3d45c5b63fa58 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Thu, 30 Oct 2014 13:24:52 +0100
Subject: [PATCH 2/2] Much faster/O(1) mfdvec.

---
 src/backend/storage/smgr/md.c   | 307 +---
 src/backend/storage/smgr/smgr.c |   4 +-
 src/include/storage/smgr.h  |   4 +-
 3 files changed, 166 insertions(+), 149 deletions(-)

diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 167d61c..6a7afa8 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -92,27 +92,22 @@
  *	out to an unlinked old copy of a segment file that will eventually
  *	disappear.
  *
- *	The file descriptor pointer (md_fd field) stored in the SMgrRelation
- *	cache is, therefore, just the head of a list of MdfdVec objects, one
- *	per segment.  But note the md_fd pointer can be NULL, indicating
- *	relation not open.
+ *	File descriptors are stored in the md_seg_fds arrays inside
+ *	SMgrRelation. The length of these arrays is stored in md_seg_no.  Note
+ *	that md_nfd having a specific value does not necessarily mean the relation
+ *	doesn't have additional segments; we may just not have opened the next
+ *	segment yet.  (We could not have all segments are in the array as an
+ *	

Re: [HACKERS] tracking commit timestamps

2014-10-31 Thread Michael Paquier
On Sat, Nov 1, 2014 at 1:15 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On 31 October 2014 15:46, Petr Jelinek p...@2ndquadrant.com wrote:

  Attached version with the above comments near the relevant code.

 Looks cooked and ready to serve. Who's gonna commit this? Alvaro, or
 do you want me to?

Could you hold on a bit? I'd like to have a look at it more deeply and by
looking at quickly at the code there are a couple of things that could be
improved.
Regards,
-- 
Michael


Re: [HACKERS] _mdfd_getseg can be expensive

2014-10-31 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 I wrote the attached patch that get rids of that essentially quadratic
 behaviour, by replacing the mdfd chain/singly linked list with an
 array. Since we seldomly grow files by a whole segment I can't see the
 slightly bigger memory reallocations matter significantly. In pretty
 much every other case the array is bound to be a winner.

 Does anybody have fundamental arguments against that idea?

While the basic idea is sound, this particular implementation seems
pretty bizarre.  What's with the md_seg_no stuff, and why is that
array typed size_t?  IOW, why didn't you *just* replace the linked
list with an array?  This patch seems to be making some other changes
that you've failed to explain.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] _mdfd_getseg can be expensive

2014-10-31 Thread Andres Freund
On 2014-10-31 18:48:45 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  I wrote the attached patch that get rids of that essentially quadratic
  behaviour, by replacing the mdfd chain/singly linked list with an
  array. Since we seldomly grow files by a whole segment I can't see the
  slightly bigger memory reallocations matter significantly. In pretty
  much every other case the array is bound to be a winner.
 
  Does anybody have fundamental arguments against that idea?
 
 While the basic idea is sound, this particular implementation seems
 pretty bizarre.  What's with the md_seg_no stuff, and why is that
 array typed size_t?
 IOW, why didn't you *just* replace the linked list with an array?

It stores the length of the array of _MdfdVec entries. To know whether
it's safe to access some element we first need to check whether we've
loaded that many entries. It's size_t[] because that seemed to be the
most appropriate type for the lenght of an array. It's an array because
md.c had already chosen to represent relation forks via an array indexed
by the fork.

So
   size_t   md_seg_no[MAX_FORKNUM + 1];
contains the length of the _MdfdVec array for each fork. These arrays
are stored in:
   struct _MdfdVec *md_seg_fds[MAX_FORKNUM + 1];

 This patch seems to be making some other changes that you've failed to
 explain.

I'm not aware of any that aren't just a consequence of not iterating
through the linked list anymore. What change are you thinking of?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

2014-10-31 Thread Jim Nasby

On 10/24/14, 6:17 PM, Jim Nasby wrote:

- Does anyone have a tangible suggestion for how to reduce the code
duplication in patch #6?


Between execute_sql_string() and tcop/exec_simple_query()? Is there stuff in
exec_simple that's not safe for bgwriter? I'm not seeing why we can't use
exec_simple. :/


There are some differences if you compare them closely.


Without doing a deep dive, I'm guessing that most of the extra stuff would be 
safe to re-use; it just wouldn't affect execute_sql_string. Obviously we could 
add a boolean to exec_simple_query for the case when it's being used by a 
bgwriter. Though, it seems like the biggest differences have to do with logging

Here's the differences I see:

- Disallowing transaction commands
- Logging
- What memory context is used (could we just use that differently in a 
pg_backend backend?)
- Portal output format
- What to do with the output of intermediate commands (surely there's other 
places we need to handle that? plpgsql maybe?)

I think all of those except logging could be handled by a function serving both 
exec_simple_query and execute_sql_command that accepts a few booleans (or maybe 
just one to indicate the caller) and some if's. At least I don't think it'd be 
too bad, without actually writing it.

I'm not sure what to do about logging. If the original backend has logging 
turned on, is it that horrible to do the same logging as exec_simple_query 
would do?

Another option would be factoring out parts of exec_simple_query; the for loop 
over the parse tree might be a good candidate. But I suspect that would be 
uglier than a separate support function.

I do feel uncomfortable with the amount of duplication there is right now 
though...


I took a stab at this by refactoring the guts of exec_simple_query (patch 
attached) into a new function called exec_query_string (also attached in raw 
form). As indicated it needs a bit more work. In particular, how it's dealing 
with excluding transactional commands is rather ugly. Why do we need to do that 
in pg_background?

Andres was concerned about the performance impact of doing this. I tested this 
by doing

for i in {1..99}; do echo 'SELECT 1;'  test.sql; done

and then

time psql -f test.sql  /dev/nul

It appears there may be a ~1% performance hit, but my laptop isn't repeatable 
enough to be sure. I did try manually in-lining exec_query_string to see if it 
was the function call causing the issue; it didn't seem to make a difference.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
Author: Jim Nasby jim.na...@bluetreble.com
Date: Thu, 30 Oct 2014 20:25:34 -0500

Move the bulk of exec_simple_query into exec_query_string() so that
pg_backend can also make use of it.

I’m not thrilled about the name, and I’m not sure tcopprot.h is the
right place to expose this. Also note the XXX comments.
---
 contrib/pg_background/pg_background.c | 141 ++
 src/backend/tcop/postgres.c   |  83 
 src/include/tcop/tcopprot.h   |   7 ++
 3 files changed, 80 insertions(+), 151 deletions(-)

diff --git a/contrib/pg_background/pg_background.c 
b/contrib/pg_background/pg_background.c
index a566ffb..075ecd8 100644
--- a/contrib/pg_background/pg_background.c
+++ b/contrib/pg_background/pg_background.c
@@ -817,10 +817,6 @@ pg_background_worker_main(Datum main_arg)
 static void
 execute_sql_string(const char *sql)
 {
-   List   *raw_parsetree_list;
-   ListCell   *lc1;
-   boolisTopLevel;
-   int commands_remaining;
MemoryContext   parsecontext;
MemoryContext   oldcontext;
 
@@ -839,139 +835,16 @@ execute_sql_string(const char *sql)

 ALLOCSET_DEFAULT_MINSIZE,

 ALLOCSET_DEFAULT_INITSIZE,

 ALLOCSET_DEFAULT_MAXSIZE);
-   oldcontext = MemoryContextSwitchTo(parsecontext);
-   raw_parsetree_list = pg_parse_query(sql);
-   commands_remaining = list_length(raw_parsetree_list);
-   isTopLevel = commands_remaining == 1;
-   MemoryContextSwitchTo(oldcontext);
 
/*
-* Do parse analysis, rule rewrite, planning, and execution for each raw
-* parsetree.  We must fully execute each query before beginning parse
-* analysis on the next one, since there may be interdependencies.
+* Do the real work
 */
-   foreach(lc1, raw_parsetree_list)
-   {
-   Node   *parsetree = (Node *) lfirst(lc1);
-   const char *commandTag;
-   charcompletionTag[COMPLETION_TAG_BUFSIZE];
-   List   *querytree_list,
-  *plantree_list;
-   bool

Re: [HACKERS] infinite loop in _bt_getstackbuf

2014-10-31 Thread Noah Misch
On Fri, Oct 31, 2014 at 10:29:53AM -0400, Robert Haas wrote:
 On Thu, Oct 30, 2014 at 11:45 PM, Noah Misch n...@leadboat.com wrote:
  Given the lack of prior complaints about this
  loop, I'm not sure I see the need to work harder than that; corruption
  of this sort must be quite rare.
 
  Looks like _bt_getstackbuf() is always called with some buffer lock held, so
  CHECK_FOR_INTERRUPTS() alone would not help:
 
  http://www.postgresql.org/message-id/flat/16519.1401395...@sss.pgh.pa.us
 
 That's the insert path.  What about the vacuum path?

I am not aware of an occasion where the vacuum path will call
_bt_getstackbuf() without already holding some buffer lock.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Robert Haas
On Fri, Oct 31, 2014 at 4:10 PM, Andres Freund and...@2ndquadrant.com wrote:
 I don't think that's correct.  We only need to process local
 invalidation messages after CommandCounterIncrement(), which I
 anticipate prohibiting during parallel execution (short thought should
 convince you that anything else is completely nuts).

 It is more complex, even without CCI. As long as you're doing anything
 inside a transaction that already has done DDL, you'll have to play
 nasty tricks. Imagine the primary backend has done a BEGIN; CLUSTER
 pg_class; and then then starts a child backend. Which will get the same
 snapshot, combocids, yada yada. But it *also* will have preexisting
 cache entries. Those you need to invalidate before it can do anything
 correct.

Where will those preexisting cache entries come from, exactly?  The
postmaster is forking the parallel worker, not the user backend.

  I'm sorry to be a bit pessimistic here. But my intuition says that
  starting to do group locking opens a can of worms that'll take us a long
  time to close again. Maybe I'm just imagining complexity that won't
  actually be there. But I have a hard time believing that.

 What's the distinction between teach the deadlock detector to catch
 these kind of cases and group locking?  Because I think those are
 at least 90% the same thing.

 I understand under 'group locking' that a primary/second backends can
 coown a lock that normally is self-exclusive. That's a fair bit more
 than adding an edge to the deadlock graph between primary/secondary
 backends to make the deadlock detector recognize problems.

I guess.  It seems like a pretty minor extension to me.  Getting the
deadlock detector to do the right thing is the hard part.

 What I have serious doubts about is 'coowning' locks. Especially if two
 backends normally wouldn't be able to both get that lock.

Perhaps we should discuss that more.  To me it seems pretty obvious
that's both safe and important.

  I wonder if we couldn't implement a 'halfway' by allowing parallel
  workers to jump the lockqueue if the parent process already has the
  lock. That'd only work for nonconflicting locks, but I think that's
  actually good.

 The patch takes precisely that approach; that part of the logic is
 already implemented.

 Well, my point is that if the solution is just to jump the queue, there
 really isn't any data structure changes needed. Secondary acquirers just
 need to check whether a lock is already owned by the primary and then
 acquire the lock in the absolutely normal way - with a completely
 separate entry. Only that they ignored the queue.

Well, they need to be able to identify who is in their group.  You
could possibly do that without any changes at all to the lock manager
data structures, but it seems a bit complex.  I took the approach of
storing the group leader's PGPROC * in each PROCLOCK, which makes it
trivial (and is enough information for the deadlock detector to work
correctly, too).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-10-31 Thread Eric Ridge
On Fri, Oct 31, 2014 at 6:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I don't know if/when that will happen as such, but Simon was making noises
 about writing code to treat hash indexes as unlogged automatically, which
 would more or less fix the worst risks.  That's not just a special case
 for hash indexes, but any index AM that lacks WAL support, as third-party
 AMs might well do.


As someone writing a 3rd-party AM, literally right this moment, do you have
a link to that thread?  While I follow this list fairly closely I don't
remember seeing this.  I'd love to understand the thoughts around handling
extension-based AMs.

eric


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Simon Riggs
On 31 October 2014 19:36, Robert Haas robertmh...@gmail.com wrote:

 It's an obscure case and its not the only solution either.

 I don't think that's an obscure situation at all.  Do you really think
 a patch that could cause an attempt to VACUUM FULL a system catalog to
 suffer an undetected deadlock meets this community's quality
 standards?  Because that's what we're talking about.

Nobody has said that allowing undetected deadlocks is acceptable, so
your other comments are void.

I've suggested *stricter* locking, which would obviously allow
deadlock detection. You recognised that by claiming that the locking I
had proposed was actually too strict, which is where the above example
came from.

Yes, I have proposed stricter locking, but as explained, the only
things this would slow down are catalog VAC FULLs, which are already a
problem.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Simon Riggs
On 31 October 2014 18:47, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Oct 31, 2014 at 11:02 AM, Simon Riggs si...@2ndquadrant.com wrote:
 You mentioned earlier that functions would need to be marked proisparallel 
 etc..

 What conditions will that be protecting against? If we aren't going to
 support the general case where every single thing works, can we at
 least discuss what the list of cases is that we will support.

 In general, any operation that relies on backend-private state not
 shared with a cooperating backend isn't parallel-safe.  For example,

Yes, the principle is easy to understand, but that was not the question.

You are saying that placing restrictions on which functions can
execute is necessary, yet at the same time saying that we must have
generalised locking for workers because restrictions on locking are
not acceptable. I don't point that out because I want an argument or
to prove a point, but because there are important things on the table
here.

What are the specific restrictions you are suggesting we place on
parallel workers? We need that definition clear so we can decide how
to mark the functions. If we don't know, which is easily
understandable, then the best way to find that out is to build a
restricted solution and to expand slowly outwards to find problems.

An obvious milestone there is whether a function contains SQL, which
is the point chosen by some other databases. I personally hope to
expand upon that, but it would be useful to reach it first and then
continue from there.

I was already thinking of implementing CONTAINS NO SQL as a SQL
Standard function marking since it can be used to optimize COPY
further.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers