Re: [HACKERS] group locking: incomplete patch, just for discussion

2015-07-28 Thread Robert Haas
On Wed, Nov 5, 2014 at 9:26 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Nov 2, 2014 at 7:31 AM, Simon Riggs si...@2ndquadrant.com wrote:
 The procgloballist stuff should be the subject of a separate patch
 which I agree with.

 Yes, I think that's probably a net improvement in robustness quite
 apart from what we decide to do about any of the rest of this.  I've
 attached it here as revise-procglobal-tracking.patch and will commit
 that bit if nobody objects.

In reviewing this thread I realized that I never got around to
committing this bit.  And it still seems like a good idea, so I've
done that now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-12-25 Thread Noah Misch
On Thu, Nov 20, 2014 at 09:15:51PM +0100, Andres Freund wrote:
 On 2014-11-20 13:57:25 -0500, Robert Haas wrote:
  On Thu, Nov 20, 2014 at 12:19 PM, Andres Freund and...@2ndquadrant.com 
  wrote:
  If the
  deadlock detector would kill the processes anyway, it doesn't matter,
  because CheckTableNotInUse() should do it first, so that we get a
  better error and without waiting for deadlock_timeout.  So any
  scenario that's predicated on the assumption that CheckTableNotInUse()
  will succeed in a parallel context is 100% unconvincing to me as an
  argument for anything.
 
 Meh, there's plenty cases where CheckTableNotInUse() isn't used where we
 still rely on locks actually working. And it's not just postgres code,
 this *will* break user code.

Today, holding a relation AccessExclusiveLock ensures no other process is
using the table.  That is one of two prerequisites for making an alteration
that threatens concurrent users of the table.  The other prerequisite is
verifying that no other subsystem of the current process is using the table,
hence CheckTableNotInUse().  Robert's designs do soften AccessExclusiveLock
for relations; holding it will not preclude use of the table in workers of the
same parallel group.  Pessimizing CheckTableNotInUse() compensates.  In every
scenario where AccessExclusiveLock is softer than before, CheckTableNotInUse()
will fail unconditionally.  Code that, today, performs the correct checks
before making a threatening table alteration will remain correct.

ANALYZE and lazy VACUUM are the only core operations that take self-exclusive
relation locks and skip CheckTableNotInUse().  PreventTransactionChain() will
block naive concurrent VACUUM.  (If that weren't the case, VACUUM would need
CheckTableNotInUse() even without parallelism in the picture.  I wouldn't
oppose adding CheckTableNotInUse() to VACUUM as defense in depth.)  ANALYZE is
special.  It's compatible with most concurrent use of the table, so
CheckTableNotInUse() is unnecessary.  Blocking ANALYZE during parallelism,
even if we someday allow other database writes, is a wart I can live with.

Given a choice between no parallelism and parallelism when the initiating
transaction holds no self-exclusive locks, I'd pick the latter.  But you have
overstated the significance of holding a lock and the degree to which Robert's
proposals change the same.

 Your argument essentially is that we don't actually need to lock objects
 if the other side only reads. Yes, you add a condition or two ontop
 (e.g. CheckTableNotInUse() fails), but that's not that much. If we want
 to do this we really need to forbid *any* modification to catalog/user
 tables while parallel operations are ongoing. In the primary and worker
 backends.  I think that's going to end up being much more
 problematic/restrictive than not allowing non-cooperative paralellism
 after = ShareUpdateExclusive. If we ever want to parallelize writing
 operations we'll regret this quite a bit.

https://wiki.postgresql.org/wiki/Parallel_Sort already contemplates blocking
every kind of database write, due to challenges around combo CIDs (affects
UPDATE and DELETE) and invalidation messages (mostly affects DDL).  I doubt
we'll ever care to allow a worker to start an independent DDL operation.  We
might want to allow INSERT, UPDATE, and DELETE.  I don't see that allowing
multiple workers to hold AccessExclusiveLock will make that more difficult.
Difficulties around XactLockTableWait() are more noteworthy, and they stand
regardless of what we do with heavyweight locking.

 The 'lets grant all the current locks at start of parallel query'
 approach seems quite a bit safer than always doing that during parallel
 query, don't get me wrong.

Agreed.

 Btw, what are your thoughts about SERIALIZABLE and parallel query?
 Afaics that'd be pretty hard to support?

It didn't look obviously awful, because most of the relevant data structures
live in shared memory.  We already have the concept of one backend adjusting
the predicate locks pertaining to another backend's transaction.  (That's not
to say the number of SERIALIZABLE-specific changes will be zero, either.)

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-12-09 Thread Peter Geoghegan
On Sun, Nov 23, 2014 at 3:59 PM, Andres Freund and...@2ndquadrant.com wrote:
  The heavyweight locking issue is really killing me, though.
 
  I don't really understand why you're not content with just detecting
  deadlocks for now. Everything else seems like bells and whistles for
  later.

 I don't think it's OK for a parallel query to just randomly deadlock.

 I think that should be the end goal, yes.

 It's OK for parallel query to excuse itself politely and retire from
 the field, but I don't think it's OK for it to say, oh, sure, I can
 parallelize that for you, and then fail by deadlocking with itself.

 But I also think that right now it doesn't necessarily need to be the
 focus immediately.

I am in strong agreement with Robert that unprincipled deadlocks are
not okay. Now, what that actually means in practice might be kind of
hard to delineate.

Take my ON CONFLICT UPDATE patch. The locking there is optimistic, and
so in theory there are risks of lock starvation [1]. There is no
axiomatic reason why a single backend cannot be starved of a lock in
the event of a perfect storm of conflicts. Similarly, the Lehman and
Yao paper makes what can only be called a plausibility argument around
how livelocks are not a concern in practice [2] when an indefinite
number of move right operations are required after a page split;
having written a bunch of theorems around the correctness of other
aspects of their algorithm, the argument here is more or less: Come
on, that'll never happen!. Concurrency is hard.

If you want another example of this kind of thing, then there's how
Oracle deals with READ COMMITTED row conflicts -- the statement is
rolled back and restarted. Who is to say with certainty that it'll
make any progress the second time? Or the third or forth?

Note that the optimistic locking stuff was actually my solution to the
unprincipled deadlocks problem, a solution that, as I say, also
implies the theoretical risk of lock starvation (but not livelock). I
think I would also have found it acceptable if there was a solution
that reduced the original risk, the risk of *deadlocks* to merely a
theoretical one, as opposed to a very real one (that could be shown
with a simple testcase).

 This is a topic that I think will be much easier to
 approach if some demonstration of actual parallel users would be
 available. Doesn't have to be committable or such, but I think it'll
 help to shape how this has to look.  I think unrecognized deadlocks will
 make things too annoying to use, but I don't think it needs to be
 guaranteed deadlock free or such.

I couldn't agree more. The greater point about livelock, which I
believe applies here as well, is that there are different standards by
which we might consider that unprincipled deadlocks won't happen.
Basically, it seems inevitable that a practical standard sometimes
must be applied. If a well informed person, say myself or Andres,
cannot make any given practical implementation deadlock (or
alternatively be starved for locks, if we're talking about an
optimistic locking work-around to unprincipled deadlocks that
introduces that risk), then I guess that means that there are no
unprincipled deadlocks/livelocks, and I for one will probably find
that acceptable given enough testing.

I realize that that might sound terribly loose (I probably forgot a
few caveats), but what it boils down to is that we *don't* have the
tools to analyze certain types of problems like this, or at least I
don't think we do, because no one does. The closest thing we have is
some well written custom stress testing, and lots of scrutiny. Of
course I'd prefer a solution that is provably correct from first
principles (in other words, I'd prefer to use algorithms that are in
some formal sense non-blocking [3] or something along those lines),
but that will often be cost prohibitive or otherwise illusive when
designing concurrent algorithms.

I think I agree with you both.

[1] https://wiki.postgresql.org/wiki/UPSERT#Theoretical_lock_starvation_hazards
[2] http://www.cs.cornell.edu/courses/cs4411/2009sp/blink.pdf ,
Section 6.4 Livelock
[3] https://en.wikipedia.org/wiki/Non-blocking_algorithm
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-24 Thread Robert Haas
On Thu, Nov 20, 2014 at 4:45 PM, Andres Freund and...@2ndquadrant.com wrote:
 How about a frontend process having created a relation that then starts
 a parallel query. Then the frontend process ERRORs out and, in the
 course of that, does smgrDoPendingDeletes(). Which will delete the new
 relation. Boom. The background process might just be accessing it. If
 you think thats harmless, think e.g. what'd happen with heap cleanup
 records generated in the background process. They'd not replay.

I spent some time thinking about this case.  I don't think this is
really principally a locking issue; I think it it's really a timing
issue around transaction abort.  I believe that, if we wanted to be
able to write any tuples at all from within a parallel worker, there
are certain phases of transaction abort processing that would need to
happen only once we're sure that all of the backends involved have
done their local abort processing.  smgrDoPendingDeletes() is one, but
I think we have pretty much the same problem with
RecordTransactionAbort() and ProcArrayEndTransaction().  Without some
synchronization, it would be possible for a parallel backend to stamp
a tuple with an XID after the abort record is already written, or
after the transaction has already been removed from the ProcArray.
Are we prepared to view that sort of thing as harmless?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-23 Thread Andres Freund
On 2014-11-21 08:31:01 -0500, Robert Haas wrote:
 On Thu, Nov 20, 2014 at 8:31 PM, Andres Freund and...@2ndquadrant.com wrote:
  I can't follow that logic.  Why do you think self-exclusive locks are
  the only ones that matter?
 
  All the others can be acquired by jumping in front of the lock's
  waitqueue?
 
 That's is true, as a practical matter, in many cases.  But from the
 point of view of the lock manager, it's really not true.  It is almost
 true if you assume that the lock level acquired by the parallel
 backend will be weaker than the lock level held by the user backed,
 but not quite, because RowExclusiveLock conflicts with ShareLock.

The assumption that a parallel backend will accquire a locklevel = the
worker's lock imo is a pretty sane one - we obviously can't predict
things if not. And I don't think any of the other presented approaches
can do that.

I'm not really bothered by RowExclusiveLock vs. ShareLock. Unless I miss
something that can only be problematic if the primary acquires a
ShareLock and the worker tries a RowExclusive. That just can't be sane.

 The assumption that the parallel backend's lock level will always be
 weaker seems shaky.  It's probably true for relation locks, but if we
 ever want to support parallel workers that can write any data
 whatsoever, they're going to need to be able to take relation
 extension locks.

I don't think extension locks are a good argument here. As you noted
upthread, they really need to be used consistenly across processes. And
they shouldn't/wouldn't be granted by your suggested mechanism either,
right?

 If one backend in a parallel group attempts to
 acquire the relation extension lock while another worker holds it, the
 rule that all group members should be regarded as waiting for each
 other will result in instantaneous deadlock.  That's pretty
 undesirable, and suggests to me that the whole model is defective in
 some way.

Shouldn't the deadlock checker actually check that all involved
processes are actually waiting for a lock? It seems a bit pointless to
deadlock when one node is actually progressing. That seems like it'd be
an absolute PITA for anything involving system tables and such.

   I don't buy the plan time/execution time argument. We'll need to be able
   to adapt plans to the availability of bgworker slots *anyway*. Otherwise
   we either need to to set the number of bgworkers to MaxConnections *
   MaxParallelism or live with errors as soon as too many processes use
   parallelism. The former is clearly unrealistic given PG's per-backend
   overhead and the latter will be a *much* bigger PITA than all the
   deadlock scenarios talked about here. It'll constantly fail, even if
   everything is ok.
 
  I certainly think that any parallel node needs to be able to cope with
  getting fewer workers than it would ideally like to have.  But that's
  different from saying that when our own backend takes new locks, we
  have to invalidate plans.  Those seem like largely unrelated problems.
 
  No need to invalidate them if you have the infrastructure to run with no
  parallel workers - just use the path that's used if there's no workers
  available.
 
 Well, it's OK for a query to run with less paralellism than the
 planner thought optimal.  It's not OK for it to attempt to run with
 the requested degree of parallelism and deadlock.  I think if the
 deadlock detector gets involved it's quite unlikely that you're going
 to be able to make things degrade smoothly to non-parallel operation.

That was in response to your argument about plan invalidation - which I
don't buy because it'd only reuse the graceful degradiation capabilities
we need anyway.

  The heavyweight locking issue is really killing me, though.
 
  I don't really understand why you're not content with just detecting
  deadlocks for now. Everything else seems like bells and whistles for
  later.
 
 I don't think it's OK for a parallel query to just randomly deadlock.

I think that should be the end goal, yes.

 It's OK for parallel query to excuse itself politely and retire from
 the field, but I don't think it's OK for it to say, oh, sure, I can
 parallelize that for you, and then fail by deadlocking with itself.

But I also think that right now it doesn't necessarily need to be the
focus immediately. This is a topic that I think will be much easier to
approach if some demonstration of actual parallel users would be
available. Doesn't have to be committable or such, but I think it'll
help to shape how this has to look.  I think unrecognized deadlocks will
make things too annoying to use, but I don't think it needs to be
guaranteed deadlock free or such.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-21 Thread Robert Haas
On Thu, Nov 20, 2014 at 8:31 PM, Andres Freund and...@2ndquadrant.com wrote:
 I can't follow that logic.  Why do you think self-exclusive locks are
 the only ones that matter?

 All the others can be acquired by jumping in front of the lock's
 waitqueue?

That's is true, as a practical matter, in many cases.  But from the
point of view of the lock manager, it's really not true.  It is almost
true if you assume that the lock level acquired by the parallel
backend will be weaker than the lock level held by the user backed,
but not quite, because RowExclusiveLock conflicts with ShareLock.

The assumption that the parallel backend's lock level will always be
weaker seems shaky.  It's probably true for relation locks, but if we
ever want to support parallel workers that can write any data
whatsoever, they're going to need to be able to take relation
extension locks.  If one backend in a parallel group attempts to
acquire the relation extension lock while another worker holds it, the
rule that all group members should be regarded as waiting for each
other will result in instantaneous deadlock.  That's pretty
undesirable, and suggests to me that the whole model is defective in
some way.

  I don't buy the plan time/execution time argument. We'll need to be able
  to adapt plans to the availability of bgworker slots *anyway*. Otherwise
  we either need to to set the number of bgworkers to MaxConnections *
  MaxParallelism or live with errors as soon as too many processes use
  parallelism. The former is clearly unrealistic given PG's per-backend
  overhead and the latter will be a *much* bigger PITA than all the
  deadlock scenarios talked about here. It'll constantly fail, even if
  everything is ok.

 I certainly think that any parallel node needs to be able to cope with
 getting fewer workers than it would ideally like to have.  But that's
 different from saying that when our own backend takes new locks, we
 have to invalidate plans.  Those seem like largely unrelated problems.

 No need to invalidate them if you have the infrastructure to run with no
 parallel workers - just use the path that's used if there's no workers
 available.

Well, it's OK for a query to run with less paralellism than the
planner thought optimal.  It's not OK for it to attempt to run with
the requested degree of parallelism and deadlock.  I think if the
deadlock detector gets involved it's quite unlikely that you're going
to be able to make things degrade smoothly to non-parallel operation.

  I
 seriously doubt that can ever be made safe, but I'd choose to prohibit
 it using something other than the heavyweight lock manager.

 Right. It's perfectly fine to forbid it imo. But there's lots of places
 that have assumptions about the locking behaviour baked in, and the
 relevant bugs will be hard to find.

That's pretty vague.  I think we can rule out the great bulk of the
problem cases by prohibiting backends from starting a DDL command
while parallel mode is in effect.  (We may be parallelizing a DDL
command or just a regular query, but once we've initiated parallelism,
no NEW DDL commands can be started.)

We should focus on trying to foresee what can go wrong apart from
that.  That's why I think your example about smgrDoPendingDeletes() is
a pretty helpful one to think about, but this one seems like something
that has no chance of slipping through the cracks.

 The heavyweight locking issue is really killing me, though.

 I don't really understand why you're not content with just detecting
 deadlocks for now. Everything else seems like bells and whistles for
 later.

I don't think it's OK for a parallel query to just randomly deadlock.
It's OK for parallel query to excuse itself politely and retire from
the field, but I don't think it's OK for it to say, oh, sure, I can
parallelize that for you, and then fail by deadlocking with itself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-20 Thread Jeff Davis
On Wed, 2014-11-19 at 11:03 -0500, Robert Haas wrote:
 But your proposal does not solve this problem:
 
 1. Backend A-1 acquires AccessExclusiveLock on relation R.
 2. Backend A-2 waits for AccessShareLock on relation R.
 
 The good news is that the deadlock detector should realize that since
 A-1 and A-2 are in the same group, this is a deadlock.  And it can
 abort either A-1 or A-2, which will eventually abort them both.

Right. It can even give a nice error hint to tell the user how to avoid
the problem.

   The
 bad news, to borrow a phrase from Peter Geoghegan, is that it's an
 unprincipled deadlock; a user confronted with the news that her
 parallel scan has self-deadlocked will be justifiably dismayed.

You seem to be raising this as a show-stopping problem, and I'm not
convinced that it is.

(a) There are no parallel operators yet, so this is speculative.
(b) Out of the parallel operators you imagine (e.g. Scan), we don't
expect anything other than AccessShare locks in the normal case.
(c) The failure mode is not so bad; it's just an error and the user can
probably work around it.

You could argue that we know we're headed for this problem, and
therefore we should solve it now. I disagree. You are assuming that
sharing exclusive heavyweight locks among a group will be a fundamental
part of everything postgres does with parallelism; but not every design
requires it.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-20 Thread Robert Haas
On Thu, Nov 20, 2014 at 4:21 AM, Jeff Davis pg...@j-davis.com wrote:
   The
 bad news, to borrow a phrase from Peter Geoghegan, is that it's an
 unprincipled deadlock; a user confronted with the news that her
 parallel scan has self-deadlocked will be justifiably dismayed.

 You seem to be raising this as a show-stopping problem, and I'm not
 convinced that it is.

Well, what I'm saying is that at very minimum we have to be able
detect deadlocks, and we have two plausible designs for avoiding that:

1. Modify the deadlock detector to know about lock groups.

2. Propagate pre-existing locks from the user backend to all the workers.

I initially proposed #1, but now I think #2 solves more of the
problems for less code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-20 Thread Jeff Davis
On Thu, 2014-11-20 at 11:22 -0500, Robert Haas wrote:
 2. Propagate pre-existing locks from the user backend to all the workers.
 
 I initially proposed #1, but now I think #2 solves more of the
 problems for less code.

OK. The primary concern with that is unintended consequences. But it's
reasonable for you to ask for something more concrete. I will think on
this more.

A few things I'm thinking about now:

 * What do you mean by pre-existing? Locks existing prior to what
event? (I don't think that's exactly what you meant.)
 * What's the conceptual difference between granting locks that would
otherwise conflict with another process in the group (which is what this
proposal is about) and having exactly the same set of locks? Is there
any?
 * Let's say you have processes A1 and A2 in one group, and B. A1 and B
both have an AccessShare lock, and A2 tries to acquire an exclusive
lock. B is waiting on A2. That's still a deadlock, right?

Regards,
Jeff Davis





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-20 Thread Andres Freund
On 2014-11-20 11:22:37 -0500, Robert Haas wrote:
 On Thu, Nov 20, 2014 at 4:21 AM, Jeff Davis pg...@j-davis.com wrote:
The
  bad news, to borrow a phrase from Peter Geoghegan, is that it's an
  unprincipled deadlock; a user confronted with the news that her
  parallel scan has self-deadlocked will be justifiably dismayed.
 
  You seem to be raising this as a show-stopping problem, and I'm not
  convinced that it is.
 
 Well, what I'm saying is that at very minimum we have to be able
 detect deadlocks, and we have two plausible designs for avoiding that:
 
 1. Modify the deadlock detector to know about lock groups.
 
 2. Propagate pre-existing locks from the user backend to all the workers.
 
 I initially proposed #1, but now I think #2 solves more of the
 problems for less code.

Except that it opens us up for all kinds of concurrency bugs. I'm pretty
strictly set against granting any self exclusive locks en-masse. If we
do this by default for all granted locks when starting a worker backend
it'll get *so* much harder to reason about correctness. Suddenly locks
don't guarantee what they used to anymore. We'll e.g. not be able to
rely that a CheckTableNotInUse() + AEL makes it safe to
drop/rewrite/whatever a relation - even if that happens in the main
backend.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-20 Thread Robert Haas
On Thu, Nov 20, 2014 at 12:12 PM, Jeff Davis pg...@j-davis.com wrote:
 On Thu, 2014-11-20 at 11:22 -0500, Robert Haas wrote:
 2. Propagate pre-existing locks from the user backend to all the workers.

 I initially proposed #1, but now I think #2 solves more of the
 problems for less code.

 OK. The primary concern with that is unintended consequences. But it's
 reasonable for you to ask for something more concrete. I will think on
 this more.

 A few things I'm thinking about now:

  * What do you mean by pre-existing? Locks existing prior to what
 event? (I don't think that's exactly what you meant.)
  * What's the conceptual difference between granting locks that would
 otherwise conflict with another process in the group (which is what this
 proposal is about) and having exactly the same set of locks? Is there
 any?
  * Let's say you have processes A1 and A2 in one group, and B. A1 and B
 both have an AccessShare lock, and A2 tries to acquire an exclusive
 lock. B is waiting on A2. That's still a deadlock, right?

I think I discussed all of these issues on the other thread already.
Am I wrong?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-20 Thread Robert Haas
On Thu, Nov 20, 2014 at 12:19 PM, Andres Freund and...@2ndquadrant.com wrote:
 Except that it opens us up for all kinds of concurrency bugs. I'm pretty
 strictly set against granting any self exclusive locks en-masse. If we
 do this by default for all granted locks when starting a worker backend
 it'll get *so* much harder to reason about correctness. Suddenly locks
 don't guarantee what they used to anymore. We'll e.g. not be able to
 rely that a CheckTableNotInUse() + AEL makes it safe to
 drop/rewrite/whatever a relation - even if that happens in the main
 backend.

Haven't I responded to this a few times already?

I see no way, even theoretically, that it can be sane for
CheckTableNotInUse() to succeed in a parallel context.  Ever.  If the
deadlock detector would kill the processes anyway, it doesn't matter,
because CheckTableNotInUse() should do it first, so that we get a
better error and without waiting for deadlock_timeout.  So any
scenario that's predicated on the assumption that CheckTableNotInUse()
will succeed in a parallel context is 100% unconvincing to me as an
argument for anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-20 Thread Andres Freund
On 2014-11-20 13:57:25 -0500, Robert Haas wrote:
 On Thu, Nov 20, 2014 at 12:19 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  Except that it opens us up for all kinds of concurrency bugs. I'm pretty
  strictly set against granting any self exclusive locks en-masse. If we
  do this by default for all granted locks when starting a worker backend
  it'll get *so* much harder to reason about correctness. Suddenly locks
  don't guarantee what they used to anymore. We'll e.g. not be able to
  rely that a CheckTableNotInUse() + AEL makes it safe to
  drop/rewrite/whatever a relation - even if that happens in the main
  backend.
 
 Haven't I responded to this a few times already?

Not in a particularly convincing way at least.

 I see no way, even theoretically, that it can be sane for
 CheckTableNotInUse() to succeed in a parallel context.  Ever.

It's not exactly inconceivable that you'd want a parallel
CLUSTER/REINDEX/VACUUM or something similar. That'll require cooperating
backends, true, but it's still a pretty sane case.

You haven't really made your case why you want to allow access to AEL
locked relations in a parallel context in the first place.

 If the
 deadlock detector would kill the processes anyway, it doesn't matter,
 because CheckTableNotInUse() should do it first, so that we get a
 better error and without waiting for deadlock_timeout.  So any
 scenario that's predicated on the assumption that CheckTableNotInUse()
 will succeed in a parallel context is 100% unconvincing to me as an
 argument for anything.

Meh, there's plenty cases where CheckTableNotInUse() isn't used where we
still rely on locks actually working. And it's not just postgres code,
this *will* break user code.

Your argument essentially is that we don't actually need to lock objects
if the other side only reads. Yes, you add a condition or two ontop
(e.g. CheckTableNotInUse() fails), but that's not that much. If we want
to do this we really need to forbid *any* modification to catalog/user
tables while parallel operations are ongoing. In the primary and worker
backends.  I think that's going to end up being much more
problematic/restrictive than not allowing non-cooperative paralellism
after = ShareUpdateExclusive. If we ever want to parallelize writing
operations we'll regret this quite a bit.

Breaking the locking system - and that's what you're doing by removing
the usual guarantees - seems just fundamentally wrong.  Yes, I can't
name all the dangers that I think are out there.

The 'lets grant all the current locks at start of parallel query'
approach seems quite a bit safer than always doing that during parallel
query, don't get me wrong.

Btw, what are your thoughts about SERIALIZABLE and parallel query?
Afaics that'd be pretty hard to support?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-20 Thread Robert Haas
On Thu, Nov 20, 2014 at 3:15 PM, Andres Freund and...@2ndquadrant.com wrote:
 Haven't I responded to this a few times already?
 Not in a particularly convincing way at least.

That's not a very helpful response.

 I see no way, even theoretically, that it can be sane for
 CheckTableNotInUse() to succeed in a parallel context.  Ever.

 It's not exactly inconceivable that you'd want a parallel
 CLUSTER/REINDEX/VACUUM or something similar. That'll require cooperating
 backends, true, but it's still a pretty sane case.

 You haven't really made your case why you want to allow access to AEL
 locked relations in a parallel context in the first place.

Well, what the heck was I doing here?

http://www.postgresql.org/message-id/ca+tgmoa_+u-qccfz9zavzfk+tgxbqqb1+0ktpzrggpj5pvr...@mail.gmail.com

I've *repeatedly* point out that if you *don't* allow this, stuff
breaks.  And you haven't answered how you'd like to fix that.  The
only answer you've given so far, that I can see, is that maybe we can
foresee all the locks that the child might take and not initiate
parallelism if any of them conflict with locks we already hold.  But
that's not a reasonable approach; we have no reasonable way of
predicting what locks the child will need, and even if we did, things
can change between plan time and execution time.

 If the
 deadlock detector would kill the processes anyway, it doesn't matter,
 because CheckTableNotInUse() should do it first, so that we get a
 better error and without waiting for deadlock_timeout.  So any
 scenario that's predicated on the assumption that CheckTableNotInUse()
 will succeed in a parallel context is 100% unconvincing to me as an
 argument for anything.

 Meh, there's plenty cases where CheckTableNotInUse() isn't used where we
 still rely on locks actually working. And it's not just postgres code,
 this *will* break user code.

It would help if you'd provide some specific examples of the problems
you foresee.  In the absence of specificity, it's hard to tell come to
any conclusions about whether your concerns are well-founded, or
propose any way to overcome them.

 Your argument essentially is that we don't actually need to lock objects
 if the other side only reads. Yes, you add a condition or two ontop
 (e.g. CheckTableNotInUse() fails), but that's not that much.

I don't think that's my argument at all.  What I'm arguing is that
there's more than one reason for taking locks.  Broadly, I think we
take some locks to preserve transaction isolation guarantees and
others to preserve data integrity.  If you're adding or dropping a
column, nobody else had better be looking at that relation, because
they might get confused and crash the server or something.  Backends,
even cooperating parallel backends, won't be able to cope with the
tuple descriptor changing under them.  But once that operation is
complete, there's no reason why the *next* statement in that
transaction can't see the relation you just modified, even though it's
still now exclusively locked.  As long as we make sure that new
parallel backends use our snapshot, they'll see the right metadata for
that relation and can safely access it.  We still can't let *other*
people see that relation because the ALTER TABLE is an uncommitted
change, but that's no argument against letting our own parallel
workers look at it.

 If we want
 to do this we really need to forbid *any* modification to catalog/user
 tables while parallel operations are ongoing. In the primary and worker
 backends.  I think that's going to end up being much more
 problematic/restrictive than not allowing non-cooperative paralellism
 after = ShareUpdateExclusive. If we ever want to parallelize writing
 operations we'll regret this quite a bit.

I can't really follow how this follows from anything I said.

 Breaking the locking system - and that's what you're doing by removing
 the usual guarantees - seems just fundamentally wrong.  Yes, I can't
 name all the dangers that I think are out there.

 The 'lets grant all the current locks at start of parallel query'
 approach seems quite a bit safer than always doing that during parallel
 query, don't get me wrong.

Lets grant all the current locks at start of parallel query is the
proposal I'm focused on right now.  Based on your arguments and those
of others, I have long-since abandoned the original proposal of having
locks within a group never conflict.  It seemed like a good idea at
the time, but it would put way too much burden on the parallel workers
to coordinate their activity with each other, so it's dead.  What I
want to establish at this time is what types of problems, if any, you
or others can foresee with Lets grant all the current locks at start
of parallel query - because I sense considerable skepticism, but
cannot put my finger on anything concretely wrong with it.

 Btw, what are your thoughts about SERIALIZABLE and parallel query?
 Afaics that'd be pretty hard to support?

Hmm, I haven't thought about that.  

Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-20 Thread Andres Freund
On 2014-11-20 15:47:39 -0500, Robert Haas wrote:
 On Thu, Nov 20, 2014 at 3:15 PM, Andres Freund and...@2ndquadrant.com wrote:
  Haven't I responded to this a few times already?
  Not in a particularly convincing way at least.
 
 That's not a very helpful response.

Well...

  I see no way, even theoretically, that it can be sane for
  CheckTableNotInUse() to succeed in a parallel context.  Ever.
 
  It's not exactly inconceivable that you'd want a parallel
  CLUSTER/REINDEX/VACUUM or something similar. That'll require cooperating
  backends, true, but it's still a pretty sane case.
 
  You haven't really made your case why you want to allow access to AEL
  locked relations in a parallel context in the first place.
 
 Well, what the heck was I doing here?
 
 http://www.postgresql.org/message-id/ca+tgmoa_+u-qccfz9zavzfk+tgxbqqb1+0ktpzrggpj5pvr...@mail.gmail.com

I don't see anything about actual use cases there.

 I've *repeatedly* point out that if you *don't* allow this, stuff
 breaks.

You've pointed out some strange corner cases that might break yes.

 And you haven't answered how you'd like to fix that.  The
 only answer you've given so far, that I can see, is that maybe we can
 foresee all the locks that the child might take and not initiate
 parallelism if any of them conflict with locks we already hold. But
 that's not a reasonable approach; we have no reasonable way of
 predicting what locks the child will need, and even if we did, things
 can change between plan time and execution time.

We don't need to predict all future locks. We only need to check which
self-exclusive locks we are already holding.

I don't buy the plan time/execution time argument. We'll need to be able
to adapt plans to the availability of bgworker slots *anyway*. Otherwise
we either need to to set the number of bgworkers to MaxConnections *
MaxParallelism or live with errors as soon as too many processes use
parallelism. The former is clearly unrealistic given PG's per-backend
overhead and the latter will be a *much* bigger PITA than all the
deadlock scenarios talked about here. It'll constantly fail, even if
everything is ok.

  If the
  deadlock detector would kill the processes anyway, it doesn't matter,
  because CheckTableNotInUse() should do it first, so that we get a
  better error and without waiting for deadlock_timeout.  So any
  scenario that's predicated on the assumption that CheckTableNotInUse()
  will succeed in a parallel context is 100% unconvincing to me as an
  argument for anything.
 
  Meh, there's plenty cases where CheckTableNotInUse() isn't used where we
  still rely on locks actually working. And it's not just postgres code,
  this *will* break user code.
 
 It would help if you'd provide some specific examples of the problems
 you foresee.  In the absence of specificity, it's hard to tell come to
 any conclusions about whether your concerns are well-founded, or
 propose any way to overcome them.

How about a frontend process having created a relation that then starts
a parallel query. Then the frontend process ERRORs out and, in the
course of that, does smgrDoPendingDeletes(). Which will delete the new
relation. Boom. The background process might just be accessing it. If
you think thats harmless, think e.g. what'd happen with heap cleanup
records generated in the background process. They'd not replay.

Another one is that the frontend process does, from some function or so,
CREATE POLICY. Triggering a relcache flush. And RelationClearRelation()
essentially assumes that a referenced relcache entry can't change (which
is the reason the keep_* stuff is in RelationClearRelation()).

I'm pretty sure there's at least a dozen other such hazards around.

  Your argument essentially is that we don't actually need to lock objects
  if the other side only reads. Yes, you add a condition or two ontop
  (e.g. CheckTableNotInUse() fails), but that's not that much.
 
 I don't think that's my argument at all.  What I'm arguing is that
 there's more than one reason for taking locks.  Broadly, I think we
 take some locks to preserve transaction isolation guarantees and
 others to preserve data integrity.  If you're adding or dropping a
 column, nobody else had better be looking at that relation, because
 they might get confused and crash the server or something.  Backends,
 even cooperating parallel backends, won't be able to cope with the
 tuple descriptor changing under them.  But once that operation is
 complete, there's no reason why the *next* statement in that
 transaction can't see the relation you just modified, even though it's
 still now exclusively locked.  As long as we make sure that new
 parallel backends use our snapshot, they'll see the right metadata for
 that relation and can safely access it.  We still can't let *other*
 people see that relation because the ALTER TABLE is an uncommitted
 change, but that's no argument against letting our own parallel
 workers look at it.

I think the transaction 

Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-20 Thread Robert Haas
On Thu, Nov 20, 2014 at 4:45 PM, Andres Freund and...@2ndquadrant.com wrote:
 And you haven't answered how you'd like to fix that.  The
 only answer you've given so far, that I can see, is that maybe we can
 foresee all the locks that the child might take and not initiate
 parallelism if any of them conflict with locks we already hold. But
 that's not a reasonable approach; we have no reasonable way of
 predicting what locks the child will need, and even if we did, things
 can change between plan time and execution time.

 We don't need to predict all future locks. We only need to check which
 self-exclusive locks we are already holding.

I can't follow that logic.  Why do you think self-exclusive locks are
the only ones that matter?

 I don't buy the plan time/execution time argument. We'll need to be able
 to adapt plans to the availability of bgworker slots *anyway*. Otherwise
 we either need to to set the number of bgworkers to MaxConnections *
 MaxParallelism or live with errors as soon as too many processes use
 parallelism. The former is clearly unrealistic given PG's per-backend
 overhead and the latter will be a *much* bigger PITA than all the
 deadlock scenarios talked about here. It'll constantly fail, even if
 everything is ok.

I certainly think that any parallel node needs to be able to cope with
getting fewer workers than it would ideally like to have.  But that's
different from saying that when our own backend takes new locks, we
have to invalidate plans.  Those seem like largely unrelated problems.

  If the
  deadlock detector would kill the processes anyway, it doesn't matter,
  because CheckTableNotInUse() should do it first, so that we get a
  better error and without waiting for deadlock_timeout.  So any
  scenario that's predicated on the assumption that CheckTableNotInUse()
  will succeed in a parallel context is 100% unconvincing to me as an
  argument for anything.
 
  Meh, there's plenty cases where CheckTableNotInUse() isn't used where we
  still rely on locks actually working. And it's not just postgres code,
  this *will* break user code.

 It would help if you'd provide some specific examples of the problems
 you foresee.  In the absence of specificity, it's hard to tell come to
 any conclusions about whether your concerns are well-founded, or
 propose any way to overcome them.

 How about a frontend process having created a relation that then starts
 a parallel query. Then the frontend process ERRORs out and, in the
 course of that, does smgrDoPendingDeletes(). Which will delete the new
 relation. Boom. The background process might just be accessing it. If
 you think thats harmless, think e.g. what'd happen with heap cleanup
 records generated in the background process. They'd not replay.

OK, that's an excellent example.  Thanks.  I'll think about that.

 Another one is that the frontend process does, from some function or so,
 CREATE POLICY. Triggering a relcache flush. And RelationClearRelation()
 essentially assumes that a referenced relcache entry can't change (which
 is the reason the keep_* stuff is in RelationClearRelation()).

I don't quite follow that one.  Are you saying that the frontend would
do a CREATE POLICY while there are backend workers running?  I
seriously doubt that can ever be made safe, but I'd choose to prohibit
it using something other than the heavyweight lock manager.

 I think there's some local state as well, so that's probably not so
 easy. I guess this needs input from Kevin.

Yeah.  I'd be OK to have v1 disable parallelism at serializable, and
fix that in v2, but of course if we can avoid that it's even better.
The heavyweight locking issue is really killing me, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-20 Thread Andres Freund
On 2014-11-20 20:21:07 -0500, Robert Haas wrote:
 On Thu, Nov 20, 2014 at 4:45 PM, Andres Freund and...@2ndquadrant.com wrote:
  And you haven't answered how you'd like to fix that.  The
  only answer you've given so far, that I can see, is that maybe we can
  foresee all the locks that the child might take and not initiate
  parallelism if any of them conflict with locks we already hold. But
  that's not a reasonable approach; we have no reasonable way of
  predicting what locks the child will need, and even if we did, things
  can change between plan time and execution time.
 
  We don't need to predict all future locks. We only need to check which
  self-exclusive locks we are already holding.
 
 I can't follow that logic.  Why do you think self-exclusive locks are
 the only ones that matter?

All the others can be acquired by jumping in front of the lock's
waitqueue?

  I don't buy the plan time/execution time argument. We'll need to be able
  to adapt plans to the availability of bgworker slots *anyway*. Otherwise
  we either need to to set the number of bgworkers to MaxConnections *
  MaxParallelism or live with errors as soon as too many processes use
  parallelism. The former is clearly unrealistic given PG's per-backend
  overhead and the latter will be a *much* bigger PITA than all the
  deadlock scenarios talked about here. It'll constantly fail, even if
  everything is ok.
 
 I certainly think that any parallel node needs to be able to cope with
 getting fewer workers than it would ideally like to have.  But that's
 different from saying that when our own backend takes new locks, we
 have to invalidate plans.  Those seem like largely unrelated problems.

No need to invalidate them if you have the infrastructure to run with no
parallel workers - just use the path that's used if there's no workers
available.

  Another one is that the frontend process does, from some function or so,
  CREATE POLICY. Triggering a relcache flush. And RelationClearRelation()
  essentially assumes that a referenced relcache entry can't change (which
  is the reason the keep_* stuff is in RelationClearRelation()).
 
 I don't quite follow that one.  Are you saying that the frontend would
 do a CREATE POLICY while there are backend workers running?

Yes.

  I
 seriously doubt that can ever be made safe, but I'd choose to prohibit
 it using something other than the heavyweight lock manager.

Right. It's perfectly fine to forbid it imo. But there's lots of places
that have assumptions about the locking behaviour baked in, and the
relevant bugs will be hard to find.

  I think there's some local state as well, so that's probably not so
  easy. I guess this needs input from Kevin.
 
 Yeah.  I'd be OK to have v1 disable parallelism at serializable, and
 fix that in v2, but of course if we can avoid that it's even better.

I don't have a problem with that either.

 The heavyweight locking issue is really killing me, though.

I don't really understand why you're not content with just detecting
deadlocks for now. Everything else seems like bells and whistles for
later.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-19 Thread Robert Haas
On Tue, Nov 18, 2014 at 4:40 AM, Jeff Davis pg...@j-davis.com wrote:
 To reiterate the basic problem here, if we do nothing at all about the
 lock manager, a parallel backend can stall trying to grab an
 AccessExclusiveLock that the user backend alread holds, either because
 the user backend holds an AccessExclusiveLock as well, or because some
 other process is waiting for one, we'll deadlock and not notice.

 My feeling is that we should keep the concept of a group and group
 leader from your patch, and improve the deadlock detector to make use of
 that information (looking at the code, it looks doable but not trivial).
 But unless I am missing something, we should separate out the lock
 sharing, and defer that until later.

Doing as you propose solves this problem:

1. Backend A-1 acquires AccessShareLock on relation R.
2. Backend B waits for AccessExclusiveLock on relation R.
3. Backend A-2 waits for AccessShareLock on relation R.

Knowing that A-1 and A-2 are related, the deadlock detector can move
A-2 ahead of B and grant the lock.  All is well.

But your proposal does not solve this problem:

1. Backend A-1 acquires AccessExclusiveLock on relation R.
2. Backend A-2 waits for AccessShareLock on relation R.

The good news is that the deadlock detector should realize that since
A-1 and A-2 are in the same group, this is a deadlock.  And it can
abort either A-1 or A-2, which will eventually abort them both.  The
bad news, to borrow a phrase from Peter Geoghegan, is that it's an
unprincipled deadlock; a user confronted with the news that her
parallel scan has self-deadlocked will be justifiably dismayed.  It's
been proposed by Andres, and maybe a few other people, that we can
detect this situation and just not use parallelism in these cases, but
that's actually quite hard to do.

Of course, it's pretty easy for a backend A-1 contemplating parallel
scan of relation R to notice that it holds a lock that conflicts with
the AccessShareLock it expects a prospective parallel backend A-2 to
attempt to acquire.  I don't think there's a lock manager method for
that today, but we could easily add one.  However, A-1 might
incidentally hold other locks (from earlier statements in the
transaction) that conflict with locks that will be sought by A-2, and
as discussed *extensively* upthread, it is not easy to predict all of
the locks a parellel backend might try to take: even seemingly-trivial
code like the equality operators for built-in data types can sometimes
take relation locks, and we have no infrastructure for predicting
which ones.

Even if you could somehow solve that problem, there's also the issue
of plan caching.  If we make a parallel plan for an SQL statement
inside a PL/pgsql procedure because our backend holds no strong locks,
and then we acquire a strong lock on a relevant relation, we have to
invalidate that plan.  I think that will add complexity inside the
plan cache machinery.  It seems to me that it would be better to
handle these issues inside the lock manager rather than letting them
creep into other parts of the system.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-19 Thread Amit Kapila
On Wed, Nov 19, 2014 at 9:33 PM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Nov 18, 2014 at 4:40 AM, Jeff Davis pg...@j-davis.com wrote:
  To reiterate the basic problem here, if we do nothing at all about the
  lock manager, a parallel backend can stall trying to grab an
  AccessExclusiveLock that the user backend alread holds, either because
  the user backend holds an AccessExclusiveLock as well, or because some
  other process is waiting for one, we'll deadlock and not notice.
 
  My feeling is that we should keep the concept of a group and group
  leader from your patch, and improve the deadlock detector to make use of
  that information (looking at the code, it looks doable but not trivial).
  But unless I am missing something, we should separate out the lock
  sharing, and defer that until later.

 Doing as you propose solves this problem:

 1. Backend A-1 acquires AccessShareLock on relation R.
 2. Backend B waits for AccessExclusiveLock on relation R.
 3. Backend A-2 waits for AccessShareLock on relation R.

 Knowing that A-1 and A-2 are related, the deadlock detector can move
 A-2 ahead of B and grant the lock.  All is well.

 But your proposal does not solve this problem:

 1. Backend A-1 acquires AccessExclusiveLock on relation R.
 2. Backend A-2 waits for AccessShareLock on relation R.


I think that could be doable under above proposal by making lock
conflicts (LockCheckConflicts) code aware of parallel group.  But
it might lead to a problem incase it allows to acquire a lock to parallel
worker where it should not like in cases of Relation extension as
mentioned by you on another thread.  However I think we should block
any such operations and allow read only operations in parallel workers.
It seems to me  that we have to do lot more things to build a capability
to allow parallel workers to perform write operations.


 Even if you could somehow solve that problem, there's also the issue
 of plan caching.  If we make a parallel plan for an SQL statement
 inside a PL/pgsql procedure because our backend holds no strong locks,
 and then we acquire a strong lock on a relevant relation, we have to
 invalidate that plan.  I think that will add complexity inside the
 plan cache machinery.  It seems to me that it would be better to
 handle these issues inside the lock manager rather than letting them
 creep into other parts of the system.


How would changes in lock manager guarantee that it won't be problem
in anycase?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-18 Thread Amit Kapila
On Tue, Nov 18, 2014 at 3:10 PM, Jeff Davis pg...@j-davis.com wrote:
 On Mon, 2014-11-17 at 14:32 -0500, Robert Haas wrote:

  To reiterate the basic problem here, if we do nothing at all about the
  lock manager, a parallel backend can stall trying to grab an
  AccessExclusiveLock that the user backend alread holds, either because
  the user backend holds an AccessExclusiveLock as well, or because some
  other process is waiting for one, we'll deadlock and not notice.

 My feeling is that we should keep the concept of a group and group
 leader from your patch, and improve the deadlock detector to make use of
 that information (looking at the code, it looks doable but not trivial).
 But unless I am missing something, we should separate out the lock
 sharing, and defer that until later.


+1 to initially have something like you describe and may be an additional
mechanism to grant the non-conflicting locks which are already held by
master backend to child backends.  I understand that allowing additional
non-conflicting locks could lead to some problem if write operations are
allowed via child backends as is described as point 1 in Robert's another
mail [1].  However I think as initially we want to allow read only
operations
via child backends, this should be okay and later on if we want to
support write via child backends, we might want to consider having an
additional property with lock which will allow lock manager or deadlock
detector
to consider them separately, so that those locks won't be granted to
child backends if there is conflict of same with parent.


[1]:
http://www.postgresql.org/message-id/ca+tgmoygplojo+lg1bebos8gdjwjtq0qdmxsyj4ihfyj11t...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-17 Thread Robert Haas
On Thu, Nov 13, 2014 at 4:57 PM, Jeff Davis pg...@j-davis.com wrote:
 On Thu, Nov 13, 2014 at 11:26 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Nov 13, 2014 at 3:38 AM, Jeff Davis pg...@j-davis.com wrote:
  If two backends both have an exclusive lock on the relation for a join
  operation, that implies that they need to do their own synchronization,
  because obviously the lock manager is not doing it for them.

 This doesn't make sense to me.  Why would they need to synchronize
 access to a relation in order to join it?

 Unfortunate typo: that was supposed to be joint operation, just meaning
 that they are working together for something (e.g. CLUSTER, VACUUM FULL as
 you suggest). Sorry for the confusion.

Ah, well, that's different.

 I meant that the backends need to divide up the work somehow. And if each
 operator needs to divide up the work before operating, that means we need to
 change every operator.

I think I see your point, which it just so happens Amit articulated to
me in different words while we were chatting about this problem this
morning.  We want to avoid waiving the mutual exclusion provided by
the lock manager only to end up re-implementing something very much
like the current lock manager to arbitrate resource contention among
backends within a parallel group.  However, we also don't want the
mutual exclusion that the lock manager provides to serve as a barrier
to implementing useful parallelism; that is, if the parallel backends
want to manage conflicts themselves instead of letting the lock
manager do it, that should be possible.

In 
http://www.postgresql.org/message-id/ca+tgmoygplojo+lg1bebos8gdjwjtq0qdmxsyj4ihfyj11t...@mail.gmail.com
I propose a new approach.  The basic idea is that any heavyweight lock
that we hold at the start of a parallel phase can't represent a bona
fide conflict between the activities of two different backends, so we
arrange to waive conflicts over those locks.  The problem (as Andres
points out) is that those locks could be subsequently re-taken by
operations that are not parallel-safe, and the new locks would be
granted locally regardless of the shared lock table because the
current backend would already hold them.  However, I think it's fine
to make it the job of the parallelism infrastructure to plug that
hole, rather than trying to enforce it in the lock manager.  There are
LOTS of things that need to be prohibited in parallel mode and only
allowed once they've been made sufficiently safe, and in the first
version, that's certainly going to include - among other things - any
operation that writes data.  I think it's OK to the job of the people
who want to relax those prohibitions to deal with making it safe to do
so.

To reiterate the basic problem here, if we do nothing at all about the
lock manager, a parallel backend can stall trying to grab an
AccessExclusiveLock that the user backend alread holds, either because
the user backend holds an AccessExclusiveLock as well, or because some
other process is waiting for one, we'll deadlock and not notice.  We
have to do *something* about that.  I'd like to arrive at some
consensus on how to move forward here as soon as possible, because the
clock is ticking here.
http://www.postgresql.org/message-id/ca+tgmoygplojo+lg1bebos8gdjwjtq0qdmxsyj4ihfyj11t...@mail.gmail.com
articulates what I currently believe to be the best plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-13 Thread Jeff Davis
Great discussion.

Robert, I think you make a lot of very valid points, but philosophically
I am in pretty strong agreement with Andres here.

On Fri, 2014-10-31 at 14:29 -0400, Robert Haas wrote:
  So just refusing parallelism as soon as anything has taken an access
  exclusive lock doesn't sound too bad to me.
 
 That restriction seems onerous to me; for example, it would prevent a
 parallel sort for CLUSTER or a parallel index rebuild for VACUUM FULL.
 Those seems like worthy targets for parallelism, and maybe even early
 targets, since I expect a lot of the real complexity here to be in the
 query planner, and you can avoid most of that complexity when planning
 DDL.

If two backends both have an exclusive lock on the relation for a join
operation, that implies that they need to do their own synchronization,
because obviously the lock manager is not doing it for them.

In turn, that implies that we need parallel versions of these
operations; going to each one and parallelizing it. I don't think that's
a good long-term strategy... it changes our simple executor (tuples in,
tuples out) to something quite different (each node needs to know a lot
about whatever is underneath it in order to divide the data up).

 process and both requests would have been granted.  Parallelism is
 generally not about making different things happen; it's about
 spreading the stuff that would happen anyway across multiple backends,

This point would be much stronger for declarative queries. When the user
issues LOCK TABLE, they are getting their hands dirty with the
implementation to some degree. If they really want to serialize, they
can serialize around the lock of an object that they don't want to also
read (in parallel).

You're right that this would be surprising to users, but hopefully we
can find a way to disable parallelism in that case, or at least issue a
WARNING that the user is probably doing the wrong thing.

 cluster pg_collate... the only solution is to
 move the parallel workers ahead of the AccessExclusiveLock request,
 but the deadlock detector won't do that unless it knows about the
 relationship among the parallel workers.

As discussed later in this thread, I also consider failure a
solution ;-) So we don't need to move anything.

If we have to tell our users that exclusive-locking a catalog can cause
deadlocks for parallel query, I think they will find that reasonable.
The same admin could turn parallelism down to one for non-superusers
before they start doing that, and restore it later.


 patch that does parallel anything has got to worry about what
 undetected deadlocks might result, and then argue about which of them
 are obscure enough that we shouldn't care and which are not.  That
 doesn't sound like a fun way to spend my time.

We're in agreement that deadlocks can't go undetected.

 But, really, the first case is the one I'm more worried about: a bunch
 of parallel backends all grab the same locks, but slightly separated
 in time.  A strong locker joins the queue midway through and we're
 hosed.

I agree the situation sounds bad in the abstract, but none of the
examples seem very compelling. Who locks random catalogs exclusively?
And if they do, why is it the lock manager's job to try to reconcile the
resulting mess without canceling someone? (OK, don't answer that. It
obviously is the lock manager's job to some degree, but we can only
demand so much.)

 Even if you think that all of these particular issues are somehow
 avoidable or that they don't matter, I think it would be unwise to bet
 on them being the only issues.  The lock manager looks like it's old,
 ill-maintained code, but that's in no small part because it solved the
 problem it aims at so thoroughly that few people have felt the need to
 modify it very much in the last decade.  I think that's part of why
 there's so much skepticism on this thread - we don't think about the
 problems that the lock manager solves as being important problems not
 because they truly aren't important, but because they are so
 thoroughly solved.  Then every once in a while we complain about the
 performance.

Guilty as charged.

I won't forever oppose changes to the lock manager, but I resist making
them a precondition of parallelism (aside from making deadlock detection
work, of course). Users taking out exclusive locks recklessly can result
in deadlocks today, and will result in more when we do parallelism
(though perhaps in more surprising situations).

I remember when doing Exclusion Constraints, I went to a lot of effort
to make deadlocks impossible, and was quite proud. When Tom saw it, he
told me not to bother, and to do it the simple way instead, because
deadlocks can happen even with UNIQUE constraints (which I didn't even
know).

We should use the same strategy here. When we see deadlocks becoming a
problem for any reasonable workload, we make a series of tweaks (perhaps
some to the lock manager itself) to reduce them.

Regards,
Jeff 

Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-13 Thread Robert Haas
On Thu, Nov 13, 2014 at 2:26 AM, Jeff Davis pg...@j-davis.com wrote:
 The current deadlock detector would see a dependency graph like:

A2 - B - A1

 But with lock groups, it would see:

(A1 A2) - B - (A1 A2)

 which is a cycle, and can be detected regardless of the synchronization
 method used between A1 and A2. There are some details to work out to
 avoid false positives, of course.

 Is that about right?

Yep.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-13 Thread Robert Haas
On Thu, Nov 13, 2014 at 3:38 AM, Jeff Davis pg...@j-davis.com wrote:
 If two backends both have an exclusive lock on the relation for a join
 operation, that implies that they need to do their own synchronization,
 because obviously the lock manager is not doing it for them.

This doesn't make sense to me.  Why would they need to synchronize
access to a relation in order to join it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Nov 13, 2014 at 3:38 AM, Jeff Davis pg...@j-davis.com wrote:
 If two backends both have an exclusive lock on the relation for a join
 operation, that implies that they need to do their own synchronization,
 because obviously the lock manager is not doing it for them.

 This doesn't make sense to me.  Why would they need to synchronize
 access to a relation in order to join it?

What's more to the point: why would you take an exclusive lock just to
do a join?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-13 Thread Andres Freund
On November 13, 2014 8:50:18 PM CET, Tom Lane t...@sss.pgh.pa.us wrote:
Robert Haas robertmh...@gmail.com writes:
 On Thu, Nov 13, 2014 at 3:38 AM, Jeff Davis pg...@j-davis.com
wrote:
 If two backends both have an exclusive lock on the relation for a
join
 operation, that implies that they need to do their own
synchronization,
 because obviously the lock manager is not doing it for them.

 This doesn't make sense to me.  Why would they need to synchronize
 access to a relation in order to join it?

What's more to the point: why would you take an exclusive lock just to
do a join?

Robert's case basically rest on the premise that it's useful  correct when 
normally conflicting locks don't conflict when normal processes and its workers 
acquire them. I have serious doubts about the safety of that and the complexity 
it requires.

Obviously a join won't need an exclusive lock itself - but with Robert's design 
a join inside a transaction that locked a relation could still be parallelized, 
even if that rel is involved.

Andred

-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-13 Thread Jeff Davis
On Thu, Nov 13, 2014 at 11:26 AM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Nov 13, 2014 at 3:38 AM, Jeff Davis pg...@j-davis.com wrote:
  If two backends both have an exclusive lock on the relation for a join
  operation, that implies that they need to do their own synchronization,
  because obviously the lock manager is not doing it for them.

 This doesn't make sense to me.  Why would they need to synchronize
 access to a relation in order to join it?


Unfortunate typo: that was supposed to be joint operation, just meaning
that they are working together for something (e.g. CLUSTER, VACUUM FULL as
you suggest). Sorry for the confusion.

I meant that the backends need to divide up the work somehow. And if each
operator needs to divide up the work before operating, that means we need
to change every operator.

Regards,
 Jeff Davis


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-12 Thread Robert Haas
On Wed, Nov 12, 2014 at 2:57 AM, Jeff Davis pg...@j-davis.com wrote:
 Trying to catch up on this thread, please excuse me if these questions
 are already covered.

Welcome to the party.  The more, the merrier.  :-)

 You mention the possibility of undetected deadlocks, which is surely
 unacceptable. But why not improve the deadlock detector? How bad are
 _detected_ deadlocks? A lot of the concern was around catalog accesses,
 but those lock requests would rarely wait anyway.

Detected deadlocks are fine.  Improving the deadlock detector is the
heart of what needs to be done here.  As you say, the lock requests
we're talking about will rarely wait, so deadlocks won't be frequent.
The issue is making sure that, if they do happen, we get a better
behavior than your parallel query hangs forever; good luck figuring
out why.

 I also wonder if group locking is generally the wrong approach to
 parallelism. Parallel scan/sort should work by assigning workers to
 chunks of data, and then merging the results. In principle the workers
 don't need to touch the same data at all, so why are we trying so hard
 to get them to all take the same locks?

 The reason, I assume, is that a full table is the lockable object, but
 we are imagining the segment files as the chunks. But maybe there's a
 way to address this more directly with an extra field in the lock tag,
 and perhaps some catalog knowledge?

In the common case, we're only taking AccessShareLock on some relation
to prevent it from being concurrently dropped or rewritten.  Locking
only part of the relation wouldn't lead to any improvement in
concurrency, because the full-relation lock isn't conflicting with
anything that the partial-relation lock wouldn't also need to conflict
with.

More generally, I think there's some misunderstanding about the
overall goal of the parallelism infrastructure that I'm trying to
create.  Many people have proposed interesting strategies for how we
could do various things (locking, planning, execution) better.  Some
of those ideas are, doubtless, good ones.  But my goal is in some ways
the opposite: I'm trying to make it possible to run as much existing
PostgreSQL backend code as possible inside a parallel worker without
any modification.  To do that, I'm trying to modify the subsystems
that are widely used throughout the backend - such as locking - in
such a way that the people using that infrastructure need not put
conditional logic into their code to make it do one thing when in
parallel mode and another thing when not in parallel mode.  And I'm
also trying to avoid fundamentally changing the way major subsystems
work today as a precondition of parallelism.

So I'm taking a very surgical approach to the patches I propose.  I'm
not proposing patches that do anything fundamentally new or different;
instead, I'm proposing patches that let the stuff we already do
continue to work.  In a single backend executing a query, we have lots
of code that assumes you can allocate memory, look data up in a
syscache, throw an error, lock a buffer, examine the value of a GUC,
test a tuple against the active snapshot, and so on.
For parallel mode to be useful, those same things need to work in a
parallel worker.  We of course do not need every crazy thing somebody
may want to do in a parallel worker, nor will it.  But anything that's
done in thousands of places throughout the code had better work, or
parallelism will be so restricted as to be useless.  So commit
2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, for example, is about
letting ereport work in a parallel worker, and this patch is about
making things like SearchSysCache1 and PG_GETARG_TEXT_PP work
*reliably* in a parallel worker.

Now, the time to do new and different things is coming, and is maybe
even now not so very far away.  I'm not an expert on parallel
execution, and I'm sure there are a number of people far more skilled
than I at figuring out how to do a parallel scan, join, sort, or
whatever quickly.  What I'm trying to do is get the infrastructure to
a point where those people don't have to worry about whether the basic
facilities that we rely on throughout the backend are gonna work.

By way of comparison, think about the periodic discussions about
making the backend multi-threaded.  Since the backend relies on global
variables in an enormous number of places, it isn't thread-safe.  If
you spawn a thread inside a PostgreSQL backend, it can't safely
ereport(), palloc(), LWLockAcquire(), or just about anything else,
which means that, although anybody can write code that calls
pthread_create(), it's not useful to do so because there's practically
no existing backend code that you could safely call from the new
thread.  Using background worker processes eliminates a lot of those
problems - palloc and lwlocks just work, for example - but not all of
them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list 

Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-12 Thread Jeff Davis
On Wed, 2014-11-12 at 14:16 -0500, Robert Haas wrote:
 Detected deadlocks are fine.  Improving the deadlock detector is the
 heart of what needs to be done here.

OK, great.

 As you say, the lock requests
 we're talking about will rarely wait, so deadlocks won't be frequent.
 The issue is making sure that, if they do happen, we get a better
 behavior than your parallel query hangs forever; good luck figuring
 out why.

Right. We can still use this patch's notion of a lock group in the
deadlock detector, but we don't need it to actually affect the way a
lock is granted. That should eliminate concerns about subtle bugs.

Later, after we understand how this is actually used, and if we see
deadlock problems, we can look for ways to solve/mitigate them.

This seems to be what Andres was saying, here:

http://www.postgresql.org/message-id/20141031130727.gf13...@awork2.anarazel.de

So I'll follow up in that thread, because it's an interesting
discussion.

 More generally, I think there's some misunderstanding about the
 overall goal of the parallelism infrastructure that I'm trying to
 create.  ...  But my goal is in some ways
 the opposite: I'm trying to make it possible to run as much existing
 PostgreSQL backend code as possible inside a parallel worker without
 any modification.

Thank you for clarifying, I think this is a good approach.

Back to the patch:

If I understand correctly, the _primary_ goal of this patch is to make
it safe to take out heavyweight locks in worker processes, even if the
deadlock involves LWLocks/latches synchronizing among the processes
within a lock group.

For example, say processes A1 and A2 are in the same lock group, and B
is in a different lock group. A1 is holding heavyweight lock H1 and
waiting on a LW lock L1; A2 is holding L1 and waiting on heavyweight
lock H2; and B is holding H2 and waiting on H1.

The current deadlock detector would see a dependency graph like:

   A2 - B - A1

But with lock groups, it would see:

   (A1 A2) - B - (A1 A2)

which is a cycle, and can be detected regardless of the synchronization
method used between A1 and A2. There are some details to work out to
avoid false positives, of course.

Is that about right?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-11 Thread Jeff Davis
On Wed, 2014-10-15 at 00:03 -0400, Robert Haas wrote:
 For parallelism, I think we need a concept of group locking.  That is,
 suppose we have a user backend and N worker backends collaborating to
 execute some query.  For the sake of argument, let's say it's a
 parallel CLUSTER, requiring a full table lock.  We need all of the
 backends to be able to lock the table even though at least one of them
 holds AccessExclusiveLock.  This suggests that the backends should all
 be members of a locking group, and that locks within the same locking
 group should be regarded as mutually non-conflicting.

Trying to catch up on this thread, please excuse me if these questions
are already covered.

You mention the possibility of undetected deadlocks, which is surely
unacceptable. But why not improve the deadlock detector? How bad are
_detected_ deadlocks? A lot of the concern was around catalog accesses,
but those lock requests would rarely wait anyway.

I also wonder if group locking is generally the wrong approach to
parallelism. Parallel scan/sort should work by assigning workers to
chunks of data, and then merging the results. In principle the workers
don't need to touch the same data at all, so why are we trying so hard
to get them to all take the same locks?

The reason, I assume, is that a full table is the lockable object, but
we are imagining the segment files as the chunks. But maybe there's a
way to address this more directly with an extra field in the lock tag,
and perhaps some catalog knowledge?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-10 Thread Amit Kapila
On Fri, Nov 7, 2014 at 3:55 AM, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Oct 31, 2014 at 9:07 AM, Andres Freund and...@2ndquadrant.com
wrote:
  Maybe we can, as a first step, make those edges in the lock graph
  visible to the deadlock detector? It's pretty clear that undetected
  deadlocks aren't ok, but detectable deadlocks in a couple corner cases
  might be acceptable.

 An interesting point came to mind this morning on this topic, while
 discussing this issue with Amit.  Suppose process A1 holds a lock on
 some object and process A2, a member of the same locking group, waits
 for a lock on that same object.  It follows that there must exist a
 process B which either *holds* or *awaits* a lock which conflicts with
 the one requested by A2; otherwise, A2 would have been granted the
 lock at once.  If B *holds* the conflicting lock, it may eventually
 release it; but if B *awaits* the conflicting lock, we have a soft
 deadlock, because A2 waits for B waits for A1, which we presume to
 wait for A1.[1]

 The logical consequence of this is that, if a process seeks to acquire
 a lock which is already held (in any mode) by a fellow group-member,
 it had better jump over the entire wait queue and acquire the lock
 immediately if no conflicting locks have already been granted.  If it
 fails to do this, it immediately creates a soft deadlock.  What if
 there *are* conflicting locks already granted?  In that case, things
 are more complicated.  We could, for example, have this situation: A1
 holds AccessShareLock, B holds ExclusiveLock, C1 awaits ShareLock, and
 then (following it in the wait queue) A2 awaits RowExclusiveLock.
 This is not a deadlock, because B can finish, then C can get the lock,
 then C1 can finish, then A2 can get the lock.  But if C2 now waits for
 some member of group A, then we have a soft deadlock between group A
 and group C, which can be resolved by moving A2's request ahead of C1.
 (This example is relatively realistic, too: a lock upgrade from
 AccessShareLock to RowExclusiveLock is probably the most common type
 of lock upgrade, for reasons that are hopefully obvious.)

 In practice, I suspect it's best to take a more aggressive approach
 and have A2 jump straight to the head of the line right out of the
 chute.  Letting some parallel workers make progress while holding up
 others out of a notion of locking fairness is probably stupid, because
 they're probably dependent on each other in some way that makes it
 useless for one to make progress while the others don't.  For the same
 reason, I'd argue that we ought to wait until all pending lock
 requests from a given group can be satisfied before granting any of
 them.

I think this logic can sometimes block processes from acquiring a lock
which no one is holding.  Assume Group-1 (G-1) is waiting on two locks
(Lock L1 on Table T1 and Lock L2 on Table T2) which are held by
unrelated processes P-2 and P-3 respectively; now P-3 releases the
lock on table T-2 and cannot grant it to G-1 who is waiting on this lock
because still L-1 lock cannot be granted to G-1.  At this point the
situation
is that no one holds lock on T-2 and G-1 waits for it, however any new
process won't be able to acquire lock on T-2 as there is already a waiter
for the same in wait queue.  OTOH if we would have granted lock for
T-2 to G-1, it might have finished it's work and released the lock
(considering it is a short time lock on catalog relation).
If it is possible to have such a situation, then even though it is
beneficial
for certain cases to view several pending requests from a same group
as single request, at the same time it can be harmful for some unrelated
processes as well.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-10 Thread Robert Haas
On Mon, Nov 10, 2014 at 6:33 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 I think this logic can sometimes block processes from acquiring a lock
 which no one is holding.  Assume Group-1 (G-1) is waiting on two locks
 (Lock L1 on Table T1 and Lock L2 on Table T2) which are held by
 unrelated processes P-2 and P-3 respectively; now P-3 releases the
 lock on table T-2 and cannot grant it to G-1 who is waiting on this lock
 because still L-1 lock cannot be granted to G-1.

That's not what I meant.  I meant we shouldn't grant a lock *on a
particular object* to any process on the group until all lock requests
*for that object* can be granted.  Waiting until every awaited lock in
the system can be granted in all requested modes would be really hard
to implement and probably have all kinds of problems (as you point out
here).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-06 Thread Robert Haas
On Wed, Nov 5, 2014 at 9:26 PM, Robert Haas robertmh...@gmail.com wrote:
 Yes, I think that's probably a net improvement in robustness quite
 apart from what we decide to do about any of the rest of this.  I've
 attached it here as revise-procglobal-tracking.patch and will commit
 that bit if nobody objects.  The remainder is reattached without
 change as group-locking-v0.1.patch.

 Per your other comment, I've developed the beginnings of a testing
 framework which I attached here as test_group_locking-v0.patch.

Urk.  I attached a version with some stupid bugs.  Here's a slightly better one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/contrib/test_group_locking/Makefile b/contrib/test_group_locking/Makefile
new file mode 100644
index 000..2d09341
--- /dev/null
+++ b/contrib/test_group_locking/Makefile
@@ -0,0 +1,21 @@
+# contrib/test_group_locking/Makefile
+
+MODULE_big = test_group_locking
+OBJS = test_group_locking.o $(WIN32RES)
+PGFILEDESC = test_group_locking - test harness for group locking
+
+EXTENSION = test_group_locking
+DATA = test_group_locking--1.0.sql
+
+REGRESS = test_group_locking
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/test_group_locking
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/test_group_locking/test_group_locking--1.0.sql b/contrib/test_group_locking/test_group_locking--1.0.sql
new file mode 100644
index 000..adb2be5
--- /dev/null
+++ b/contrib/test_group_locking/test_group_locking--1.0.sql
@@ -0,0 +1,8 @@
+/* contrib/test_group_locking/test_group_locking--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use CREATE EXTENSION test_group_locking to load this file. \quit
+
+CREATE FUNCTION test_group_locking(spec pg_catalog.text)
+RETURNS pg_catalog.void STRICT
+	AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/contrib/test_group_locking/test_group_locking.c b/contrib/test_group_locking/test_group_locking.c
new file mode 100644
index 000..904beff
--- /dev/null
+++ b/contrib/test_group_locking/test_group_locking.c
@@ -0,0 +1,1071 @@
+/*--
+ *
+ * test_group_locking.c
+ *		Test harness code for group locking.
+ *
+ * Copyright (C) 2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/test_shm_mq/test.c
+ *
+ * -
+ */
+
+#include postgres.h
+
+#include access/xact.h
+#include catalog/namespace.h
+#include commands/dbcommands.h
+#include fmgr.h
+#include lib/ilist.h
+#include lib/stringinfo.h
+#include libpq/libpq.h
+#include libpq/pqformat.h
+#include libpq/pqmq.h
+#include mb/pg_wchar.h
+#include miscadmin.h
+#include nodes/makefuncs.h
+#include parser/scansup.h
+#include storage/ipc.h
+#include storage/lmgr.h
+#include storage/procarray.h
+#include storage/shm_mq.h
+#include storage/shm_toc.h
+#include utils/builtins.h
+#include utils/hsearch.h
+#include utils/memutils.h
+#include utils/resowner.h
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(test_group_locking);
+
+void test_group_locking_worker_main(Datum main_arg);
+
+/* Names of lock modes, for debug printouts */
+static const char *const lock_mode_names[] =
+{
+	INVALID,
+	AccessShareLock,
+	RowShareLock,
+	RowExclusiveLock,
+	ShareUpdateExclusiveLock,
+	ShareLock,
+	ShareRowExclusiveLock,
+	ExclusiveLock,
+	AccessExclusiveLock
+};
+
+typedef enum
+{
+	TGL_START,
+	TGL_STOP,
+	TGL_LOCK,
+	TGL_UNLOCK
+} TestGroupLockOp;
+
+typedef struct
+{
+	TestGroupLockOp op;
+	LOCKMODE lockmode;
+	Oid	relid;
+} TestGroupLockCommand;
+
+typedef struct
+{
+	dlist_node node;
+	bool verified;
+	int	group_id;
+	int task_id;
+	TestGroupLockCommand command;
+} TestGroupLockStep;
+
+typedef struct
+{
+	int group_id;
+	int task_id;
+} worker_key;
+
+typedef struct
+{
+	worker_key	key;
+	dsm_segment *seg;
+	BackgroundWorkerHandle *handle;
+	shm_mq_handle *requesth;
+	shm_mq_handle *responseh;
+	bool awaiting_response;
+} worker_info;
+
+typedef struct
+{
+	int	group_id;
+	int	leader_task_id;
+	bool has_followers;
+} leader_info;
+
+/* Fixed-size data passed via our dynamic shared memory segment. */
+typedef struct worker_fixed_data
+{
+	Oid	database_id;
+	Oid	authenticated_user_id;
+	NameData	database;
+	NameData	authenticated_user;
+	bool		use_group_locking;
+	pid_t		leader_pid;
+} worker_fixed_data;
+
+#define SHM_QUEUE_SIZE	32768
+#define TEST_GROUP_LOCKING_MAGIC		0x4c616e65
+
+static void check_for_messages(HTAB *worker_hash);
+static void determine_leader_info(dlist_head *plan, HTAB *leader_hash);
+static void handle_sigterm(SIGNAL_ARGS);
+static void process_message(HTAB *worker_hash, worker_info *info,
+char *message, Size message_bytes);
+static void rationalize_steps(dlist_head 

Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-06 Thread Jim Nasby

On 11/5/14, 8:26 PM, Robert Haas wrote:

rhaas=# create table foo (a int);
CREATE TABLE
rhaas=# select 
test_group_locking('1.0:start,2.0:start,1.0:lock:AccessExclusiveLock:foo,2.0:lock:AccessExclusiveLock:foo');
NOTICE:  starting worker 1.0
NOTICE:  starting worker 2.0
NOTICE:  instructing worker 1.0 to acquire AccessExclusiveLock on
relation with OID 16387
NOTICE:  instructing worker 2.0 to acquire AccessExclusiveLock on
relation with OID 16387


Damn that's cool to see! :)

FWIW, some creative use of a composite type, plpgsql and string_to_array() 
would have saved a bunch of parsing code... it's surprisingly easy to parse 
stuff like this using string_to_array().
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-06 Thread Robert Haas
On Fri, Oct 31, 2014 at 9:07 AM, Andres Freund and...@2ndquadrant.com wrote:
 Maybe we can, as a first step, make those edges in the lock graph
 visible to the deadlock detector? It's pretty clear that undetected
 deadlocks aren't ok, but detectable deadlocks in a couple corner cases
 might be acceptable.

An interesting point came to mind this morning on this topic, while
discussing this issue with Amit.  Suppose process A1 holds a lock on
some object and process A2, a member of the same locking group, waits
for a lock on that same object.  It follows that there must exist a
process B which either *holds* or *awaits* a lock which conflicts with
the one requested by A2; otherwise, A2 would have been granted the
lock at once.  If B *holds* the conflicting lock, it may eventually
release it; but if B *awaits* the conflicting lock, we have a soft
deadlock, because A2 waits for B waits for A1, which we presume to
wait for A1.[1]

The logical consequence of this is that, if a process seeks to acquire
a lock which is already held (in any mode) by a fellow group-member,
it had better jump over the entire wait queue and acquire the lock
immediately if no conflicting locks have already been granted.  If it
fails to do this, it immediately creates a soft deadlock.  What if
there *are* conflicting locks already granted?  In that case, things
are more complicated.  We could, for example, have this situation: A1
holds AccessShareLock, B holds ExclusiveLock, C1 awaits ShareLock, and
then (following it in the wait queue) A2 awaits RowExclusiveLock.
This is not a deadlock, because B can finish, then C can get the lock,
then C1 can finish, then A2 can get the lock.  But if C2 now waits for
some member of group A, then we have a soft deadlock between group A
and group C, which can be resolved by moving A2's request ahead of C1.
(This example is relatively realistic, too: a lock upgrade from
AccessShareLock to RowExclusiveLock is probably the most common type
of lock upgrade, for reasons that are hopefully obvious.)

In practice, I suspect it's best to take a more aggressive approach
and have A2 jump straight to the head of the line right out of the
chute.  Letting some parallel workers make progress while holding up
others out of a notion of locking fairness is probably stupid, because
they're probably dependent on each other in some way that makes it
useless for one to make progress while the others don't.  For the same
reason, I'd argue that we ought to wait until all pending lock
requests from a given group can be satisfied before granting any of
them.  In fact, I suspect it's best in general for the locking
machinery and the deadlock detector to view several simultaneous,
pending lock requests as if they were a single request conflicting
with everything that would conflict with any of the requested modes.
Consider the situation where there are 10 lock waiters, 5 in each of
two parallel groups.  The deadlock detector could spend a good deal of
time and energy trying various reorderings of the lock queue, but in
practice it seems highly unlikely that it's sensible to do anything
other than put all the locks from one group first and all of the locks
from the other group afterwards, regardless of whether the processes
want the same mode or different modes.  Most of the other orderings
will either be equivalent to one of those orderings (because commuting
two non-conflicting locks in the wait queue has no effect) or soft
deadlocks (because the interleave a conflicting lock request between
two requests form the same group).  And even if you can find some
corner case where neither of those things is the case, it's probably
still not useful to let half the lock group run and leave the other
half blocked.  What will probably happen is that some internal queue
will pretty quickly either fill up or drain completely and the process
filling it, or draining it, will wait.  Now we're no better off than
if we'd waited to grant the lock in the first place, and we're now
holding more locks that can block other things or cause deadlocks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1]  Here I'm assuming, as in previous discussion, that we regard all
members of a parallel group as mutually waiting for each other, on the
theory that the parallel computation won't complete until all workers
complete and, therefore, the members of the group are either already
waiting for each other or eventually will.  While this might not be
true in every case, I believe it's a good (and generally valid)
simplifying assumption.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-05 Thread Robert Haas
On Sun, Nov 2, 2014 at 7:31 AM, Simon Riggs si...@2ndquadrant.com wrote:
 The procgloballist stuff should be the subject of a separate patch
 which I agree with.

Yes, I think that's probably a net improvement in robustness quite
apart from what we decide to do about any of the rest of this.  I've
attached it here as revise-procglobal-tracking.patch and will commit
that bit if nobody objects.  The remainder is reattached without
change as group-locking-v0.1.patch.

Per your other comment, I've developed the beginnings of a testing
framework which I attached here as test_group_locking-v0.patch.  That
doesn't look to have much hope of evolving into something we'd want
even in contrib, but I think it'll be rather useful for debugging.  It
works like this:

rhaas=# create table foo (a int);
CREATE TABLE
rhaas=# select 
test_group_locking('1.0:start,2.0:start,1.0:lock:AccessExclusiveLock:foo,2.0:lock:AccessExclusiveLock:foo');
NOTICE:  starting worker 1.0
NOTICE:  starting worker 2.0
NOTICE:  instructing worker 1.0 to acquire AccessExclusiveLock on
relation with OID 16387
NOTICE:  instructing worker 2.0 to acquire AccessExclusiveLock on
relation with OID 16387
ERROR:  could not obtain AccessExclusiveLock on relation with OID 16387
CONTEXT:  background worker, group 2, task 0

The syntax is a little arcane, I guess, but it's documented in the
comments within.  In this case I asked it to start up two background
workers and have them both try to take AccessExclusiveLock on table
foo.  As expected, the second one fails.  The idea is that workers are
identified by a pair of numbers X.Y; two workers with the same X-value
are in the same locking group.  So if I call the second worker 1.1
rather than 2.0, it'll join the same locking group as worker 1.0 and
... then it does the wrong thing, and then it crashes the server,
because my completely-untested code is unsurprisingly riddled with
bugs.

Eventually, this needs to be generalized a bit so that we can use it
to test deadlock detection.  That's tricky, because what you really
want to do is tell worker A to wait for some lock and then, once
you're sure it's on the wait queue, tell worker B to go take some
other lock and check that you see the resulting deadlock.  There
doesn't seem to be a good API for the user backend to find out whether
some background worker is waiting for some particular lock, so I may
have to resort to the hacky expedient of having the driver process
wait for a few seconds and assume that's long enough that the
background worker will be on the wait queue by then.  Or maybe I can
drum up some solution, but anyway it's not done yet.

The value of this test code is that we can easily reproduce locking
scenarios which would be hard to reproduce in a real workload - e.g.
because they're timing-dependent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 723051e..85997f6 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -210,6 +210,10 @@ static bool FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode);
 static bool FastPathTransferRelationLocks(LockMethod lockMethodTable,
 			  const LOCKTAG *locktag, uint32 hashcode);
 static PROCLOCK *FastPathGetRelationLockEntry(LOCALLOCK *locallock);
+static bool GroupLockShouldJumpQueue(LockMethod lockMethodTable,
+		 LOCKMODE lockmode,
+		 LOCK *lock,
+		 PROCLOCK *proclock);
 
 /*
  * To make the fast-path lock mechanism work, we must have some way of
@@ -339,7 +343,8 @@ PROCLOCK_PRINT(const char *where, const PROCLOCK *proclockP)
 static uint32 proclock_hash(const void *key, Size keysize);
 static void RemoveLocalLock(LOCALLOCK *locallock);
 static PROCLOCK *SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc,
- const LOCKTAG *locktag, uint32 hashcode, LOCKMODE lockmode);
+ PGPROC *leader, const LOCKTAG *locktag, uint32 hashcode,
+ LOCKMODE lockmode);
 static void GrantLockLocal(LOCALLOCK *locallock, ResourceOwner owner);
 static void BeginStrongLockAcquire(LOCALLOCK *locallock, uint32 fasthashcode);
 static void FinishStrongLockAcquire(void);
@@ -894,8 +899,8 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	 * away anytime.  So we have to use SetupLockInTable() to recompute the
 	 * lock and proclock pointers, even if they're already set.
 	 */
-	proclock = SetupLockInTable(lockMethodTable, MyProc, locktag,
-hashcode, lockmode);
+	proclock = SetupLockInTable(lockMethodTable, MyProc, LockGroupLeader,
+locktag, hashcode, lockmode);
 	if (!proclock)
 	{
 		AbortStrongLockAcquire();
@@ -914,18 +919,27 @@ LockAcquireExtended(const LOCKTAG *locktag,
 
 	/*
 	 * If lock requested conflicts with locks requested by waiters, must join
-	 * wait queue.  Otherwise, check for conflict with already-held locks.
-	 * (That's last because most complex check.)
+	 * wait queue (except for certain cases involving 

Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-03 Thread Greg Stark
On Sat, Nov 1, 2014 at 9:09 PM, Robert Haas robertmh...@gmail.com wrote:
 1. Any non-trivial piece of PostgreSQL code is likely to contain
 syscache lookups.
 2. Syscache lookups had better work in parallel workers, or they'll be
 all but useless.


I've been using parallel sorts and index builds in my mental model of
how this will be used. I note that sorts go out of their way to look
up all the syscache entries in advance precisely so that tuplesort
doesn't start doing catalog lookups in the middle of the sort. In
general I think what people are imagining is that the parallel workers
will be running low-level code like tuplesort that has all the
databasey stuff like catalog lookups done in advance and just operates
on C data structures like function pointers. And I think that's a
valuable coding discipline to enforce, it avoids having low level
infrastructure calling up to higher level abstractions which quickly
becomes hard to reason about.

However in practice I think you're actually right -- but not for the
reasons you've been saying. I think the parallel workers *should* be
written as low level infrastructure and not be directly doing syscache
lookups or tuple locking etc. However there are a million ways in
which Postgres is extensible which causes loops in the call graph that
aren't apparent in the direct code structure. For instance, what
happens if the index you're building is an expression index or partial
index? Worse, what happens if those expressions have a plpython
function that does queries using SPI

But those are the kinds of user code exploiting extensibility are the
situations where we need a deadlock detector and where you might need
this infrastructure. We wouldn't and shouldn't need a deadlock
detector for our own core server code. In an ideal world some sort of
compromise that enforces careful locking rules where all locks are
acquired in advance and parallel workers are prohibited from obtaining
locks in the core code while still allowing users to a free-for-all
and detecting deadlocks at runtime for them would be ideal. But I'm
not sure there's any real middle ground here.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-03 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 On Sat, Nov 1, 2014 at 9:09 PM, Robert Haas robertmh...@gmail.com wrote:
 2. Syscache lookups had better work in parallel workers, or they'll be
 all but useless.

 I've been using parallel sorts and index builds in my mental model of
 how this will be used. I note that sorts go out of their way to look
 up all the syscache entries in advance precisely so that tuplesort
 doesn't start doing catalog lookups in the middle of the sort.

Well, they try to avoid doing the lookups more than once, but that does
not by any means imply that no lookups happen after the sort starts.
In particular, if you're sorting a container type (array, record)
there will be lookups during the first call of the sort type's comparison
function.  Enums could cause catalog lookups much later than the first
call, too.

I'm with Robert: if you cannot do catalog lookups in a worker process,
the feature will be so crippled as to be essentially useless.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-03 Thread Robert Haas
On Mon, Nov 3, 2014 at 10:18 AM, Greg Stark st...@mit.edu wrote:
 On Sat, Nov 1, 2014 at 9:09 PM, Robert Haas robertmh...@gmail.com wrote:
 1. Any non-trivial piece of PostgreSQL code is likely to contain
 syscache lookups.
 2. Syscache lookups had better work in parallel workers, or they'll be
 all but useless.

 I've been using parallel sorts and index builds in my mental model of
 how this will be used. I note that sorts go out of their way to look
 up all the syscache entries in advance precisely so that tuplesort
 doesn't start doing catalog lookups in the middle of the sort.

This isn't really true.  Routines like varstr_cmp() do syscache
lookups and de-TOASTing.  We've recently tried to reduce that with the
sortsupport stuff, but it hasn't eliminated it completely, and there
are other cases.  This is really the key point that I think everybody
needs to understand: a lot of code that you think doesn't do anything
complicated actually reads from the database - generally either in the
form of syscache lookups, or in the form of de-TOASTing.  Most of that
code doesn't perform those operations *frequently*, but that's because
there are backend-private caches that minimize the number of times we
actually hit the database.  But restructuring all of that code to do
no database access whatsoever would be a very difficult proposition
and would involve painful sacrifices, which is, I imagine, why it is
the way it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-03 Thread Robert Haas
On Sun, Nov 2, 2014 at 7:31 AM, Simon Riggs si...@2ndquadrant.com wrote:
 The main question is still why any of this is necessary? If we are
 only acquiring fast path locks in workers, what need is there for any
 of this? The leader and workers never conflict with each other, so why
 would they ever wait for each other (except at a point where they are
 actively transferring data), which is the necessary pre-condition for
 a deadlock?

 Yes, running a command with a conflicting lock would disrupt the start
 of a parallel operation, but as I said above, the deadlock detector
 will eventually see that and re-arrange us so there is no deadlock.

 The only need I can see is if one of the workers/leader requests a
 Strong lock on an object, someone else requests a conflicting lock on
 that object and then we perform a parallel operation ourselves. We
 could easily solve that by saying that you can't go parallel if you
 hold a strong lock AND that workers can't take anything higher than
 RowShareLock, like HS backends. That's good enough for now.

That would be enough to keep the parallel group from deadlocking with
itself, but it wouldn't be enough to prevent deadlock with other
processes, which is the scenario that actually worries me a lot more.
In particular, I'm worried about deadlocks where some other process
gets sandwiched between two processes from the same parallel group.
That is, let A1 be a parallel group leader and A2 be a process from
the same parallel group, and let B be an unrelated process.  I'm
worried about the situation where A2 waits for B which waits for A1
which (outside the view of the deadlock detector) waits for A2.  All
that can happen even if the parallel backends hold no strong locks,
because B can hold strong locks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-03 Thread Simon Riggs
On 3 November 2014 17:00, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Nov 2, 2014 at 7:31 AM, Simon Riggs si...@2ndquadrant.com wrote:
 The main question is still why any of this is necessary? If we are
 only acquiring fast path locks in workers, what need is there for any
 of this? The leader and workers never conflict with each other, so why
 would they ever wait for each other (except at a point where they are
 actively transferring data), which is the necessary pre-condition for
 a deadlock?

 Yes, running a command with a conflicting lock would disrupt the start
 of a parallel operation, but as I said above, the deadlock detector
 will eventually see that and re-arrange us so there is no deadlock.

 The only need I can see is if one of the workers/leader requests a
 Strong lock on an object, someone else requests a conflicting lock on
 that object and then we perform a parallel operation ourselves. We
 could easily solve that by saying that you can't go parallel if you
 hold a strong lock AND that workers can't take anything higher than
 RowShareLock, like HS backends. That's good enough for now.

 That would be enough to keep the parallel group from deadlocking with
 itself, but it wouldn't be enough to prevent deadlock with other
 processes, which is the scenario that actually worries me a lot more.
 In particular, I'm worried about deadlocks where some other process
 gets sandwiched between two processes from the same parallel group.
 That is, let A1 be a parallel group leader and A2 be a process from
 the same parallel group, and let B be an unrelated process.  I'm
 worried about the situation where A2 waits for B which waits for A1
 which (outside the view of the deadlock detector) waits for A2.  All
 that can happen even if the parallel backends hold no strong locks,
 because B can hold strong locks.

OK, I think I'm happy with this as a general point.

How will we automatically test this code?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-03 Thread Robert Haas
On Mon, Nov 3, 2014 at 1:02 PM, Simon Riggs si...@2ndquadrant.com wrote:
 OK, I think I'm happy with this as a general point.

Cool!

 How will we automatically test this code?

Good question.  I can see two possible approaches:

1. We write a test_group_locking harness along the lines of
test_shm_mq and test_decoding and put that in contrib.

2. We wait until higher-level facilities built on top of this are
available and get regression test coverage of this code via those
higher-level modules.

Personally, I can't imagine debugging this code without writing some
kind of test harness that only does locking; I don't want my locking
bugs to be mixed with my planner and executor bugs.  But I could go
either way on actually putting that code in contrib.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-02 Thread Simon Riggs
On 15 October 2014 05:03, Robert Haas robertmh...@gmail.com wrote:

 At least to me, that simple scenario is clear-cut[1], but what do we
 do in more complicated situations?  For example, suppose backends A
 and B are members of the same locking group.  A locks a relation with
 AccessShareLock, an unrelated process X queues up waiting for an
 AccessExclusiveLock, and then B also requests AccessShareLock.  The
 normal behavior here is that B should wait for X to go first, but here
 that's a problem.  If A is the user backend and B is a worker backend,
 A will eventually wait for B, which is waiting for X, which is waiting
 for A: deadlock.

Yes, deadlock.

My understanding would be that the lead process would wait on a latch,
not a heavyweight lock. So it would never perform a deadlock
detection. Which leaves only X and B to perform the deadlock check.

Are you aware that the deadlock detector will reorder the lock queue,
if that presents a possible solution to the deadlock?

Would the above example not be resolved simply with the existing code?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-02 Thread Simon Riggs
On 2 November 2014 10:39, Simon Riggs si...@2ndquadrant.com wrote:
 On 15 October 2014 05:03, Robert Haas robertmh...@gmail.com wrote:

 At least to me, that simple scenario is clear-cut[1], but what do we
 do in more complicated situations?  For example, suppose backends A
 and B are members of the same locking group.  A locks a relation with
 AccessShareLock, an unrelated process X queues up waiting for an
 AccessExclusiveLock, and then B also requests AccessShareLock.  The
 normal behavior here is that B should wait for X to go first, but here
 that's a problem.  If A is the user backend and B is a worker backend,
 A will eventually wait for B, which is waiting for X, which is waiting
 for A: deadlock.

 Yes, deadlock.

 My understanding would be that the lead process would wait on a latch,
 not a heavyweight lock. So it would never perform a deadlock
 detection. Which leaves only X and B to perform the deadlock check.

 Are you aware that the deadlock detector will reorder the lock queue,
 if that presents a possible solution to the deadlock?

 Would the above example not be resolved simply with the existing code?

Self Answer: Only if the lock request would not be self-conflicting.

Reading the patch (yay!) you say

*   ...However, if a conflicting lock is present that is
* not held by another member of our lock group, then we can't do this.
* In that case we'll have to wait despite the deadlock risk and hope for
* the best.

I wasn't sure what that meant.

Actually, I like the rest of the patch, assuming I understand it. Not
sure that changes anything I've said about wanting to see something
actually working, even as a prototype. For example, how will you test
the code in this patch actually works? The best way is to make
parallel query work in a restricted form and then drop that in
isolation tester, otherwise you'll have to waste time on a test
harness.

The procgloballist stuff should be the subject of a separate patch
which I agree with.

The main question is still why any of this is necessary? If we are
only acquiring fast path locks in workers, what need is there for any
of this? The leader and workers never conflict with each other, so why
would they ever wait for each other (except at a point where they are
actively transferring data), which is the necessary pre-condition for
a deadlock?

Yes, running a command with a conflicting lock would disrupt the start
of a parallel operation, but as I said above, the deadlock detector
will eventually see that and re-arrange us so there is no deadlock.

The only need I can see is if one of the workers/leader requests a
Strong lock on an object, someone else requests a conflicting lock on
that object and then we perform a parallel operation ourselves. We
could easily solve that by saying that you can't go parallel if you
hold a strong lock AND that workers can't take anything higher than
RowShareLock, like HS backends. That's good enough for now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-01 Thread Robert Haas
On Sat, Nov 1, 2014 at 1:55 PM, Andres Freund and...@2ndquadrant.com wrote:
 Where will those preexisting cache entries come from, exactly?  The
 postmaster is forking the parallel worker, not the user backend.

 Several things:
 1) The relcache init files load a fair bit
 2) There's cache entries made just during startup while we look up user
information and such
 3) There's lots of places you can hook into where it's perfectly legal
and expected that you access system caches. I don't think you can
reasonably define those away.
 4) I'm pretty sure that one of the early next steps will be to reuse a
bgworker for multiple tasks. So there'll definitely be preexisting
cache entries.

 I'm pretty sure that it's impossible to define the problem away. We
 *might* be able to say that we'll just do a InvalidateSystemCaches() at
 start of the parallelized task.

I see.  I haven't thought deeply about those things, and there could
indeed be problems there.

  What I have serious doubts about is 'coowning' locks. Especially if two
  backends normally wouldn't be able to both get that lock.

 Perhaps we should discuss that more.  To me it seems pretty obvious
 that's both safe and important.

 I completely fail to see why you'd generally think it's safe for two
 backends to hold the same self conflicting lock. Yes, within carefully
 restricted sections of code that might be doable. But I don't think you
 can fully enforce that. People *will* mark their own functions at
 parallel safe. And do stupid stuff. That shouldn't cause
 segfaults/exploits/low level data corruption/.. as long as it's done
 from !C functions.

I agree that people will mark their own functions as parallel-safe,
and I also agree that shouldn't cause segfaults, exploits, or
low-level data corruption.  Clearly, we'll need to impose a bunch of
restrictions to achieve that.  But if you put a LOCK TABLE statement
in an SQL function, it won't conflict with other locking request made
by the same or another function elsewhere in the same transaction;
surely you don't want that behavior to be *different* in parallel
mode.  That would be a blatant, user-visible behavior difference
justified by ... nothing at all.  There's no hazard there.  Where you
start getting into crash/exploit/data corruption territory is when you
are talking about DDL operations that change the physical structure of
the table.  That's why we have stuff like CheckTableNotInUse() to
verify that, for example, there are no old cursors around that are
still expecting the old relfilenode and tuple descriptor to be valid.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-01 Thread Robert Haas
On Sat, Nov 1, 2014 at 1:55 PM, Andres Freund and...@2ndquadrant.com wrote:
 I doubt it. There's a whole bunch of problems that just exist by virtue
 of having a group leader. What if the leader dies but the worker backend
 isn't in a section of code that does a CHECK_FOR_INTERRUPTS()?

In between all of the heat about whether I'm writing the correct
patch, or whether I'm overall making progress fast enough, somebody
could, I don't know, read the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-01 Thread Andres Freund
On 2014-11-01 17:00:59 -0400, Robert Haas wrote:
 On Sat, Nov 1, 2014 at 1:55 PM, Andres Freund and...@2ndquadrant.com wrote:
  I doubt it. There's a whole bunch of problems that just exist by virtue
  of having a group leader. What if the leader dies but the worker backend
  isn't in a section of code that does a CHECK_FOR_INTERRUPTS()?
 
 In between all of the heat about whether I'm writing the correct
 patch, or whether I'm overall making progress fast enough, somebody
 could, I don't know, read the patch.

I plan to, but that takes a fair amount of time. And there's at least
one patch of yours ahead in the queue... And I do think that most of the
concerns are more general than a specific implementation.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-01 Thread Robert Haas
On Sat, Nov 1, 2014 at 5:06 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-11-01 17:00:59 -0400, Robert Haas wrote:
 On Sat, Nov 1, 2014 at 1:55 PM, Andres Freund and...@2ndquadrant.com wrote:
  I doubt it. There's a whole bunch of problems that just exist by virtue
  of having a group leader. What if the leader dies but the worker backend
  isn't in a section of code that does a CHECK_FOR_INTERRUPTS()?

 In between all of the heat about whether I'm writing the correct
 patch, or whether I'm overall making progress fast enough, somebody
 could, I don't know, read the patch.

 I plan to, but that takes a fair amount of time. And there's at least
 one patch of yours ahead in the queue... And I do think that most of the
 concerns are more general than a specific implementation.

Fair enough.  Well, the way I solved that specific problem is to allow
the group leader to terminate but not to return its PGPROC to the free
list until the last group member exits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-01 Thread Robert Haas
On Sat, Nov 1, 2014 at 1:09 AM, Simon Riggs si...@2ndquadrant.com wrote:
 What are the specific restrictions you are suggesting we place on
 parallel workers? We need that definition clear so we can decide how
 to mark the functions. If we don't know, which is easily
 understandable, then the best way to find that out is to build a
 restricted solution and to expand slowly outwards to find problems.

 An obvious milestone there is whether a function contains SQL, which
 is the point chosen by some other databases. I personally hope to
 expand upon that, but it would be useful to reach it first and then
 continue from there.

 I was already thinking of implementing CONTAINS NO SQL as a SQL
 Standard function marking since it can be used to optimize COPY
 further.

I've written some fairly extensive thoughts on this topic at
http://wiki.postgresql.org/wiki/Parallel_Sort

Personally, I tend to focus more on what needs to be allowed in
parallel workers than what needs to be prohibited.  The motivation for
this patch is simple:

1. Any non-trivial piece of PostgreSQL code is likely to contain
syscache lookups.
2. Syscache lookups had better work in parallel workers, or they'll be
all but useless.
3. Some of those syscache lookups will be satisfied from cache, but
sometimes we'll need to actually read the system catalogs.
4. That requires locks.
5. Gosh, we'd better not deadlock.

There's a lot of infrastructure that I think will work fine in
parallel mode without any changes.  Buffer locking, buffer pins,
buffer lookups, and lwlocks, for example, I expect to all just work
without any real.  But heavyweight locks and invalidation messages
seem like they'll need a bit of surgery to work correctly in this
environment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-01 Thread Andres Freund
On 2014-11-01 16:59:34 -0400, Robert Haas wrote:
  I completely fail to see why you'd generally think it's safe for two
  backends to hold the same self conflicting lock. Yes, within carefully
  restricted sections of code that might be doable. But I don't think you
  can fully enforce that. People *will* mark their own functions at
  parallel safe. And do stupid stuff. That shouldn't cause
  segfaults/exploits/low level data corruption/.. as long as it's done
  from !C functions.
 
 I agree that people will mark their own functions as parallel-safe,
 and I also agree that shouldn't cause segfaults, exploits, or
 low-level data corruption.  Clearly, we'll need to impose a bunch of
 restrictions to achieve that.

 But if you put a LOCK TABLE statement
 in an SQL function, it won't conflict with other locking request made
 by the same or another function elsewhere in the same transaction;
 surely you don't want that behavior to be *different* in parallel
 mode.  That would be a blatant, user-visible behavior difference
 justified by ... nothing at all.

That's just a mislabeled function. It's obviously not parallel safe at
all. I see absolutely no problem with erroring out.

 There's no hazard there.  Where you
 start getting into crash/exploit/data corruption territory is when you
 are talking about DDL operations that change the physical structure of
 the table.  That's why we have stuff like CheckTableNotInUse() to
 verify that, for example, there are no old cursors around that are
 still expecting the old relfilenode and tuple descriptor to be valid.

It's not just fully structural changes although they are a concern.
It's also that we've amassed a number of hacks to deal with local state
that just won't be nicely transported. What's with stuff like
RelationSetIndexList() (which is infrequently enough used to not be a
big problem in practice...)? If we only allow parallel access while
independent backends could acquire the relevant we can rely on us
already having taken care about the concurrency hazards. Otherwise not.

And before you say that'd prohibit a number of useful things - yes, it
might. But e.g. CREATE INDEX can still be parallelized. The primary
backend takes a Exclusive/ShareUpdateExclusive (depending on
CONCURRENTLY), workers just a AccessShare.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-01 Thread Robert Haas
On Sat, Nov 1, 2014 at 5:38 PM, Andres Freund and...@2ndquadrant.com wrote:
 That's just a mislabeled function. It's obviously not parallel safe at
 all. I see absolutely no problem with erroring out.

I disagree.  It's entirely parallel-safe, as long as you don't
arbitrarily decide to have the lock manager break it.

 There's no hazard there.  Where you
 start getting into crash/exploit/data corruption territory is when you
 are talking about DDL operations that change the physical structure of
 the table.  That's why we have stuff like CheckTableNotInUse() to
 verify that, for example, there are no old cursors around that are
 still expecting the old relfilenode and tuple descriptor to be valid.

 It's not just fully structural changes although they are a concern.
 It's also that we've amassed a number of hacks to deal with local state
 that just won't be nicely transported. What's with stuff like
 RelationSetIndexList() (which is infrequently enough used to not be a
 big problem in practice...)? If we only allow parallel access while
 independent backends could acquire the relevant we can rely on us
 already having taken care about the concurrency hazards. Otherwise not.

RelationSetIndexList() is only used inside REINDEX, which I think
illustrates my point that it's mostly DDL we need to be worried about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Simon Riggs
On 31 October 2014 04:42, Amit Kapila amit.kapil...@gmail.com wrote:

 In fact it would be more sensible to lock the toast table earlier.


 It might make some sense to lock the toast table earlier for this
 particular case, but I don't think in general it will be feasible to lock
 all the tables (including catalog tables) which might get used in the
 overall operation before starting parallel operation.  I believe it will
 make doing parallel stuff difficult if we try to go via this route, as
 we need to always do this for any operation we want to make parallel.
 It will always have the risk that we might miss something and another
 thing is it might not be feasible in all kind of cases.

 If I understand correctly, you are suggesting that to get the first version
 of parallel operation out earlier, we should try to avoid all the stuff
 (like group locking, ...),

I advocate trying to avoid it, to see. The problems cited so far are
not so extreme they cannot be easily got around, if you have a will to
do so.

I've just posted about an idea to reduce catalog locking.

 have some restrictions on which kind of cases
 Create Index can work, do some hackery in Create Index path to avoid
 any kind of locking problems and do the parallel operation for External
 Sort (which might avoid need for shared memory allocator) and then
 call it a day and then do the things which we need for other parallel
 operations as and when they are required.

 I think this might be a good
 approach in itself if somebody wants to target something to release
 early, however in medium to short term when we want to provide
 non-trivial parallel operations we have to eventually solve those problems
 and delaying will only make the things worse.

My (by now) long experience on Postgres has shown me that developing
something now is a good route to take. By moving towards a solution at
a reasonable pace you learn things which may affect the longer term
viability of the project. New functionality in each release is useful.
Big bang developments that don't deliver until the end have a habit of
not delivering at the end either. MVP is a term frequently used in
VCs for a reason.

Lack of urgency in delivering a useful outcome from the project looks
strange. If all you develop in this release is core infrastructure for
parallelism and a custom scan API, it begins to look like there may
not be an open source version delivered at all. How people make their
income is up to them, but its a concern that's been raised my way
before, so it seems reasonable to do so for others also. That is why
we have spent much time explaining clearly the goals and licencing of
BDR, for example. No need for big arguments; these are not allegations
or implications etc..

 In short, I agree that there is a merit in your idea w.r.t getting the first
 version of parallel operation out early, however if we see from medium
 to long term, I feel solving group locking problem (or some other general
 infrastructure what Robert mentioned upthread) can yield better results,
 unless you feel that is not at all required for parallel operations.

Is it genuinely required for most parallel operations? I think it's
clear that none of us knows the answer. Sure, the general case needs
it, but is the general case the same thing as the reasonably common
case?

Certainly, having a working prototype for something useful would help
build the case for why these things are needed. It would certainly
help the cause to have something working in this release.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Robert Haas
On Fri, Oct 31, 2014 at 6:41 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Is it genuinely required for most parallel operations? I think it's
 clear that none of us knows the answer. Sure, the general case needs
 it, but is the general case the same thing as the reasonably common
 case?

Well, I think that the answer is pretty clear.  Most of the time,
perhaps in 99.9% of cases, group locking will make no difference as to
whether a parallel operation succeeds or fails.  Occasionally,
however, it will cause an undetected deadlock.  I don't hear anyone
arguing that that's OK.  Does anyone wish to make that argument?

If not, then we must prevent it.  The only design, other than what
I've proposed here, that seems like it will do that consistently in
all cases is to have the user backend lock every table that the child
backend might possibly want to lock and retain those locks throughout
the entire duration of the computation whether the child would
actually need those locks or not.  I think that could be made to work,
but there are two probems:

1. Turing's theorem being what it is, predicting what catalog tables
the child might lock is not necessarily simple.

2. It might end up taking any more locks than necessary and holding
them for much longer than necessary.  Right now, for example, a
syscache lookup locks the table only if we actually need to read from
it and releases the lock as soon as the actual read is complete.
Under this design, every syscache that the parallel worker might
conceivably consult needs to be locked for the entire duration of the
parallel computation.  I would expect this to provoke a violent
negative reaction from at least one prominent community member (and I
bet users wouldn't like it much, either).

So, I am still of the opinion that group locking makes sense.   While
I appreciate the urge to avoid solving difficult problems where it's
reasonably avoidable, I think we're in danger of spending more effort
trying to avoid solving this particular problem than it would take to
actually solve it.  Based on what I've done so far, I'm guessing that
a complete group locking patch will be between 1000 and 1500 lines of
code and that nearly all of the new logic will be skipped when it's
not in use (i.e. no parallelism).  That sounds to me like a hell of a
deal compared to trying to predict what locks the child might
conceivably take and preemptively acquire them all, which sounds
annoyingly tedious even for simple things (like equality operators)
and nearly impossible for anything more complicated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Andres Freund
On 2014-10-31 08:54:54 -0400, Robert Haas wrote:
 On Fri, Oct 31, 2014 at 6:41 AM, Simon Riggs si...@2ndquadrant.com wrote:
  Is it genuinely required for most parallel operations? I think it's
  clear that none of us knows the answer. Sure, the general case needs
  it, but is the general case the same thing as the reasonably common
  case?
 
 Well, I think that the answer is pretty clear.  Most of the time,
 perhaps in 99.9% of cases, group locking will make no difference as to
 whether a parallel operation succeeds or fails.  Occasionally,
 however, it will cause an undetected deadlock.  I don't hear anyone
 arguing that that's OK.  Does anyone wish to make that argument?

Maybe we can, as a first step, make those edges in the lock graph
visible to the deadlock detector? It's pretty clear that undetected
deadlocks aren't ok, but detectable deadlocks in a couple corner cases
might be acceptable.

 If not, then we must prevent it.  The only design, other than what
 I've proposed here, that seems like it will do that consistently in
 all cases is to have the user backend lock every table that the child
 backend might possibly want to lock and retain those locks throughout
 the entire duration of the computation whether the child would
 actually need those locks or not.  I think that could be made to work,
 but there are two probems:
 
 1. Turing's theorem being what it is, predicting what catalog tables
 the child might lock is not necessarily simple.

Well, there's really no need to be absolutely general here. We're only
going to support a subset of functionality as parallelizable. And in
that case we don't need anything complicated to acquire all locks.

It seems quite possible to combine a heuristic approach with improved
deadlock detection.

 2. It might end up taking any more locks than necessary and holding
 them for much longer than necessary.  Right now, for example, a
 syscache lookup locks the table only if we actually need to read from
 it and releases the lock as soon as the actual read is complete.
 Under this design, every syscache that the parallel worker might
 conceivably consult needs to be locked for the entire duration of the
 parallel computation.  I would expect this to provoke a violent
 negative reaction from at least one prominent community member (and I
 bet users wouldn't like it much, either).

I see little reason to do this for system level relations.

 So, I am still of the opinion that group locking makes sense.   While
 I appreciate the urge to avoid solving difficult problems where it's
 reasonably avoidable, I think we're in danger of spending more effort
 trying to avoid solving this particular problem than it would take to
 actually solve it.  Based on what I've done so far, I'm guessing that
 a complete group locking patch will be between 1000 and 1500 lines of
 code and that nearly all of the new logic will be skipped when it's
 not in use (i.e. no parallelism).  That sounds to me like a hell of a
 deal compared to trying to predict what locks the child might
 conceivably take and preemptively acquire them all, which sounds
 annoyingly tedious even for simple things (like equality operators)
 and nearly impossible for anything more complicated.

What I'm worried about is the performance impact of group locking when
it's not in use. The heavyweight locking code is already quite complex
and often a noticeable bottleneck...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Robert Haas
On Fri, Oct 31, 2014 at 9:07 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-10-31 08:54:54 -0400, Robert Haas wrote:
 On Fri, Oct 31, 2014 at 6:41 AM, Simon Riggs si...@2ndquadrant.com wrote:
  Is it genuinely required for most parallel operations? I think it's
  clear that none of us knows the answer. Sure, the general case needs
  it, but is the general case the same thing as the reasonably common
  case?

 Well, I think that the answer is pretty clear.  Most of the time,
 perhaps in 99.9% of cases, group locking will make no difference as to
 whether a parallel operation succeeds or fails.  Occasionally,
 however, it will cause an undetected deadlock.  I don't hear anyone
 arguing that that's OK.  Does anyone wish to make that argument?

 Maybe we can, as a first step, make those edges in the lock graph
 visible to the deadlock detector? It's pretty clear that undetected
 deadlocks aren't ok, but detectable deadlocks in a couple corner cases
 might be acceptable.

Interesting idea.  I agree that would be more acceptable.  It would be
kind of sad, though, if you got a deadlock from this:

BEGIN;
LOCK TABLE foo;
SELECT * FROM foo;

You can, of course, avoid that, but it's basically transferring the
burden from the lock manager to every bit of parallel code we ever
write.  If it turns out to be easier to do this than what I'm
currently proposing, I'm definitely amenable to going this route for
version 1, but I don't think it's going to be appealing as a permanent
solution.

 1. Turing's theorem being what it is, predicting what catalog tables
 the child might lock is not necessarily simple.

 Well, there's really no need to be absolutely general here. We're only
 going to support a subset of functionality as parallelizable. And in
 that case we don't need anything complicated to acquire all locks.

See, that's what I don't believe.  Even very simple things like
equality operators do a surprising amount of catalog locking.

 2. It might end up taking any more locks than necessary and holding
 them for much longer than necessary.  Right now, for example, a
 syscache lookup locks the table only if we actually need to read from
 it and releases the lock as soon as the actual read is complete.
 Under this design, every syscache that the parallel worker might
 conceivably consult needs to be locked for the entire duration of the
 parallel computation.  I would expect this to provoke a violent
 negative reaction from at least one prominent community member (and I
 bet users wouldn't like it much, either).

 I see little reason to do this for system level relations.

I'm not following you.

 So, I am still of the opinion that group locking makes sense.   While
 I appreciate the urge to avoid solving difficult problems where it's
 reasonably avoidable, I think we're in danger of spending more effort
 trying to avoid solving this particular problem than it would take to
 actually solve it.  Based on what I've done so far, I'm guessing that
 a complete group locking patch will be between 1000 and 1500 lines of
 code and that nearly all of the new logic will be skipped when it's
 not in use (i.e. no parallelism).  That sounds to me like a hell of a
 deal compared to trying to predict what locks the child might
 conceivably take and preemptively acquire them all, which sounds
 annoyingly tedious even for simple things (like equality operators)
 and nearly impossible for anything more complicated.

 What I'm worried about is the performance impact of group locking when
 it's not in use. The heavyweight locking code is already quite complex
 and often a noticeable bottleneck...

I certainly agree that it would be unacceptable for group locking to
significantly regress the normal case.  I'm pretty optimistic about
reducing the overhead to near-zero, though - a few extra branch
instructions on the non-fast-path case should be lost in the noise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Andres Freund
On 2014-10-31 10:08:59 -0400, Robert Haas wrote:
 On Fri, Oct 31, 2014 at 9:07 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2014-10-31 08:54:54 -0400, Robert Haas wrote:
  On Fri, Oct 31, 2014 at 6:41 AM, Simon Riggs si...@2ndquadrant.com wrote:
   Is it genuinely required for most parallel operations? I think it's
   clear that none of us knows the answer. Sure, the general case needs
   it, but is the general case the same thing as the reasonably common
   case?
 
  Well, I think that the answer is pretty clear.  Most of the time,
  perhaps in 99.9% of cases, group locking will make no difference as to
  whether a parallel operation succeeds or fails.  Occasionally,
  however, it will cause an undetected deadlock.  I don't hear anyone
  arguing that that's OK.  Does anyone wish to make that argument?
 
  Maybe we can, as a first step, make those edges in the lock graph
  visible to the deadlock detector? It's pretty clear that undetected
  deadlocks aren't ok, but detectable deadlocks in a couple corner cases
  might be acceptable.
 
 Interesting idea.  I agree that would be more acceptable.  It would be
 kind of sad, though, if you got a deadlock from this:
 
 BEGIN;
 LOCK TABLE foo;
 SELECT * FROM foo;

I have serious doubts about the number of cases where it's correct to
access relations in a second backend that are exclusively locked. Not so
much when that happens for a user issued LOCK statement of course, but
when the system has done so. Personally I think to stay sane we'll have
to forbid access to anything normal other backends can't access either -
otherwise things will get too hard to reason about.

So just refusing parallelism as soon as anything has taken an access
exclusive lock doesn't sound too bad to me.

 You can, of course, avoid that, but it's basically transferring the
 burden from the lock manager to every bit of parallel code we ever
 write.  If it turns out to be easier to do this than what I'm
 currently proposing, I'm definitely amenable to going this route for
 version 1, but I don't think it's going to be appealing as a permanent
 solution.

It's probably not the solution forever, I agree. But it might be the
simplest way forward till we have more actual users.

  1. Turing's theorem being what it is, predicting what catalog tables
  the child might lock is not necessarily simple.
 
  Well, there's really no need to be absolutely general here. We're only
  going to support a subset of functionality as parallelizable. And in
  that case we don't need anything complicated to acquire all locks.
 
 See, that's what I don't believe.  Even very simple things like
 equality operators do a surprising amount of catalog locking.

So what? I don't see catalog locking as something problematic? Maybe I'm
missing something, but I don't see cases would you expect deadlocks or
anything like it on catalog relations?

  So, I am still of the opinion that group locking makes sense.   While
  I appreciate the urge to avoid solving difficult problems where it's
  reasonably avoidable, I think we're in danger of spending more effort
  trying to avoid solving this particular problem than it would take to
  actually solve it.  Based on what I've done so far, I'm guessing that
  a complete group locking patch will be between 1000 and 1500 lines of
  code and that nearly all of the new logic will be skipped when it's
  not in use (i.e. no parallelism).  That sounds to me like a hell of a
  deal compared to trying to predict what locks the child might
  conceivably take and preemptively acquire them all, which sounds
  annoyingly tedious even for simple things (like equality operators)
  and nearly impossible for anything more complicated.
 
  What I'm worried about is the performance impact of group locking when
  it's not in use. The heavyweight locking code is already quite complex
  and often a noticeable bottleneck...
 
 I certainly agree that it would be unacceptable for group locking to
 significantly regress the normal case.  I'm pretty optimistic about
 reducing the overhead to near-zero, though - a few extra branch
 instructions on the non-fast-path case should be lost in the noise.

I'm less worried about the additional branches than about increasing the
size of the datastructures. They're already pretty huge.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Simon Riggs
On 31 October 2014 12:54, Robert Haas robertmh...@gmail.com wrote:

 1. Turing's theorem being what it is, predicting what catalog tables
 the child might lock is not necessarily simple.

The Pareto principle offers ways to cope with the world's lack of simplicity.

You mentioned earlier that functions would need to be marked proisparallel etc..

What conditions will that be protecting against? If we aren't going to
support the general case where every single thing works, can we at
least discuss what the list of cases is that we will support.

I don't think we can argue that everything must be generic when we
already admit it won't be.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Robert Haas
On Fri, Oct 31, 2014 at 10:38 AM, Andres Freund and...@2ndquadrant.com wrote:
 I have serious doubts about the number of cases where it's correct to
 access relations in a second backend that are exclusively locked. Not so
 much when that happens for a user issued LOCK statement of course, but
 when the system has done so. Personally I think to stay sane we'll have
 to forbid access to anything normal other backends can't access either -
 otherwise things will get too hard to reason about.

 So just refusing parallelism as soon as anything has taken an access
 exclusive lock doesn't sound too bad to me.

That restriction seems onerous to me; for example, it would prevent a
parallel sort for CLUSTER or a parallel index rebuild for VACUUM FULL.
Those seems like worthy targets for parallelism, and maybe even early
targets, since I expect a lot of the real complexity here to be in the
query planner, and you can avoid most of that complexity when planning
DDL.

I also think it's unnecessary.  It's wrong to think of two parallel
backends that both want AccessExclusiveLock on a given target relation
as conflicting with each other; if the process were not running in
parallel, those lock requests would both have come from the same
process and both requests would have been granted.  Parallelism is
generally not about making different things happen; it's about
spreading the stuff that would happen anyway across multiple backends,
and things that would succeed if done in a single backend shouldn't
fail when spread across 2 or more without some compelling
justification.  The cases where it's actually unsafe for a table to be
accessed even by some other code within the same backend are handled
not by the lock manager, but by CheckTableNotInUse().  That call
should probably fail categorically if issued while parallelism is
active.

  1. Turing's theorem being what it is, predicting what catalog tables
  the child might lock is not necessarily simple.
 
  Well, there's really no need to be absolutely general here. We're only
  going to support a subset of functionality as parallelizable. And in
  that case we don't need anything complicated to acquire all locks.

 See, that's what I don't believe.  Even very simple things like
 equality operators do a surprising amount of catalog locking.

 So what? I don't see catalog locking as something problematic? Maybe I'm
 missing something, but I don't see cases would you expect deadlocks or
 anything like it on catalog relations?

Suppose somebody fires off a parallel sort on a text column, or a
parallel sequential scan involving a qual of the form textcol = 'zog'.
We launch a bunch of workers to do comparisons; they do lookups
against pg_collation.  After some but not all of them have loaded the
collation information they need into their local cache, the DBA types
cluster pg_collate.  It queues up for an AccessExclusiveLock.  The
remaining parallel workers join the wait queue waiting for their
AccessShareLock. The system is now deadlocked; the only solution is to
move the parallel workers ahead of the AccessExclusiveLock request,
but the deadlock detector won't do that unless it knows about the
relationship among the parallel workers.

It's possible to construct more obscure cases as well.  For example,
suppose a user (for reasons known only to himself and God) does LOCK
TABLE pg_opclass and then begins a sort of a column full of enums.
Meanwhile, another user tries to CLUSTER pg_enum.  This will deadlock
if, and only if, some of the enum OIDs are odd.  I mention this
example not because I think it's a particularly useful case but
because it illustrates just how robust the current system is: it will
catch that case, and break the deadlock somehow, and you as a
PostgreSQL backend hacker do not need to think about it.  The guy who
added pg_enum did not need to think about it.  It just works.  If we
decide that parallelism is allowed to break that guarantee, then every
patch that does parallel anything has got to worry about what
undetected deadlocks might result, and then argue about which of them
are obscure enough that we shouldn't care and which are not.  That
doesn't sound like a fun way to spend my time.

But, really, the first case is the one I'm more worried about: a bunch
of parallel backends all grab the same locks, but slightly separated
in time.  A strong locker joins the queue midway through and we're
hosed.  Obviously any system cache lookups whatsoever are going to be
separated in time like this, just because the world isn't synchronous.
The pg_enum case is an example of how the lookups could be more widely
spaced: each backend will scan pg_enum when it first hits an odd OID,
and that could happen much later for some than others.  But even a
small window is enough to render undetected deadlock a possibility,
and you will not convince me that hope the race condition is narrow
enough in practice that this never happens is a remotely sane
strategy.

Down the road, it's 

Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Simon Riggs
On 31 October 2014 18:29, Robert Haas robertmh...@gmail.com wrote:

 Suppose somebody fires off a parallel sort on a text column, or a
 parallel sequential scan involving a qual of the form textcol = 'zog'.
 We launch a bunch of workers to do comparisons; they do lookups
 against pg_collation.  After some but not all of them have loaded the
 collation information they need into their local cache, the DBA types
 cluster pg_collate.  It queues up for an AccessExclusiveLock.  The
 remaining parallel workers join the wait queue waiting for their
 AccessShareLock. The system is now deadlocked; the only solution is to
 move the parallel workers ahead of the AccessExclusiveLock request,
 but the deadlock detector won't do that unless it knows about the
 relationship among the parallel workers.

It's an obscure case and its not the only solution either.

I'm really surprised that having workers do their own locking doesn't
scare you. Personally, it scares me.

It's not like we're the first do parallel query. Honestly, how many
parallel databases do you think do this?

I can't see this being a practical, workable solution for running a
parallel query across a cluster of many nodes. Shared, distributed
lock table?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Robert Haas
On Fri, Oct 31, 2014 at 11:02 AM, Simon Riggs si...@2ndquadrant.com wrote:
 You mentioned earlier that functions would need to be marked proisparallel 
 etc..

 What conditions will that be protecting against? If we aren't going to
 support the general case where every single thing works, can we at
 least discuss what the list of cases is that we will support.

In general, any operation that relies on backend-private state not
shared with a cooperating backend isn't parallel-safe.  For example,
consider:

SELECT setseed(1);
SELECT random() FROM generate_series(1,1000) g;

This sequence of queries can be relied upon to produce the same output
every time.  But if some of the random() calls are executed in another
backend, you'll get different results because random() works by using
backend-private memory to store its pseudo-random state.  It's
unlikely to be worth the effort to move the pseudo-random state to a
DSM for parallelism, so we probably just want to disallow parallel
query when random() is in use, or, better still, disallow
parallelizing only the particular query node that uses random().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Andres Freund
On 2014-10-31 14:29:57 -0400, Robert Haas wrote:
 On Fri, Oct 31, 2014 at 10:38 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  I have serious doubts about the number of cases where it's correct to
  access relations in a second backend that are exclusively locked. Not so
  much when that happens for a user issued LOCK statement of course, but
  when the system has done so. Personally I think to stay sane we'll have
  to forbid access to anything normal other backends can't access either -
  otherwise things will get too hard to reason about.
 
  So just refusing parallelism as soon as anything has taken an access
  exclusive lock doesn't sound too bad to me.

 That restriction seems onerous to me; for example, it would prevent a
 parallel sort for CLUSTER or a parallel index rebuild for VACUUM FULL.
 Those seems like worthy targets for parallelism, and maybe even early
 targets, since I expect a lot of the real complexity here to be in the
 query planner, and you can avoid most of that complexity when planning
 DDL.

Ugh. I think that's aiming too high. You'll suddenly need to share cche
invalidations between the backends. I think we should start by requiring
that there's no DDL done *at all* in the transaction that starts the
parallel activity. For CREATE INDEX PARALLEL that can be done by reusing
the logic we have for CONCURRENTLY, thereby moving the parallelized part
into a transaction that hasn't done any DDL yet.

 I also think it's unnecessary.  It's wrong to think of two parallel
 backends that both want AccessExclusiveLock on a given target relation
 as conflicting with each other; if the process were not running in
 parallel, those lock requests would both have come from the same
 process and both requests would have been granted.

I don't think that's a realistic view. There's lots of backend private
state around. If we try to make all that coherent between backends we'll
fail.

 Parallelism is
 generally not about making different things happen; it's about
 spreading the stuff that would happen anyway across multiple backends,
 and things that would succeed if done in a single backend shouldn't
 fail when spread across 2 or more without some compelling
 justification.  The cases where it's actually unsafe for a table to be
 accessed even by some other code within the same backend are handled
 not by the lock manager, but by CheckTableNotInUse().  That call
 should probably fail categorically if issued while parallelism is
 active.

Which will e.g. prohibit CLUSTER and VACUUM FULL.

   1. Turing's theorem being what it is, predicting what catalog tables
   the child might lock is not necessarily simple.
  
   Well, there's really no need to be absolutely general here. We're only
   going to support a subset of functionality as parallelizable. And in
   that case we don't need anything complicated to acquire all locks.
 
  See, that's what I don't believe.  Even very simple things like
  equality operators do a surprising amount of catalog locking.
 
  So what? I don't see catalog locking as something problematic? Maybe I'm
  missing something, but I don't see cases would you expect deadlocks or
  anything like it on catalog relations?

 Suppose somebody fires off a parallel sort on a text column, or a
 parallel sequential scan involving a qual of the form textcol = 'zog'.
 We launch a bunch of workers to do comparisons; they do lookups
 against pg_collation.  After some but not all of them have loaded the
 collation information they need into their local cache, the DBA types
 cluster pg_collate.  It queues up for an AccessExclusiveLock.  The
 remaining parallel workers join the wait queue waiting for their
 AccessShareLock. The system is now deadlocked; the only solution is to
 move the parallel workers ahead of the AccessExclusiveLock request,
 but the deadlock detector won't do that unless it knows about the
 relationship among the parallel workers.

I think you would need to make the case more complex - we only hold
locks on system object for a very short time, and mostly not in a nested
fashion.

But generally I think it's absolutely perfectly alright to fail in such
case. We need to fail reliably, but *none* of this is a new hazard. You
can have pretty similar issues today, without any parallelism.

 It's possible to construct more obscure cases as well.  For example,
 suppose a user (for reasons known only to himself and God) does LOCK
 TABLE pg_opclass and then begins a sort of a column full of enums.
 Meanwhile, another user tries to CLUSTER pg_enum.  This will deadlock
 if, and only if, some of the enum OIDs are odd.  I mention this
 example not because I think it's a particularly useful case but
 because it illustrates just how robust the current system is: it will
 catch that case, and break the deadlock somehow, and you as a
 PostgreSQL backend hacker do not need to think about it.  The guy who
 added pg_enum did not need to think about it.  It just works.  If we
 decide that 

Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Robert Haas
On Fri, Oct 31, 2014 at 2:46 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 31 October 2014 18:29, Robert Haas robertmh...@gmail.com wrote:
 Suppose somebody fires off a parallel sort on a text column, or a
 parallel sequential scan involving a qual of the form textcol = 'zog'.
 We launch a bunch of workers to do comparisons; they do lookups
 against pg_collation.  After some but not all of them have loaded the
 collation information they need into their local cache, the DBA types
 cluster pg_collate.  It queues up for an AccessExclusiveLock.  The
 remaining parallel workers join the wait queue waiting for their
 AccessShareLock. The system is now deadlocked; the only solution is to
 move the parallel workers ahead of the AccessExclusiveLock request,
 but the deadlock detector won't do that unless it knows about the
 relationship among the parallel workers.

 It's an obscure case and its not the only solution either.

I don't think that's an obscure situation at all.  Do you really think
a patch that could cause an attempt to VACUUM FULL a system catalog to
suffer an undetected deadlock meets this community's quality
standards?  Because that's what we're talking about.

You are right that it isn't the only solution.  I have said the same
thing myself, multiple times, on this thread.

 I'm really surprised that having workers do their own locking doesn't
 scare you. Personally, it scares me.

Frankly, the reverse decision scares me a heck of a lot more.  It
superficially seems like a good idea to have the master take locks on
behalf of the workers, but when you start to really think about how
many low-level parts of the code take locks, it quickly becomes
evident, at least to me, that trying to make the resulting system
robust will be a nightmare.

Don't forget that there are not only relation locks but also page
locks, tuple locks, relation extension locks, XID and VXID locks,
locks on arbitrary database or shared objects.  The deadlock detector
handles them all.  If avoiding deadlock outside the lock manager is so
easy, why do we have a deadlock detector in the first place?  Why not
just change all of our other code to avoid them instead?

 It's not like we're the first do parallel query. Honestly, how many
 parallel databases do you think do this?

All of them.

I might be wrong, of course, but I see no reason at all to believe
that Oracle or SQL Server have ignoring deadlock detection when
parallel query is in use.

 I can't see this being a practical, workable solution for running a
 parallel query across a cluster of many nodes. Shared, distributed
 lock table?

I am not proposing a distributed lock manager that can run across many
nodes.  The considerations for such a piece of software are largely
orthogonal to the problem that I'm actually trying to solve here, and
at least an order of magnitude more complex.  If you make it sound
like that's the project that I'm doing, then of course it's going to
sound frighteningly complicated.  But I'm not.

I'm sort of baffled by your concern on this point.  Sure, the lock
manager is a complex and performance-sensitive piece of code, and we
have to be careful not to break it or make it slower.  But we're not
talking about launching a rocket to Mars here; we're just talking
about making the lock manager interface look the same way to two
cooperating backends issuing lock requests in parallel that it looks
to a single backend making those requests serially.  I don't think
it's noticeably harder than getting Hot Standby conflict resolution
work, or getting historical snapshots working for logical decoding, or
making the visibility map crash-safe - in fact, I'd bet on it being
substantially easier than the first two of those, because all the
logic is inside a single, relatively simple backend subsystem.  Why
we'd want to minimize complexity there at the cost of spreading it
across half the backend is mysterious to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Robert Haas
On Fri, Oct 31, 2014 at 3:32 PM, Andres Freund and...@2ndquadrant.com wrote:
  So just refusing parallelism as soon as anything has taken an access
  exclusive lock doesn't sound too bad to me.

 That restriction seems onerous to me; for example, it would prevent a
 parallel sort for CLUSTER or a parallel index rebuild for VACUUM FULL.
 Those seems like worthy targets for parallelism, and maybe even early
 targets, since I expect a lot of the real complexity here to be in the
 query planner, and you can avoid most of that complexity when planning
 DDL.

 Ugh. I think that's aiming too high. You'll suddenly need to share cche
 invalidations between the backends. I think we should start by requiring
 that there's no DDL done *at all* in the transaction that starts the
 parallel activity. For CREATE INDEX PARALLEL that can be done by reusing
 the logic we have for CONCURRENTLY, thereby moving the parallelized part
 into a transaction that hasn't done any DDL yet.

I don't think that's correct.  We only need to process local
invalidation messages after CommandCounterIncrement(), which I
anticipate prohibiting during parallel execution (short thought should
convince you that anything else is completely nuts).  So I think
there's no real problem with invalidation messages being generated
during parallel execution.  I wouldn't anticipate supporting that in
V1, because they'd have to be propagated back to the user backend
prior to commit, but in the longer term it seems pretty feasible.

 I also think it's unnecessary.  It's wrong to think of two parallel
 backends that both want AccessExclusiveLock on a given target relation
 as conflicting with each other; if the process were not running in
 parallel, those lock requests would both have come from the same
 process and both requests would have been granted.

 I don't think that's a realistic view. There's lots of backend private
 state around. If we try to make all that coherent between backends we'll
 fail.

I agree that there's a lot of backend private state, and that's the
principle problem with making this work at all.  The challenge here is
to figure out which parts of that backend-private state we absolutely
must make coherent between backends to enable parallelism to have a
useful, non-trivial scope, and which parts we can punt on.  I have
ideas about that - which are mostly documented at
https://wiki.postgresql.org/wiki/Parallel_Sort - but I'm trying to
keep an open mind on the details.

 Parallelism is
 generally not about making different things happen; it's about
 spreading the stuff that would happen anyway across multiple backends,
 and things that would succeed if done in a single backend shouldn't
 fail when spread across 2 or more without some compelling
 justification.  The cases where it's actually unsafe for a table to be
 accessed even by some other code within the same backend are handled
 not by the lock manager, but by CheckTableNotInUse().  That call
 should probably fail categorically if issued while parallelism is
 active.

 Which will e.g. prohibit CLUSTER and VACUUM FULL.

I don't think so.  I think it's safe to get to the point where we
check that the table is not in use, then do a parallel scan and sort,
then collect the results and write them out.  But I would not try to
make it safe, for example, to be scanning a table and have some
user-defined function invoked along the way to go try to do anything
that involves CheckTableNotInUse().  Many of those would fall over
anyway because of PreventTransactionChain(), but I've got no problem
nailing that hatch shut twice.  And conversely, I would expect we'd
disallow the sort operators invoked by CLUSTER from reaching any code
guarded by CheckTableNotInUse().

 Suppose somebody fires off a parallel sort on a text column, or a
 parallel sequential scan involving a qual of the form textcol = 'zog'.
 We launch a bunch of workers to do comparisons; they do lookups
 against pg_collation.  After some but not all of them have loaded the
 collation information they need into their local cache, the DBA types
 cluster pg_collate.  It queues up for an AccessExclusiveLock.  The
 remaining parallel workers join the wait queue waiting for their
 AccessShareLock. The system is now deadlocked; the only solution is to
 move the parallel workers ahead of the AccessExclusiveLock request,
 but the deadlock detector won't do that unless it knows about the
 relationship among the parallel workers.

 I think you would need to make the case more complex - we only hold
 locks on system object for a very short time, and mostly not in a nested
 fashion.

Hmm.  You're right.  If the parallel backends release their locks
quickly, the CLUSTER could run and then the parallel job could finish.
But it sounds as if you get the idea.

 But generally I think it's absolutely perfectly alright to fail in such
 case. We need to fail reliably, but *none* of this is a new hazard. You
 can have pretty similar issues today, 

Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Andres Freund
On 2014-10-31 15:56:42 -0400, Robert Haas wrote:
 On Fri, Oct 31, 2014 at 3:32 PM, Andres Freund and...@2ndquadrant.com wrote:
   So just refusing parallelism as soon as anything has taken an access
   exclusive lock doesn't sound too bad to me.
 
  That restriction seems onerous to me; for example, it would prevent a
  parallel sort for CLUSTER or a parallel index rebuild for VACUUM FULL.
  Those seems like worthy targets for parallelism, and maybe even early
  targets, since I expect a lot of the real complexity here to be in the
  query planner, and you can avoid most of that complexity when planning
  DDL.
 
  Ugh. I think that's aiming too high. You'll suddenly need to share cche
  invalidations between the backends. I think we should start by requiring
  that there's no DDL done *at all* in the transaction that starts the
  parallel activity. For CREATE INDEX PARALLEL that can be done by reusing
  the logic we have for CONCURRENTLY, thereby moving the parallelized part
  into a transaction that hasn't done any DDL yet.
 
 I don't think that's correct.  We only need to process local
 invalidation messages after CommandCounterIncrement(), which I
 anticipate prohibiting during parallel execution (short thought should
 convince you that anything else is completely nuts).

It is more complex, even without CCI. As long as you're doing anything
inside a transaction that already has done DDL, you'll have to play
nasty tricks. Imagine the primary backend has done a BEGIN; CLUSTER
pg_class; and then then starts a child backend. Which will get the same
snapshot, combocids, yada yada. But it *also* will have preexisting
cache entries. Those you need to invalidate before it can do anything
correct.


  I'm sorry to be a bit pessimistic here. But my intuition says that
  starting to do group locking opens a can of worms that'll take us a long
  time to close again. Maybe I'm just imagining complexity that won't
  actually be there. But I have a hard time believing that.
 
 What's the distinction between teach the deadlock detector to catch
 these kind of cases and group locking?  Because I think those are
 at least 90% the same thing.

I understand under 'group locking' that a primary/second backends can
coown a lock that normally is self-exclusive. That's a fair bit more
than adding an edge to the deadlock graph between primary/secondary
backends to make the deadlock detector recognize problems.

What I have serious doubts about is 'coowning' locks. Especially if two
backends normally wouldn't be able to both get that lock.

  I wonder if we couldn't implement a 'halfway' by allowing parallel
  workers to jump the lockqueue if the parent process already has the
  lock. That'd only work for nonconflicting locks, but I think that's
  actually good.
 
 The patch takes precisely that approach; that part of the logic is
 already implemented.

Well, my point is that if the solution is just to jump the queue, there
really isn't any data structure changes needed. Secondary acquirers just
need to check whether a lock is already owned by the primary and then
acquire the lock in the absolutely normal way - with a completely
separate entry. Only that they ignored the queue.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Robert Haas
On Fri, Oct 31, 2014 at 4:10 PM, Andres Freund and...@2ndquadrant.com wrote:
 I don't think that's correct.  We only need to process local
 invalidation messages after CommandCounterIncrement(), which I
 anticipate prohibiting during parallel execution (short thought should
 convince you that anything else is completely nuts).

 It is more complex, even without CCI. As long as you're doing anything
 inside a transaction that already has done DDL, you'll have to play
 nasty tricks. Imagine the primary backend has done a BEGIN; CLUSTER
 pg_class; and then then starts a child backend. Which will get the same
 snapshot, combocids, yada yada. But it *also* will have preexisting
 cache entries. Those you need to invalidate before it can do anything
 correct.

Where will those preexisting cache entries come from, exactly?  The
postmaster is forking the parallel worker, not the user backend.

  I'm sorry to be a bit pessimistic here. But my intuition says that
  starting to do group locking opens a can of worms that'll take us a long
  time to close again. Maybe I'm just imagining complexity that won't
  actually be there. But I have a hard time believing that.

 What's the distinction between teach the deadlock detector to catch
 these kind of cases and group locking?  Because I think those are
 at least 90% the same thing.

 I understand under 'group locking' that a primary/second backends can
 coown a lock that normally is self-exclusive. That's a fair bit more
 than adding an edge to the deadlock graph between primary/secondary
 backends to make the deadlock detector recognize problems.

I guess.  It seems like a pretty minor extension to me.  Getting the
deadlock detector to do the right thing is the hard part.

 What I have serious doubts about is 'coowning' locks. Especially if two
 backends normally wouldn't be able to both get that lock.

Perhaps we should discuss that more.  To me it seems pretty obvious
that's both safe and important.

  I wonder if we couldn't implement a 'halfway' by allowing parallel
  workers to jump the lockqueue if the parent process already has the
  lock. That'd only work for nonconflicting locks, but I think that's
  actually good.

 The patch takes precisely that approach; that part of the logic is
 already implemented.

 Well, my point is that if the solution is just to jump the queue, there
 really isn't any data structure changes needed. Secondary acquirers just
 need to check whether a lock is already owned by the primary and then
 acquire the lock in the absolutely normal way - with a completely
 separate entry. Only that they ignored the queue.

Well, they need to be able to identify who is in their group.  You
could possibly do that without any changes at all to the lock manager
data structures, but it seems a bit complex.  I took the approach of
storing the group leader's PGPROC * in each PROCLOCK, which makes it
trivial (and is enough information for the deadlock detector to work
correctly, too).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Simon Riggs
On 31 October 2014 19:36, Robert Haas robertmh...@gmail.com wrote:

 It's an obscure case and its not the only solution either.

 I don't think that's an obscure situation at all.  Do you really think
 a patch that could cause an attempt to VACUUM FULL a system catalog to
 suffer an undetected deadlock meets this community's quality
 standards?  Because that's what we're talking about.

Nobody has said that allowing undetected deadlocks is acceptable, so
your other comments are void.

I've suggested *stricter* locking, which would obviously allow
deadlock detection. You recognised that by claiming that the locking I
had proposed was actually too strict, which is where the above example
came from.

Yes, I have proposed stricter locking, but as explained, the only
things this would slow down are catalog VAC FULLs, which are already a
problem.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-31 Thread Simon Riggs
On 31 October 2014 18:47, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Oct 31, 2014 at 11:02 AM, Simon Riggs si...@2ndquadrant.com wrote:
 You mentioned earlier that functions would need to be marked proisparallel 
 etc..

 What conditions will that be protecting against? If we aren't going to
 support the general case where every single thing works, can we at
 least discuss what the list of cases is that we will support.

 In general, any operation that relies on backend-private state not
 shared with a cooperating backend isn't parallel-safe.  For example,

Yes, the principle is easy to understand, but that was not the question.

You are saying that placing restrictions on which functions can
execute is necessary, yet at the same time saying that we must have
generalised locking for workers because restrictions on locking are
not acceptable. I don't point that out because I want an argument or
to prove a point, but because there are important things on the table
here.

What are the specific restrictions you are suggesting we place on
parallel workers? We need that definition clear so we can decide how
to mark the functions. If we don't know, which is easily
understandable, then the best way to find that out is to build a
restricted solution and to expand slowly outwards to find problems.

An obvious milestone there is whether a function contains SQL, which
is the point chosen by some other databases. I personally hope to
expand upon that, but it would be useful to reach it first and then
continue from there.

I was already thinking of implementing CONTAINS NO SQL as a SQL
Standard function marking since it can be used to optimize COPY
further.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-30 Thread Simon Riggs
On 30 October 2014 04:24, Amit Kapila amit.kapil...@gmail.com wrote:

 Locking the toast table of any main tables we access seems easily
 done. Though perhaps we should make weak locking of the toast table
 presumed. Do we have cases where the toast table can be accessed when
 the main table is not also strong locked first?

 I think it is possible to have a strong lock on toast table before
 main table.
 Cluster pg_toast.toast_table_name using toast_index;
 Vacuum Full pg_toast.toast_table_name;
 Reindex Index pg_toast.toast_index_name;
 ..

 Now if take the lock on toast table in main task, it will block some of
 the operations before actually they need to be blocked.

Very strange.

Those commands are not common operations? I doubt many people even
know that exists.

All of those commands would block SELECTs on the main table, so there
is no significant benefit in that behaviour.

In fact it would be more sensible to lock the toast table earlier.


 As for locking the enums table or collation table, that's simple stuff
 also. They're just AccessShareLocks.

 Yeah, taking AccessShareLocks is not a problem, however the main
 problem is that they block any other operation on those tables which require
 AccessExclusiveLock lock or any other strong lock required, before it is
 actually required to block any such operation.

What operations are those? How frequently do they occur?

I can't see any need for major maintenance operations on small catalog tables.


 I can't imagine we'll be able to presume that all functions of every
 kind are parallel-safe.

 Yeah, thats right, so we need to either block such functions to get executed
 in parallel worker or make arrangements so that they can be safely executed,
 I think for first version blocking might be better.

I think that also. That way we won't need complex locking.

 Now, I think we can find ways to avoid all such things by hacking code such
 that conflicting operations can be banned/blocked, however eventually we
 need to have some kind of group locking to avoid deadlocks which can arise
 due to operations in parallel worker.  So it seems to me it is not a bad
 idea
 to have a strong infrastructure first for the things (like group locking)
 which
 we think are required for any non-trivial useful parallel operation without
 putting too much restrictions on users for using the feature.

We are all agreed we need to block some functions from executing in
parallel mode.

Why don't we start by making a list of the functions that might cause
problems in parallel workers, and why. That way we might find out
easier ways of doing things.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-30 Thread Amit Kapila
On Thu, Oct 30, 2014 at 2:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 30 October 2014 04:24, Amit Kapila amit.kapil...@gmail.com wrote:

  Locking the toast table of any main tables we access seems easily
  done. Though perhaps we should make weak locking of the toast table
  presumed. Do we have cases where the toast table can be accessed when
  the main table is not also strong locked first?
 
  I think it is possible to have a strong lock on toast table before
  main table.
  Cluster pg_toast.toast_table_name using toast_index;
  Vacuum Full pg_toast.toast_table_name;
  Reindex Index pg_toast.toast_index_name;
  ..
 
  Now if take the lock on toast table in main task, it will block some of
  the operations before actually they need to be blocked.

 Very strange.

 Those commands are not common operations? I doubt many people even
 know that exists.

 All of those commands would block SELECTs on the main table, so there
 is no significant benefit in that behaviour.

 In fact it would be more sensible to lock the toast table earlier.


It might make some sense to lock the toast table earlier for this
particular case, but I don't think in general it will be feasible to lock
all the tables (including catalog tables) which might get used in the
overall operation before starting parallel operation.  I believe it will
make doing parallel stuff difficult if we try to go via this route, as
we need to always do this for any operation we want to make parallel.
It will always have the risk that we might miss something and another
thing is it might not be feasible in all kind of cases.

If I understand correctly, you are suggesting that to get the first version
of parallel operation out earlier, we should try to avoid all the stuff
(like group locking, ...), have some restrictions on which kind of cases
Create Index can work, do some hackery in Create Index path to avoid
any kind of locking problems and do the parallel operation for External
Sort (which might avoid need for shared memory allocator) and then
call it a day and then do the things which we need for other parallel
operations as and when they are required.  I think this might be a good
approach in itself if somebody wants to target something to release
early, however in medium to short term when we want to provide
non-trivial parallel operations we have to eventually solve those problems
and delaying will only make the things worse.

In short, I agree that there is a merit in your idea w.r.t getting the first
version of parallel operation out early, however if we see from medium
to long term, I feel solving group locking problem (or some other general
infrastructure what Robert mentioned upthread) can yield better results,
unless you feel that is not at all required for parallel operations.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-29 Thread Simon Riggs
On 28 October 2014 23:24, Robert Haas robertmh...@gmail.com wrote:

 You asked for my help, but I'd like to see some concrete steps towards
 an interim feature so I can see some benefit in a clear direction.

 Can we please have the first step we discussed? Parallel CREATE INDEX?
 (Note the please)

 What I've been thinking about trying to work towards is parallel
 sequential scan.  I think that it would actually be pretty easy to
 code up a mostly-working version using the existing infrastructure,
 but the patch would be rejected with a bazooka, because the
 non-working parts would include things like:

 1. The cooperating backends might not all be using the same snapshot,
 because that requires sharing the snapshot, combo CID hash, and
 transaction state.
 2. The quals that got pushed down to the workers might not return the
 same answers that they would have produced with a single backend,
 because we have no mechanism for assessing pushdown-safety.
 3. Deadlock detection would be to some greater or lesser degree
 broken, the details depending on the implementation choices you made.

 There is a bit of a chicken-and-egg problem here.  If I submit a patch
 for parallel sequential scan, it'll (justifiably) get rejected because
 it doesn't solve those problems.  So I'm trying to solve those
 above-enumerated problems first, with working and at least
 somewhat-useful examples that show how the incremental bits of
 infrastructure can be used to do stuff.  But that leads to your
 (understandable) complaint that this isn't very real yet.

 Why am I now thinking about parallel sequential scan instead of
 parallel CREATE INDEX?  You may remember that I posted a patch for a
 new memory allocator some time ago, and it came in for a fair amount
 of criticism and not much approbation.   Some of that criticism was
 certainly justified, and perhaps I was as hard on myself as anyone
 else was.  However you want to look at it, I see the trade-off between
 parallel sort and parallel seq-scan this way: parallel seq-scan
 requires dealing with the planner (ouch!) but parallel sort requires
 dealing with memory allocation in dynamic shared memory segments
 (ouch!).  Both of them require solving the three problems listed
 above.

 And maybe a few others, but I think those are the big ones - and I
 think proper deadlock detection is the hardest of them.  A colleague
 of mine has drafted patches for sharing snapshots and combo CIDs
 between processes, and as you might expect that's pretty easy.
 Sharing the transaction state (so we can test whether a transaction ID
 is our transaction ID inside the worker) is a bit trickier, but I
 think not too hard.  Assessing pushdown-safety will probably boil down
 to adding some equivalent of proisparallel.  Maybe not the most
 elegant, but defensible, and if you're looking for the shortest path
 to something usable, that's probably it.  But deadlock detection ...
 well, I don't see any simpler solution than what I'm trying to build
 here.

OK, I see where you are headed, and perhaps why.

I said I would help and I mean to do so. Here's my thoughts:

Solving your 3 problems requires significant research, coding and
agreement that you're unlikely to get in 9.5 because there are no
solutions happening at the end of it; its all theoretical design,
which becomes more arguable. And even if you do, Parallel (||) Seq
Scan isn't interesting except for marketing purposes, since its use in
real queries is limited because most queries need to do something else
after the SeqScan, which opens up a fuller discussion of parallel
query which we've never had. My feeling is that releasing only || Seq
Scan would backfire since we don't yet do enough other stuff to be a
full solution.

If you do wish to pursue || Seq Scan, then a working prototype would
help. It allows us to see that there is an open source solution we are
working to solve the problems for. People can benchmark it, understand
the benefits and issues it raises and that would help focus attention
on the problems you are trying to solve in infrastructure. People may
have suggestions on how to solve or avoid those that you hadn't
thought of.

Personally, IMHO, its quite late in the cycle to be releasing such a
prototype since we have only 6-7 weeks until the major patch
submission deadline. Hence my request: can we get *something* good for
users into 9.5, since with respect, infrastructure isn't useful. (If
you take such comments negatively, we might discuss similar issues
with BDR - but we are working to make simple features available in
each release on that project also).

My proposal is we do a parallel index build scan... just as we
discussed earlier, so you would be following the direction set by Dev
Meeting, not just a proposal of mine.

As I mentioned previously when you started discussing shared memory
segments, parallel sort does NOT require shared memory. The only thing
you need to share are files. Split the problem into N pieces, 

Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-29 Thread Amit Kapila
On Wed, Oct 29, 2014 at 2:18 PM, Simon Riggs si...@2ndquadrant.com wrote:

 My proposal is we do a parallel index build scan... just as we
 discussed earlier, so you would be following the direction set by Dev
 Meeting, not just a proposal of mine.

 As I mentioned previously when you started discussing shared memory
 segments, parallel sort does NOT require shared memory. The only thing
 you need to share are files. Split the problem into N pieces, sort
 them to produce N files and then merge the files using existing code.
 That only applies to large sorts, but then those are the ones you
 cared about doing in parallel anyway.

 CREATE INDEX has a lock on the table. Worker tasks can be simply
 banned from acquiring new locks and doing many other tasks.

Banning worker tasks from taking locks is only possible if the
worker backend doesn't need to acquire any lock which is not
already acquired by main backend, but can we safely assume
the same?  One example as taken by Robert upthread about
bttextcmp() can break that assumption.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-29 Thread Robert Haas
On Wed, Oct 29, 2014 at 4:48 AM, Simon Riggs si...@2ndquadrant.com wrote:
 If you do wish to pursue || Seq Scan, then a working prototype would
 help. It allows us to see that there is an open source solution we are
 working to solve the problems for. People can benchmark it, understand
 the benefits and issues it raises and that would help focus attention
 on the problems you are trying to solve in infrastructure. People may
 have suggestions on how to solve or avoid those that you hadn't
 thought of.

I've mulled that over a bit and it might be worth pursuing further.
Of course there's always the trade-off: doing that means not doing
something else.

 As I mentioned previously when you started discussing shared memory
 segments, parallel sort does NOT require shared memory. The only thing
 you need to share are files. Split the problem into N pieces, sort
 them to produce N files and then merge the files using existing code.
 That only applies to large sorts, but then those are the ones you
 cared about doing in parallel anyway.

A simple implementation of this would work only for simple
pass-by-value types, like integers.  Pass-by-reference types require
the comparator to de-TOAST, and some other types require catalog
lookups.  I don't think that's very useful: Noah previously did some
analysis of this problem and concluded (with apologies if I'm remember
the details incorrectly here) that the comparator for strings was
something like 1000x as expensive as the comparator for integers, and
that you basically couldn't get the latter to take enough time to be
worth parallelizing.

I care much more about getting the general infrastructure in place to
make parallel programming feasible in PostgreSQL than I do about
getting one particular case working.  And more than feasible: I want
it to be relatively straightforward.  That's not simple, but the
potential rewards are great.  Let's face it: there are people here who
are much better than I am at hacking on the planner and especially the
executor than I am.  Why haven't any of those people implemented
parallel anything?  I think it's because, right now, it's just too
darn hard.  I'm trying to reduce that to something approaching the
difficulty of writing normal PostgreSQL backend code, and I think I'm
6-12 patches away from that.  This is one of them and, yeah, it's not
done, and, yeah, we might not get to parallel anything this release
and, yeah, things would be going faster if I could work on parallelism
full time.  But I think that the progress we are making is meaningful
and the goal is within sight.

I appreciate that you'd probably attack this problem from a different
direction than I'm attacking it from, but I still think that what I'm
trying to do is a legitimate direction of attack which, by the way,
does not preclude anybody else from attacking it from a different
direction and, indeed, such a development would be most welcome.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-29 Thread Simon Riggs
On 29 October 2014 12:08, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Oct 29, 2014 at 2:18 PM, Simon Riggs si...@2ndquadrant.com wrote:

 My proposal is we do a parallel index build scan... just as we
 discussed earlier, so you would be following the direction set by Dev
 Meeting, not just a proposal of mine.

 As I mentioned previously when you started discussing shared memory
 segments, parallel sort does NOT require shared memory. The only thing
 you need to share are files. Split the problem into N pieces, sort
 them to produce N files and then merge the files using existing code.
 That only applies to large sorts, but then those are the ones you
 cared about doing in parallel anyway.

 CREATE INDEX has a lock on the table. Worker tasks can be simply
 banned from acquiring new locks and doing many other tasks.

 Banning worker tasks from taking locks is only possible if the
 worker backend doesn't need to acquire any lock which is not
 already acquired by main backend, but can we safely assume
 the same?  One example as taken by Robert upthread about
 bttextcmp() can break that assumption.

I think its reasonable to imagine some prep work will be required in
the main task before workers begin.

Locking the toast table of any main tables we access seems easily
done. Though perhaps we should make weak locking of the toast table
presumed. Do we have cases where the toast table can be accessed when
the main table is not also strong locked first?

As for locking the enums table or collation table, that's simple stuff
also. They're just AccessShareLocks.

I can't imagine we'll be able to presume that all functions of every
kind are parallel-safe.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-29 Thread Simon Riggs
On 29 October 2014 12:28, Robert Haas robertmh...@gmail.com wrote:

 I care much more about getting the general infrastructure in place to
 make parallel programming feasible in PostgreSQL than I do about
 getting one particular case working.  And more than feasible: I want
 it to be relatively straightforward.  That's not simple, but the
 potential rewards are great.  Let's face it: there are people here who
 are much better than I am at hacking on the planner and especially the
 executor than I am.  Why haven't any of those people implemented
 parallel anything?  I think it's because, right now, it's just too
 darn hard.  I'm trying to reduce that to something approaching the
 difficulty of writing normal PostgreSQL backend code, and I think I'm
 6-12 patches away from that.  This is one of them and, yeah, it's not
 done, and, yeah, we might not get to parallel anything this release
 and, yeah, things would be going faster if I could work on parallelism
 full time.  But I think that the progress we are making is meaningful
 and the goal is within sight.

The role of an immediate working solution is to prove the
infrastructure so far is genuinely useful, helping also to build
interest and understanding.

There is a real danger that your ta-dah moment sometime in the
future contains flaws which need to be addressed, but we now have
piles of questionable infrastructure lieing around. If you have
similar doubts about anything I'm doing, please just ask.

If we could see the 6-12 patch descriptions and understand where you
are going it would help.

 I appreciate that you'd probably attack this problem from a different
 direction than I'm attacking it from, but I still think that what I'm
 trying to do is a legitimate direction of attack which, by the way,
 does not preclude anybody else from attacking it from a different
 direction and, indeed, such a development would be most welcome.

Well, I don't have time; been a little busy these last 10 years and
for at least one year more yet before projects open up again.

If I did, I don't think it would be that useful to interfere.
Cooperation seems better use of my time, if possible.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-29 Thread Robert Haas
On Wed, Oct 29, 2014 at 10:21 AM, Simon Riggs si...@2ndquadrant.com wrote:
 There is a real danger that your ta-dah moment sometime in the
 future contains flaws which need to be addressed, but we now have
 piles of questionable infrastructure lieing around. If you have
 similar doubts about anything I'm doing, please just ask.

I think the implication that the infrastructure created to date is of
questionable utility for parallelism is false and unjustifed.  Every
infrastructure patch thus far committed, and every patch proposed
other than this one, has been accompanied by code that demonstrates
what it does and how that's useful.  Most of them have had enough
interest by extension authors to generate bug reports from other
people - most notably Petr, who has obviously been playing with all of
the existing modules enough to notice bugs and come to conclusions
about the shortcomings of the existing technology similar to my own.
Interest in dynamic background workers seems to go considerably
further, to the point where it made 9.4's list of major features.

 If we could see the 6-12 patch descriptions and understand where you
 are going it would help.

OK, here's what I think we need (roughly):

1. The remaining pg_background patches.  The main things in there are
(a) the ability to propagate errors from a background worker back to
the user backend and (b) the ability to synchronize the background
worker's GUC state with the user backend.

2. Group locking.

3. Patches to synchronize the (a) snapshot, (b) combo CID hash, and
(c) transaction state from the user backend to a background worker.
Draft patches for (a) and (b) are already completed yet, but not
published because I didn't think anyone would be very excited about
them just yet.

4. A patch to add parallel mode in which a whole bunch of things are
prohibited in the user backend and even more things are prohibited in
the background worker - specifically, things that won't be safe in
those contexts.  Noah and Amit Kapila worked on this and we have draft
patches for this as well, also not published and for the same reasons.

5. A patch to add a proisparallel flag or similar so that we can
label parallel-safe functions.

6. A patch to add a way for a background worker that hits an error to
kick the user backend, probably via the ProcSignal mechanism, so that
the user doesn't have to poll the shm_mq structures it's using to
communicate with those backends for errors continuously (or
alternative sit idle while those background workers do all the work).
This might be something that can be skipped for early versions, but
eventually I think we'll need it.

7. Eventually, a dynamic shared memory allocator.  A lot of things can
be done without this, but some can't.

I think that's most of it.  No doubt we'll discover a few other needs
along the way, but if we had that infrastructure, then I think it
would be quite feasible for an experienced hacker to parallelize most
of what we want to parallelize without having to invent too much
that's really fundamental.

 If I did, I don't think it would be that useful to interfere.
 Cooperation seems better use of my time, if possible.

I am not asking anyone to interfere, in the sense of getting in the
way, but I certainly think it would be useful for people to contribute
code and/or reviews.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-29 Thread Simon Riggs
On 29 October 2014 15:43, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Oct 29, 2014 at 10:21 AM, Simon Riggs si...@2ndquadrant.com wrote:
 There is a real danger that your ta-dah moment sometime in the
 future contains flaws which need to be addressed, but we now have
 piles of questionable infrastructure lieing around. If you have
 similar doubts about anything I'm doing, please just ask.

 I think the implication that the infrastructure created to date is of
 questionable utility for parallelism is false and unjustifed.

Huh? I haven't implied that.

Thanks for your wider explanations.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-29 Thread Amit Kapila
On Wed, Oct 29, 2014 at 7:38 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 29 October 2014 12:08, Amit Kapila amit.kapil...@gmail.com wrote:
  On Wed, Oct 29, 2014 at 2:18 PM, Simon Riggs si...@2ndquadrant.com
wrote:
 
  CREATE INDEX has a lock on the table. Worker tasks can be simply
  banned from acquiring new locks and doing many other tasks.
 
  Banning worker tasks from taking locks is only possible if the
  worker backend doesn't need to acquire any lock which is not
  already acquired by main backend, but can we safely assume
  the same?  One example as taken by Robert upthread about
  bttextcmp() can break that assumption.

 I think its reasonable to imagine some prep work will be required in
 the main task before workers begin.

 Locking the toast table of any main tables we access seems easily
 done. Though perhaps we should make weak locking of the toast table
 presumed. Do we have cases where the toast table can be accessed when
 the main table is not also strong locked first?

I think it is possible to have a strong lock on toast table before
main table.
Cluster pg_toast.toast_table_name using toast_index;
Vacuum Full pg_toast.toast_table_name;
Reindex Index pg_toast.toast_index_name;
..

Now if take the lock on toast table in main task, it will block some of
the operations before actually they need to be blocked.

 As for locking the enums table or collation table, that's simple stuff
 also. They're just AccessShareLocks.

Yeah, taking AccessShareLocks is not a problem, however the main
problem is that they block any other operation on those tables which require

AccessExclusiveLock lock or any other strong lock required, before it is

actually required to block any such operation.

 I can't imagine we'll be able to presume that all functions of every
 kind are parallel-safe.

Yeah, thats right, so we need to either block such functions to get executed
in parallel worker or make arrangements so that they can be safely executed,
I think for first version blocking might be better.

Now, I think we can find ways to avoid all such things by hacking code such
that conflicting operations can be banned/blocked, however eventually we
need to have some kind of group locking to avoid deadlocks which can arise
due to operations in parallel worker.  So it seems to me it is not a bad
idea
to have a strong infrastructure first for the things (like group locking)
which
we think are required for any non-trivial useful parallel operation without
putting too much restrictions on users for using the feature.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-28 Thread Simon Riggs
On 16 October 2014 16:22, Robert Haas robertmh...@gmail.com wrote:

 Might I gently enquire what the something usable we are going to see
 in this release? I'm not up on current plans.

 I don't know how far I'm going to get for this release yet.  I think
 pg_background is a pretty good milestone, and useful in its own right.
 I would like to get something that's truly parallel working sooner
 rather than later, but this group locking issue is one of 2 or 3
 significant hurdles that I need to climb over first.

pg_background is very cute, but really its not really a step forward,
or at least very far. It's sounding like you've already decided that
is as far as we're going to get this release, which I'm disappointed
about.

Given your description of pg_background it looks an awful lot like
infrastructure to make Autonomous Transactions work, but it doesn't
even do that. I guess it could do in a very small additional patch, so
maybe it is useful for something.

You asked for my help, but I'd like to see some concrete steps towards
an interim feature so I can see some benefit in a clear direction.

Can we please have the first step we discussed? Parallel CREATE INDEX?
(Note the please)

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-28 Thread Jim Nasby

On 10/28/14, 3:48 PM, Simon Riggs wrote:

Given your description of pg_background it looks an awful lot like
infrastructure to make Autonomous Transactions work, but it doesn't
even do that. I guess it could do in a very small additional patch, so
maybe it is useful for something.


What do you see as being missing for autonomous transactios?

BTW, what I think would make this feature VERY useful is if it provided the 
ability to fire something up in another backend and leave it running in the 
background. I think you can do that with FDW, but then you have the 
authentication PITA to deal with (and perhaps pg_background is a more efficient 
way to move data between backends than FDW, but that's just a guess...)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-28 Thread Robert Haas
On Tue, Oct 28, 2014 at 4:48 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 16 October 2014 16:22, Robert Haas robertmh...@gmail.com wrote:
 Might I gently enquire what the something usable we are going to see
 in this release? I'm not up on current plans.

 I don't know how far I'm going to get for this release yet.  I think
 pg_background is a pretty good milestone, and useful in its own right.
 I would like to get something that's truly parallel working sooner
 rather than later, but this group locking issue is one of 2 or 3
 significant hurdles that I need to climb over first.

 pg_background is very cute, but really its not really a step forward,
 or at least very far. It's sounding like you've already decided that
 is as far as we're going to get this release, which I'm disappointed
 about.

 Given your description of pg_background it looks an awful lot like
 infrastructure to make Autonomous Transactions work, but it doesn't
 even do that. I guess it could do in a very small additional patch, so
 maybe it is useful for something.

 You asked for my help, but I'd like to see some concrete steps towards
 an interim feature so I can see some benefit in a clear direction.

 Can we please have the first step we discussed? Parallel CREATE INDEX?
 (Note the please)

What I've been thinking about trying to work towards is parallel
sequential scan.  I think that it would actually be pretty easy to
code up a mostly-working version using the existing infrastructure,
but the patch would be rejected with a bazooka, because the
non-working parts would include things like:

1. The cooperating backends might not all be using the same snapshot,
because that requires sharing the snapshot, combo CID hash, and
transaction state.
2. The quals that got pushed down to the workers might not return the
same answers that they would have produced with a single backend,
because we have no mechanism for assessing pushdown-safety.
3. Deadlock detection would be to some greater or lesser degree
broken, the details depending on the implementation choices you made.

There is a bit of a chicken-and-egg problem here.  If I submit a patch
for parallel sequential scan, it'll (justifiably) get rejected because
it doesn't solve those problems.  So I'm trying to solve those
above-enumerated problems first, with working and at least
somewhat-useful examples that show how the incremental bits of
infrastructure can be used to do stuff.  But that leads to your
(understandable) complaint that this isn't very real yet.

Why am I now thinking about parallel sequential scan instead of
parallel CREATE INDEX?  You may remember that I posted a patch for a
new memory allocator some time ago, and it came in for a fair amount
of criticism and not much approbation.   Some of that criticism was
certainly justified, and perhaps I was as hard on myself as anyone
else was.  However you want to look at it, I see the trade-off between
parallel sort and parallel seq-scan this way: parallel seq-scan
requires dealing with the planner (ouch!) but parallel sort requires
dealing with memory allocation in dynamic shared memory segments
(ouch!).  Both of them require solving the three problems listed
above.

And maybe a few others, but I think those are the big ones - and I
think proper deadlock detection is the hardest of them.  A colleague
of mine has drafted patches for sharing snapshots and combo CIDs
between processes, and as you might expect that's pretty easy.
Sharing the transaction state (so we can test whether a transaction ID
is our transaction ID inside the worker) is a bit trickier, but I
think not too hard.  Assessing pushdown-safety will probably boil down
to adding some equivalent of proisparallel.  Maybe not the most
elegant, but defensible, and if you're looking for the shortest path
to something usable, that's probably it.  But deadlock detection ...
well, I don't see any simpler solution than what I'm trying to build
here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-28 Thread Robert Haas
On Tue, Oct 28, 2014 at 7:22 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 10/28/14, 3:48 PM, Simon Riggs wrote:
 Given your description of pg_background it looks an awful lot like
 infrastructure to make Autonomous Transactions work, but it doesn't
 even do that. I guess it could do in a very small additional patch, so
 maybe it is useful for something.

 What do you see as being missing for autonomous transactios?

Personally, I don't see this patch set as having much to do with real
autonomous transactions.

 BTW, what I think would make this feature VERY useful is if it provided the
 ability to fire something up in another backend and leave it running in the
 background.

You *can* do that.  I mean, long-running transactions will have their
usual problems, but if you want to kick off a long-running (or a
short-running query) in the background and forget about it, this patch
lets you do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-28 Thread Robert Haas
On Tue, Oct 28, 2014 at 7:26 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Oct 28, 2014 at 7:22 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 10/28/14, 3:48 PM, Simon Riggs wrote:
 Given your description of pg_background it looks an awful lot like
 infrastructure to make Autonomous Transactions work, but it doesn't
 even do that. I guess it could do in a very small additional patch, so
 maybe it is useful for something.

 What do you see as being missing for autonomous transactios?

 Personally, I don't see this patch set as having much to do with real
 autonomous transactions.

 BTW, what I think would make this feature VERY useful is if it provided the
 ability to fire something up in another backend and leave it running in the
 background.

 You *can* do that.  I mean, long-running transactions will have their
 usual problems, but if you want to kick off a long-running (or a
 short-running query) in the background and forget about it, this patch
 lets you do that.

Err, sorry.  pg_background lets you do that, not this patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-16 Thread Robert Haas
On Wed, Oct 15, 2014 at 2:55 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 15 October 2014 17:03, Robert Haas robertmh...@gmail.com wrote:
 Well, I'm fervently in agreement with you on one point: the first
 version of all this needs to be as simple as possible, or the time to
 get to the first version will be longer than we can afford to wait.  I
 think what we're discussing here is which things are important enough
 that it makes sense to have them in the first version, and which
 things can wait.

 I'm guessing we might differ slightly on what constitutes as simple as 
 possible.

Yes, I believe there have been occasions in the past when that has
happened, so definitely possible.  :-)

 Something usable, with severe restrictions, is actually better than we
 have now. I understand the journey this work represents, so don't be
 embarrassed by submitting things with heuristics and good-enoughs in
 it. Our mentor, Mr.Lane, achieved much by spreading work over many
 releases, leaving others to join in the task.

 Might I gently enquire what the something usable we are going to see
 in this release? I'm not up on current plans.

I don't know how far I'm going to get for this release yet.  I think
pg_background is a pretty good milestone, and useful in its own right.
I would like to get something that's truly parallel working sooner
rather than later, but this group locking issue is one of 2 or 3
significant hurdles that I need to climb over first.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-15 Thread Simon Riggs
On 15 October 2014 05:13, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 For parallelism, I think we need a concept of group locking.  That is,
 suppose we have a user backend and N worker backends collaborating to
 execute some query.  For the sake of argument, let's say it's a
 parallel CLUSTER, requiring a full table lock.  We need all of the
 backends to be able to lock the table even though at least one of them
 holds AccessExclusiveLock.  This suggests that the backends should all
 be members of a locking group, and that locks within the same locking
 group should be regarded as mutually non-conflicting.

 In the background worker case, I imagined that the foreground process
 would hold a lock and the background processes would just assume they
 could access the table without holding locks of their own.  Aren't
 you building a mountain where a molehill would do?

Yeh. Locks should be made in the name of the main transaction and
released by it.

When my family goes to a restaurant, any member of the party may ask
for a table and the request is granted for the whole family. But the
lock is released only when I pay the bill. Once we have the table, any
stragglers know we have locked the table and they just come sit at the
table without needing to make their own lock request to the Maitre D',
though they clearly cache the knowledge that we have the table locked.

So all lock requests held until EOX should be made in the name of the
top level process. Any child process wanting a lock should request it,
but on discovering it is already held at parent level should just
update the local lock table. Transient locks, like catalog locks can
be made and released locally; I think there is more detail there but
it shouldn't affect the generalisation.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-15 Thread Robert Haas
On Wed, Oct 15, 2014 at 4:18 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 15 October 2014 05:13, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 For parallelism, I think we need a concept of group locking.  That is,
 suppose we have a user backend and N worker backends collaborating to
 execute some query.  For the sake of argument, let's say it's a
 parallel CLUSTER, requiring a full table lock.  We need all of the
 backends to be able to lock the table even though at least one of them
 holds AccessExclusiveLock.  This suggests that the backends should all
 be members of a locking group, and that locks within the same locking
 group should be regarded as mutually non-conflicting.

 In the background worker case, I imagined that the foreground process
 would hold a lock and the background processes would just assume they
 could access the table without holding locks of their own.  Aren't
 you building a mountain where a molehill would do?

 Yeh. Locks should be made in the name of the main transaction and
 released by it.

 When my family goes to a restaurant, any member of the party may ask
 for a table and the request is granted for the whole family. But the
 lock is released only when I pay the bill. Once we have the table, any
 stragglers know we have locked the table and they just come sit at the
 table without needing to make their own lock request to the Maitre D',
 though they clearly cache the knowledge that we have the table locked.

 So all lock requests held until EOX should be made in the name of the
 top level process. Any child process wanting a lock should request it,
 but on discovering it is already held at parent level should just
 update the local lock table. Transient locks, like catalog locks can
 be made and released locally; I think there is more detail there but
 it shouldn't affect the generalisation.

Hmm, interesting idea.  Suppose, though, that the child process
requests a lock that can't immediately be granted, because the catalog
it's trying to access is locked in AccessExclusiveLock mode by an
unrelated transaction.  The unrelated transaction, in turn, is blocked
trying to acquire some resource, which the top level parallelism
process.  Assuming the top level parallelism process is waiting for
the child (or will eventually wait), this is a deadlock, but without
some modification to the deadlock detector, it can't see one of the
edges.

Figuring out what to do about that is really the heart of this
project, I think, and there are a variety of designs possible.  One of
the early ideas that I had was to the parallel workers directly
twaddle the main processes' PGPROC and lock table state.  In other
words, instead of taking locks using their own PGPROCs, everybody uses
a single PGPROC.  I made several attempts at getting designs along
these lines off the ground, but it got complicated and invasive: (1)
The processes need to coordinate to make sure that you don't have two
people twaddling the lock state at the same time; (2) The existing
data structures won't support more than one process waiting at a time,
but there's no reason why one parallel worker couldn't be trying to
lock one catalog while another one is trying to lock a different
catalog; (3) On a related note, when a lock wait ends, you can't just
wake up the process that owns the PGPROC, but rather the one that's
actually waiting; (4) the LWLockReleaseAll algorithm just falls apart
in this environment, as far as I can see.

The alternative design which I've been experimenting with is to have
each process use its own PGPROC and PROCLOCK structures, but to tag
each PROCLOCK with not only the owning PGPROC but also the group
leader's PGPROC.  This has not been entirely smooth sailing, but it
sees to break much less code than trying to have everybody use one
PGPROC.  Most of the changes that seem to be needed to make things
work are pretty well-isolated; rather than totally rearranging the
lock manager, you're just adding extra code that runs only in the
parallel case.

I'm definitely open to the idea that there's a better, simpler design
out there, but I haven't been able to think of one that doesn't break
deadlock detection.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-15 Thread Simon Riggs
On 15 October 2014 14:46, Robert Haas robertmh...@gmail.com wrote:

 When my family goes to a restaurant, any member of the party may ask
 for a table and the request is granted for the whole family. But the
 lock is released only when I pay the bill. Once we have the table, any
 stragglers know we have locked the table and they just come sit at the
 table without needing to make their own lock request to the Maitre D',
 though they clearly cache the knowledge that we have the table locked.

 Hmm, interesting idea.  Suppose, though, that the child process
 requests a lock that can't immediately be granted, because the catalog
 it's trying to access is locked in AccessExclusiveLock mode by an
 unrelated transaction.  The unrelated transaction, in turn, is blocked
 trying to acquire some resource, which the top level parallelism
 process.  Assuming the top level parallelism process is waiting for
 the child (or will eventually wait), this is a deadlock, but without
 some modification to the deadlock detector, it can't see one of the
 edges.

Family disputes are fairly easily resolved ;-)

The first and basic point is that in most cases the parent should
already hold the required locks. This can only happen for briefly held
locks and/or more complex stuff. In the first case, getting
parallelism to work without that complex stuff would be useful. I'd be
happy if the first version simply throws an error if a child can't
acquire a lock immediately. Don't overthink the first version. Knowing
you'll disagree, lets take a further step...

Second point, the relationship between parent and children is clear.
If we do a deadlock detection, we should be able to search for that as
a special case, since we will know that we are a child and that such a
situation might occur. So just add in an edge so the rest of the
deadlock code works fine.

If that doesn't work, use a heurisic. If parent is waiting when child
does deadlock test, assume its a deadlock and abort the child
speculatively just in case. You can work out how to do that better in
the future, since it won't happen that often.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-15 Thread Robert Haas
On Wed, Oct 15, 2014 at 10:12 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 15 October 2014 14:46, Robert Haas robertmh...@gmail.com wrote:
 When my family goes to a restaurant, any member of the party may ask
 for a table and the request is granted for the whole family. But the
 lock is released only when I pay the bill. Once we have the table, any
 stragglers know we have locked the table and they just come sit at the
 table without needing to make their own lock request to the Maitre D',
 though they clearly cache the knowledge that we have the table locked.

 Hmm, interesting idea.  Suppose, though, that the child process
 requests a lock that can't immediately be granted, because the catalog
 it's trying to access is locked in AccessExclusiveLock mode by an
 unrelated transaction.  The unrelated transaction, in turn, is blocked
 trying to acquire some resource, which the top level parallelism
 process.  Assuming the top level parallelism process is waiting for
 the child (or will eventually wait), this is a deadlock, but without
 some modification to the deadlock detector, it can't see one of the
 edges.

 Family disputes are fairly easily resolved ;-)

 The first and basic point is that in most cases the parent should
 already hold the required locks. This can only happen for briefly held
 locks and/or more complex stuff. In the first case, getting
 parallelism to work without that complex stuff would be useful. I'd be
 happy if the first version simply throws an error if a child can't
 acquire a lock immediately. Don't overthink the first version. Knowing
 you'll disagree, lets take a further step...

Well, I'm fervently in agreement with you on one point: the first
version of all this needs to be as simple as possible, or the time to
get to the first version will be longer than we can afford to wait.  I
think what we're discussing here is which things are important enough
that it makes sense to have them in the first version, and which
things can wait.  I also think we are in agreement that at least SOME
thought about lock management is needed; the question we're trying to
hash out is whether what I'm proposing to try to do here is
significantly more ambitious than what's really necessary for V1.
Which is a good question.

 Second point, the relationship between parent and children is clear.
 If we do a deadlock detection, we should be able to search for that as
 a special case, since we will know that we are a child and that such a
 situation might occur. So just add in an edge so the rest of the
 deadlock code works fine.

 If that doesn't work, use a heurisic. If parent is waiting when child
 does deadlock test, assume its a deadlock and abort the child
 speculatively just in case. You can work out how to do that better in
 the future, since it won't happen that often.

Well, the deadlock checker needs to work not only when one of the
parallel processes invokes it, but also when somebody else invokes it.
For example, suppose we have parallel processes A and B.  A holds lock
1 and awaits lock 2, while B holds awaits lock 3.  No problem!  Now
process X, which is holding lock 2 tries to grab lock 1.  X must be
able to detect the deadlock.  Now, that's not necessarily a problem
with what you said: it just means that the parent-child relationship
has to be clear from the contents of shared memory.

Which is pretty much the direction I'm aiming with the incomplete
patch I posted.  For the sake of simplicity, I want to assume that
every process in a locking group waits for every other process in the
same locking group, even though that might not be technically true in
every case.   If the parent is waiting for a lock and the guy who has
that lock is waiting for a parallel worker in the same group, my plan
(which I think matches your suggestion) is to call that a deadlock.
There are a few sticky points here: sometimes, the deadlock checker
doesn't actually abort transactions, but just rearranges the wait
queue, so something at least somewhat sensible needs to happen in that
case.

The thing that I'm aiming to do in the patch which I think you are
suggesting might not be necessary is to make it possible for the child
go ahead and request AccessShareLock on the scan relation even though
the parent might already hold some other lock (perhaps even
AccessExclusiveLock).   I want to make the lock manager smart enough
to recognize that those locks are mutually non-conflicting because of
the fact that the two processes are in close cooperation.  Clearly, we
could get by without that, but it adds complexity in other places: the
parent has to never release its lock (even if killed) until the child
is done with the relation; and the scan code itself needs to be
conditional, passing NoLock from children and some other mode in the
parent.  That's all manageable, but it looks to me like doing the
necessary surgery on the lock manager isn't actually going to be that
hard; most of the necessary logic is present in 

Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-15 Thread Simon Riggs
On 15 October 2014 17:03, Robert Haas robertmh...@gmail.com wrote:

 Well, I'm fervently in agreement with you on one point: the first
 version of all this needs to be as simple as possible, or the time to
 get to the first version will be longer than we can afford to wait.  I
 think what we're discussing here is which things are important enough
 that it makes sense to have them in the first version, and which
 things can wait.

I'm guessing we might differ slightly on what constitutes as simple as possible.

Something usable, with severe restrictions, is actually better than we
have now. I understand the journey this work represents, so don't be
embarrassed by submitting things with heuristics and good-enoughs in
it. Our mentor, Mr.Lane, achieved much by spreading work over many
releases, leaving others to join in the task.

Might I gently enquire what the something usable we are going to see
in this release? I'm not up on current plans.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 For parallelism, I think we need a concept of group locking.  That is,
 suppose we have a user backend and N worker backends collaborating to
 execute some query.  For the sake of argument, let's say it's a
 parallel CLUSTER, requiring a full table lock.  We need all of the
 backends to be able to lock the table even though at least one of them
 holds AccessExclusiveLock.  This suggests that the backends should all
 be members of a locking group, and that locks within the same locking
 group should be regarded as mutually non-conflicting.

In the background worker case, I imagined that the foreground process
would hold a lock and the background processes would just assume they
could access the table without holding locks of their own.  Aren't
you building a mountain where a molehill would do?

 [1] You could argue that one process instead should take locks on
 behalf of the group.  But I think this is a terrible approach.  First,
 it means that whichever process is holding the locks, presumably the
 user backend, had better not try to do any real computation of its
 own, so that it can timely service lock requests, which seems like a
 huge loss of efficiency;

What is timely service lock requests?  You got the lock before firing
off the background workers, you hold it till they're done.

This is a truly horrible excuse for the amount of complexity (and,
without doubt, bugs and performance penalties) you propose to load onto
the lock mechanism.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers