Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-23 Thread Robert Haas
On Mon, Nov 22, 2010 at 6:54 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 21.11.2010 15:18, Robert Haas wrote:

 On Sat, Nov 20, 2010 at 4:07 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

 Robert Haasrobertmh...@gmail.com  writes:

 So what DO we need to guard against here?

 I think the general problem can be stated as process A changes two or
 more values in shared memory in a fairly short span of time, and process
 B, which is concurrently examining the same variables, sees those
 changes occur in a different order than A thought it made them in.

 In practice we do not need to worry about changes made with a kernel
 call in between, as any sort of context swap will cause the kernel to
 force cache synchronization.

 Also, the intention is that the locking primitives will take care of
 this for any shared structures that are protected by a lock.  (There
 were some comments upthread suggesting maybe our lock code is not
 bulletproof; but if so that's something to fix in the lock code, not
 a logic error in code using the locks.)

 So what this boils down to is being an issue for shared data structures
 that we access without using locks.  As, for example, the latch
 structures.

 So is the problem case a race involving owning/disowning a latch vs.
 setting that same latch?

 No. (or maybe that as well, but that's not what we've been concerned about
 here). As far as I've understood correctly, the problem is that process A
 does something like this:

 /* set a shared variable */
 ((volatile bool *) shmem)-variable = true;
 /* Wake up process B to notice that we changed the variable */
 SetLatch();

 And process B does this:

 for (;;)
 {
  ResetLatch();
  if (((volatile bool *) shmem)-variable)
    DoStuff();

  WaitLatch();
 }

 This is the documented usage pattern of latches. The problem arises if
 process A runs just before ResetLatch, but the effect of setting the shared
 variable doesn't become visible until after the if-test in process B.
 Process B will clear the is_set flag in ResetLatch(), but it will not
 DoStuff(), so it in effect misses the wakeup from process A and goes back to
 sleep even though it would have work to do.

 This situation doesn't arise in the current use of latches, because the
 shared state comparable to shmem-variable in the above example is protected
 by a spinlock. But it might become an issue in some future use case.

Eh, so, should we do anything about this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-22 Thread Heikki Linnakangas

On 21.11.2010 15:18, Robert Haas wrote:

On Sat, Nov 20, 2010 at 4:07 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

Robert Haasrobertmh...@gmail.com  writes:

So what DO we need to guard against here?


I think the general problem can be stated as process A changes two or
more values in shared memory in a fairly short span of time, and process
B, which is concurrently examining the same variables, sees those
changes occur in a different order than A thought it made them in.

In practice we do not need to worry about changes made with a kernel
call in between, as any sort of context swap will cause the kernel to
force cache synchronization.

Also, the intention is that the locking primitives will take care of
this for any shared structures that are protected by a lock.  (There
were some comments upthread suggesting maybe our lock code is not
bulletproof; but if so that's something to fix in the lock code, not
a logic error in code using the locks.)

So what this boils down to is being an issue for shared data structures
that we access without using locks.  As, for example, the latch
structures.


So is the problem case a race involving owning/disowning a latch vs.
setting that same latch?


No. (or maybe that as well, but that's not what we've been concerned 
about here). As far as I've understood correctly, the problem is that 
process A does something like this:


/* set a shared variable */
((volatile bool *) shmem)-variable = true;
/* Wake up process B to notice that we changed the variable */
SetLatch();

And process B does this:

for (;;)
{
  ResetLatch();
  if (((volatile bool *) shmem)-variable)
DoStuff();

  WaitLatch();
}

This is the documented usage pattern of latches. The problem arises if 
process A runs just before ResetLatch, but the effect of setting the 
shared variable doesn't become visible until after the if-test in 
process B. Process B will clear the is_set flag in ResetLatch(), but it 
will not DoStuff(), so it in effect misses the wakeup from process A and 
goes back to sleep even though it would have work to do.


This situation doesn't arise in the current use of latches, because the 
shared state comparable to shmem-variable in the above example is 
protected by a spinlock. But it might become an issue in some future use 
case.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-21 Thread Robert Haas
On Sat, Nov 20, 2010 at 4:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 So what DO we need to guard against here?

 I think the general problem can be stated as process A changes two or
 more values in shared memory in a fairly short span of time, and process
 B, which is concurrently examining the same variables, sees those
 changes occur in a different order than A thought it made them in.

 In practice we do not need to worry about changes made with a kernel
 call in between, as any sort of context swap will cause the kernel to
 force cache synchronization.

 Also, the intention is that the locking primitives will take care of
 this for any shared structures that are protected by a lock.  (There
 were some comments upthread suggesting maybe our lock code is not
 bulletproof; but if so that's something to fix in the lock code, not
 a logic error in code using the locks.)

 So what this boils down to is being an issue for shared data structures
 that we access without using locks.  As, for example, the latch
 structures.

So is the problem case a race involving owning/disowning a latch vs.
setting that same latch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-20 Thread Robert Haas
On Fri, Nov 19, 2010 at 5:59 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 But what about timings vs. random other stuff?  Like in this case
 there's a problem if the signal arrives before the memory update to
 latch-is_set becomes visible.  I don't know what we need to do to
 guarantee that.

 I don't believe there's an issue there.  A context swap into the kernel
 is certainly going to include msync.  If you're afraid otherwise, you
 could put an msync before the kill() call, but I think it's a waste of
 effort.

So what DO we need to guard against here?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 So what DO we need to guard against here?

I think the general problem can be stated as process A changes two or
more values in shared memory in a fairly short span of time, and process
B, which is concurrently examining the same variables, sees those
changes occur in a different order than A thought it made them in.

In practice we do not need to worry about changes made with a kernel
call in between, as any sort of context swap will cause the kernel to
force cache synchronization.

Also, the intention is that the locking primitives will take care of
this for any shared structures that are protected by a lock.  (There
were some comments upthread suggesting maybe our lock code is not
bulletproof; but if so that's something to fix in the lock code, not
a logic error in code using the locks.)

So what this boils down to is being an issue for shared data structures
that we access without using locks.  As, for example, the latch
structures.

The other case that I can think of offhand is the signal multiplexing
flags.  I think we're all right there so far as the flags themselves are
concerned because only one atomic update is involved on each side:
there's no possibility of inconsistency due to cache visibility skew.
But we'd be at some risk if we were using any such flag as a cue to go
look at some other shared-memory state.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 05:38:14 Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  I'm all in favor of having some memory ordering primitives so that we
  can try to implement better algorithms, but if we use it here it
  amounts to a fairly significant escalation in the minimum requirements
  to compile PG (which is bad) rather than just a performance
  optimization (which is good).
 
 I don't believe there would be any escalation in compilation
 requirements: we already have the ability to invoke stronger primitives
 than these.  What is needed is research to find out what the primitives
 are called, on platforms where we aren't relying on direct asm access.
 
 My feeling is it's time to bite the bullet and do that work.  We
 shouldn't cripple the latch operations because of laziness at the
 outset.
I don't think developing the code is the actual code is that hard - s_lock.c 
contains nearly everything necessary.
An 'lock xchg' or similar is only marginally slower then  the barrier-only 
implementation. So doing a TAS() on a slock_t in private memory should be an 
easy enough fallback implementation.

So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses 
spinlocks for that purpose - no idea where that is true these days.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Robert Haas
On Thu, Nov 18, 2010 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I'm all in favor of having some memory ordering primitives so that we
 can try to implement better algorithms, but if we use it here it
 amounts to a fairly significant escalation in the minimum requirements
 to compile PG (which is bad) rather than just a performance
 optimization (which is good).

 I don't believe there would be any escalation in compilation
 requirements: we already have the ability to invoke stronger primitives
 than these.  What is needed is research to find out what the primitives
 are called, on platforms where we aren't relying on direct asm access.

I don't believe that's correct, although it's possible that I may be
missing something.  On any platform where we have TAS(), that should
be sufficient to set the flag, but how will we read the flag?  A
simple fetch isn't guaranteed to be sufficient; for some
architectures, you might need to insert a read fence, and I don't
think we have anything like that defined right now.  We've got
special-cases in s_lock.h for all kinds of crazy architectures; and
it's not obvious what would be needed.  For example some operating
system I've never heard of called SINIX has this:

#include abi_mutex.h
typedef abilock_t slock_t;

#define TAS(lock)   (!acquire_lock(lock))
#define S_UNLOCK(lock)  release_lock(lock)
#define S_INIT_LOCK(lock)   init_lock(lock)
#define S_LOCK_FREE(lock)   (stat_lock(lock) == UNLOCKED)

It's far from obvious to me how to make this do what we need - I have
a sneaking suspicion it can't be done with those primitives at all -
and I bet neither of us has a machine on which it can be tested.  Now
maybe we no longer care about supporting SINIX anyway, but the point
is that if we make this change, every platform for which we don't have
working TAS and read-fence operations becomes an unsupported platform.
 Forget about --disable-spinlocks; there is no such thing.  That
strikes me as an utterly unacceptable amount of collateral damage to
avoid a basically harmless API change, not to mention a ton of work.
You might be able to convince me that it's no longer important to
support platforms without a working spinlock implementation (although
I think it's rather nice that we can - might encourage someone to try
out PG and then contribute an implementation for their favorite
platform) but this is also going to break platforms that nominally
have TAS now (some of the TAS implementations aren't really TAS, as in
the above case, and we may not be able to easily determine what's
required for a read-fence even where TAS is a real TAS).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Robert Haas
On Fri, Nov 19, 2010 at 3:07 AM, Andres Freund and...@anarazel.de wrote:
 So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses
 spinlocks for that purpose - no idea where that is true these days.

Me neither, which is exactly the problem.  Under Tom's proposal, any
architecture we don't explicitly provide for, breaks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 15:16:24 Robert Haas wrote:
 On Fri, Nov 19, 2010 at 3:07 AM, Andres Freund and...@anarazel.de wrote:
  So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses
  spinlocks for that purpose - no idea where that is true these days.
 Me neither, which is exactly the problem.  Under Tom's proposal, any
 architecture we don't explicitly provide for, breaks.
I doubt its that much of a problem as !defined(HAS_TEST_AND_SET) will be so 
slow that there would be noise from that side more often...

Besides, we can just jump into the kernel and back in that case (which the TAS 
implementation already does), that does more than just a fence...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Robert Haas
On Fri, Nov 19, 2010 at 9:27 AM, Aidan Van Dyk ai...@highrise.ca wrote:
 On Fri, Nov 19, 2010 at 9:16 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Nov 19, 2010 at 3:07 AM, Andres Freund and...@anarazel.de wrote:
 So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses
 spinlocks for that purpose - no idea where that is true these days.

 Me neither, which is exactly the problem.  Under Tom's proposal, any
 architecture we don't explicitly provide for, breaks.

 Just a small point of clarification - you need to have both that
 unknown archtecture, and that architecture has to have postgres
 process running simultaneously on difference CPUs with different
 caches that are incoherent to have those problems.

Sure you do.  But so what?  Are you going to compile PostgreSQL and
implement TAS as a simple store and read-fence as a simple load?  How
likely is that to work out well?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Aidan Van Dyk
On Fri, Nov 19, 2010 at 9:16 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Nov 19, 2010 at 3:07 AM, Andres Freund and...@anarazel.de wrote:
 So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses
 spinlocks for that purpose - no idea where that is true these days.

 Me neither, which is exactly the problem.  Under Tom's proposal, any
 architecture we don't explicitly provide for, breaks.

Just a small point of clarification - you need to have both that
unknown archtecture, and that architecture has to have postgres
process running simultaneously on difference CPUs with different
caches that are incoherent to have those problems.

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 15:29:10 Andres Freund wrote:
 Besides, we can just jump into the kernel and back in that case (which the
 TAS  implementation already does), that does more than just a fence...
Or if you don't believe that is enough initialize a lock on the stack, lock 
and forget it...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Robert Haas
On Fri, Nov 19, 2010 at 9:29 AM, Andres Freund and...@anarazel.de wrote:
 On Friday 19 November 2010 15:16:24 Robert Haas wrote:
 On Fri, Nov 19, 2010 at 3:07 AM, Andres Freund and...@anarazel.de wrote:
  So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses
  spinlocks for that purpose - no idea where that is true these days.
 Me neither, which is exactly the problem.  Under Tom's proposal, any
 architecture we don't explicitly provide for, breaks.
 I doubt its that much of a problem as !defined(HAS_TEST_AND_SET) will be so
 slow that there would be noise from that side more often...

 Besides, we can just jump into the kernel and back in that case (which the TAS
 implementation already does), that does more than just a fence...

Eh, really?  If there's a workaround for platforms for which we don't
know what the appropriate read-fencing incantation is, then I'd feel
more comfortable about doing this.  But I don't see how to make that
work.  The whole problem here is that API is designed in such a way
that the signal handler might be invoked when the lock that it needs
to grab is already held by the same process.  The reason memory
barriers solve the problem is because they'll be atomically released
when we jump into the signal handler, but that is not true of a
spin-lock or a semaphore.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Robert Haas
On Fri, Nov 19, 2010 at 9:35 AM, Aidan Van Dyk ai...@highrise.ca wrote:
 On Fri, Nov 19, 2010 at 9:31 AM, Robert Haas robertmh...@gmail.com wrote:
 Just a small point of clarification - you need to have both that
 unknown archtecture, and that architecture has to have postgres
 process running simultaneously on difference CPUs with different
 caches that are incoherent to have those problems.

 Sure you do.  But so what?  Are you going to compile PostgreSQL and
 implement TAS as a simple store and read-fence as a simple load?  How
 likely is that to work out well?

 If I was trying to port PostgreSQL to some strange architecture, and
 my strange architecture didtt' have all the normal TAS and memory
 bariers stuff because it was only a UP system with no cache, then yes,
 and it would work out well ;-)

I get your point, but obviously this case isn't very interesting or
likely in 2010.

 If it was some strange SMP architecture, I wouldn't expect *anything*
 to work out well if the architecture doesn't have some sort of
 TAS/memory barrier/cache-coherency stuff in it ;-)

Well, you'd be pleasantly surprised to find that you could at least
get it to compile using --disable-spinlocks.  Yeah, the performance
would probably be lousy and you might run out of semaphores, but at
least for basic stuff it would run.  Ripping that out just to avoid an
API change in code we committed two months ago seems a bit extreme,
especially since it's also going to implementing a read-fence
operation on every platform we want to continue supporting.  Maybe you
could default the read-fence to just a simple read for platforms that
are not known to have an issue, but all the platforms where TAS is
calling some OS-provided routine that does mysterious magic under the
covers are going to need attention; and I just don't think that
cleaning up everything that's going to break is a very worthwhile
investment of our limited development resources, even if it doesn't
result in needlessly dropping platform support.

If we're going to work on memory primitives, I would much rather see
us put that effort into, say, implementing more efficient LWLock
algorithms to solve the bottlenecks that the MOSBENCH guys found,
rather than spending it on trying to avoid a minor API complication
for the latch facility.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 15:38:37 Robert Haas wrote:
 Eh, really?  If there's a workaround for platforms for which we don't
 know what the appropriate read-fencing incantation is, then I'd feel
 more comfortable about doing this.  But I don't see how to make that
 work.  The whole problem here is that API is designed in such a way
 that the signal handler might be invoked when the lock that it needs
 to grab is already held by the same process.  The reason memory
 barriers solve the problem is because they'll be atomically released
 when we jump into the signal handler, but that is not true of a
 spin-lock or a semaphore.
Well, its not generally true - you are right there. But there is a wide range 
for syscalls available where its inherently true (which is what I sloppily 
referred to). And you are allowed to call a, although quite restricted, set of 
system calls even in signal handlers. I don't have the list for older posix 
versions in mind, but for 2003 you can choose something from several like 
write, lseek,setpgid which inherently have to serialize. And I am quite sure 
there were sensible calls for earlier versions.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 15:14:58 Robert Haas wrote:
 On Thu, Nov 18, 2010 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  I'm all in favor of having some memory ordering primitives so that we
  can try to implement better algorithms, but if we use it here it
  amounts to a fairly significant escalation in the minimum requirements
  to compile PG (which is bad) rather than just a performance
  optimization (which is good).
  
  I don't believe there would be any escalation in compilation
  requirements: we already have the ability to invoke stronger primitives
  than these.  What is needed is research to find out what the primitives
  are called, on platforms where we aren't relying on direct asm access.
 
 I don't believe that's correct, although it's possible that I may be
 missing something.  On any platform where we have TAS(), that should
 be sufficient to set the flag, but how will we read the flag?  A
 simple fetch isn't guaranteed to be sufficient; for some
 architectures, you might need to insert a read fence, and I don't
 think we have anything like that defined right now. 
A TAS is both a read and write fence. After that you don't *need* to fetch it.
And even if it were only a write fence on some platforms  - if we consistently 
issue write fences at the relevant places that ought to be enough.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 15:49:45 Robert Haas wrote:
 If we're going to work on memory primitives, I would much rather see
 us put that effort into, say, implementing more efficient LWLock
 algorithms to solve the bottlenecks that the MOSBENCH guys found,
 rather than spending it on trying to avoid a minor API complication
 for the latch facility.
But for that you will need more infrastructure in that area anyway.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Robert Haas
On Fri, Nov 19, 2010 at 9:51 AM, Andres Freund and...@anarazel.de wrote:
 On Friday 19 November 2010 15:49:45 Robert Haas wrote:
 If we're going to work on memory primitives, I would much rather see
 us put that effort into, say, implementing more efficient LWLock
 algorithms to solve the bottlenecks that the MOSBENCH guys found,
 rather than spending it on trying to avoid a minor API complication
 for the latch facility.
 But for that you will need more infrastructure in that area anyway.

True, but you don't have to do it all at once.  You can continue to do
the same old stuff on the platforms you currently support, and use the
newer stuff on platforms where the right thing to do is readily
apparent, like x64 and x86_64.  And people can add support for their
favorite platforms gradually over time, rather than having a flag day
where we stop supporting everything we don't know what to do with.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Aidan Van Dyk
On Fri, Nov 19, 2010 at 9:49 AM, Andres Freund and...@anarazel.de wrote:
 Well, its not generally true - you are right there. But there is a wide range
 for syscalls available where its inherently true (which is what I sloppily
 referred to). And you are allowed to call a, although quite restricted, set of
 system calls even in signal handlers. I don't have the list for older posix
 versions in mind, but for 2003 you can choose something from several like
 write, lseek,setpgid which inherently have to serialize. And I am quite sure
 there were sensible calls for earlier versions.

Well, it's not quite enough just to call into the kernel to serialize
on some point of memory, because your point is to make sure that
*this particular piece of memory* is coherent.  It doesn't matter if
the kernel has proper fencing in it's stuff if the memory it's
guarding is in another cacheline, because that won't *necessarily*
force cache coherency in your local lock/variable memory.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 If we're going to work on memory primitives, I would much rather see
 us put that effort into, say, implementing more efficient LWLock
 algorithms to solve the bottlenecks that the MOSBENCH guys found,
 rather than spending it on trying to avoid a minor API complication
 for the latch facility.

I haven't read all of this very long thread yet, but I will point out
that you seem to be arguing from the position that memory ordering
primitives will only be useful for the latch code.  This is nonsense
of the first order.  We already know that the sinval signalling
mechanism could use it to avoid needing a spinlock.  I submit that
it's very likely that fixing communication bottlenecks elsewhere
will similarly require memory ordering primitives if we are to avoid
the stupid use a lock approach.  I think it's time to build that
infrastructure.

BTW, I agree with Andres' point that we can probably default memory
barriers to be no-ops on unknown platforms.  Weak memory ordering
isn't a common architectural choice.  A look through s_lock.h suggests
that PPC and MIPS are the only supported arches that need to worry
about this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Robert Haas
On Fri, Nov 19, 2010 at 10:01 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 If we're going to work on memory primitives, I would much rather see
 us put that effort into, say, implementing more efficient LWLock
 algorithms to solve the bottlenecks that the MOSBENCH guys found,
 rather than spending it on trying to avoid a minor API complication
 for the latch facility.

 I haven't read all of this very long thread yet, but I will point out
 that you seem to be arguing from the position that memory ordering
 primitives will only be useful for the latch code.  This is nonsense
 of the first order.  We already know that the sinval signalling
 mechanism could use it to avoid needing a spinlock.  I submit that
 it's very likely that fixing communication bottlenecks elsewhere
 will similarly require memory ordering primitives if we are to avoid
 the stupid use a lock approach.  I think it's time to build that
 infrastructure.

I completely agree, but I'm not too sure I want to drop support for
any platform for which we haven't yet implemented such primitives.
What's different about this case is that fall back to taking the spin
lock is not a workable option.

 BTW, I agree with Andres' point that we can probably default memory
 barriers to be no-ops on unknown platforms.  Weak memory ordering
 isn't a common architectural choice.  A look through s_lock.h suggests
 that PPC and MIPS are the only supported arches that need to worry
 about this.

That's good to hear.  I'm more worried, however, about architectures
where we supposedly have TAS but it isn't really TAS but some
OS-provided acquire a lock primitive.  That won't generalize nicely
to what we need for this case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Aidan Van Dyk
On Fri, Nov 19, 2010 at 9:31 AM, Robert Haas robertmh...@gmail.com wrote:

 Just a small point of clarification - you need to have both that
 unknown archtecture, and that architecture has to have postgres
 process running simultaneously on difference CPUs with different
 caches that are incoherent to have those problems.

 Sure you do.  But so what?  Are you going to compile PostgreSQL and
 implement TAS as a simple store and read-fence as a simple load?  How
 likely is that to work out well?

If I was trying to port PostgreSQL to some strange architecture, and
my strange architecture didtt' have all the normal TAS and memory
bariers stuff because it was only a UP system with no cache, then yes,
and it would work out well ;-)

If it was some strange SMP architecture, I wouldn't expect *anything*
to work out well if the architecture doesn't have some sort of
TAS/memory barrier/cache-coherency stuff in it ;-)

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 15:58:39 Aidan Van Dyk wrote:
 On Fri, Nov 19, 2010 at 9:49 AM, Andres Freund and...@anarazel.de wrote:
  Well, its not generally true - you are right there. But there is a wide
  range for syscalls available where its inherently true (which is what I
  sloppily referred to). And you are allowed to call a, although quite
  restricted, set of system calls even in signal handlers. I don't have
  the list for older posix versions in mind, but for 2003 you can choose
  something from several like write, lseek,setpgid which inherently have
  to serialize. And I am quite sure there were sensible calls for earlier
  versions.
 
 Well, it's not quite enough just to call into the kernel to serialize
 on some point of memory, because your point is to make sure that
 *this particular piece of memory* is coherent.  It doesn't matter if
 the kernel has proper fencing in it's stuff if the memory it's
 guarding is in another cacheline, because that won't *necessarily*
 force cache coherency in your local lock/variable memory.
Yes and no. It provides the same guarantees as our current approach of using 
spinlocks for exactly that - that it theoretically is not enough is an 
independent issue (but *definitely* an issue).

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Markus Wanner
On 11/19/2010 03:58 PM, Aidan Van Dyk wrote:
 Well, it's not quite enough just to call into the kernel to serialize
 on some point of memory, because your point is to make sure that
 *this particular piece of memory* is coherent.

Well, that certainly doesn't apply to full fences, that are not specific
to a particular piece of memory. I'm thinking of 'mfence' on x86_64 or
'mf' on ia64.

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 ... The reason memory
 barriers solve the problem is because they'll be atomically released
 when we jump into the signal handler, but that is not true of a
 spin-lock or a semaphore.

Hm, I wonder whether your concern is stemming from a wrong mental
model.  There is nothing to release.  In my view, a memory barrier
primitive is a sequence point, having the properties that all writes
issued before the barrier shall become visible to other processors
before any writes after it, and also that no reads issued after the
barrier shall be executed until those writes have become visible.
(PPC can separate those two aspects, but I think we probably don't
need to get that detailed for our purposes.)  On most processors,
the barrier primitive will just be ((void) 0) because they don't
deal in out-of-order writes anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I completely agree, but I'm not too sure I want to drop support for
 any platform for which we haven't yet implemented such primitives.
 What's different about this case is that fall back to taking the spin
 lock is not a workable option.

The point I was trying to make is that the fallback position can
reasonably be a no-op.

 That's good to hear.  I'm more worried, however, about architectures
 where we supposedly have TAS but it isn't really TAS but some
 OS-provided acquire a lock primitive.  That won't generalize nicely
 to what we need for this case.

I did say we need some research ;-).  We need to look into what's the
appropriate primitive for any such OSes that are available for PPC or
MIPS.  I don't feel a need to be paranoid about it for other
architectures.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Markus Wanner mar...@bluegap.ch writes:
 Well, that certainly doesn't apply to full fences, that are not specific
 to a particular piece of memory. I'm thinking of 'mfence' on x86_64 or
 'mf' on ia64.

Hm, what do those do exactly?  We've never had any such thing in the
Intel-ish spinlock asm, but if out-of-order writes are possible I should
think we'd need 'em.  Or does lock xchgb imply an mfence?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Markus Wanner
On 11/19/2010 04:51 PM, Tom Lane wrote:
 Hm, what do those do exactly?

Performs a serializing operation on all load-from-memory and
store-to-memory instructions that were issued prior the MFENCE
instruction. [1]

Given the memory ordering guarantees of x86, this instruction might only
be relevant for SMP systems, though.

 Or does lock xchgb imply an mfence?

Probably on older architectures (given the name bus locked exchange),
but OTOH I wouldn't bet on that still being true. Locking the entire bus
sounds like a prohibitively expensive operation with today's amounts of
cores per system.

Regards

Markus Wanner


[1]: random google hit on 'mfence':
http://siyobik.info/index.php?module=x86id=170

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 16:51:00 Tom Lane wrote:
 Markus Wanner mar...@bluegap.ch writes:
  Well, that certainly doesn't apply to full fences, that are not specific
  to a particular piece of memory. I'm thinking of 'mfence' on x86_64 or
  'mf' on ia64.
 Hm, what do those do exactly?  We've never had any such thing in the
 Intel-ish spinlock asm, but if out-of-order writes are possible I should
 think we'd need 'em.  Or does lock xchgb imply an mfence?
Out of order writes are definitely possible if you consider multiple 
processors.
Locked statments like 'lock xaddl;' guarantee that the specific operands (or 
their cachelines) are visible on all processors and are done atomically - but 
its not influencing the whole cache like mfence would.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 Locked statments like 'lock xaddl;' guarantee that the specific operands (or 
 their cachelines) are visible on all processors and are done atomically - but 
 its not influencing the whole cache like mfence would.

Where is this locking the whole cache meme coming from?  What we're
looking for has nothing to do with locking anything.  It's primarily
a directive to the processor to flush any dirty cache lines out to
main memory.  It's not going to block any other processors.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 17:25:57 Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  Locked statments like 'lock xaddl;' guarantee that the specific operands
  (or  their cachelines) are visible on all processors and are done
  atomically - but its not influencing the whole cache like mfence would.
 Where is this locking the whole cache meme coming from?  What we're
 looking for has nothing to do with locking anything.  It's primarily
 a directive to the processor to flush any dirty cache lines out to
 main memory.  It's not going to block any other processors.
I was never talking about 'locking the whole cache' - I was talking about 
flushing/fencing it like a global read/write barrier would. And lock 
xchgb/xaddl does not imply anything for other cachelines but its own.

I only used 'locked' in the context of 'lock xaddl'. 

Am I misunderstanding you?

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 I was never talking about 'locking the whole cache' - I was talking about 
 flushing/fencing it like a global read/write barrier would. And lock 
 xchgb/xaddl does not imply anything for other cachelines but its own.

If that's the case, why aren't the parallel regression tests falling
over constantly?  My recollection is that when I broke the sinval code
by assuming strong memory ordering without spinlocks, it didn't take
long at all for the PPC buildfarm members to expose the problem.
If it's possible for Intel-ish processors to exhibit weak memory
ordering behavior, I'm quite sure that our current code would be showing
bugs everywhere.

The impression I had of current Intel designs is that they ensure global
cache coherency, ie if one processor has a dirty cache line the others
know that, and will go get the updated data before attempting to access
that piece of memory.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Robert Haas
On Fri, Nov 19, 2010 at 10:44 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I completely agree, but I'm not too sure I want to drop support for
 any platform for which we haven't yet implemented such primitives.
 What's different about this case is that fall back to taking the spin
 lock is not a workable option.

 The point I was trying to make is that the fallback position can
 reasonably be a no-op.

Hmm, maybe you're right.  I was assuming weak memory ordering was a
reasonably common phenomenon, but if it only applies to a very small
number of architectures and we're pretty confident we know which ones
they are, your approach would be far less frightening than I
originally thought.  But is that really true?

I think it would be useful to try to build up a library of primitives
in this area.  For this particular task, we really only need a
write-with-fence primitive and a read-with-fence primitive.  On strong
memory ordering machines, these can just do a store and a read,
respectively; on weak memory ordering machines, they can insert
whatever fencing operations are needed on either the store side or the
load side.  I think it would also be useful to provide macros for
compare-and-swap and fetch-and-add on platforms where they are
available.  Then we could potentially write code like this:

#ifdef HAVE_COMPARE_AND_SWAP
...do it the lock-free way...
#else
...oh, well, do it with spinlocks...
#endif

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I think it would be useful to try to build up a library of primitives
 in this area.  For this particular task, we really only need a
 write-with-fence primitive and a read-with-fence primitive.

That's really entirely the wrong way to think about it.  You need a
fence primitive, full stop.  It's a sequence point, not an operation
in itself.  It guarantees that reads/writes occurring before or after
it aren't resequenced around it.  I don't even understand what write
with fence means --- is the write supposed to be fenced against other
writes before it, or other writes after it?

 I think it would also be useful to provide macros for
 compare-and-swap and fetch-and-add on platforms where they are
 available.

That would be a great deal more work, because it's not a no-op anywhere;
and our need for it is still rather hypothetical.  I'm surprised to see
you advocating that when you didn't want to touch fencing a moment ago.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Florian Weimer
* Andres Freund:

 I was never talking about 'locking the whole cache' - I was talking about 
 flushing/fencing it like a global read/write barrier would. And lock 
 xchgb/xaddl does not imply anything for other cachelines but its own.

My understanding is that once you've seen the result of an atomic
operation on i386 and amd64, you are guaranteed to observe all prior
writes performed by the thread which did the atomic operation, too.
Explicit fencing is only necessary if you need synchronization without
atomic operations.

-- 
Florian Weimerfwei...@bfk.de
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
I wrote:
 Markus Wanner mar...@bluegap.ch writes:
 Well, that certainly doesn't apply to full fences, that are not specific
 to a particular piece of memory. I'm thinking of 'mfence' on x86_64 or
 'mf' on ia64.

 Hm, what do those do exactly?

I poked around in the Intel manuals a bit.  They do have mfence (also
lfence and sfence) but so far as I can tell, those are only used to
manage loads and stores that are issued by special instructions that
explicitly mark the operation as weakly ordered.  So the reason we're
not seeing bugs is presumably that C compilers don't generate such
instructions.  Also, Intel architectures do guarantee cache consistency
across multiple processors (and it costs them a lot...)

I found a fairly interesting and detailed paper about memory fencing
in the Linux kernel:
http://www.rdrop.com/users/paulmck/scalability/paper/ordering.2007.09.19a.pdf

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I think it would be useful to try to build up a library of
 primitives in this area.  For this particular task, we really
 only need a write-with-fence primitive and a read-with-fence
 primitive.
 
 That's really entirely the wrong way to think about it.  You need
 a fence primitive, full stop.  It's a sequence point, not an
 operation in itself.  It guarantees that reads/writes occurring
 before or after it aren't resequenced around it.  I don't even
 understand what write with fence means --- is the write supposed
 to be fenced against other writes before it, or other writes after
 it?
 
I was taking it to mean something similar to the memory guarantees
around synchronized blocks in Java.  At the start of a synchronized
block you discard any cached data which you've previously read from
or written to main memory, and must read everything fresh from that
point.  At the end of a synchronized block you must write any
locally written values to main memory, although you retain them in
your thread-local cache for possible re-use.  Reads or writes from
outside the synchronized block can be pulled into the block and
reordered in among the reads and writes within the block (which may
also be reordered) unless there's another block to contain them.
 
It works fine once you have your head around it, and allows for
significant optimization in a heavily multi-threaded application.  I
have no idea whether such a model would be useful for PostgreSQL. If
I understand Tom he is proposing what sounds roughly like what could
be achieved in the Java memory model by keeping all code for a
process within a single synchronized block, with the fence being a
point where you end it (flushing all local writes to main memory)
and start a new one (forcing a discard of locally cached data).
 
Of course I'm ignoring the locking aspect of synchronized blocks and
just discussing the memory access aspect of them.  (A synchronized
block in Java always references some [any] Object, and causes an
exclusive lock to be held on the object from one end of the block to
the other.)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 That's really entirely the wrong way to think about it.  You need
 a fence primitive, full stop.  It's a sequence point, not an
 operation in itself.

 I was taking it to mean something similar to the memory guarantees
 around synchronized blocks in Java.  At the start of a synchronized
 block you discard any cached data which you've previously read from
 or written to main memory, and must read everything fresh from that
 point.  At the end of a synchronized block you must write any
 locally written values to main memory, although you retain them in
 your thread-local cache for possible re-use.

That is basically the model that we have implemented in the spinlock
primitives: taking a spinlock corresponds to starting a synchronized
block and releasing the spinlock ends it.  On processors that need
it, the spinlock macros include memory fence instructions that implement
the above semantics.

However, for lock-free interactions I think this model isn't terribly
helpful: it's not clear what is inside and what is outside the sync
block, and forcing your code into that model doesn't improve either
clarity or performance.  What you typically need is a guarantee about
the order in which writes become visible.  To give a concrete example,
the sinval bug I was mentioning earlier boiled down to assuming that a
write into an element of the sinval message array would become visible
to other processors before the change of the last-message pointer
variable became visible to them.  Without a fence instruction, that
doesn't hold on WMO processors, and so they were able to fetch a stale
message value.  In some cases you also need to guarantee the order of
reads.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 What you typically need is a guarantee about the order in which
 writes become visible.
 
 In some cases you also need to guarantee the order of reads.
 
Doesn't that suggest different primitives?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 18:46:00 Tom Lane wrote:
 I wrote:
  Markus Wanner mar...@bluegap.ch writes:
  Well, that certainly doesn't apply to full fences, that are not specific
  to a particular piece of memory. I'm thinking of 'mfence' on x86_64 or
  'mf' on ia64.
  
  Hm, what do those do exactly?
 
 I poked around in the Intel manuals a bit.  They do have mfence (also
 lfence and sfence) but so far as I can tell, those are only used to
 manage loads and stores that are issued by special instructions that
 explicitly mark the operation as weakly ordered.  So the reason we're
 not seeing bugs is presumably that C compilers don't generate such
 instructions.
Well. Some memcpy() implementations use string (or SIMD) operations which are 
weakly ordered though.

 Also, Intel architectures do guarantee cache consistency
 across multiple processors (and it costs them a lot...)
Only if you are talking about the *same* locations though. See example 8.2.3.4

Combined with:

For the Intel486 and Pentium processors, the LOCK# signal is always asserted 
on the bus during a LOCK operation, even if the area of memory being locked is 
cached in the processor.  For the P6 and more recent processor families, if 
the area of memory being locked during a LOCK operation is cached in the 
processor that is performing the LOCK operation as write-back memory and is 
completely contained in a cache line, the processor may not assert the LOCK# 
signal on the bus. Instead, it will modify the memory location internally and 
allow it’s cache coherency mechanism to ensure that the operation is carried 
out atomically. This operation is called “cache locking.” The cache coherency 
mechanism automatically prevents two or more processors that have cached the 
same area of memory from simultaneously modifying data in that area.


Which means something like (in intel's terminology) can happen:

initially x = 0

P1: mov [_X], 1
P1: lock xchg Y, 1

P2. lock xchg [_Z], 1
P2: mov r1, [_X]

A valid result is that r1 on P2 is 0.

I think that is not biting pg because it always uses the same spinlocks at the 
reading and writing side - but I am not that sure about that.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 20:03:27 Andres Freund wrote:
 Which means something like (in intel's terminology) can happen:
 
 initially x = 0
 
 P1: mov [_X], 1
 P1: lock xchg Y, 1
 
 P2. lock xchg [_Z], 1
 P2: mov r1, [_X]
 
 A valid result is that r1 on P2 is 0.
 
 I think that is not biting pg because it always uses the same spinlocks at
 the  reading and writing side - but I am not that sure about that.
Which also seems to mean that a simple read memory barrier that does __asm__ 
__volatile__(lock; xaddl $0, ???) seems not to be enough unless you use the 
same address for all those barriers which would cause horrible cacheline 
bouncing.

Am I missing something?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Robert Haas
On Fri, Nov 19, 2010 at 1:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, for lock-free interactions I think this model isn't terribly
 helpful: it's not clear what is inside and what is outside the sync
 block, and forcing your code into that model doesn't improve either
 clarity or performance.  What you typically need is a guarantee about
 the order in which writes become visible.  To give a concrete example,
 the sinval bug I was mentioning earlier boiled down to assuming that a
 write into an element of the sinval message array would become visible
 to other processors before the change of the last-message pointer
 variable became visible to them.  Without a fence instruction, that
 doesn't hold on WMO processors, and so they were able to fetch a stale
 message value.  In some cases you also need to guarantee the order of
 reads.

But what about timings vs. random other stuff?  Like in this case
there's a problem if the signal arrives before the memory update to
latch-is_set becomes visible.  I don't know what we need to do to
guarantee that.

This page seems to indicate that x86 is OK as far as this is concerned
- we can simply store a 1 and everyone will see it:

http://coding.derkeiler.com/Archive/Assembler/comp.lang.asm.x86/2004-08/0979.html

...but if we were to, say, increment a counter at that location, it
would not be safe without a LOCK prefix (further messages in the
thread indicate that you might also have a problem if the address in
question is unaligned).

It's not obvious to me, however, what might be required on other processors.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 But what about timings vs. random other stuff?  Like in this case
 there's a problem if the signal arrives before the memory update to
 latch-is_set becomes visible.  I don't know what we need to do to
 guarantee that.

I don't believe there's an issue there.  A context swap into the kernel
is certainly going to include msync.  If you're afraid otherwise, you
could put an msync before the kill() call, but I think it's a waste of
effort.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On Friday 19 November 2010 18:46:00 Tom Lane wrote:
 I poked around in the Intel manuals a bit.  They do have mfence (also
 lfence and sfence) but so far as I can tell, those are only used to
 manage loads and stores that are issued by special instructions that
 explicitly mark the operation as weakly ordered.  So the reason we're
 not seeing bugs is presumably that C compilers don't generate such
 instructions.

 Well. Some memcpy() implementations use string (or SIMD) operations which are 
 weakly ordered though.

I'd expect memcpy to msync at completion of the move if it does that
kind of thing.  Otherwise it's failing to ensure that the move is really
done before it returns.

 
 For the Intel486 and Pentium processors, the LOCK# signal is always asserted 
 on the bus during a LOCK operation, even if the area of memory being locked 
 is 
 cached in the processor.  For the P6 and more recent processor families, if 
 the area of memory being locked during a LOCK operation is cached in the 
 processor that is performing the LOCK operation as write-back memory and is 
 completely contained in a cache line, the processor may not assert the LOCK# 
 signal on the bus. Instead, it will modify the memory location internally and 
 allow it’s cache coherency mechanism to ensure that the operation is 
 carried 
 out atomically. This operation is called “cache locking.” The cache 
 coherency 
 mechanism automatically prevents two or more processors that have cached the 
 same area of memory from simultaneously modifying data in that area.
 

Like it says, the cache coherency mechanism prevents this from being a
problem for us.  Once the change is made in a processor's cache, it's
the cache's job to ensure that all processors see it --- and on Intel
architectures, the cache does take care of that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Saturday 20 November 2010 00:08:07 Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  On Friday 19 November 2010 18:46:00 Tom Lane wrote:
  I poked around in the Intel manuals a bit.  They do have mfence (also
  lfence and sfence) but so far as I can tell, those are only used to
  manage loads and stores that are issued by special instructions that
  explicitly mark the operation as weakly ordered.  So the reason we're
  not seeing bugs is presumably that C compilers don't generate such
  instructions.
  
  Well. Some memcpy() implementations use string (or SIMD) operations which
  are weakly ordered though.

 Like it says, the cache coherency mechanism prevents this from being a
 problem for us.  Once the change is made in a processor's cache, it's
 the cache's job to ensure that all processors see it --- and on Intel
 architectures, the cache does take care of that.
Check example 8.2.3.4 of 3a. - in my opinion that makes my example correct.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-18 Thread Robert Haas
On Mon, Nov 15, 2010 at 11:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 In SetLatch, is it enough to add the SpinLockAcquire() call *after*
 checking that is_set is not already set? Ie. still do the quick exit
 without holding a lock. Or do we need a memory barrier operation before
 the fetch, to ensure that we see if the other process has just cleared
 the flag with ResetLatch() ? Presumable ResetLatch() needs to call
 SpinLockAcquire() anyway to ensure that other processes see the clearing
 of the flag.

 Hmm ... I just remembered the reason why we didn't use a spinlock in
 these functions already.  Namely, that it's unsafe for a signal handler
 to try to acquire a spinlock that the interrupted code might be holding.
 So I think a bit more thought is needed here.  Maybe we need to bite the
 bullet and do memory barriers ...

The signal handler just checks a process-local, volatile variable
called waiting (which should be fine) and then sends a self-pipe
byte.  It deliberately *doesn't* take a spinlock.  So unless I'm
missing something (which is perfectly possible) protecting a few more
things with a spinlock should be safe enough.

Of course, there's still a potential *performance* problem if we end
up doing a kernel call while holding a spin lock, but I'm uncertain
whether we should worry about that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-18 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Nov 15, 2010 at 11:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Hmm ... I just remembered the reason why we didn't use a spinlock in
 these functions already.  Namely, that it's unsafe for a signal handler
 to try to acquire a spinlock that the interrupted code might be holding.

 The signal handler just checks a process-local, volatile variable
 called waiting (which should be fine) and then sends a self-pipe
 byte.  It deliberately *doesn't* take a spinlock.

I'm not talking about latch_sigusr1_handler.  I'm talking about the
several already-existing signal handlers that call SetLatch.  Now maybe
it's possible to prove that none of those can fire in a process that's
mucking with the same latch in-line, but that sounds fragile as heck;
and anyway what of future usage?  Given that precedent, somebody is
going to write something unsafe at some point, and it'll fail only often
enough to be seen in the field but not in our testing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-18 Thread Robert Haas
On Thu, Nov 18, 2010 at 5:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Nov 15, 2010 at 11:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Hmm ... I just remembered the reason why we didn't use a spinlock in
 these functions already.  Namely, that it's unsafe for a signal handler
 to try to acquire a spinlock that the interrupted code might be holding.

 The signal handler just checks a process-local, volatile variable
 called waiting (which should be fine) and then sends a self-pipe
 byte.  It deliberately *doesn't* take a spinlock.

 I'm not talking about latch_sigusr1_handler.  I'm talking about the
 several already-existing signal handlers that call SetLatch.  Now maybe
 it's possible to prove that none of those can fire in a process that's
 mucking with the same latch in-line, but that sounds fragile as heck;
 and anyway what of future usage?  Given that precedent, somebody is
 going to write something unsafe at some point, and it'll fail only often
 enough to be seen in the field but not in our testing.

Oh, I get it.  You're right.  We can't possibly assume that that we're
not trying to set the latch for our own process, because that's the
whole point of having the self-pipe code in the first place.

How about changing the API so that the caller must use one function,
say, SetOwnedLatch(), to set a latch that they own, and another
function, say, SetNonOwnedLatch(), to set a latch that they do not
own?  The first can simply set is_set (another process might fail to
see that the latch has been set, but the worst thing they can do is
set it over again, which should be fairly harmless) and send a
self-pipe byte.  The second can take the spin lock, set is_set, read
the owner PID, release the spin lock, and send a signal.  WaitLatch()
can similarly take the spin lock before reading is_set.  This requires
the caller to know whether they are setting their own latch or someone
else's latch, but I don't believe that's a problem given current
usage.

I'm all in favor of having some memory ordering primitives so that we
can try to implement better algorithms, but if we use it here it
amounts to a fairly significant escalation in the minimum requirements
to compile PG (which is bad) rather than just a performance
optimization (which is good).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-18 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I'm all in favor of having some memory ordering primitives so that we
 can try to implement better algorithms, but if we use it here it
 amounts to a fairly significant escalation in the minimum requirements
 to compile PG (which is bad) rather than just a performance
 optimization (which is good).

I don't believe there would be any escalation in compilation
requirements: we already have the ability to invoke stronger primitives
than these.  What is needed is research to find out what the primitives
are called, on platforms where we aren't relying on direct asm access.

My feeling is it's time to bite the bullet and do that work.  We
shouldn't cripple the latch operations because of laziness at the
outset.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-15 Thread Robert Haas
On Mon, Nov 15, 2010 at 2:15 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Can you elaborate?

 Weak memory ordering means that stores into shared memory initiated by
 one processor are not guaranteed to be observed to occur in the same
 sequence by another processor.  This implies first that the latch code
 could malfunction all by itself, if two processes manipulate a latch at
 about the same time, and second (probably much less likely) that there
 could be a malfunction involving a process that's waited on a latch not
 seeing the shared-memory status updates that another process did before
 setting the latch.

 This is not at all hypothetical --- my first attempt at rewriting the
 sinval signaling code, a couple years back, failed on PPC machines in
 the buildfarm because of exactly this type of issue.

 Hmm, SetLatch only sets one flag, so I don't see how it could malfunction
 all by itself. And I would've thought that declaring the Latch variable
 volatile prevents rearrangements.

It's not a question of code rearrangement.  Suppose at time zero, the
latch is unset, but owned.  At approximately the same time, SetLatch()
is called in one process and WaitLatch() in another process.
SetLatch() sees that the latch is not set and sends SIGUSR1 to the
other process.  The other process receives the signal but, since
waiting is not yet set, it ignores the signal.  It then drains the
self-pipe and examines latch-is_set.  But as it turns out, the update
by the process which called SetLatch() isn't yet visible to this
process, because this process has a copy of those bytes in some
internal cache that isn't guaranteed to be fully coherent.  So even
though SetLatch() already changed latch-is_set to true, it still
looks false here.  Therefore, we go to sleep on the latch.

At this point, we are very likely screwed.  If we're lucky, yet a
third process will come along, also see the latch as still unset (even
though it is), and set it again, waking up the owner.  But if we're
unlucky, by the time that third process comes along, the memory update
will have become visible everywhere and all future calls to SetLatch()
will exit quickly, leaving the poor shmuck who waited on the latch
sleeping for all eternity.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-15 Thread Heikki Linnakangas

On 15.11.2010 15:22, Robert Haas wrote:

On Mon, Nov 15, 2010 at 2:15 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

Can you elaborate?


Weak memory ordering means that stores into shared memory initiated by
one processor are not guaranteed to be observed to occur in the same
sequence by another processor.  This implies first that the latch code
could malfunction all by itself, if two processes manipulate a latch at
about the same time, and second (probably much less likely) that there
could be a malfunction involving a process that's waited on a latch not
seeing the shared-memory status updates that another process did before
setting the latch.

This is not at all hypothetical --- my first attempt at rewriting the
sinval signaling code, a couple years back, failed on PPC machines in
the buildfarm because of exactly this type of issue.


Hmm, SetLatch only sets one flag, so I don't see how it could malfunction
all by itself. And I would've thought that declaring the Latch variable
volatile prevents rearrangements.


It's not a question of code rearrangement.


Rearrangement of code, rearrangement of CPU instructions, or 
rearrangement of the order the changes in the memory become visible to 
other processes. The end result is the same.



 Suppose at time zero, the
latch is unset, but owned.  At approximately the same time, SetLatch()
is called in one process and WaitLatch() in another process.
SetLatch() sees that the latch is not set and sends SIGUSR1 to the
other process.  The other process receives the signal but, since
waiting is not yet set, it ignores the signal.  It then drains the
self-pipe and examines latch-is_set.  But as it turns out, the update
by the process which called SetLatch() isn't yet visible to this
process, because this process has a copy of those bytes in some
internal cache that isn't guaranteed to be fully coherent.  So even
though SetLatch() already changed latch-is_set to true, it still
looks false here.  Therefore, we go to sleep on the latch.


Surely marking the latch pointer volatile would force the store to 
is_set to be flushed, if not immediately, at least before the kill() 
system call. No?


Looking at Tom's patch in sinvaladt.c, the problem there was that the 
the store of the shared maxMsgNum variable could become visible to other 
processes after the store of the message itself. Using a volatile 
pointer for maxMsgNum would not have helped with that, because the 
operations on other variables might still be rearranged with respect to 
the store of maxMsgNum. SetLatch is simpler, there is only one variable 
(ok, two, but your scenario didn't involve a change in owner_pid). It 
seems safe to assume that the store becomes visible before the system call.


Tom's other scenario, where changing some other variable in shared 
memory might not have become visible to other processes when SetLatch() 
runs, seems more plausible (if harder to run into in practice). But if 
the variable is meant to be examined by other processes, then you should 
use a lock to protect it. Otherwise you'll have concurrency issues 
anyway. Or at least use a volatile pointer, I believe it's safe to 
assume that two operations using a volatile pointer will not be 
rearranged wrt. each other.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-15 Thread Robert Haas
On Mon, Nov 15, 2010 at 8:45 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 It's not a question of code rearrangement.

 Rearrangement of code, rearrangement of CPU instructions, or rearrangement
 of the order the changes in the memory become visible to other processes.
 The end result is the same.

I'll let Tom speak to this, because he understands it better than I
do, but I don't think this is really true.  Rearrangement of code is
something that the compiler has control over, and volatile addresses
that issue by preventing the compiler from making certain assumptions,
but the order in which memory operations become visible is a result of
the architecture of the memory bus, and the compiler doesn't directly
do anything about that.  It won't for example emit a cmpxchg
instruction rather than a simple store just because you declared the
variable volatile.

  Suppose at time zero, the
 latch is unset, but owned.  At approximately the same time, SetLatch()
 is called in one process and WaitLatch() in another process.
 SetLatch() sees that the latch is not set and sends SIGUSR1 to the
 other process.  The other process receives the signal but, since
 waiting is not yet set, it ignores the signal.  It then drains the
 self-pipe and examines latch-is_set.  But as it turns out, the update
 by the process which called SetLatch() isn't yet visible to this
 process, because this process has a copy of those bytes in some
 internal cache that isn't guaranteed to be fully coherent.  So even
 though SetLatch() already changed latch-is_set to true, it still
 looks false here.  Therefore, we go to sleep on the latch.

 Surely marking the latch pointer volatile would force the store to is_set to
 be flushed, if not immediately, at least before the kill() system call. No?

I don't think so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-15 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Hmm, SetLatch only sets one flag, so I don't see how it could malfunction
 all by itself. And I would've thought that declaring the Latch variable
 volatile prevents rearrangements.
 
 It's not a question of code rearrangement.

Precisely.  volatile prevents the compiler from rearranging the
instruction sequence in a way that would *issue* stores out-of-order.
However, that doesn't prevent the hardware from *executing* the stores
out-of-order from the point of view of a different processor.  As Robert
noted, the risk cases here come from caching; in particular, that a
dirty cache line might get flushed to main memory later than some other
dirty cache line.  There are some architectures that guarantee that this
won't happen (no doubt at significant expenditure of gates).  There are
others that don't, preferring to optimize the single-processor case.
On those, you need an msync type of instruction to force dirty cache
lines out to main memory between any two stores whose effects had better
become visible to another processor in a certain order.  I'm not sure if
it's universal, but on PPC there are actually two different sync
concepts involved, a write barrier that does the above and a read
barrier that ensures the reading processor is up-to-date.

 I believe it's safe to 
 assume that two operations using a volatile pointer will not be 
 rearranged wrt. each other.

This is entirely wrong, so far as cross-processor visibility of changes
is concerned.  Whether it should be true in some ideal reading of the C
spec is not relevant: there are common architectures in which it is not
true.

The window for trouble is normally pretty small, and in particular any
kernel call or context swap is usually going to force an msync.  So I'm
not sure that there are any risk spots in practice right now with
SetLatch.  But if we expand our use of it, we can be 100% certain we'll
get bit eventually.

 Tom's other scenario, where changing some other variable in shared 
 memory might not have become visible to other processes when SetLatch() 
 runs, seems more plausible (if harder to run into in practice). But if 
 the variable is meant to be examined by other processes, then you should 
 use a lock to protect it.

In that case, of what use is the latch stuff?  The whole point with that
(or at least a lot of the point) is to not have to take locks.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-15 Thread Robert Haas
On Mon, Nov 15, 2010 at 9:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Hmm, SetLatch only sets one flag, so I don't see how it could malfunction
 all by itself. And I would've thought that declaring the Latch variable
 volatile prevents rearrangements.

 It's not a question of code rearrangement.

 Precisely.  volatile prevents the compiler from rearranging the
 instruction sequence in a way that would *issue* stores out-of-order.
 However, that doesn't prevent the hardware from *executing* the stores
 out-of-order from the point of view of a different processor.  As Robert
 noted, the risk cases here come from caching; in particular, that a
 dirty cache line might get flushed to main memory later than some other
 dirty cache line.  There are some architectures that guarantee that this
 won't happen (no doubt at significant expenditure of gates).

And in fact if this (interesting!) video is any indication, that
problem is only going to get worse as core counts go up.  This guy
built a lock-free, wait-free hash table implementation that can run on
a system with hundreds of cores.  I'm just guessing here, but I
strongly suspect that keeping memory in full sync across that many
processors would just kill performance, so they shrug their shoulders
and don't.  The application programmer gets to pick up the pieces.

http://video.google.com/videoplay?docid=2139967204534450862#

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-15 Thread Heikki Linnakangas

On 15.11.2010 16:51, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

I believe it's safe to
assume that two operations using a volatile pointer will not be
rearranged wrt. each other.


This is entirely wrong, so far as cross-processor visibility of changes
is concerned.


Ok.

In SetLatch, is it enough to add the SpinLockAcquire() call *after* 
checking that is_set is not already set? Ie. still do the quick exit 
without holding a lock. Or do we need a memory barrier operation before 
the fetch, to ensure that we see if the other process has just cleared 
the flag with ResetLatch() ? Presumable ResetLatch() needs to call 
SpinLockAcquire() anyway to ensure that other processes see the clearing 
of the flag.



Tom's other scenario, where changing some other variable in shared
memory might not have become visible to other processes when SetLatch()
runs, seems more plausible (if harder to run into in practice). But if
the variable is meant to be examined by other processes, then you should
use a lock to protect it.


In that case, of what use is the latch stuff?  The whole point with that
(or at least a lot of the point) is to not have to take locks.


The use case for a latch is to wake up another process to examine other 
piece of shared memory (or a file or something else), and take action 
based on that other state if needed. Access to that other piece of 
shared memory needs locking or some other means of concurrency control, 
regardless of the mechanism used to wake up the other process.


Take the walsender latches for example. The other piece of shared 
memory is the current WAL flush location. The latch is used to wake up 
a walsender after flushing some WAL. The latch itself doesn't protect 
the access to the WAL flush pointer in any way, GetFlushRecPtr() uses a 
spinlock for that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-15 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 In SetLatch, is it enough to add the SpinLockAcquire() call *after* 
 checking that is_set is not already set? Ie. still do the quick exit 
 without holding a lock. Or do we need a memory barrier operation before 
 the fetch, to ensure that we see if the other process has just cleared 
 the flag with ResetLatch() ? Presumable ResetLatch() needs to call 
 SpinLockAcquire() anyway to ensure that other processes see the clearing 
 of the flag.

Hmm ... I just remembered the reason why we didn't use a spinlock in
these functions already.  Namely, that it's unsafe for a signal handler
to try to acquire a spinlock that the interrupted code might be holding.
So I think a bit more thought is needed here.  Maybe we need to bite the
bullet and do memory barriers ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-14 Thread Heikki Linnakangas

On 14.11.2010 22:55, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

On 13.11.2010 17:07, Tom Lane wrote:

Robert Haasrobertmh...@gmail.com   writes:

Come to think of it, I'm not really sure I understand what protects
SetLatch() against memory ordering hazards.  Is that actually safe?


Hmm ... that's a good question.  It certainly *looks* like it could
malfunction on machines with weak memory ordering.



Can you elaborate?


Weak memory ordering means that stores into shared memory initiated by
one processor are not guaranteed to be observed to occur in the same
sequence by another processor.  This implies first that the latch code
could malfunction all by itself, if two processes manipulate a latch at
about the same time, and second (probably much less likely) that there
could be a malfunction involving a process that's waited on a latch not
seeing the shared-memory status updates that another process did before
setting the latch.

This is not at all hypothetical --- my first attempt at rewriting the
sinval signaling code, a couple years back, failed on PPC machines in
the buildfarm because of exactly this type of issue.


Hmm, SetLatch only sets one flag, so I don't see how it could 
malfunction all by itself. And I would've thought that declaring the 
Latch variable volatile prevents rearrangements.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers