subject:"Re\: \[HACKERS\] Wait free LW

Hi,

On 2014-10-10 10:13:03 +0530, Amit Kapila wrote:
 I have done few performance tests for above patches and results of
 same is as below:

Cool, thanks.

 Performance Data
 --
 IBM POWER-7 16 cores, 64 hardware threads
 RAM = 64GB
 max_connections =210
 Database Locale =C
 checkpoint_segments=256
 checkpoint_timeout=35min
 shared_buffers=8GB
 Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8)
 Duration of each individual run = 5mins
 Test type - read only pgbench with -M prepared
 Other Related information about test
 a. This is the data for median of 3 runs, the detailed data of individual
 run
 is attached with mail.
 b. I have applied both the patches to take performance data.
 
 Scale Factor - 100
 
Patch_ver/Client_count 1 8 16 32 64 128  HEAD 13344 106921 196629 295123
 377846 333928  PATCH 13662 106179 203960 298955 452638 465671
 
 Scale Factor - 3000
 
Patch_ver/Client_count 8 16 32 64 128 160  HEAD 86920 152417 231668
 280827 257093 255122  PATCH 87552 160313 230677 276186 248609 244372
 
 
 Observations
 --
 a. The patch performs really well (increase upto ~40%) incase all the
 data fits in shared buffers (scale factor -100).
 b. Incase data doesn't fit in shared buffers, but fits in RAM
 (scale factor -3000), there is performance increase upto 16 client count,
 however after that it starts dipping (in above config unto ~4.4%).

Hm. Interesting. I don't see that dip on x86.

 The above data shows that the patch improves performance for cases
 when there is shared LWLock contention, however there is a slight
 performance dip in case of Exclusive LWLocks (at scale factor 3000,
 it needs exclusive LWLocks for buf mapping tables).  Now I am not
 sure if this is the worst case dip or under similar configurations the
 performance dip can be higher, because the trend shows that dip is
 increasing with more client counts.
 
 Brief Analysis of code w.r.t performance dip
 -
 Extra Instructions w.r.t Head in Acquire Exclusive lock path
 a. Attempt lock twice
 b. atomic operations for nwaiters in LWLockQueueSelf() and
 LWLockAcquireCommon()
 c. Now we need to take spinlock twice, once for self queuing and then
 again for setting releaseOK.
 d. few function calls and some extra checks

Hm. I can't really see the number of atomics itself matter - a spinning
lock will do many more atomic ops than this. But I wonder whether we
could get rid of the releaseOK lock. Should be quite possible.

 Now probably these shouldn't matter much in case backend needs to
 wait for other Exclusive locker, but I am not sure what else could be
 the reason for dip in case we need to have Exclusive LWLocks.

Any chance to get a profile?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

On 2014-10-08 20:07:35 -0400, Robert Haas wrote:
 On Wed, Oct 8, 2014 at 2:04 PM, Andres Freund and...@2ndquadrant.com wrote:
  So, what makes it work for me (among other unrelated stuff) seems to be
  the following in .gdbinit, defineing away some things that gdb doesn't
  handle:
  macro define __builtin_offsetof(T, F) ((int) (((T *) 0)-F))
  macro define __extension__
  macro define AssertVariableIsOfTypeMacro(x, y) ((void)0)
 
  Additionally I have -ggdb -g3 in CFLAGS. That way gdb knows about
  postgres' macros. At least if you're in the right scope.
 
  As an example, the following works:
  (gdb) p dlist_is_empty(BackendList) ? NULL : dlist_head_element(Backend, 
  elem, BackendList)
 
 Ah, cool.  I'll try that.

If that works for you, should we put it somewhere in the docs? If so,
where?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

Hi Robert,

On 2014-10-08 16:01:53 -0400, Robert Haas wrote:
 [ comment fixes ]

Thanks, I've incorporated these + a bit more.

Could you otherwise make sense of the explanation and the algorithm?

 +/* yipeyyahee */
 
 Although this will be clear to individuals with a good command of
 English, I suggest avoiding such usages.

I've removed them with a heavy heart. These are heartfelt emotions from
getting the algorithm to work (:P)

I've attached these fixes + the removal of spinlocks around releaseOK as
follow up patches. Obviously they'll be merged into the other patch, but
sounds useful to be able see them separately.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 6885a15cc6f2e193ff575a4463d90ad252d74f5e Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 7 Oct 2014 15:32:34 +0200
Subject: [PATCH 1/4] Convert the PGPROC-lwWaitLink list into a dlist instead
 of open coding it.

Besides being shorter and much easier to read it:

* changes the logic in LWLockRelease() to release all shared lockers
  when waking up any. This can yield some significant performance
  improvements - and the fairness isn't really much worse than before,
  as we always allowed new shared lockers to jump the queue.

* adds a memory pg_write_barrier() in the wakeup paths between
  dequeuing and unsetting -lwWaiting. That was always required on
  weakly ordered machines, but f4077cda2 made it more urgent.

Author: Andres Freund
---
 src/backend/access/transam/twophase.c |   1 -
 src/backend/storage/lmgr/lwlock.c | 151 +-
 src/backend/storage/lmgr/proc.c   |   2 -
 src/include/storage/lwlock.h  |   5 +-
 src/include/storage/proc.h|   3 +-
 5 files changed, 60 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index d5409a6..6401943 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -389,7 +389,6 @@ MarkAsPreparing(TransactionId xid, const char *gid,
 	proc-roleId = owner;
 	proc-lwWaiting = false;
 	proc-lwWaitMode = 0;
-	proc-lwWaitLink = NULL;
 	proc-waitLock = NULL;
 	proc-waitProcLock = NULL;
 	for (i = 0; i  NUM_LOCK_PARTITIONS; i++)
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 9fe6855..e6f9158 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -35,6 +35,7 @@
 #include miscadmin.h
 #include pg_trace.h
 #include replication/slot.h
+#include storage/barrier.h
 #include storage/ipc.h
 #include storage/predicate.h
 #include storage/proc.h
@@ -115,9 +116,9 @@ inline static void
 PRINT_LWDEBUG(const char *where, const LWLock *lock)
 {
 	if (Trace_lwlocks)
-		elog(LOG, %s(%s %d): excl %d shared %d head %p rOK %d,
+		elog(LOG, %s(%s %d): excl %d shared %d rOK %d,
 			 where, T_NAME(lock), T_ID(lock),
-			 (int) lock-exclusive, lock-shared, lock-head,
+			 (int) lock-exclusive, lock-shared,
 			 (int) lock-releaseOK);
 }
 
@@ -475,8 +476,7 @@ LWLockInitialize(LWLock *lock, int tranche_id)
 	lock-exclusive = 0;
 	lock-shared = 0;
 	lock-tranche = tranche_id;
-	lock-head = NULL;
-	lock-tail = NULL;
+	dlist_init(lock-waiters);
 }
 
 
@@ -615,12 +615,7 @@ LWLockAcquireCommon(LWLock *lock, LWLockMode mode, uint64 *valptr, uint64 val)
 
 		proc-lwWaiting = true;
 		proc-lwWaitMode = mode;
-		proc-lwWaitLink = NULL;
-		if (lock-head == NULL)
-			lock-head = proc;
-		else
-			lock-tail-lwWaitLink = proc;
-		lock-tail = proc;
+		dlist_push_head(lock-waiters, proc-lwWaitLink);
 
 		/* Can release the mutex now */
 		SpinLockRelease(lock-mutex);
@@ -836,12 +831,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 
 		proc-lwWaiting = true;
 		proc-lwWaitMode = LW_WAIT_UNTIL_FREE;
-		proc-lwWaitLink = NULL;
-		if (lock-head == NULL)
-			lock-head = proc;
-		else
-			lock-tail-lwWaitLink = proc;
-		lock-tail = proc;
+		dlist_push_head(lock-waiters, proc-lwWaitLink);
 
 		/* Can release the mutex now */
 		SpinLockRelease(lock-mutex);
@@ -997,13 +987,8 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 		 */
 		proc-lwWaiting = true;
 		proc-lwWaitMode = LW_WAIT_UNTIL_FREE;
-		proc-lwWaitLink = NULL;
-
 		/* waiters are added to the front of the queue */
-		proc-lwWaitLink = lock-head;
-		if (lock-head == NULL)
-			lock-tail = proc;
-		lock-head = proc;
+		dlist_push_head(lock-waiters, proc-lwWaitLink);
 
 		/* Can release the mutex now */
 		SpinLockRelease(lock-mutex);
@@ -1079,9 +1064,10 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 void
 LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
 {
-	PGPROC	   *head;
-	PGPROC	   *proc;
-	PGPROC	   *next;
+	dlist_head	wakeup;
+	dlist_mutable_iter iter;
+
+	dlist_init(wakeup);
 
 	/* Acquire mutex.  Time spent holding mutex

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund and...@2ndquadrant.com
wrote:
 On 2014-10-10 10:13:03 +0530, Amit Kapila wrote:
  I have done few performance tests for above patches and results of
  same is as below:

 Cool, thanks.

  Performance Data
  --
  IBM POWER-7 16 cores, 64 hardware threads
  RAM = 64GB
  max_connections =210
  Database Locale =C
  checkpoint_segments=256
  checkpoint_timeout=35min
  shared_buffers=8GB
  Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8)
  Duration of each individual run = 5mins
  Test type - read only pgbench with -M prepared
  Other Related information about test
  a. This is the data for median of 3 runs, the detailed data of
individual
  run
  is attached with mail.
  b. I have applied both the patches to take performance data.
 
 
  Observations
  --
  a. The patch performs really well (increase upto ~40%) incase all the
  data fits in shared buffers (scale factor -100).
  b. Incase data doesn't fit in shared buffers, but fits in RAM
  (scale factor -3000), there is performance increase upto 16 client
count,
  however after that it starts dipping (in above config unto ~4.4%).

 Hm. Interesting. I don't see that dip on x86.

Is it possible that implementation of some atomic operation is costlier
for particular architecture?

I have tried again for scale factor 3000 and could see the dip and this
time I have even tried with 175 client count and the dip is approximately
5% which is slightly more than 160 client count.


  Patch_ver/Client_count 175  HEAD 248374  PATCH 235669
  Now probably these shouldn't matter much in case backend needs to
  wait for other Exclusive locker, but I am not sure what else could be
  the reason for dip in case we need to have Exclusive LWLocks.

 Any chance to get a profile?

Here it goes..

HEAD - client_count=128
-

+   7.53% postgres  postgres   [.] GetSnapshotData
+   3.41% postgres  postgres   [.] AllocSetAlloc
+   2.61% postgres  postgres   [.] AllocSetFreeIndex
+   2.49% postgres  postgres   [.] _bt_compare
+   2.43% postgres  [kernel.kallsyms]  [k] .__copy_tofrom_user
+   2.40% postgres  postgres   [.]
hash_search_with_hash_value
+   1.83% postgres  postgres   [.] tas
+   1.29% postgres  postgres   [.] pg_encoding_mbcliplen
+   1.27% postgres  postgres   [.] MemoryContextCreate
+   1.22% postgres  postgres   [.]
MemoryContextAllocZeroAligned
+   1.17% postgres  postgres   [.] hash_seq_search
+   0.97% postgres  postgres   [.] LWLockRelease
+   0.96% postgres  postgres   [.]
MemoryContextAllocZero
+   0.91% postgres  postgres   [.]
GetPrivateRefCountEntry
+   0.82% postgres  postgres   [.] AllocSetFree
+   0.79% postgres  postgres   [.] LWLockAcquireCommon
+   0.78% postgres  postgres   [.] pfree



Detailed Data
-
-   7.53% postgres  postgres   [.] GetSnapshotData
   - GetSnapshotData
  - 7.46% GetSnapshotData
 - 7.46% GetTransactionSnapshot
- 3.74% exec_bind_message
 PostgresMain
 BackendRun
 BackendStartup
 ServerLoop
 PostmasterMain
 main
 generic_start_main.isra.0
 __libc_start_main
 0
- 3.72% PortalStart
 exec_bind_message
 PostgresMain
 BackendRun
 BackendStartup
 ServerLoop
 PostmasterMain
 main
 generic_start_main.isra.0
 __libc_start_main
 0
-   3.41% postgres  postgres   [.] AllocSetAlloc
   - AllocSetAlloc
  - 2.01% AllocSetAlloc
   0.81% palloc
   0.63% MemoryContextAlloc
-   2.61% postgres  postgres   [.] AllocSetFreeIndex
   - AllocSetFreeIndex
1.59% AllocSetAlloc
0.79% AllocSetFree
-   2.49% postgres  postgres   [.] _bt_compare
   - _bt_compare
  - 1.80% _bt_binsrch
 - 1.80% _bt_binsrch
- 1.21% _bt_search
 _bt_first

Lwlock_contention patches - client_count=128
--

+   7.95%  postgres  postgres   [.] GetSnapshotData
+   3.58%  postgres  postgres   [.] AllocSetAlloc
+   2.51%  postgres  postgres   [.] _bt_compare
+   2.44%  postgres  postgres   [.]
hash_search_with_hash_value
+   2.33%  postgres  [kernel.kallsyms]  [k]

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On 2014-10-10 17:18:46 +0530, Amit Kapila wrote:
 On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund and...@2ndquadrant.com
 wrote:
   Observations
   --
   a. The patch performs really well (increase upto ~40%) incase all the
   data fits in shared buffers (scale factor -100).
   b. Incase data doesn't fit in shared buffers, but fits in RAM
   (scale factor -3000), there is performance increase upto 16 client
 count,
   however after that it starts dipping (in above config unto ~4.4%).
 
  Hm. Interesting. I don't see that dip on x86.

 Is it possible that implementation of some atomic operation is costlier
 for particular architecture?

Yes, sure. And IIRC POWER improved atomics performance considerably for
POWER8...

 I have tried again for scale factor 3000 and could see the dip and this
 time I have even tried with 175 client count and the dip is approximately
 5% which is slightly more than 160 client count.

FWIW, the profile always looks like
-  48.61%  postgres  postgres  [.] s_lock
   - s_lock
  + 96.67% StrategyGetBuffer
  + 1.19% UnpinBuffer
  + 0.90% PinBuffer
  + 0.70% hash_search_with_hash_value
+   3.11%  postgres  postgres  [.] GetSnapshotData
+   2.47%  postgres  postgres  [.] StrategyGetBuffer
+   1.93%  postgres  [kernel.kallsyms] [k] copy_user_generic_string
+   1.28%  postgres  postgres  [.] hash_search_with_hash_value
-   1.27%  postgres  postgres  [.] LWLockAttemptLock
   - LWLockAttemptLock
  - 97.78% LWLockAcquire
 + 38.76% ReadBuffer_common
 + 28.62% _bt_getbuf
 + 8.59% _bt_relandgetbuf
 + 6.25% GetSnapshotData
 + 5.93% VirtualXactLockTableInsert
 + 3.95% VirtualXactLockTableCleanup
 + 2.35% index_fetch_heap
 + 1.66% StartBufferIO
 + 1.56% LockReleaseAll
 + 1.55% _bt_next
 + 0.78% LockAcquireExtended
  + 1.47% _bt_next
  + 0.75% _bt_relandgetbuf

to me. Now that's with the client count 496, but it's similar with lower
counts.

BTW, that profile *clearly* indicates we should make StrategyGetBuffer()
smarter.

   Patch_ver/Client_count 175  HEAD 248374  PATCH 235669
   Now probably these shouldn't matter much in case backend needs to
   wait for other Exclusive locker, but I am not sure what else could be
   the reason for dip in case we need to have Exclusive LWLocks.
 
  Any chance to get a profile?

 Here it goes..

 Lwlock_contention patches - client_count=128
 --

 +   7.95%  postgres  postgres   [.] GetSnapshotData
 +   3.58%  postgres  postgres   [.] AllocSetAlloc
 +   2.51%  postgres  postgres   [.] _bt_compare
 +   2.44%  postgres  postgres   [.]
 hash_search_with_hash_value
 +   2.33%  postgres  [kernel.kallsyms]  [k] .__copy_tofrom_user
 +   2.24%  postgres  postgres   [.] AllocSetFreeIndex
 +   1.75%  postgres  postgres   [.]
 pg_atomic_fetch_add_u32_impl

Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were
the compiler settings you used?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On 2014-10-10 16:41:39 +0200, Andres Freund wrote:
 FWIW, the profile always looks like
 -  48.61%  postgres  postgres  [.] s_lock
- s_lock
   + 96.67% StrategyGetBuffer
   + 1.19% UnpinBuffer
   + 0.90% PinBuffer
   + 0.70% hash_search_with_hash_value
 +   3.11%  postgres  postgres  [.] GetSnapshotData
 +   2.47%  postgres  postgres  [.] StrategyGetBuffer
 +   1.93%  postgres  [kernel.kallsyms] [k] copy_user_generic_string
 +   1.28%  postgres  postgres  [.] hash_search_with_hash_value
 -   1.27%  postgres  postgres  [.] LWLockAttemptLock
- LWLockAttemptLock
   - 97.78% LWLockAcquire
  + 38.76% ReadBuffer_common
  + 28.62% _bt_getbuf
  + 8.59% _bt_relandgetbuf
  + 6.25% GetSnapshotData
  + 5.93% VirtualXactLockTableInsert
  + 3.95% VirtualXactLockTableCleanup
  + 2.35% index_fetch_heap
  + 1.66% StartBufferIO
  + 1.56% LockReleaseAll
  + 1.55% _bt_next
  + 0.78% LockAcquireExtended
   + 1.47% _bt_next
   + 0.75% _bt_relandgetbuf
 
 to me. Now that's with the client count 496, but it's similar with lower
 counts.
 
 BTW, that profile *clearly* indicates we should make StrategyGetBuffer()
 smarter.

Which is nearly trivial now that atomics are in. Check out the attached
WIP patch which eliminates the spinlock from StrategyGetBuffer() unless
there's buffers on the freelist.

Test:
pgbench  -M prepared -P 5 -S -c 496 -j 496 -T 5000
on a scale=1000 database, with 4GB of shared buffers.

Before:
progress: 40.0 s, 136252.3 tps, lat 3.628 ms stddev 4.547
progress: 45.0 s, 135049.0 tps, lat 3.660 ms stddev 4.515
progress: 50.0 s, 135788.9 tps, lat 3.640 ms stddev 4.398
progress: 55.0 s, 135268.4 tps, lat 3.654 ms stddev 4.469
progress: 60.0 s, 134991.6 tps, lat 3.661 ms stddev 4.739

after:
progress: 40.0 s, 207701.1 tps, lat 2.382 ms stddev 3.018
progress: 45.0 s, 208022.4 tps, lat 2.377 ms stddev 2.902
progress: 50.0 s, 209187.1 tps, lat 2.364 ms stddev 2.970
progress: 55.0 s, 206462.7 tps, lat 2.396 ms stddev 2.871
progress: 60.0 s, 210263.8 tps, lat 2.351 ms stddev 2.914

Yes, no kidding.

The results are similar, but less extreme, for smaller client counts
like 80 or 160.

Amit, since your test seems to be currently completely bottlenecked
within StrategyGetBuffer(), could you compare with that patch applied to
HEAD and the LW_SHARED patch for one client count? That'll may allow us
to see a more meaningful profile...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 6b486e5b467e94ab9297d7656a5b39b816c5c55a Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Fri, 10 Oct 2014 17:36:46 +0200
Subject: [PATCH] WIP: lockless StrategyGetBuffer hotpath

---
 src/backend/storage/buffer/freelist.c | 154 --
 1 file changed, 90 insertions(+), 64 deletions(-)

diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 5966beb..0c634a0 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -18,6 +18,12 @@
 #include storage/buf_internals.h
 #include storage/bufmgr.h
 
+#include port/atomics.h
+
+
+#define LATCHPTR_ACCESS_ONCE(var)	((Latch *)(*((volatile Latch **)(var
+#define INT_ACCESS_ONCE(var)	((int)(*((volatile int *)(var
+
 
 /*
  * The shared freelist control information.
@@ -27,8 +33,12 @@ typedef struct
 	/* Spinlock: protects the values below */
 	slock_t		buffer_strategy_lock;
 
-	/* Clock sweep hand: index of next buffer to consider grabbing */
-	int			nextVictimBuffer;
+	/*
+	 * Clock sweep hand: index of next buffer to consider grabbing. Note that
+	 * this isn't a concrete buffer - we only ever increase the value. So, to
+	 * get an actual buffer, it needs to be used modulo NBuffers.
+	 */
+	pg_atomic_uint32 nextVictimBuffer;
 
 	int			firstFreeBuffer;	/* Head of list of unused buffers */
 	int			lastFreeBuffer; /* Tail of list of unused buffers */
@@ -42,8 +52,8 @@ typedef struct
 	 * Statistics.  These counters should be wide enough that they can't
 	 * overflow during a single bgwriter cycle.
 	 */
-	uint32		completePasses; /* Complete cycles of the clock sweep */
-	uint32		numBufferAllocs;	/* Buffers allocated since last reset */
+	pg_atomic_uint32 completePasses; /* Complete cycles of the clock sweep */
+	pg_atomic_uint32 numBufferAllocs;	/* Buffers allocated since last reset */
 
 	/*
 	 * Notification latch, or NULL if none.  See StrategyNotifyBgWriter.
@@ -124,87 +134,107 @@ StrategyGetBuffer(BufferAccessStrategy strategy)
 			return buf;
 	}
 
-	/* Nope, so lock the freelist */
-	SpinLockAcquire(StrategyControl-buffer_strategy_lock);
-
 	/*
 	 * We count buffer allocation requests so that the bgwriter can estimate
 	 * the rate of buffer consumption.

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On Fri, Oct 10, 2014 at 8:11 PM, Andres Freund and...@2ndquadrant.com
wrote:
On 2014-10-10 17:18:46 +0530, Amit Kapila wrote:
On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund and...@2ndquadrant.com
wrote:
Observations
--
a. The patch performs really well (increase upto ~40%) incase all
the
data fits in shared buffers (scale factor -100).
b. Incase data doesn't fit in shared buffers, but fits in RAM
(scale factor -3000), there is performance increase upto 16 client
count,
however after that it starts dipping (in above config unto ~4.4%).

Hm. Interesting. I don't see that dip on x86.

Is it possible that implementation of some atomic operation is costlier
for particular architecture?

Yes, sure. And IIRC POWER improved atomics performance considerably for
POWER8...

I have tried again for scale factor 3000 and could see the dip and this
time I have even tried with 175 client count and the dip is
approximately
5% which is slightly more than 160 client count.

FWIW, the profile always looks like:

For my tests on Power8, the profile looks somewhat similar to below
profile mentioned by you, please see this mail:
http://www.postgresql.org/message-id/caa4ek1je9zblhsfiavhd18gdwxux21zfqpjgq_dz_zoa35n...@mail.gmail.com

However on Power7, the profile looks different which I have
posted above thread.

BTW, that profile *clearly* indicates we should make StrategyGetBuffer()
smarter.

Yeah, even bgreclaimer patch is able to achieve the same, however
after that the contention moves to somewhere else as you can see
in above link.

Here it goes..

Lwlock_contention patches - client_count=128
--

+ 7.95% postgres postgres [.] GetSnapshotData
+ 3.58% postgres postgres [.] AllocSetAlloc
+ 2.51% postgres postgres [.] _bt_compare
+ 2.44% postgres postgres [.]
hash_search_with_hash_value
+ 2.33% postgres [kernel.kallsyms] [k] .__copy_tofrom_user
+ 2.24% postgres postgres [.] AllocSetFreeIndex
+ 1.75% postgres postgres [.]
pg_atomic_fetch_add_u32_impl

Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were
the compiler settings you used?

Nothing specific, for performance tests where I have to take profiles
I use below:
./configure --prefix=installation_path CFLAGS=-fno-omit-frame-pointer
make

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On 2014-10-11 06:18:11 +0530, Amit Kapila wrote:
 On Fri, Oct 10, 2014 at 8:11 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  On 2014-10-10 17:18:46 +0530, Amit Kapila wrote:
   On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund and...@2ndquadrant.com
   wrote:
 Observations
 --
 a. The patch performs really well (increase upto ~40%) incase all
 the
 data fits in shared buffers (scale factor -100).
 b. Incase data doesn't fit in shared buffers, but fits in RAM
 (scale factor -3000), there is performance increase upto 16 client
   count,
 however after that it starts dipping (in above config unto ~4.4%).
   
Hm. Interesting. I don't see that dip on x86.
  
   Is it possible that implementation of some atomic operation is costlier
   for particular architecture?
 
  Yes, sure. And IIRC POWER improved atomics performance considerably for
  POWER8...
 
   I have tried again for scale factor 3000 and could see the dip and this
   time I have even tried with 175 client count and the dip is
 approximately
   5% which is slightly more than 160 client count.

I've run some short tests on hydra:

scale 1000:

base:
4GB:
tps = 296273.004800 (including connections establishing)
tps = 296373.978100 (excluding connections establishing)

8GB:
tps = 338001.455970 (including connections establishing)
tps = 338177.439106 (excluding connections establishing)

base + freelist:
4GB:
tps = 297057.523528 (including connections establishing)
tps = 297156.987418 (excluding connections establishing)

8GB:
tps = 335123.867097 (including connections establishing)
tps = 335239.122472 (excluding connections establishing)

base + LW_SHARED:
4GB:
tps = 296262.164455 (including connections establishing)
tps = 296357.524819 (excluding connections establishing)
8GB:
tps = 336988.744742 (including connections establishing)
tps = 337097.836395 (excluding connections establishing)

base + LW_SHARED + freelist:
4GB:
tps = 296887.981743 (including connections establishing)
tps = 296980.231853 (excluding connections establishing)

8GB:
tps = 345049.062898 (including connections establishing)
tps = 345161.947055 (excluding connections establishing)

I've also run some preliminary tests using scale=3000 - and I couldn't
see a performance difference either.

Note that all these are noticeably faster than your results.

  
   Lwlock_contention patches - client_count=128
   --
  
   +   7.95%  postgres  postgres   [.] GetSnapshotData
   +   3.58%  postgres  postgres   [.] AllocSetAlloc
   +   2.51%  postgres  postgres   [.] _bt_compare
   +   2.44%  postgres  postgres   [.]
   hash_search_with_hash_value
   +   2.33%  postgres  [kernel.kallsyms]  [k] .__copy_tofrom_user
   +   2.24%  postgres  postgres   [.] AllocSetFreeIndex
   +   1.75%  postgres  postgres   [.]
   pg_atomic_fetch_add_u32_impl
 
  Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were
  the compiler settings you used?
 
 Nothing specific, for performance tests where I have to take profiles
 I use below:
 ./configure --prefix=installation_path CFLAGS=-fno-omit-frame-pointer
 make

Hah. Doing so overwrites the CFLAGS configure normally sets. Check
# CFLAGS are selected so:
# If the user specifies something in the environment, that is used.
# else:  If the template file set something, that is used.
# else:  If coverage was enabled, don't set anything.
# else:  If the compiler is GCC, then we use -O2.
# else:  If the compiler is something else, then we use -O, unless debugging.

so, if you do like above, you're compiling without optimizations... So,
include at least -O2 as well.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On Sat, Oct 11, 2014 at 6:29 AM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-10-11 06:18:11 +0530, Amit Kapila wrote:
 I've run some short tests on hydra:

 scale 1000:

 base:
 4GB:
 tps = 296273.004800 (including connections establishing)
 tps = 296373.978100 (excluding connections establishing)

 8GB:
 tps = 338001.455970 (including connections establishing)
 tps = 338177.439106 (excluding connections establishing)

 base + freelist:
 4GB:
 tps = 297057.523528 (including connections establishing)
 tps = 297156.987418 (excluding connections establishing)

 8GB:
 tps = 335123.867097 (including connections establishing)
 tps = 335239.122472 (excluding connections establishing)

 base + LW_SHARED:
 4GB:
 tps = 296262.164455 (including connections establishing)
 tps = 296357.524819 (excluding connections establishing)
 8GB:
 tps = 336988.744742 (including connections establishing)
 tps = 337097.836395 (excluding connections establishing)

 base + LW_SHARED + freelist:
 4GB:
 tps = 296887.981743 (including connections establishing)
 tps = 296980.231853 (excluding connections establishing)

 8GB:
 tps = 345049.062898 (including connections establishing)
 tps = 345161.947055 (excluding connections establishing)

 I've also run some preliminary tests using scale=3000 - and I couldn't
 see a performance difference either.

 Note that all these are noticeably faster than your results.

What is the client count?
Could you please post numbers you are getting for 3000 scale
factor for client count 128 and 175?

  Nothing specific, for performance tests where I have to take profiles
  I use below:
  ./configure --prefix=installation_path
CFLAGS=-fno-omit-frame-pointer
  make

 Hah. Doing so overwrites the CFLAGS configure normally sets. Check
 # CFLAGS are selected so:
 # If the user specifies something in the environment, that is used.
 # else:  If the template file set something, that is used.
 # else:  If coverage was enabled, don't set anything.
 # else:  If the compiler is GCC, then we use -O2.
 # else:  If the compiler is something else, then we use -O, unless
debugging.

 so, if you do like above, you're compiling without optimizations... So,
 include at least -O2 as well.

Hmm. okay, but is this required when we do actual performance
tests, because for that currently I don't use CFLAGS.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On 2014-10-11 06:49:54 +0530, Amit Kapila wrote:
 On Sat, Oct 11, 2014 at 6:29 AM, Andres Freund and...@2ndquadrant.com
 wrote:
 
  On 2014-10-11 06:18:11 +0530, Amit Kapila wrote:
  I've run some short tests on hydra:
 
  scale 1000:
 
  base:
  4GB:
  tps = 296273.004800 (including connections establishing)
  tps = 296373.978100 (excluding connections establishing)
 
  8GB:
  tps = 338001.455970 (including connections establishing)
  tps = 338177.439106 (excluding connections establishing)
 
  base + freelist:
  4GB:
  tps = 297057.523528 (including connections establishing)
  tps = 297156.987418 (excluding connections establishing)
 
  8GB:
  tps = 335123.867097 (including connections establishing)
  tps = 335239.122472 (excluding connections establishing)
 
  base + LW_SHARED:
  4GB:
  tps = 296262.164455 (including connections establishing)
  tps = 296357.524819 (excluding connections establishing)
  8GB:
  tps = 336988.744742 (including connections establishing)
  tps = 337097.836395 (excluding connections establishing)
 
  base + LW_SHARED + freelist:
  4GB:
  tps = 296887.981743 (including connections establishing)
  tps = 296980.231853 (excluding connections establishing)
 
  8GB:
  tps = 345049.062898 (including connections establishing)
  tps = 345161.947055 (excluding connections establishing)
 
  I've also run some preliminary tests using scale=3000 - and I couldn't
  see a performance difference either.
 
  Note that all these are noticeably faster than your results.
 
 What is the client count?

160, because that was the one you reported the biggest regression.

 Could you please post numbers you are getting for 3000 scale
 factor for client count 128 and 175?

Yes, although not tonight Also from hydra?

   Nothing specific, for performance tests where I have to take profiles
   I use below:
   ./configure --prefix=installation_path
 CFLAGS=-fno-omit-frame-pointer
   make
 
  Hah. Doing so overwrites the CFLAGS configure normally sets. Check
  # CFLAGS are selected so:
  # If the user specifies something in the environment, that is used.
  # else:  If the template file set something, that is used.
  # else:  If coverage was enabled, don't set anything.
  # else:  If the compiler is GCC, then we use -O2.
  # else:  If the compiler is something else, then we use -O, unless
 debugging.
 
  so, if you do like above, you're compiling without optimizations... So,
  include at least -O2 as well.
 
 Hmm. okay, but is this required when we do actual performance
 tests, because for that currently I don't use CFLAGS.

I'm not sure what you mean? You need to include -O2 in CFLAGS whenever
you override it. Your profile was clearly without inlining... And since
your general performance numbers are a fair bit lower than what I see
with, hopefully, the same code on the same machine...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On Sat, Oct 11, 2014 at 7:00 AM, Andres Freund and...@2ndquadrant.com
wrote:
 On 2014-10-11 06:49:54 +0530, Amit Kapila wrote:
  On Sat, Oct 11, 2014 at 6:29 AM, Andres Freund and...@2ndquadrant.com
  wrote:
  
   On 2014-10-11 06:18:11 +0530, Amit Kapila wrote:
   I've run some short tests on hydra:
  

  Could you please post numbers you are getting for 3000 scale
  factor for client count 128 and 175?

 Yes, although not tonight

No issues, whenever you get it.

 Also from hydra?

Yes.  One more thing I would like to share with you is that while doing
tests, there are some other settings change in postgresql.conf

maintenance_work_mem = 1GB
synchronous_commit = off
wal_writer_delay = 20ms
checkpoint_segments=256
checkpoint_timeout=35min

I don't think these parameters matter for the tests we are doing, but
still I thought it is good to share, because I forgot to send some of
these non-default settings in previous mail.

Nothing specific, for performance tests where I have to take
profiles
I use below:
./configure --prefix=installation_path
  CFLAGS=-fno-omit-frame-pointer
make
  
   Hah. Doing so overwrites the CFLAGS configure normally sets. Check
   # CFLAGS are selected so:
   # If the user specifies something in the environment, that is used.
   # else:  If the template file set something, that is used.
   # else:  If coverage was enabled, don't set anything.
   # else:  If the compiler is GCC, then we use -O2.
   # else:  If the compiler is something else, then we use -O, unless
  debugging.
  
   so, if you do like above, you're compiling without optimizations...
So,
   include at least -O2 as well.
 
  Hmm. okay, but is this required when we do actual performance
  tests, because for that currently I don't use CFLAGS.

 I'm not sure what you mean? You need to include -O2 in CFLAGS whenever
 you override it.

okay, thats what I wanted to ask you, so that we should not see different
numbers due to the way code is built.

When I do performance tests where I don't want to see profile,
I use below statement:
./configure --prefix=installation_path

 And since
 your general performance numbers are a fair bit lower than what I see
 with, hopefully, the same code on the same machine...

You have reported numbers at 1000 scale factor and mine were
at 3000 scale factor, so I think the difference is expected.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-09 Thread Jim Nasby


On 10/8/14, 8:35 AM, Andres Freund wrote:

+#define EXCLUSIVE_LOCK (((uint32) 1)  (31 - 1))
+
+/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */
+#define SHARED_LOCK_MASK (~EXCLUSIVE_LOCK)


There should at least be a comment where we define MAX_BACKENDS about the 
relationship here... or better yet, validate that MAX_BACKENDS  
SHARED_LOCK_MASK during postmaster startup. (For those that think that's too 
pedantic, I'll argue that it's no worse than the patch verifying that MyProc != 
NULL in LWLockQueueSelf()).



+/*
+ * Internal function that tries to atomically acquire the lwlock in the passed
+ * in mode.
+ *
+ * This function will not block waiting for a lock to become free - that's the
+ * callers job.
+ *
+ * Returns true if the lock isn't free and we need to wait.
+ *
+ * When acquiring shared locks it's possible that we disturb an exclusive
+ * waiter. If that's a problem for the specific user, pass in a valid pointer
+ * for 'potentially_spurious'. Its value will be set to true if we possibly
+ * did so. The caller then has to handle that scenario.
+ */
+static bool
+LWLockAttemptLock(LWLock* lock, LWLockMode mode, bool *potentially_spurious)


We should invert the return of this function. Current code returns true if the 
lock is actually acquired (see below), and I think that's true of other locking 
code as well. IMHO it makes more sense that way, plus consistency is good.

(From 9.3)
 * LWLockConditionalAcquire - acquire a lightweight lock in the specified mode
 *
 * If the lock is not available, return FALSE with no side-effects.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-09 Thread Andres Freund

On 2014-10-09 16:52:46 -0500, Jim Nasby wrote:
 On 10/8/14, 8:35 AM, Andres Freund wrote:
 +#define EXCLUSIVE_LOCK (((uint32) 1)  (31 - 1))
 +
 +/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */
 +#define SHARED_LOCK_MASK (~EXCLUSIVE_LOCK)
 
 There should at least be a comment where we define MAX_BACKENDS about the 
 relationship here... or better yet, validate that MAX_BACKENDS  
 SHARED_LOCK_MASK during postmaster startup. (For those that think that's too 
 pedantic, I'll argue that it's no worse than the patch verifying that MyProc 
 != NULL in LWLockQueueSelf()).

If you modify either, you better grep for them... I don't think that's
going to happen anyway. Requiring it during startup would mean exposing
SHARED_LOCK_MASK outside of lwlock.c which'd be ugly. We could possibly
stick a StaticAssert() someplace in lwlock.c.

And no, it's not comparable at all to MyProc != NULL - the lwlock code
initially *does* run when MyProc isn't setup. We just better not
conflict against any other lockers at that stage.

 +/*
 + * Internal function that tries to atomically acquire the lwlock in the 
 passed
 + * in mode.
 + *
 + * This function will not block waiting for a lock to become free - that's 
 the
 + * callers job.
 + *
 + * Returns true if the lock isn't free and we need to wait.
 + *
 + * When acquiring shared locks it's possible that we disturb an exclusive
 + * waiter. If that's a problem for the specific user, pass in a valid 
 pointer
 + * for 'potentially_spurious'. Its value will be set to true if we possibly
 + * did so. The caller then has to handle that scenario.
 + */
 +static bool
 +LWLockAttemptLock(LWLock* lock, LWLockMode mode, bool *potentially_spurious)
 
 We should invert the return of this function. Current code returns
 true if the lock is actually acquired (see below), and I think that's
 true of other locking code as well. IMHO it makes more sense that way,
 plus consistency is good.

I don't think so. I've wondered about it as well, but the way the
function is used its more consistent imo if it returns whether we must
wait. Note that it's not an exported function.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-09 Thread Jim Nasby


On 10/9/14, 4:57 PM, Andres Freund wrote:

If you modify either, you better grep for them... I don't think that's
going to happen anyway. Requiring it during startup would mean exposing
SHARED_LOCK_MASK outside of lwlock.c which'd be ugly. We could possibly
stick a StaticAssert() someplace in lwlock.c.


Ahh, yeah, exposing it would be ugly.

I just get the heeby-jeebies when I see assumptions like this though. I fear 
there's a bunch of cases where changing something will break a completely 
unrelated part of the system with no warning.

Maybe add an assert() to check it?


And no, it's not comparable at all to MyProc != NULL - the lwlock code
initially*does*  run when MyProc isn't setup. We just better not
conflict against any other lockers at that stage.


Ahh, can you maybe add that detail to the comment? That wasn't clear to me.


 +/*
 + * Internal function that tries to atomically acquire the lwlock in the 
passed
 + * in mode.
 + *
 + * This function will not block waiting for a lock to become free - that's 
the
 + * callers job.
 + *
 + * Returns true if the lock isn't free and we need to wait.
 + *
 + * When acquiring shared locks it's possible that we disturb an exclusive
 + * waiter. If that's a problem for the specific user, pass in a valid 
pointer
 + * for 'potentially_spurious'. Its value will be set to true if we possibly
 + * did so. The caller then has to handle that scenario.
 + */
 +static bool
 +LWLockAttemptLock(LWLock* lock, LWLockMode mode, bool *potentially_spurious)


We should invert the return of this function. Current code returns
true if the lock is actually acquired (see below), and I think that's
true of other locking code as well. IMHO it makes more sense that way,
plus consistency is good.

I don't think so. I've wondered about it as well, but the way the
function is used its more consistent imo if it returns whether we must
wait. Note that it's not an exported function.


ISTM that a function attempting a lock would return success, not failure. Even 
though it's internal now it could certainly be made external at some point in 
the future. But I suppose it's ultimately a matter of preference...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-09 Thread Amit Kapila

On Wed, Oct 8, 2014 at 7:05 PM, Andres Freund and...@2ndquadrant.com
wrote:

 Hi,

 Attached you can find the next version of my LW_SHARED patchset. Now
 that atomics are committed, it seems like a good idea to also add their
 raison d'être.

 Since the last public version I have:
 * Addressed lots of Amit's comments. Thanks!
 * Peformed a fair amount of testing.
 * Rebased the code. The volatile removal made that not entirely
   trivial...
 * Significantly cleaned up and simplified the code.
 * Updated comments and such
 * Fixed a minor bug (unpaired HOLD/RESUME_INTERRUPTS in a corner case)

 The feature currently consists out of two patches:
 1) Convert PGPROC-lwWaitLink into a dlist. The old code was frail and
verbose. This also does:
 * changes the logic in LWLockRelease() to release all shared lockers
   when waking up any. This can yield some significant performance
   improvements - and the fairness isn't really much worse than
   before,
   as we always allowed new shared lockers to jump the queue.

 * adds a memory pg_write_barrier() in the wakeup paths between
   dequeuing and unsetting -lwWaiting. That was always required on
   weakly ordered machines, but f4077cda2 made it more urgent. I can
   reproduce crashes without it.
 2) Implement the wait free LW_SHARED algorithm.


I have done few performance tests for above patches and results of
same is as below:

Performance Data
--
IBM POWER-7 16 cores, 64 hardware threads
RAM = 64GB
max_connections =210
Database Locale =C
checkpoint_segments=256
checkpoint_timeout=35min
shared_buffers=8GB
Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8)
Duration of each individual run = 5mins
Test type - read only pgbench with -M prepared
Other Related information about test
a. This is the data for median of 3 runs, the detailed data of individual
run
is attached with mail.
b. I have applied both the patches to take performance data.

Scale Factor - 100

   Patch_ver/Client_count 1 8 16 32 64 128  HEAD 13344 106921 196629 295123
377846 333928  PATCH 13662 106179 203960 298955 452638 465671

Scale Factor - 3000

   Patch_ver/Client_count 8 16 32 64 128 160  HEAD 86920 152417 231668
280827 257093 255122  PATCH 87552 160313 230677 276186 248609 244372


Observations
--
a. The patch performs really well (increase upto ~40%) incase all the
data fits in shared buffers (scale factor -100).
b. Incase data doesn't fit in shared buffers, but fits in RAM
(scale factor -3000), there is performance increase upto 16 client count,
however after that it starts dipping (in above config unto ~4.4%).

The above data shows that the patch improves performance for cases
when there is shared LWLock contention, however there is a slight
performance dip in case of Exclusive LWLocks (at scale factor 3000,
it needs exclusive LWLocks for buf mapping tables).  Now I am not
sure if this is the worst case dip or under similar configurations the
performance dip can be higher, because the trend shows that dip is
increasing with more client counts.

Brief Analysis of code w.r.t performance dip
-
Extra Instructions w.r.t Head in Acquire Exclusive lock path
a. Attempt lock twice
b. atomic operations for nwaiters in LWLockQueueSelf() and
LWLockAcquireCommon()
c. Now we need to take spinlock twice, once for self queuing and then
again for setting releaseOK.
d. few function calls and some extra checks

Similarly there seems to be few additional instructions in
LWLockRelease() path.

Now probably these shouldn't matter much in case backend needs to
wait for other Exclusive locker, but I am not sure what else could be
the reason for dip in case we need to have Exclusive LWLocks.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


perf_lwlock_contention_data_v1.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:
 2.
 LWLockWakeup()
 {
 ..
 #ifdef LWLOCK_STATS
 lwstats-spin_delay_count += SpinLockAcquire(lock-mutex);
 #else
 SpinLockAcquire(lock-mutex);
 #endif
 ..
 }
 
 Earlier while releasing lock, we don't count it towards LWLock stats
 spin_delay_count.  I think if we see other places in lwlock.c, it only gets
 counted when we try to acquire it in a loop.

I think the previous situation was clearly suboptimal. I've now modified
things so all spinlock acquirations are counted.

 3.
 LWLockRelease()
 {
 ..
 /* grant permission to run, even if a spurious share lock increases
 lockcount */
 else if (mode == LW_EXCLUSIVE  have_waiters)
 check_waiters = true;
 /* nobody has this locked anymore, potential exclusive lockers get a chance
 */
 else if (lockcount == 0  have_waiters)
 check_waiters = true;
 ..
 }
 
 It seems comments have been reversed in above code.

No, they look right. But I've expanded them in the version I'm going to
post in a couple minutes.
 
 5.
 LWLockWakeup()
 {
 ..
 dlist_foreach_modify(iter, (dlist_head *) wakeup)
 {
 PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
 LOG_LWDEBUG(LWLockRelease, l, mode, release waiter);
 dlist_delete(waiter-lwWaitLink);
 pg_write_barrier();
 waiter-lwWaiting = false;
 PGSemaphoreUnlock(waiter-sem);
 }
 ..
 }
 
 Why can't we decrement the nwaiters after waking up? I don't think
 there is any major problem even if callers do that themselves, but
 in some rare cases LWLockRelease() might spuriously assume that
 there are some waiters and tries to call LWLockWakeup().  Although
 this doesn't create any problem, keeping the value sane is good unless
 there is some problem in doing so.

That won't work because then LWLockWakeup() wouldn't be called when
necessary - precisely because nwaiters is 0.

 6.
 LWLockWakeup()
 {
 ..
 dlist_foreach_modify(iter, (dlist_head *) l-waiters)
 {
 ..
 if (wokeup_somebody  waiter-lwWaitMode == LW_EXCLUSIVE)
 continue;
 ..
 if (waiter-lwWaitMode != LW_WAIT_UNTIL_FREE)
 {
 ..
 wokeup_somebody = true;
 }
 ..
 }
 ..
 }
 
 a.
 IIUC above logic, if the waiter queue is as follows:
 (S-Shared; X-Exclusive) S X S S S X S S
 
 it can skip the exclusive waiters and release shared waiter.
 
 If my understanding is right, then I think instead of continue, there
 should be *break* in above logic.

No, it looks correct to me. What happened is that the first S was woken
up. So there's no point in waking up an exclusive locker, but further
non-exclusive lockers can be woken up.

 b.
 Consider below sequence of waiters:
 (S-Shared; X-Exclusive) S S X S S
 
 I think as per un-patched code, it will wakeup waiters uptill (including)
 first Exclusive, but patch will wake up uptill (*excluding*) first
 Exclusive.

I don't think the current code does that. And it'd be a pretty pointless
behaviour, leading to useless increased contention. The only time it'd
make sense for X to be woken up is when it gets run faster than the S
processes.

 7.
 LWLockWakeup()
 {
 ..
 dlist_foreach_modify(iter, (dlist_head *) l-waiters)
 {
 ..
 dlist_delete(waiter-lwWaitLink);
 dlist_push_tail(wakeup, waiter-lwWaitLink);
 ..
 }
 ..
 }
 
 Use of dlist has simplified the code, but I think there might be a slight
 overhead of maintaining wakeup queue as compare to un-patched
 mechanism especially when there is a long waiter queue.

I don't see that as being relevant. The difference is an instruction or
two - in the slow path we'll enter the kernel and sleep. This doesn't
matter in comparison.
And the code is *so* much more readable.

 8.
 LWLockConditionalAcquire()
 {
 ..
 /*
  * We ran into an exclusive lock and might have blocked another
  * exclusive lock from taking a shot because it took a time to back
  * off. Retry till we are either sure we didn't block somebody (because
  * somebody else certainly has the lock) or till we got it.
  *
  * We cannot rely on the two-step lock-acquisition protocol as in
  * LWLockAcquire because we're not using it.
  */
 if (potentially_spurious)
 {
 SPIN_DELAY();
 goto retry;
 }
 ..
 }
 
 Due to above logic, I think it can keep on retrying for long time before
 it actually concludes whether it got lock or not incase other backend/'s
 takes Exclusive lock after *double_check* and release before
 unconditional increment of  shared lock in function LWLockAttemptLock.
 I understand that it might be difficult to have such a practical scenario,
 however still there is a theoratical possibility of same.

I'm not particularly concerned. We could optimize it a bit, but I really
don't think it's necessary.

 Is there any advantage of retrying in LWLockConditionalAcquire()?

It's required for correctness. We only retry if we potentially blocked
an exclusive acquirer (by spuriously incrementing/decrementing lockcount
with 1). We need to be sure to either get the lock (in which case we can
wake up the waiter on release), or be sure that we didn't disturb
anyone.

 9.
 LWLockAcquireOrWait()

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

On 2014-10-08 14:47:44 +0200, Andres Freund wrote:
 On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:
  5.
  LWLockWakeup()
  {
  ..
  dlist_foreach_modify(iter, (dlist_head *) wakeup)
  {
  PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
  LOG_LWDEBUG(LWLockRelease, l, mode, release waiter);
  dlist_delete(waiter-lwWaitLink);
  pg_write_barrier();
  waiter-lwWaiting = false;
  PGSemaphoreUnlock(waiter-sem);
  }
  ..
  }
  
  Why can't we decrement the nwaiters after waking up? I don't think
  there is any major problem even if callers do that themselves, but
  in some rare cases LWLockRelease() might spuriously assume that
  there are some waiters and tries to call LWLockWakeup().  Although
  this doesn't create any problem, keeping the value sane is good unless
  there is some problem in doing so.
 
 That won't work because then LWLockWakeup() wouldn't be called when
 necessary - precisely because nwaiters is 0.

Err, this is bogus. Memory fail.

The reason I've done so is that it's otherwise much harder to debug
issues where there are backend that have been woken up already, but
haven't rerun yet. Without this there's simply no evidence of that
state. I can't see this being relevant for performance, so I'd rather
have it stay that way.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

Hi,

Attached you can find the next version of my LW_SHARED patchset. Now
that atomics are committed, it seems like a good idea to also add their
raison d'être.

Since the last public version I have:
* Addressed lots of Amit's comments. Thanks!
* Peformed a fair amount of testing.
* Rebased the code. The volatile removal made that not entirely
  trivial...
* Significantly cleaned up and simplified the code.
* Updated comments and such
* Fixed a minor bug (unpaired HOLD/RESUME_INTERRUPTS in a corner case)

The feature currently consists out of two patches:
1) Convert PGPROC-lwWaitLink into a dlist. The old code was frail and
   verbose. This also does:
* changes the logic in LWLockRelease() to release all shared lockers
  when waking up any. This can yield some significant performance
  improvements - and the fairness isn't really much worse than
  before,
  as we always allowed new shared lockers to jump the queue.

* adds a memory pg_write_barrier() in the wakeup paths between
  dequeuing and unsetting -lwWaiting. That was always required on
  weakly ordered machines, but f4077cda2 made it more urgent. I can
  reproduce crashes without it.
2) Implement the wait free LW_SHARED algorithm.

Personally I'm quite happy with the new state. I think it needs more
review, but I personally don't know of anything that needs
changing. There's lots of further improvements that could be done, but
let's get this in first.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 6885a15cc6f2e193ff575a4463d90ad252d74f5e Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 7 Oct 2014 15:32:34 +0200
Subject: [PATCH 1/2] Convert the PGPROC-lwWaitLink list into a dlist instead
 of open coding it.

Besides being shorter and much easier to read it:

* changes the logic in LWLockRelease() to release all shared lockers
  when waking up any. This can yield some significant performance
  improvements - and the fairness isn't really much worse than before,
  as we always allowed new shared lockers to jump the queue.

* adds a memory pg_write_barrier() in the wakeup paths between
  dequeuing and unsetting -lwWaiting. That was always required on
  weakly ordered machines, but f4077cda2 made it more urgent.

Author: Andres Freund
---
 src/backend/access/transam/twophase.c |   1 -
 src/backend/storage/lmgr/lwlock.c | 151 +-
 src/backend/storage/lmgr/proc.c   |   2 -
 src/include/storage/lwlock.h  |   5 +-
 src/include/storage/proc.h|   3 +-
 5 files changed, 60 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index d5409a6..6401943 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -389,7 +389,6 @@ MarkAsPreparing(TransactionId xid, const char *gid,
 	proc-roleId = owner;
 	proc-lwWaiting = false;
 	proc-lwWaitMode = 0;
-	proc-lwWaitLink = NULL;
 	proc-waitLock = NULL;
 	proc-waitProcLock = NULL;
 	for (i = 0; i  NUM_LOCK_PARTITIONS; i++)
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 9fe6855..e6f9158 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -35,6 +35,7 @@
 #include miscadmin.h
 #include pg_trace.h
 #include replication/slot.h
+#include storage/barrier.h
 #include storage/ipc.h
 #include storage/predicate.h
 #include storage/proc.h
@@ -115,9 +116,9 @@ inline static void
 PRINT_LWDEBUG(const char *where, const LWLock *lock)
 {
 	if (Trace_lwlocks)
-		elog(LOG, %s(%s %d): excl %d shared %d head %p rOK %d,
+		elog(LOG, %s(%s %d): excl %d shared %d rOK %d,
 			 where, T_NAME(lock), T_ID(lock),
-			 (int) lock-exclusive, lock-shared, lock-head,
+			 (int) lock-exclusive, lock-shared,
 			 (int) lock-releaseOK);
 }
 
@@ -475,8 +476,7 @@ LWLockInitialize(LWLock *lock, int tranche_id)
 	lock-exclusive = 0;
 	lock-shared = 0;
 	lock-tranche = tranche_id;
-	lock-head = NULL;
-	lock-tail = NULL;
+	dlist_init(lock-waiters);
 }
 
 
@@ -615,12 +615,7 @@ LWLockAcquireCommon(LWLock *lock, LWLockMode mode, uint64 *valptr, uint64 val)
 
 		proc-lwWaiting = true;
 		proc-lwWaitMode = mode;
-		proc-lwWaitLink = NULL;
-		if (lock-head == NULL)
-			lock-head = proc;
-		else
-			lock-tail-lwWaitLink = proc;
-		lock-tail = proc;
+		dlist_push_head(lock-waiters, proc-lwWaitLink);
 
 		/* Can release the mutex now */
 		SpinLockRelease(lock-mutex);
@@ -836,12 +831,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 
 		proc-lwWaiting = true;
 		proc-lwWaitMode = LW_WAIT_UNTIL_FREE;
-		proc-lwWaitLink = NULL;
-		if (lock-head == NULL)
-			lock-head = proc;
-		else
-			lock-tail-lwWaitLink = proc;
-		lock-tail = proc;
+		dlist_push_head(lock-waiters, proc-lwWaitLink);
 
 		/* Can release the mutex now */
 		SpinLockRelease(lock-mutex);

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-08 Thread Robert Haas

On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund and...@2ndquadrant.com wrote:
 I don't see that as being relevant. The difference is an instruction or
 two - in the slow path we'll enter the kernel and sleep. This doesn't
 matter in comparison.
 And the code is *so* much more readable.

I find the slist/dlist stuff actually quite difficult to get right
compared to a hand-rolled linked list.  But the really big problem is
that the debugger can't do anything useful with it.  You have to work
out the structure-member offset in order to walk the list and manually
cast to char *, adjust the pointer, and cast back.  That sucks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

On 2014-10-08 13:13:33 -0400, Robert Haas wrote:
 On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund and...@2ndquadrant.com wrote:
  I don't see that as being relevant. The difference is an instruction or
  two - in the slow path we'll enter the kernel and sleep. This doesn't
  matter in comparison.
  And the code is *so* much more readable.
 
 I find the slist/dlist stuff actually quite difficult to get right
 compared to a hand-rolled linked list.

Really? I've spent more than a day debugging things with the current
code. And Heikki introduced a bug in it. If you look at how the code
looks before/after I find the difference pretty clear.

 But the really big problem is
 that the debugger can't do anything useful with it.  You have to work
 out the structure-member offset in order to walk the list and manually
 cast to char *, adjust the pointer, and cast back.  That sucks.

Hm. I can just do that with the debugger here. Not sure if that's
because I added the right thing to my .gdbinit or because I use the
correct compiler flags.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-08 Thread Alvaro Herrera

Robert Haas wrote:
 On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund and...@2ndquadrant.com wrote:
  I don't see that as being relevant. The difference is an instruction or
  two - in the slow path we'll enter the kernel and sleep. This doesn't
  matter in comparison.
  And the code is *so* much more readable.
 
 I find the slist/dlist stuff actually quite difficult to get right
 compared to a hand-rolled linked list.  But the really big problem is
 that the debugger can't do anything useful with it.  You have to work
 out the structure-member offset in order to walk the list and manually
 cast to char *, adjust the pointer, and cast back.  That sucks.

As far as I recall you can get gdb to understand those pointer games
by defining some structs or macros.  Maybe we can improve by documenting
this.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

On 2014-10-08 14:23:44 -0300, Alvaro Herrera wrote:
 Robert Haas wrote:
  On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund and...@2ndquadrant.com 
  wrote:
   I don't see that as being relevant. The difference is an instruction or
   two - in the slow path we'll enter the kernel and sleep. This doesn't
   matter in comparison.
   And the code is *so* much more readable.
  
  I find the slist/dlist stuff actually quite difficult to get right
  compared to a hand-rolled linked list.  But the really big problem is
  that the debugger can't do anything useful with it.  You have to work
  out the structure-member offset in order to walk the list and manually
  cast to char *, adjust the pointer, and cast back.  That sucks.
 
 As far as I recall you can get gdb to understand those pointer games
 by defining some structs or macros.  Maybe we can improve by documenting
 this.

So, what makes it work for me (among other unrelated stuff) seems to be
the following in .gdbinit, defineing away some things that gdb doesn't
handle:
macro define __builtin_offsetof(T, F) ((int) (((T *) 0)-F))
macro define __extension__
macro define AssertVariableIsOfTypeMacro(x, y) ((void)0)

Additionally I have -ggdb -g3 in CFLAGS. That way gdb knows about
postgres' macros. At least if you're in the right scope.

As an example, the following works:
(gdb) p dlist_is_empty(BackendList) ? NULL : dlist_head_element(Backend, elem, 
BackendList)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On 2014-10-08 15:23:22 -0400, Robert Haas wrote:
 On Wed, Oct 8, 2014 at 9:35 AM, Andres Freund and...@2ndquadrant.com wrote:
  1) Convert PGPROC-lwWaitLink into a dlist. The old code was frail and
 verbose. This also does:
  * changes the logic in LWLockRelease() to release all shared lockers
when waking up any. This can yield some significant performance
improvements - and the fairness isn't really much worse than
before,
as we always allowed new shared lockers to jump the queue.
 
  * adds a memory pg_write_barrier() in the wakeup paths between
dequeuing and unsetting -lwWaiting. That was always required on
weakly ordered machines, but f4077cda2 made it more urgent. I can
reproduce crashes without it.

 I think it's a really bad idea to mix a refactoring change (like
 converting PGPROC-lwWaitLink into a dlist) with an attempted
 performance enhancement (like changing the rules for jumping the lock
 queue) and a bug fix (like adding pg_write_barrier where needed).  I'd
 suggest that the last of those be done first, and perhaps
 back-patched.

I think it makes sense to separate out the write barrier one. I don't
really see the point of separating the other two changes.

I've indeed previously started a thread
(http://archives.postgresql.org/message-id/20140210134625.GA15246%40awork2.anarazel.de)
about the barrier issue. IIRC you argued that that might be to
expensive.

 The current coding, using a hand-rolled list, touches shared memory
 fewer times.  When many waiters are awoken at once, we clip them all
 out of the list at one go.  Your revision moves them to a
 backend-private list one at a time, and then pops them off one at a
 time.  The backend-private memory accesses don't seem like they matter
 much, but the shared memory accesses would be nice to avoid.

I can't imagine this to matter.  We're entering the kernel for each PROC
for the PGSemaphoreUnlock() and we're dirtying the cacheline for
proc-lwWaiting = false anyway. This really is the slow path.

 Does LWLockUpdateVar's wake-up loop need a write barrier per
 iteration, or just one before the loop starts?  How about commenting
 the pg_write_barrier() with the read-fence to which it pairs?

Hm. Are you picking out LWLockUpdateVar for a reason or just as an
example? Because I don't see a difference between the different wakeup
loops?
It needs to be a barrier per iteration.

Currently the loop looks like
while (head != NULL)
{
proc = head;
head = proc-lwWaitLink;
proc-lwWaitLink = NULL;
proc-lwWaiting = false;
PGSemaphoreUnlock(proc-sem);
}

Consider what happens if either the compiler or the cpu reorders this
to:
proc-lwWaiting = false;
head = proc-lwWaitLink;
proc-lwWaitLink = NULL;
PGSemaphoreUnlock(proc-sem);

as soon as lwWaiting = false, 'proc' can wake up and acquire a new
lock. Backends can wake up prematurely because proc-sem is used for
other purposes than this (that's why the loops around PGSemaphoreLock
exist). Then it could reset lwWaitLink while acquiring a new lock. And
some processes wouldn't be woken up anymore.

The barrier it pairs with is the spinlock acquiration before
requeuing. To be more obviously correct we could add a read barrier
before
if (!proc-lwWaiting)
break;
but I don't think it's needed.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-08 Thread Robert Haas

On Wed, Oct 8, 2014 at 9:35 AM, Andres Freund and...@2ndquadrant.com wrote:
 2) Implement the wait free LW_SHARED algorithm.

+ * too high for workloads/locks that were locked in shared mode very

s/locked/taken/?

+ * frequently. Often we were spinning in the (obviously exlusive) spinlock,

exclusive.

+ * acquiration for locks that aren't exclusively locked.

acquisition.

+ * For exlusive lock acquisition we use an atomic compare-and-exchange on the

exclusive.

+ * lockcount variable swapping in EXCLUSIVE_LOCK/131-1/0x7FFF if and only

Add comma after variable.  Find some way of describing the special
value (maybe a sentinel value, EXCLUSIVE_LOCK) just once, instead of
three times.

+ * if the current value of lockcount is 0. If the swap was not successfull, we

successful.

+ * by 1 again. If so, we have to wait for the exlusive locker to release the

exclusive.

+ * The attentive reader probably might have noticed that naively doing the

probably might is redundant.  Delete probably.

+ * notice that we have to wait. Unfortunately until we have finished queuing,

until - by the time

+ *   Phase 2: Add us too the waitqueue of the lock

too - to.  And maybe us - ourselves.

+ *get queued in Phase 2 and we can wake them up if neccessary or they will

necessary.

+ * When acquiring shared locks it's possible that we disturb an exclusive
+ * waiter. If that's a problem for the specific user, pass in a valid pointer
+ * for 'potentially_spurious'. Its value will be set to true if we possibly
+ * did so. The caller then has to handle that scenario.

disturb is not clear enough.

+/* yipeyyahee */

Although this will be clear to individuals with a good command of
English, I suggest avoiding such usages.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-08 Thread Robert Haas

On Wed, Oct 8, 2014 at 2:04 PM, Andres Freund and...@2ndquadrant.com wrote:
 So, what makes it work for me (among other unrelated stuff) seems to be
 the following in .gdbinit, defineing away some things that gdb doesn't
 handle:
 macro define __builtin_offsetof(T, F) ((int) (((T *) 0)-F))
 macro define __extension__
 macro define AssertVariableIsOfTypeMacro(x, y) ((void)0)

 Additionally I have -ggdb -g3 in CFLAGS. That way gdb knows about
 postgres' macros. At least if you're in the right scope.

 As an example, the following works:
 (gdb) p dlist_is_empty(BackendList) ? NULL : dlist_head_element(Backend, 
 elem, BackendList)

Ah, cool.  I'll try that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition

2014-07-01 Thread Andres Freund

Hi,

Over at -performance Mark Kirkwood tested a recent version of this
(http://archives.postgresql.org/message-id/53B283F3.7020005%40catalyst.net.nz)
. I thought it's interesting to add the numbers to this thread:

 Test: pgbench
 Options: scale 500
  read only
 Os: Ubuntu 14.04
 Pg: 9.3.4
 Pg Options:
 max_connections = 200
 shared_buffers = 10GB
 maintenance_work_mem = 1GB
 effective_io_concurrency = 10
 wal_buffers = 32MB
 checkpoint_segments = 192
 checkpoint_completion_target = 0.8
 
 
 Results
 
 Clients | 9.3 tps 32 cores | 9.3 tps 60 cores
 +--+-
 6   |  70400   |  71028
 12  |  98918   | 129140
 24  | 230345   | 240631
 48  | 324042   | 409510
 96  | 346929   | 120464
 192 | 312621   |  92663
 
 So we have anti scaling with 60 cores as we increase the client connections.
 Ouch! A level of urgency led to trying out Andres's 'rwlock' 9.4 branch [1]
 - cherry picking the last 5 commits into 9.4 branch and building a package
 from that and retesting:
 
 Clients | 9.4 tps 60 cores (rwlock)
 +--
 6   |  70189
 12  | 128894
 24  | 233542
 48  | 422754
 96  | 590796
 192 | 630672

Now, this is a bit of a skewed comparison due to 9.4 vs. 9.3 but still
interesting.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition

2014-07-01 Thread Heikki Linnakangas


On 07/01/2014 01:08 PM, Andres Freund wrote:

Hi,

Over at -performance Mark Kirkwood tested a recent version of this
(http://archives.postgresql.org/message-id/53B283F3.7020005%40catalyst.net.nz)
. I thought it's interesting to add the numbers to this thread:


Test: pgbench
Options: scale 500
  read only
Os: Ubuntu 14.04
Pg: 9.3.4
Pg Options:
 max_connections = 200
 shared_buffers = 10GB
 maintenance_work_mem = 1GB
 effective_io_concurrency = 10
 wal_buffers = 32MB
 checkpoint_segments = 192
 checkpoint_completion_target = 0.8


Results

Clients | 9.3 tps 32 cores | 9.3 tps 60 cores
+--+-
6   |  70400   |  71028
12  |  98918   | 129140
24  | 230345   | 240631
48  | 324042   | 409510
96  | 346929   | 120464
192 | 312621   |  92663

So we have anti scaling with 60 cores as we increase the client connections.
Ouch! A level of urgency led to trying out Andres's 'rwlock' 9.4 branch [1]
- cherry picking the last 5 commits into 9.4 branch and building a package
from that and retesting:

Clients | 9.4 tps 60 cores (rwlock)
+--
6   |  70189
12  | 128894
24  | 233542
48  | 422754
96  | 590796
192 | 630672


Now, this is a bit of a skewed comparison due to 9.4 vs. 9.3 but still
interesting.


It looks like the issue I reported here:

http://www.postgresql.org/message-id/5190e17b.9060...@vmware.com

fixed by this commit:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b03d196be055450c7260749f17347c2d066b4254.

So, definitely need to compare plain 9.4 vs patched 9.4, not 9.3.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition

2014-07-01 Thread Mark Kirkwood


On 01/07/14 23:25, Heikki Linnakangas wrote:

On 07/01/2014 01:08 PM, Andres Freund wrote:

Hi,

Over at -performance Mark Kirkwood tested a recent version of this
(http://archives.postgresql.org/message-id/53B283F3.7020005%40catalyst.net.nz)

. I thought it's interesting to add the numbers to this thread:


Test: pgbench
Options: scale 500
  read only
Os: Ubuntu 14.04
Pg: 9.3.4
Pg Options:
 max_connections = 200
 shared_buffers = 10GB
 maintenance_work_mem = 1GB
 effective_io_concurrency = 10
 wal_buffers = 32MB
 checkpoint_segments = 192
 checkpoint_completion_target = 0.8


Results

Clients | 9.3 tps 32 cores | 9.3 tps 60 cores
+--+-
6   |  70400   |  71028
12  |  98918   | 129140
24  | 230345   | 240631
48  | 324042   | 409510
96  | 346929   | 120464
192 | 312621   |  92663

So we have anti scaling with 60 cores as we increase the client
connections.
Ouch! A level of urgency led to trying out Andres's 'rwlock' 9.4
branch [1]
- cherry picking the last 5 commits into 9.4 branch and building a
package
from that and retesting:

Clients | 9.4 tps 60 cores (rwlock)
+--
6   |  70189
12  | 128894
24  | 233542
48  | 422754
96  | 590796
192 | 630672


Now, this is a bit of a skewed comparison due to 9.4 vs. 9.3 but still
interesting.


It looks like the issue I reported here:

http://www.postgresql.org/message-id/5190e17b.9060...@vmware.com

fixed by this commit:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b03d196be055450c7260749f17347c2d066b4254.


So, definitely need to compare plain 9.4 vs patched 9.4, not 9.3.



Here's plain 9.4 vs patched 9.4:

Clients | 9.4 tps 60 cores | 9.4 tps 60 cores (rwlock)
+--+--
6   |  69490   |  70189
12  | 128200   | 128894
24  | 232243   | 233542
48  | 417689   | 422754
96  | 464037   | 590796
192 | 418252   | 630672

It appears that plain 9.4 does not exhibit the dramatic anti scaling 
that 9.3 showed, but there is still evidence of some contention in the 
higher client numbers, and we peak at the 96 client mark. The patched 
variant looks pretty much free from this, still scaling at 192 
connections (might have been interesting to try more, but had 
max_connections set to 200)!


Cheers

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-25 Thread Amit Kapila

On Tue, Jun 24, 2014 at 9:33 AM, Amit Kapila amit.kapil...@gmail.com
wrote:
 On Mon, Jun 23, 2014 at 9:12 PM, Andres Freund and...@2ndquadrant.com
wrote:
  On 2014-06-23 19:59:10 +0530, Amit Kapila wrote:
   7.
   LWLockWaitForVar()
   {
   ..
   /*
* Add myself to wait queue. Note that this is racy, somebody else
* could wakeup before we're finished queuing.
* NB: We're using nearly the same twice-in-a-row lock acquisition
* protocol as LWLockAcquire(). Check its comments for details.
*/
   LWLockQueueSelf(l, LW_WAIT_UNTIL_FREE);
  
   /* we're now guaranteed to be woken up if necessary */
mustwait = LWLockAttemptLock(l, LW_EXCLUSIVE, false,
   potentially_spurious);
   }
  
   Why is it important to Attempt lock after queuing in this case, can't
   we just re-check exclusive lock as done before queuing?
 
  Well, that's how Heikki designed LWLockWaitForVar().

 In that case I might be missing some point here, un-patched code of
 LWLockWaitForVar() never tries to acquire the lock, but the new code
 does so.  Basically I am not able to think what is the problem if we just
 do below after queuing:
 mustwait = pg_atomic_read_u32(lock-lockcount) != 0;

 Could you please explain what is the problem in just rechecking?


I have further reviewed the lwlock related changes and thought
its good to share my findings with you. This completes my initial
review for lwlock related changes and below are my findings:

1.
LWLockRelease()
{
..
TRACE_POSTGRESQL_LWLOCK_RELEASE(T_NAME(l), T_ID(l));
}

Dynamic tracing macro seems to be omitted from LWLockRelease()
call.

2.
LWLockWakeup()
{
..
#ifdef LWLOCK_STATS
lwstats-spin_delay_count += SpinLockAcquire(lock-mutex);
#else
SpinLockAcquire(lock-mutex);
#endif
..
}

Earlier while releasing lock, we don't count it towards LWLock stats
spin_delay_count.  I think if we see other places in lwlock.c, it only gets
counted when we try to acquire it in a loop.

3.
LWLockRelease()
{
..
/* grant permission to run, even if a spurious share lock increases
lockcount */
else if (mode == LW_EXCLUSIVE  have_waiters)
check_waiters = true;
/* nobody has this locked anymore, potential exclusive lockers get a chance
*/
else if (lockcount == 0  have_waiters)
check_waiters = true;
..
}

It seems comments have been reversed in above code.

4.
LWLockWakeup()
{
..
dlist_foreach_modify(iter, (dlist_head *) l-waiters)
..
}

Shouldn't we need to use volatile variable in above loop (lock instead of
l)?

5.
LWLockWakeup()
{
..
dlist_foreach_modify(iter, (dlist_head *) wakeup)
{
PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
LOG_LWDEBUG(LWLockRelease, l, mode, release waiter);
dlist_delete(waiter-lwWaitLink);
pg_write_barrier();
waiter-lwWaiting = false;
PGSemaphoreUnlock(waiter-sem);
}
..
}

Why can't we decrement the nwaiters after waking up? I don't think
there is any major problem even if callers do that themselves, but
in some rare cases LWLockRelease() might spuriously assume that
there are some waiters and tries to call LWLockWakeup().  Although
this doesn't create any problem, keeping the value sane is good unless
there is some problem in doing so.

6.
LWLockWakeup()
{
..
dlist_foreach_modify(iter, (dlist_head *) l-waiters)
{
..
if (wokeup_somebody  waiter-lwWaitMode == LW_EXCLUSIVE)
continue;
..
if (waiter-lwWaitMode != LW_WAIT_UNTIL_FREE)
{
..
wokeup_somebody = true;
}
..
}
..
}

a.
IIUC above logic, if the waiter queue is as follows:
(S-Shared; X-Exclusive) S X S S S X S S

it can skip the exclusive waiters and release shared waiter.

If my understanding is right, then I think instead of continue, there
should be *break* in above logic.

b.
Consider below sequence of waiters:
(S-Shared; X-Exclusive) S S X S S

I think as per un-patched code, it will wakeup waiters uptill (including)
first Exclusive, but patch will wake up uptill (*excluding*) first
Exclusive.

7.
LWLockWakeup()
{
..
dlist_foreach_modify(iter, (dlist_head *) l-waiters)
{
..
dlist_delete(waiter-lwWaitLink);
dlist_push_tail(wakeup, waiter-lwWaitLink);
..
}
..
}

Use of dlist has simplified the code, but I think there might be a slight
overhead of maintaining wakeup queue as compare to un-patched
mechanism especially when there is a long waiter queue.

8.
LWLockConditionalAcquire()
{
..
/*
 * We ran into an exclusive lock and might have blocked another
 * exclusive lock from taking a shot because it took a time to back
 * off. Retry till we are either sure we didn't block somebody (because
 * somebody else certainly has the lock) or till we got it.
 *
 * We cannot rely on the two-step lock-acquisition protocol as in
 * LWLockAcquire because we're not using it.
 */
if (potentially_spurious)
{
SPIN_DELAY();
goto retry;
}
..
}

Due to above logic, I think it can keep on retrying for long time before
it actually concludes whether it got lock or not incase other backend/'s
takes Exclusive lock after *double_check* and release before
unconditional increment of  shared lock in function

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-23 Thread Amit Kapila

On Tue, Jun 17, 2014 at 8:56 PM, Andres Freund and...@2ndquadrant.com
wrote:
 On 2014-06-17 20:47:51 +0530, Amit Kapila wrote:
  On Tue, Jun 17, 2014 at 6:35 PM, Andres Freund and...@2ndquadrant.com
  wrote:
 
  You have followed it pretty well as far as I can understand from your
  replies, as there is no reproducible test (which I think is bit tricky
to
  prepare), so it becomes difficult to explain by theory.

 I'm working an updated patch that moves the releaseOK into the
 spinlocks. Maybe that's the problem already - it's certainly not correct
 as is.

Sure, I will do the test/performance test with updated patch; you
might want to include some more changes based on comments
in mail below:

  You are right, it will wakeup the existing waiters, but I think the
  new logic has one difference which is that it can allow the backend to
  take Exclusive lock when there are already waiters in queue.  As per
  above example even though Session-2 and Session-3 are in wait
  queue, Session-4 will be able to acquire Exclusive lock which I think
  was previously not possible.

 I think that was previously possible as well - in a slightly different
 set of circumstances though. After a process releases a lock and wakes
 up some of several waiters another process can come in and acquire the
 lock. Before the woken up process gets scheduled again. lwlocks aren't
 fair locks...

Okay, but I think changing behaviour for lwlocks might impact some
tests/applications.  As they are not fair, I think defining exact
behaviour is not easy and we don't have any concrete scenario which
can be effected, so there should not be problem in accepting
slightly different behaviour.

Few more comments:

1.
LWLockAcquireCommon()
{
..
iterations++;
}

In current logic, I could not see any use of these *iterations* variable.

2.
LWLockAcquireCommon()
{
..
if (!LWLockDequeueSelf(l))
{
/*
* Somebody else dequeued us and has or will..
 ..
*/
for (;;)
{
 PGSemaphoreLock(proc-sem, false);
if (!proc-lwWaiting)
break;
 extraWaits++;
}
lock-releaseOK = true;
}

Do we want to set result = false; after waking in above code?
The idea behind setting false is to indicate whether we get the lock
immediately or not which previously was decided based on if it needs
to queue itself?

3.
LWLockAcquireCommon()
{
..
/*
 * Ok, at this point we couldn't grab the lock on the first try. We
 * cannot simply queue ourselves to the end of the list and wait to be
 * woken up because by now the lock could long have been released.
 * Instead add us to the queue and try to grab the lock again. If we
 * suceed we need to revert the queuing and be happy, otherwise we
 * recheck the lock. If we still couldn't grab it, we know that the
 * other lock will see our queue entries when releasing since they
 * existed before we checked for the lock.
 */
/* add to the queue */
LWLockQueueSelf(l, mode);

/* we're now guaranteed to be woken up if necessary */
mustwait = LWLockAttemptLock(l, mode, false, potentially_spurious);
..
}

a. By reading above code and comments, it is not quite clear why
second attempt is important unless somebody thinks on it or refer
your comments in *Notes* section at top of file.  I think it's better to
indicate in some way so that code reader can refer to Notes section or
whereever you are planing to keep those comments.

b. There is typo in above comment suceed/succeed.


4.
LWLockAcquireCommon()
{
..
if (!LWLockDequeueSelf(l))
{
 for (;;)
{
PGSemaphoreLock(proc-sem, false);
 if (!proc-lwWaiting)
break;
extraWaits++;
 }
lock-releaseOK = true;
..
}

Setting releaseOK in above context might not be required  because if the
control comes in this part of code, it will not retry to acquire another
time.

5.
LWLockWaitForVar()
{
..
else
mustwait = false;

if (!mustwait)
break;
..
}

I think we can directly break in else part in above code.

6.
LWLockWaitForVar()
{
..
/*
 * Quick test first to see if it the slot is free right now.
 *
 * XXX: the caller uses a spinlock before this,...
 */
if (pg_atomic_read_u32(lock-lockcount) == 0)
 return true;
}

Does the part of comment that refers to spinlock is still relevant
after using atomic ops?

7.
LWLockWaitForVar()
{
..
/*
 * Add myself to wait queue. Note that this is racy, somebody else
 * could wakeup before we're finished queuing.
 * NB: We're using nearly the same twice-in-a-row lock acquisition
 * protocol as LWLockAcquire(). Check its comments for details.
 */
LWLockQueueSelf(l, LW_WAIT_UNTIL_FREE);

/* we're now guaranteed to be woken up if necessary */
 mustwait = LWLockAttemptLock(l, LW_EXCLUSIVE, false,
potentially_spurious);
}

Why is it important to Attempt lock after queuing in this case, can't
we just re-check exclusive lock as done before queuing?

8.
LWLockWaitForVar()
{
..
PRINT_LWDEBUG(LWLockAcquire undo queue, lock, mode);
 break;
}
else
{
 PRINT_LWDEBUG(LWLockAcquire waiting 4, lock, mode);
}
..
}

a. I think instead of LWLockAcquire, here we should use
   LWLockWaitForVar
b. Isn't it better

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-23 Thread Andres Freund

On 2014-06-23 19:59:10 +0530, Amit Kapila wrote:
 On Tue, Jun 17, 2014 at 8:56 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  On 2014-06-17 20:47:51 +0530, Amit Kapila wrote:
   On Tue, Jun 17, 2014 at 6:35 PM, Andres Freund and...@2ndquadrant.com
   wrote:
  
   You have followed it pretty well as far as I can understand from your
   replies, as there is no reproducible test (which I think is bit tricky
 to
   prepare), so it becomes difficult to explain by theory.
 
  I'm working an updated patch that moves the releaseOK into the
  spinlocks. Maybe that's the problem already - it's certainly not correct
  as is.
 
 Sure, I will do the test/performance test with updated patch; you
 might want to include some more changes based on comments
 in mail below:

I'm nearly finished in cleaning up the atomics part of the patch which
also includes a bit of cleanup of the lwlocks code.

 Few more comments:
 
 1.
 LWLockAcquireCommon()
 {
 ..
 iterations++;
 }
 
 In current logic, I could not see any use of these *iterations* variable.

It's useful for debugging. Should be gone in the final code.

 2.
 LWLockAcquireCommon()
 {
 ..
 if (!LWLockDequeueSelf(l))
 {
 /*
 * Somebody else dequeued us and has or will..
  ..
 */
 for (;;)
 {
  PGSemaphoreLock(proc-sem, false);
 if (!proc-lwWaiting)
 break;
  extraWaits++;
 }
 lock-releaseOK = true;
 }
 
 Do we want to set result = false; after waking in above code?
 The idea behind setting false is to indicate whether we get the lock
 immediately or not which previously was decided based on if it needs
 to queue itself?

Hm. I don't think it's clear which version is better.

 3.
 LWLockAcquireCommon()
 {
 ..
 /*
  * Ok, at this point we couldn't grab the lock on the first try. We
  * cannot simply queue ourselves to the end of the list and wait to be
  * woken up because by now the lock could long have been released.
  * Instead add us to the queue and try to grab the lock again. If we
  * suceed we need to revert the queuing and be happy, otherwise we
  * recheck the lock. If we still couldn't grab it, we know that the
  * other lock will see our queue entries when releasing since they
  * existed before we checked for the lock.
  */
 /* add to the queue */
 LWLockQueueSelf(l, mode);
 
 /* we're now guaranteed to be woken up if necessary */
 mustwait = LWLockAttemptLock(l, mode, false, potentially_spurious);
 ..
 }
 
 a. By reading above code and comments, it is not quite clear why
 second attempt is important unless somebody thinks on it or refer
 your comments in *Notes* section at top of file.  I think it's better to
 indicate in some way so that code reader can refer to Notes section or
 whereever you are planing to keep those comments.

Ok.

 b. There is typo in above comment suceed/succeed.

Thanks, fixed.

 
 4.
 LWLockAcquireCommon()
 {
 ..
 if (!LWLockDequeueSelf(l))
 {
  for (;;)
 {
 PGSemaphoreLock(proc-sem, false);
  if (!proc-lwWaiting)
 break;
 extraWaits++;
  }
 lock-releaseOK = true;
 ..
 }
 
 Setting releaseOK in above context might not be required  because if the
 control comes in this part of code, it will not retry to acquire another
 time.

Hm. You're probably right.

 5.
 LWLockWaitForVar()
 {
 ..
 else
 mustwait = false;
 
 if (!mustwait)
 break;
 ..
 }
 
 I think we can directly break in else part in above code.

Well, there's another case of mustwait=false above which is triggered
while the spinlock is held. Don't think it'd get simpler.

 6.
 LWLockWaitForVar()
 {
 ..
 /*
  * Quick test first to see if it the slot is free right now.
  *
  * XXX: the caller uses a spinlock before this,...
  */
 if (pg_atomic_read_u32(lock-lockcount) == 0)
  return true;
 }
 
 Does the part of comment that refers to spinlock is still relevant
 after using atomic ops?

Yes. pg_atomic_read_u32() isn't a memory barrier (and explicitly
documented not to be).

 7.
 LWLockWaitForVar()
 {
 ..
 /*
  * Add myself to wait queue. Note that this is racy, somebody else
  * could wakeup before we're finished queuing.
  * NB: We're using nearly the same twice-in-a-row lock acquisition
  * protocol as LWLockAcquire(). Check its comments for details.
  */
 LWLockQueueSelf(l, LW_WAIT_UNTIL_FREE);
 
 /* we're now guaranteed to be woken up if necessary */
  mustwait = LWLockAttemptLock(l, LW_EXCLUSIVE, false,
 potentially_spurious);
 }
 
 Why is it important to Attempt lock after queuing in this case, can't
 we just re-check exclusive lock as done before queuing?

Well, that's how Heikki designed LWLockWaitForVar().

 8.
 LWLockWaitForVar()
 {
 ..
 PRINT_LWDEBUG(LWLockAcquire undo queue, lock, mode);
  break;
 }
 else
 {
  PRINT_LWDEBUG(LWLockAcquire waiting 4, lock, mode);
 }
 ..
 }
 
 a. I think instead of LWLockAcquire, here we should use
LWLockWaitForVar

right.

 b. Isn't it better to use LOG_LWDEBUG instead of PRINT_LWDEBUG(),
 as PRINT_LWDEBUG() is generally used in file at entry of functions to
 log info about locks?

Fine with me.

 9.
 LWLockUpdateVar()
 {
 /*

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-23 Thread Amit Kapila

On Mon, Jun 23, 2014 at 9:12 PM, Andres Freund and...@2ndquadrant.com
wrote:
 On 2014-06-23 19:59:10 +0530, Amit Kapila wrote:
  On Tue, Jun 17, 2014 at 8:56 PM, Andres Freund and...@2ndquadrant.com
  wrote:
  2.
  LWLockAcquireCommon()
  {
  ..
  if (!LWLockDequeueSelf(l))
  {
  /*
  * Somebody else dequeued us and has or will..
   ..
  */
  for (;;)
  {
   PGSemaphoreLock(proc-sem, false);
  if (!proc-lwWaiting)
  break;
   extraWaits++;
  }
  lock-releaseOK = true;
  }
 
  Do we want to set result = false; after waking in above code?
  The idea behind setting false is to indicate whether we get the lock
  immediately or not which previously was decided based on if it needs
  to queue itself?

 Hm. I don't think it's clear which version is better.

I thought if we get the lock at first attempt, then result should be
true which seems to be clear, but for the case of second attempt you
are right that it's not clear.  In such a case, I think we can go either
way and then later during tests or otherwise if any problem is discovered,
we can revert it.

  7.
  LWLockWaitForVar()
  {
  ..
  /*
   * Add myself to wait queue. Note that this is racy, somebody else
   * could wakeup before we're finished queuing.
   * NB: We're using nearly the same twice-in-a-row lock acquisition
   * protocol as LWLockAcquire(). Check its comments for details.
   */
  LWLockQueueSelf(l, LW_WAIT_UNTIL_FREE);
 
  /* we're now guaranteed to be woken up if necessary */
   mustwait = LWLockAttemptLock(l, LW_EXCLUSIVE, false,
  potentially_spurious);
  }
 
  Why is it important to Attempt lock after queuing in this case, can't
  we just re-check exclusive lock as done before queuing?

 Well, that's how Heikki designed LWLockWaitForVar().

In that case I might be missing some point here, un-patched code of
LWLockWaitForVar() never tries to acquire the lock, but the new code
does so.  Basically I am not able to think what is the problem if we just
do below after queuing:
mustwait = pg_atomic_read_u32(lock-lockcount) != 0;

Could you please explain what is the problem in just rechecking?

   I think both are actually critical for performance... Otherwise even a
   only lightly contended lock would require scheduler activity when a
   processes tries to lock something twice. Given the frequency we
acquire
   some locks with that'd be disastrous...
 
  Do you have any suggestion how both behaviours can be retained?

 Not sure what you mean.

I just wanted to say that current behaviour of releaseOK seems to
be of use for some cases and if you want to change it, then would it
retain the current behaviour we get by releaseOK?

I understand that till now your patch has not changed anything specific
to releaseOK, but by above discussion I got the impression that you are
planing to change it, that's why I had asked above question.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-23 Thread Amit Kapila

On Mon, Jun 23, 2014 at 9:12 PM, Andres Freund and...@2ndquadrant.com
wrote:
 On 2014-06-23 19:59:10 +0530, Amit Kapila wrote:
  12.
  #ifdef LWLOCK_DEBUG
  lock-owner = MyProc;
  #endif
 
  Shouldn't it be reset in LWLockRelease?

 That's actually intentional. It's quite useful to know the last owner
 when debugging lwlock code.

Won't it cause any problem if the last owner process exits?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-17 Thread Amit Kapila

On Fri, May 23, 2014 at 10:01 PM, Amit Kapila amit.kapil...@gmail.com
wrote:
 On Fri, Jan 31, 2014 at 3:24 PM, Andres Freund and...@2ndquadrant.com
wrote:
  I've pushed a rebased version of the patchset to
  http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git
  branch rwlock contention.
  220b34331f77effdb46798ddd7cca0cffc1b2858 actually was the small problem,
  ea9df812d8502fff74e7bc37d61bdc7d66d77a7f was the major PITA.

 As per discussion in developer meeting, I wanted to test shared
 buffer scaling patch with this branch.  I am getting merge
 conflicts as per HEAD.  Could you please get it resolved, so that
 I can get the data.

I have started looking into this patch and have few questions/
findings which are shared below:

1. I think stats for lwstats-ex_acquire_count will be counted twice,
first it is incremented in LWLockAcquireCommon() and then in
LWLockAttemptLock()

2.
Handling of potentialy_spurious case seems to be pending
in LWLock functions like LWLockAcquireCommon().

LWLockAcquireCommon()
{
..
/* add to the queue */
LWLockQueueSelf(l, mode);

/* we're now guaranteed to be woken up if necessary */
mustwait = LWLockAttemptLock(l, mode, false, potentially_spurious);

}

I think it can lead to some problems, example:

Session -1
1. Acquire Exclusive LWlock

Session -2
1. Acquire Shared LWlock

1a. Unconditionally incrementing shared count by session-2

Session -1
2. Release Exclusive lock

Session -3
1. Acquire Exclusive LWlock
It will put itself to wait queue by seeing the lock count incremented
by Session-2

Session-2
1b. Decrement the shared count and add it to wait queue.

Session-4
1. Acquire Exclusive lock
   This session will get the exclusive lock, because even
   though other lockers are waiting, lockcount is zero.

Session-2
2. Try second time to take shared lock, it won't get
   as session-4 already has an exclusive lock, so it will
   start waiting

Session-4
2. Release Exclusive lock
   it will not wake the waiters because waiters have been added
   before acquiring this lock.

So in above scenario, Session-3 and Session-2 are waiting in queue
with nobody to awake them.

I have not reproduced the exact scenario above,
so I might be missing some thing which will not
lead to above situation.

3.
LWLockAcquireCommon()
{
for (;;)
{
PGSemaphoreLock(proc-sem, false);
if (!proc-lwWaiting)
..
}
proc-lwWaiting is checked, updated without spinklock where
as previously it was done under spinlock, won't it be unsafe?

4.
LWLockAcquireCommon()
{
..
for (;;)
{
/* false means cannot accept cancel/die interrupt here. */
PGSemaphoreLock(proc-sem, false);
if (!proc-lwWaiting)
break;
extraWaits++;
}
lock-releaseOK = true;
}

lock-releaseOK is updated/checked without spinklock where
as previously it was done under spinlock, won't it be unsafe?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-17 Thread Andres Freund

On 2014-06-17 12:41:26 +0530, Amit Kapila wrote:
 On Fri, May 23, 2014 at 10:01 PM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  On Fri, Jan 31, 2014 at 3:24 PM, Andres Freund and...@2ndquadrant.com
 wrote:
   I've pushed a rebased version of the patchset to
   http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git
   branch rwlock contention.
   220b34331f77effdb46798ddd7cca0cffc1b2858 actually was the small problem,
   ea9df812d8502fff74e7bc37d61bdc7d66d77a7f was the major PITA.
 
  As per discussion in developer meeting, I wanted to test shared
  buffer scaling patch with this branch.  I am getting merge
  conflicts as per HEAD.  Could you please get it resolved, so that
  I can get the data.

 I have started looking into this patch and have few questions/
 findings which are shared below:

 1. I think stats for lwstats-ex_acquire_count will be counted twice,
 first it is incremented in LWLockAcquireCommon() and then in
 LWLockAttemptLock()

Hrmpf. Will fix.

 2.
 Handling of potentialy_spurious case seems to be pending
 in LWLock functions like LWLockAcquireCommon().

 LWLockAcquireCommon()
 {
 ..
 /* add to the queue */
 LWLockQueueSelf(l, mode);

 /* we're now guaranteed to be woken up if necessary */
 mustwait = LWLockAttemptLock(l, mode, false, potentially_spurious);

 }

 I think it can lead to some problems, example:

 Session -1
 1. Acquire Exclusive LWlock

 Session -2
 1. Acquire Shared LWlock

 1a. Unconditionally incrementing shared count by session-2

 Session -1
 2. Release Exclusive lock

 Session -3
 1. Acquire Exclusive LWlock
 It will put itself to wait queue by seeing the lock count incremented
 by Session-2

 Session-2
 1b. Decrement the shared count and add it to wait queue.

 Session-4
 1. Acquire Exclusive lock
This session will get the exclusive lock, because even
though other lockers are waiting, lockcount is zero.

 Session-2
 2. Try second time to take shared lock, it won't get
as session-4 already has an exclusive lock, so it will
start waiting

 Session-4
 2. Release Exclusive lock
it will not wake the waiters because waiters have been added
before acquiring this lock.

I don't understand this step here? When releasing the lock it'll notice
that the waiters is  0 and acquire the spinlock which should protect
against badness here?

 3.
 LWLockAcquireCommon()
 {
 for (;;)
 {
 PGSemaphoreLock(proc-sem, false);
 if (!proc-lwWaiting)
 ..
 }
 proc-lwWaiting is checked, updated without spinklock where
 as previously it was done under spinlock, won't it be unsafe?

It was previously checked/unset without a spinlock as well:
/*
 * Awaken any waiters I removed from the queue.
 */
while (head != NULL)
{
LOG_LWDEBUG(LWLockRelease, T_NAME(l), T_ID(l), release 
waiter);
proc = head;
head = proc-lwWaitLink;
proc-lwWaitLink = NULL;
proc-lwWaiting = false;
PGSemaphoreUnlock(proc-sem);
}
I don't think there's dangers here, lwWaiting will only ever be
manipulated by the the PGPROC's owner. As discussed elsewhere there
needs to be a write barrier before the proc-lwWaiting = false, even in
upstream code.

 4.
 LWLockAcquireCommon()
 {
 ..
 for (;;)
 {
 /* false means cannot accept cancel/die interrupt here. */
 PGSemaphoreLock(proc-sem, false);
 if (!proc-lwWaiting)
 break;
 extraWaits++;
 }
 lock-releaseOK = true;
 }

 lock-releaseOK is updated/checked without spinklock where
 as previously it was done under spinlock, won't it be unsafe?

Hm. That's probably buggy. Good catch. Especially if you have a compiler
that does byte manipulation by reading e.g. 4 bytes from a struct and
then write the wider variable back... So the releaseOk bit needs to move
into LWLockDequeueSelf().

Thanks for looking!

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-17 Thread Amit Kapila

On Tue, Jun 17, 2014 at 3:56 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-06-17 12:41:26 +0530, Amit Kapila wrote:
  On Fri, May 23, 2014 at 10:01 PM, Amit Kapila amit.kapil...@gmail.com
  wrote:
   On Fri, Jan 31, 2014 at 3:24 PM, Andres Freund and...@2ndquadrant.com

  wrote:
I've pushed a rebased version of the patchset to
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git
branch rwlock contention.
220b34331f77effdb46798ddd7cca0cffc1b2858 actually was the small
problem,
ea9df812d8502fff74e7bc37d61bdc7d66d77a7f was the major PITA.
  
   As per discussion in developer meeting, I wanted to test shared
   buffer scaling patch with this branch.  I am getting merge
   conflicts as per HEAD.  Could you please get it resolved, so that
   I can get the data.
 
  I have started looking into this patch and have few questions/
  findings which are shared below:
 
  1. I think stats for lwstats-ex_acquire_count will be counted twice,
  first it is incremented in LWLockAcquireCommon() and then in
  LWLockAttemptLock()

 Hrmpf. Will fix.

  2.
  Handling of potentialy_spurious case seems to be pending
  in LWLock functions like LWLockAcquireCommon().
 
  LWLockAcquireCommon()
  {
  ..
  /* add to the queue */
  LWLockQueueSelf(l, mode);
 
  /* we're now guaranteed to be woken up if necessary */
  mustwait = LWLockAttemptLock(l, mode, false, potentially_spurious);
 
  }
 
  I think it can lead to some problems, example:
 
  Session -1
  1. Acquire Exclusive LWlock
 
  Session -2
  1. Acquire Shared LWlock
 
  1a. Unconditionally incrementing shared count by session-2
 
  Session -1
  2. Release Exclusive lock
 
  Session -3
  1. Acquire Exclusive LWlock
  It will put itself to wait queue by seeing the lock count incremented
  by Session-2
 
  Session-2
  1b. Decrement the shared count and add it to wait queue.
 
  Session-4
  1. Acquire Exclusive lock
 This session will get the exclusive lock, because even
 though other lockers are waiting, lockcount is zero.
 
  Session-2
  2. Try second time to take shared lock, it won't get
 as session-4 already has an exclusive lock, so it will
 start waiting
 
  Session-4
  2. Release Exclusive lock
 it will not wake the waiters because waiters have been added
 before acquiring this lock.

 I don't understand this step here? When releasing the lock it'll notice
 that the waiters is  0 and acquire the spinlock which should protect
 against badness here?

While Releasing lock, I think it will not go to Wakeup waiters
(LWLockWakeup), because releaseOK will be false.  releaseOK
can be set as false when Session-1 has Released Exclusive lock
and wakedup some previous waiter.  Once it is set to false, it can
be reset to true only for retry logic(after getting semaphore).

  3.
 I don't think there's dangers here, lwWaiting will only ever be
 manipulated by the the PGPROC's owner. As discussed elsewhere there
 needs to be a write barrier before the proc-lwWaiting = false, even in
 upstream code.

Agreed.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-17 Thread Andres Freund

On 2014-06-17 18:01:58 +0530, Amit Kapila wrote:
 On Tue, Jun 17, 2014 at 3:56 PM, Andres Freund and...@2ndquadrant.com
  On 2014-06-17 12:41:26 +0530, Amit Kapila wrote:
   2.
   Handling of potentialy_spurious case seems to be pending
   in LWLock functions like LWLockAcquireCommon().
  
   LWLockAcquireCommon()
   {
   ..
   /* add to the queue */
   LWLockQueueSelf(l, mode);
  
   /* we're now guaranteed to be woken up if necessary */
   mustwait = LWLockAttemptLock(l, mode, false, potentially_spurious);
  
   }
  
   I think it can lead to some problems, example:
  
   Session -1
   1. Acquire Exclusive LWlock
  
   Session -2
   1. Acquire Shared LWlock
  
   1a. Unconditionally incrementing shared count by session-2
  
   Session -1
   2. Release Exclusive lock
  
   Session -3
   1. Acquire Exclusive LWlock
   It will put itself to wait queue by seeing the lock count incremented
   by Session-2
  
   Session-2
   1b. Decrement the shared count and add it to wait queue.
  
   Session-4
   1. Acquire Exclusive lock
  This session will get the exclusive lock, because even
  though other lockers are waiting, lockcount is zero.
  
   Session-2
   2. Try second time to take shared lock, it won't get
  as session-4 already has an exclusive lock, so it will
  start waiting
  
   Session-4
   2. Release Exclusive lock
  it will not wake the waiters because waiters have been added
  before acquiring this lock.
 
  I don't understand this step here? When releasing the lock it'll notice
  that the waiters is  0 and acquire the spinlock which should protect
  against badness here?
 
 While Releasing lock, I think it will not go to Wakeup waiters
 (LWLockWakeup), because releaseOK will be false.  releaseOK
 can be set as false when Session-1 has Released Exclusive lock
 and wakedup some previous waiter.  Once it is set to false, it can
 be reset to true only for retry logic(after getting semaphore).

I unfortunately still can't follow. If Session-1 woke up some previous
waiter the woken up process will set releaseOK to true again when it
loops to acquire the lock?


Somewhat unrelated:

I have a fair amount of doubt about the effectiveness of the releaseOK
logic (which imo also is pretty poorly documented).
Essentially its intent is to avoid unneccessary scheduling when other
processes have already been woken up (i.e. releaseOK has been set to
false). I believe the theory is that if any process has already been
woken up it's pointless to wake up additional processes
(i.e. PGSemaphoreUnlock()) because the originally woken up process will
wake up at some point. But if the to-be-woken up process is scheduled
out because it used all his last timeslices fully that means we'll not
wakeup other waiters for a relatively long time.

It's been introduced in the course of
5b9a058384e714b89e050fc0b6381f97037c665a whose logic generally is rather
sound - I just doubt that the releaseOK part is necessary.

It'd certainly interesting to rip releaseOK out and benchmark the
result... My theory is that the average latency will go down on busy
systems that aren't IO bound.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-17 Thread Amit Kapila

On Tue, Jun 17, 2014 at 6:35 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-06-17 18:01:58 +0530, Amit Kapila wrote:
  On Tue, Jun 17, 2014 at 3:56 PM, Andres Freund and...@2ndquadrant.com
   On 2014-06-17 12:41:26 +0530, Amit Kapila wrote:
2.
Handling of potentialy_spurious case seems to be pending
in LWLock functions like LWLockAcquireCommon().
   
LWLockAcquireCommon()
{
..
/* add to the queue */
LWLockQueueSelf(l, mode);
   
/* we're now guaranteed to be woken up if necessary */
mustwait = LWLockAttemptLock(l, mode, false, potentially_spurious);
   
}
   
I think it can lead to some problems, example:
   
Session -1
1. Acquire Exclusive LWlock
   
Session -2
1. Acquire Shared LWlock
   
1a. Unconditionally incrementing shared count by session-2
   
Session -1
2. Release Exclusive lock
   
Session -3
1. Acquire Exclusive LWlock
It will put itself to wait queue by seeing the lock count
incremented
by Session-2
   
Session-2
1b. Decrement the shared count and add it to wait queue.
   
Session-4
1. Acquire Exclusive lock
   This session will get the exclusive lock, because even
   though other lockers are waiting, lockcount is zero.
   
Session-2
2. Try second time to take shared lock, it won't get
   as session-4 already has an exclusive lock, so it will
   start waiting
   
Session-4
2. Release Exclusive lock
   it will not wake the waiters because waiters have been added
   before acquiring this lock.
  
   I don't understand this step here? When releasing the lock it'll
notice
   that the waiters is  0 and acquire the spinlock which should protect
   against badness here?
 
  While Releasing lock, I think it will not go to Wakeup waiters
  (LWLockWakeup), because releaseOK will be false.  releaseOK
  can be set as false when Session-1 has Released Exclusive lock
  and wakedup some previous waiter.  Once it is set to false, it can
  be reset to true only for retry logic(after getting semaphore).

 I unfortunately still can't follow.

You have followed it pretty well as far as I can understand from your
replies, as there is no reproducible test (which I think is bit tricky to
prepare), so it becomes difficult to explain by theory.

 If Session-1 woke up some previous
 waiter the woken up process will set releaseOK to true again when it
 loops to acquire the lock?

You are right, it will wakeup the existing waiters, but I think the
new logic has one difference which is that it can allow the backend to
take Exclusive lock when there are already waiters in queue.  As per
above example even though Session-2 and Session-3 are in wait
queue, Session-4 will be able to acquire Exclusive lock which I think
was previously not possible.


 Somewhat unrelated:

 I have a fair amount of doubt about the effectiveness of the releaseOK
 logic (which imo also is pretty poorly documented).
 Essentially its intent is to avoid unneccessary scheduling when other
 processes have already been woken up (i.e. releaseOK has been set to
 false). I believe the theory is that if any process has already been
 woken up it's pointless to wake up additional processes
 (i.e. PGSemaphoreUnlock()) because the originally woken up process will
 wake up at some point. But if the to-be-woken up process is scheduled
 out because it used all his last timeslices fully that means we'll not
 wakeup other waiters for a relatively long time.

I think it will also maintain that the wokedup process won't stall for
very long time, because if we wake new waiters, then previously woked
process can again enter into wait queue and similar thing can repeat
for long time.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-17 Thread Andres Freund

On 2014-06-17 20:47:51 +0530, Amit Kapila wrote:
 On Tue, Jun 17, 2014 at 6:35 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  On 2014-06-17 18:01:58 +0530, Amit Kapila wrote:
   On Tue, Jun 17, 2014 at 3:56 PM, Andres Freund and...@2ndquadrant.com
On 2014-06-17 12:41:26 +0530, Amit Kapila wrote:
  I unfortunately still can't follow.
 
 You have followed it pretty well as far as I can understand from your
 replies, as there is no reproducible test (which I think is bit tricky to
 prepare), so it becomes difficult to explain by theory.

I'm working an updated patch that moves the releaseOK into the
spinlocks. Maybe that's the problem already - it's certainly not correct
as is.

  If Session-1 woke up some previous
  waiter the woken up process will set releaseOK to true again when it
  loops to acquire the lock?
 
 You are right, it will wakeup the existing waiters, but I think the
 new logic has one difference which is that it can allow the backend to
 take Exclusive lock when there are already waiters in queue.  As per
 above example even though Session-2 and Session-3 are in wait
 queue, Session-4 will be able to acquire Exclusive lock which I think
 was previously not possible.

I think that was previously possible as well - in a slightly different
set of circumstances though. After a process releases a lock and wakes
up some of several waiters another process can come in and acquire the
lock. Before the woken up process gets scheduled again. lwlocks aren't
fair locks...

  Somewhat unrelated:
 
  I have a fair amount of doubt about the effectiveness of the releaseOK
  logic (which imo also is pretty poorly documented).
  Essentially its intent is to avoid unneccessary scheduling when other
  processes have already been woken up (i.e. releaseOK has been set to
  false). I believe the theory is that if any process has already been
  woken up it's pointless to wake up additional processes
  (i.e. PGSemaphoreUnlock()) because the originally woken up process will
  wake up at some point. But if the to-be-woken up process is scheduled
  out because it used all his last timeslices fully that means we'll not
  wakeup other waiters for a relatively long time.
 
 I think it will also maintain that the wokedup process won't stall for
 very long time, because if we wake new waiters, then previously woked
 process can again enter into wait queue and similar thing can repeat
 for long time.

I don't think it effectively does that - newly incoming lockers ignore
the queue and just acquire the lock. Even if there's some other backend
scheduled to wake up. And shared locks can be acquired when there's
exclusive locks waiting.

I think both are actually critical for performance... Otherwise even a
only lightly contended lock would require scheduler activity when a
processes tries to lock something twice. Given the frequency we acquire
some locks with that'd be disastrous...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-05-23 Thread Amit Kapila

On Fri, Jan 31, 2014 at 3:24 PM, Andres Freund and...@2ndquadrant.com
wrote:
 I've pushed a rebased version of the patchset to
 http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git
 branch rwlock contention.
 220b34331f77effdb46798ddd7cca0cffc1b2858 actually was the small problem,
 ea9df812d8502fff74e7bc37d61bdc7d66d77a7f was the major PITA.

As per discussion in developer meeting, I wanted to test shared
buffer scaling patch with this branch.  I am getting merge
conflicts as per HEAD.  Could you please get it resolved, so that
I can get the data.

From git://git.postgresql.org/git/users/andresfreund/postgres
 * branchrwlock-contention - FETCH_HEAD
Auto-merging src/test/regress/regress.c
CONFLICT (content): Merge conflict in src/test/regress/regress.c
Auto-merging src/include/storage/proc.h
Auto-merging src/include/storage/lwlock.h
CONFLICT (content): Merge conflict in src/include/storage/lwlock.h
Auto-merging src/include/storage/ipc.h
CONFLICT (content): Merge conflict in src/include/storage/ipc.h
Auto-merging src/include/storage/barrier.h
CONFLICT (content): Merge conflict in src/include/storage/barrier.h
Auto-merging src/include/pg_config_manual.h
Auto-merging src/include/c.h
Auto-merging src/backend/storage/lmgr/spin.c
Auto-merging src/backend/storage/lmgr/proc.c
Auto-merging src/backend/storage/lmgr/lwlock.c
CONFLICT (content): Merge conflict in src/backend/storage/lmgr/lwlock.c
Auto-merging src/backend/storage/ipc/shmem.c
Auto-merging src/backend/storage/ipc/ipci.c
Auto-merging src/backend/access/transam/xlog.c
CONFLICT (content): Merge conflict in src/backend/access/transam/xlog.c
Auto-merging src/backend/access/transam/twophase.c
Auto-merging configure.in
Auto-merging configure
Auto-merging config/c-compiler.m4
Automatic merge failed; fix conflicts and then commit the result.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-02-10 Thread Heikki Linnakangas


On 01/31/2014 11:54 AM, Andres Freund wrote:

Hi,

On 2014-01-28 21:27:29 -0800, Peter Geoghegan wrote:

On Fri, Nov 15, 2013 at 11:47 AM, Andres Freund and...@2ndquadrant.com wrote:

1) I've added an abstracted atomic ops implementation. Needs a fair
amount of work, also submitted as a separate CF entry. (Patch 1  2)


Commit 220b34331f77effdb46798ddd7cca0cffc1b2858 caused bitrot when
applying 0002-Very-basic-atomic-ops-implementation.patch. Please
rebase.


I've pushed a rebased version of the patchset to
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git
branch rwlock contention.
220b34331f77effdb46798ddd7cca0cffc1b2858 actually was the small problem,
ea9df812d8502fff74e7bc37d61bdc7d66d77a7f was the major PITA.

I plan to split the atomics patch into smaller chunks before
reposting. Imo the Convert the PGPROC-lwWaitLink list into a dlist
instead of open coding it. is worth being applied independently from
the rest of the series, it simplies code and it fixes a bug...


I committed a fix for the WakeupWaiters() bug now, without the rest of 
the open coding patch. Converting lwWaitLInk into a dlist is probably 
a good idea, but seems better to fix the bug separately, for the sake of 
git history if nothing else.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

On 2014-02-03 17:51:20 -0800, Peter Geoghegan wrote:
 On Sun, Feb 2, 2014 at 6:00 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2014-02-01 19:47:29 -0800, Peter Geoghegan wrote:
  Here are the results of a benchmark on Nathan Boley's 64-core, 4
  socket server: 
  http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/
 
  That's interesting. The maximum number of what you see here (~293125)
  is markedly lower than what I can get.
 
  ... poke around ...
 
  Hm, that's partially because you're using pgbench without -M prepared if
  I see that correctly. The bottleneck in that case is primarily memory
  allocation. But even after that I am getting higher
  numbers: ~342497.
 
  Trying to nail down the differnce it oddly seems to be your
  max_connections=80 vs my 100. The profile in both cases is markedly
  different, way much more spinlock contention with 80. All in
  Pin/UnpinBuffer().
 
 I updated this benchmark, with your BufferDescriptors alignment patch
 [1] applied on top of master (while still not using -M prepared in
 order to keep the numbers comparable). So once again, that's:
 
 http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/
 
 It made a bigger, fairly noticeable difference, but not so big a
 difference as you describe here. Are you sure that you saw this kind
 of difference with only 64 clients, as you mentioned elsewhere [1]
 (perhaps you fat-fingered [1] -- -cj is ambiguous)? Obviously
 max_connections is still 80 in the above. Should I have gone past 64
 clients to see the problem? The best numbers I see with the [1] patch
 applied on master is only ~327809 for -S 10 64 clients. Perhaps I've
 misunderstood.

That's likely -M prepared.  It was with -c 64 -j 64...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition

2014-02-04 Thread Christian Kruse

Hi,

I'm doing some benchmarks regarding this problem: one set with
baseline and one set with your patch. Machine was a 32 core machine (4
CPUs with 8 cores), 252 gib RAM. Both versions have the type align
patch applied. pgbench-tools config:

SCALES=100
SETCLIENTS=1 4 8 16 32 48 64 96 128
SETTIMES=2

I added -M prepared to the pgbench call in the benchwarmer script.

The read-only tests are finished, I come to similiar results as yours:

http://wwwtech.de/pg/benchmarks-lwlock-read-only/

I think the small differences are caused by the fact that I use TCP
connections and not Unix domain sockets.

The results are pretty impressive… I will post the read-write results
as soon as they are finished.

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



pgpKL7vRo_Vp0.pgp
Description: PGP signature

Re: [HACKERS] Wait free LW_SHARED acquisition

On Tue, Feb 4, 2014 at 11:39 AM, Christian Kruse
christ...@2ndquadrant.com wrote:
 I'm doing some benchmarks regarding this problem: one set with
 baseline and one set with your patch. Machine was a 32 core machine (4
 CPUs with 8 cores), 252 gib RAM. Both versions have the type align
 patch applied.

It certainly seems as if the interesting cases are where clients  cores.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition

On Tue, Feb 4, 2014 at 11:39 AM, Christian Kruse
christ...@2ndquadrant.com wrote:
 I added -M prepared to the pgbench call in the benchwarmer script.

 The read-only tests are finished, I come to similiar results as yours:

 http://wwwtech.de/pg/benchmarks-lwlock-read-only/

Note that Christian ran this test with max_connections=201, presumably
to exercise the alignment problem.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition

On 2014-02-04 11:48:14 -0800, Peter Geoghegan wrote:
 On Tue, Feb 4, 2014 at 11:39 AM, Christian Kruse
 christ...@2ndquadrant.com wrote:
  I added -M prepared to the pgbench call in the benchwarmer script.
 
  The read-only tests are finished, I come to similiar results as yours:
 
  http://wwwtech.de/pg/benchmarks-lwlock-read-only/
 
 Note that Christian ran this test with max_connections=201, presumably
 to exercise the alignment problem.

I think he has applied the patch to hack around the alignment issue I
pushed to git for both branches. It's not nice enough to be applied yet,
but it should fix the issue.
I think the 201 is just a remembrance of debugging the issue.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition

On Tue, Feb 4, 2014 at 11:50 AM, Andres Freund and...@2ndquadrant.com wrote:
 I think he has applied the patch to hack around the alignment issue I
 pushed to git for both branches. It's not nice enough to be applied yet,
 but it should fix the issue.
 I think the 201 is just a remembrance of debugging the issue.

I guess that given that *both* cases tested had the patch applied,
that makes sense. However, I would have liked to see a real master
baseline.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition

On Tue, Feb 4, 2014 at 11:39 AM, Christian Kruse
christ...@2ndquadrant.com wrote:
 I'm doing some benchmarks regarding this problem: one set with
 baseline and one set with your patch. Machine was a 32 core machine (4
 CPUs with 8 cores), 252 gib RAM.

What CPU model? Can you post /proc/cpuinfo? The distinction between
logical and physical cores matters here.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition

On February 4, 2014 8:53:36 PM CET, Peter Geoghegan p...@heroku.com wrote:
On Tue, Feb 4, 2014 at 11:50 AM, Andres Freund and...@2ndquadrant.com
wrote:
 I think he has applied the patch to hack around the alignment issue I
 pushed to git for both branches. It's not nice enough to be applied
yet,
 but it should fix the issue.
 I think the 201 is just a remembrance of debugging the issue.

I guess that given that *both* cases tested had the patch applied,
that makes sense. However, I would have liked to see a real master
baseline.

Christian, could you rerun with master (the commit on which the branch is based 
on), the alignment patch, and then the lwlock patch? Best with max_connections 
200.
That's probably more important than the write tests as a first step..

Thanks,
Andres

-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition

2014-02-04 Thread Christian Kruse

Hi,

On 04/02/14 12:02, Peter Geoghegan wrote:
 On Tue, Feb 4, 2014 at 11:39 AM, Christian Kruse
 christ...@2ndquadrant.com wrote:
  I'm doing some benchmarks regarding this problem: one set with
  baseline and one set with your patch. Machine was a 32 core machine (4
  CPUs with 8 cores), 252 gib RAM.
 
 What CPU model? Can you post /proc/cpuinfo? The distinction between
 logical and physical cores matters here.

model name  : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz

32 physical cores, 64 logical cores. /proc/cpuinfo is applied.

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



pgpZpWnKfQtb4.pgp
Description: PGP signature

Re: [HACKERS] Wait free LW_SHARED acquisition

2014-02-04 Thread Christian Kruse

Hi,

On 04/02/14 21:03, Andres Freund wrote:
 Christian, could you rerun with master (the commit on which the
 branch is based on), the alignment patch, and then the lwlock patch?
 Best with max_connections 200.  That's probably more important than
 the write tests as a first step..

Ok, benchmark for baseline+alignment patch is running. This will take
a couple of hours and since I have to get up at about 05:00 I won't be
able to post it before tomorrow.

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



pgpsN5kcUiOgQ.pgp
Description: PGP signature

Re: [HACKERS] Wait free LW_SHARED acquisition

On Tue, Feb 4, 2014 at 12:30 PM, Christian Kruse
christ...@2ndquadrant.com wrote:
 Ok, benchmark for baseline+alignment patch is running.

I see that you have enabled latency information. For this kind of
thing I prefer to hack pgbench-tools to not collect this (i.e. to not
pass the -l flag, Per-Transaction Logging). Just remove it and
pgbench-tools rolls with it. It may well be that the overhead added is
completely insignificant, but for something like this, where the
latency information is unlikely to add any value, I prefer to not take
the chance. This is a fairly minor point, however, especially since
these are only 60 second runs where you're unlikely to accumulate
enough transaction latency information to notice any effect.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition