Hello all,

Consider following stacktrace:

(gdb) bt
#0  0x00007f77c81fae87 in semop () syscall-template.S:81
#1  0x000000000063b721 in PGSemaphoreLock pg_sema.c:387
#2  0x0000000000692e1b in LWLockAcquire lwlock.c:1026
#3  0x000000000068d4c5 in LockAcquireExtended lock.c:881
#4  0x000000000068dcc1 in LockAcquire lock.c:672
#5  0x000000000068b7a8 in LockRelationOid lmgr.c:112
#6  0x0000000000501d18 in find_inheritance_children pg_inherits.c:120
#7  0x0000000000501d80 in find_all_inheritors pg_inherits.c:182
#8  0x000000000062db8d in expand_inherited_rtentry prepunion.c:1285
#9  expand_inherited_tables prepunion.c:1212
#10 0x0000000000622705 in subquery_planner planner.c:501
#11 0x0000000000622d31 in standard_planner planner.c:285
#12 0x000000000069ef0c in pg_plan_query postgres.c:809
#13 0x000000000069f004 in pg_plan_queries postgres.c:868
#14 0x00000000006a0fc2 in exec_simple_query postgres.c:1033
#15 PostgresMain postgres.c:4032
#16 0x0000000000467479 in BackendRun postmaster.c:4237
#17 BackendStartup postmaster.c:3913
#18 ServerLoop () postmaster.c:1684
#19 0x000000000064c828 in PostmasterMain postmaster.c:1292
#20 0x0000000000467f3e in main main.c:223

Turns out PostgreSQL can spend a lot of time waiting for a lock in this
particular place, especially if you are running PostgreSQL on 60-core
server. Which obviously is a pretty bad sign.

You can easily reproduce this issue on regular Core i7 server. Use
attached schema.sql file to create a database and run:

pgbench -j 8 -c 8 -f pgbench.sql -T 300 my_database 2>/dev/null &

While this example is running connect to some PostgreSQL process with
gdb and run bt/c from time to time. You will see that PostgreSQL waits
for this particular lock quite often.

The problem is that code between LWLockAcquire (lock.c:881) and
LWLockRelease (lock.c:1020) can _sometimes_ run up to 3-5 ms. Using
old-good gettimeofday and logging method I managed to find a bottleneck:

-- proclock = SetupLockInTable [lock.c:892]
 `-- proclock = (PROCLOCK *) hash_search_with_hash_value [lock.c:1105]
   `-- currBucket = get_hash_entry(hashp) [dynahash.c:985]
     `-- SpinLockAcquire(&hctl->mutex) [dynahash.c:1187]

If my measurements are not wrong (I didn't place gettimeofday between
SpinLockAquire/SpinLockRelease, etc) we sometimes spend about 3 ms here
waiting for a spinlock, which doesn't seems right.

I managed to fix this behaviour by modifying choose_nelem_alloc
procedure in dynahash.c (see attached patch). The idea is to double
number of items we allocate when there is no more free items in hash
table. So we need twice less allocations which reduces contention.

This patch doesn't cause any performance degradation according to
pgbench, `make check` passes, etc.

Best regards,
diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index eacffc4..48def5e 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -544,19 +544,19 @@ choose_nelem_alloc(Size entrysize)
 	elementSize = MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(entrysize);
-	 * The idea here is to choose nelem_alloc at least 32, but round up so
+	 * The idea here is to choose nelem_alloc at least 64, but round up so
 	 * that the allocation request will be a power of 2 or just less. This
 	 * makes little difference for hash tables in shared memory, but for hash
 	 * tables managed by palloc, the allocation request will be rounded up to
 	 * a power of 2 anyway.  If we fail to take this into account, we'll waste
 	 * as much as half the allocated space.
-	allocSize = 32 * 4;			/* assume elementSize at least 8 */
+	allocSize = 64 * 4;			/* assume elementSize at least 8 */
 		allocSize <<= 1;
 		nelem_alloc = allocSize / elementSize;
-	} while (nelem_alloc < 32);
+	} while (nelem_alloc < 64);
 	return nelem_alloc;

Attachment: schema.sql
Description: application/sql

Attachment: pgbench.sql
Description: application/sql

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to