Re: Suppressing elog.c context messages (was Re: [HACKERS] Wait free LW_SHARED acquisition)
On 2015-01-26 18:30:13 -0600, Jim Nasby wrote: On 12/23/14 11:41 AM, Andres Freund wrote: I think it'd generally be useful to have something like errhidecontext() akin to errhidestatement() to avoid things like the above. Under this proposal, do you want to suppress the context/statement unconditionally or via some hook/variable, because it might be useful to print the contexts/statements in certain cases where there is complex application and we don't know which part of application code causes problem. I'm proposing to do model it after errhidestatement(). I.e. add errhidecontext(). I've attached what I was tinkering with when I wrote this message. The usecases wher eI see this as being useful is high volume debug logging, not normal messages... I think usecase is valid, it is really difficult to dig such a log especially when statement size is big. Right, that was what triggered to look me into it. I'd cases where the same context was printed thousands of times. In case anyone picks this up... this problem exists in other places too, such as RAISE DEBUG in plpgsql. I think it'd be useful to have clien_min_context and log_min_context that operate akin to *_min_messages but control context output. I've pushed it already IIRC. I don't think we can just apply it without regard for possible users to that many cases - I think the RAISE DEBUG case is better addressed by a plpgsql option to *optionall* suppress context, than do it unconditionally. That's discussed somewhere nearby. -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Suppressing elog.c context messages (was Re: [HACKERS] Wait free LW_SHARED acquisition)
On 12/23/14 11:41 AM, Andres Freund wrote: I think it'd generally be useful to have something like errhidecontext() akin to errhidestatement() to avoid things like the above. Under this proposal, do you want to suppress the context/statement unconditionally or via some hook/variable, because it might be useful to print the contexts/statements in certain cases where there is complex application and we don't know which part of application code causes problem. I'm proposing to do model it after errhidestatement(). I.e. add errhidecontext(). I've attached what I was tinkering with when I wrote this message. The usecases wher eI see this as being useful is high volume debug logging, not normal messages... I think usecase is valid, it is really difficult to dig such a log especially when statement size is big. Right, that was what triggered to look me into it. I'd cases where the same context was printed thousands of times. In case anyone picks this up... this problem exists in other places too, such as RAISE DEBUG in plpgsql. I think it'd be useful to have clien_min_context and log_min_context that operate akin to *_min_messages but control context output. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Suppressing elog.c context messages (was Re: [HACKERS] Wait free LW_SHARED acquisition)
On 2014-12-22 10:35:35 +0530, Amit Kapila wrote: On Fri, Dec 19, 2014 at 9:36 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, When debugging lwlock issues I found PRINT_LWDEBUG/LOG_LWDEBUG rather painful to use because of the amount of elog contexts/statements emitted. Given the number of lwlock acquirations that's just not doable. To solve that during development I've solved that by basically replacing: if (Trace_lwlocks) elog(LOG, %s(%s %d): %s, where, name, index, msg); with something like if (Trace_lwlocks) { ErrorContextCallback *old_error_context_stack; ... old_error_context_stack = error_context_stack; error_context_stack = NULL; ereport(LOG, (errhidestmt(true), errmsg(%s(%s %d): %s, where, T_NAME(lock), T_ID(lock), msg))); I think it'd generally be useful to have something like errhidecontext() akin to errhidestatement() to avoid things like the above. Under this proposal, do you want to suppress the context/statement unconditionally or via some hook/variable, because it might be useful to print the contexts/statements in certain cases where there is complex application and we don't know which part of application code causes problem. I'm proposing to do model it after errhidestatement(). I.e. add errhidecontext(). I've attached what I was tinkering with when I wrote this message. The usecases wher eI see this as being useful is high volume debug logging, not normal messages... I think usecase is valid, it is really difficult to dig such a log especially when statement size is big. Right, that was what triggered to look me into it. I'd cases where the same context was printed thousands of times. Also I think even with above, the number of logs generated are high for any statement which could still make debugging difficult, do you think it would be helpful if PRINT_LWDEBUG and LOG_LWDEBUG are used with separate defines (LOCK_DEBUG and LOCK_BLOCK_DEBUG) as in certain cases we might want to print info about locks which are acquired after waiting or in other words that gets blocked. Hm, that seems like a separate thing. Personally I don't find it interesting enough to write a patch for it, although I could see it being interesting for somebody. If you're looking at that level it's easy enough to just add a early return in either routine... Greetings, Andres Freund From 5e6af912377a1b43ea0168951238cb6a5e0b233e Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Sun, 21 Dec 2014 15:45:55 +0100 Subject: [PATCH 1/4] Add capability to suppress CONTEXT: messages to elog machinery. Hiding context messages usually is not a good idea - except for rather verbose debugging/development utensils like LOG_DEBUG. There the amount of repeated context messages just bloat the log without adding information. --- src/backend/utils/error/elog.c | 24 ++-- src/include/utils/elog.h | 2 ++ 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 2316464..8f04b19 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -1081,6 +1081,25 @@ errhidestmt(bool hide_stmt) return 0; /* return value does not matter */ } +/* + * errhidestmt --- optionally suppress CONTEXT: field of log entry + * + * This should only be used for verbose debugging messages where the repeated + * inclusion of CONTEXT: bloats the log volume too much. + */ +int +errhidecontext(bool hide_ctx) +{ + ErrorData *edata = errordata[errordata_stack_depth]; + + /* we don't bother incrementing recursion_depth */ + CHECK_STACK_DEPTH(); + + edata-hide_ctx = hide_ctx; + + return 0; /* return value does not matter */ +} + /* * errfunction --- add reporting function name to the current error @@ -2724,7 +2743,8 @@ write_csvlog(ErrorData *edata) appendStringInfoChar(buf, ','); /* errcontext */ - appendCSVLiteral(buf, edata-context); + if (!edata-hide_ctx) + appendCSVLiteral(buf, edata-context); appendStringInfoChar(buf, ','); /* user query --- only reported if not disabled by the caller */ @@ -2856,7 +2876,7 @@ send_message_to_server_log(ErrorData *edata) append_with_tabs(buf, edata-internalquery); appendStringInfoChar(buf, '\n'); } - if (edata-context) + if (edata-context !edata-hide_ctx) { log_line_prefix(buf, edata); appendStringInfoString(buf, _(CONTEXT: )); diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h index 87438b8..ec7ed22 100644 --- a/src/include/utils/elog.h +++ b/src/include/utils/elog.h @@ -221,6 +221,7 @@ errcontext_msg(const char *fmt,...) __attribute__((format(PG_PRINTF_ATTRIBUTE, 1, 2))); extern int errhidestmt(bool hide_stmt); +extern int errhidecontext(bool hide_ctx); extern int errfunction(const char *funcname); extern int errposition(int
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.10
Hi, Attached is a new version of the patchset which I intend to commit soon. Stuff changed since 0.9: * Greatly simplified locking logic - the whole concept that a lock could be spuriously acquired is gone. That cost a small bit of performance (0.5%, I thought it'd be much bigger) on x86, but is a noticeable performance *benefit* on PPC. * releaseOK (and other internal flags) are rolled into the former 'lockcount' variable which is now named state. By having it inside the same atomic reasoning about the state gets easier as there's no skew between observing the lockcount and other variables. * The number of queued waiters isn't required anymore, it's only a debugging aid (#ifdef LOCK_DEBUG) at this point. Patches: 0001: errhidecontext() patch 0002: dlist()ify lwWaitLink 0003: LW_SHARED scalability I've done a fair amount of benchmarking and on bigger system the new code seems to be a win pretty much generally. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 5e6af912377a1b43ea0168951238cb6a5e0b233e Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Sun, 21 Dec 2014 15:45:55 +0100 Subject: [PATCH 1/4] Add capability to suppress CONTEXT: messages to elog machinery. Hiding context messages usually is not a good idea - except for rather verbose debugging/development utensils like LOG_DEBUG. There the amount of repeated context messages just bloat the log without adding information. --- src/backend/utils/error/elog.c | 24 ++-- src/include/utils/elog.h | 2 ++ 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 2316464..8f04b19 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -1081,6 +1081,25 @@ errhidestmt(bool hide_stmt) return 0; /* return value does not matter */ } +/* + * errhidestmt --- optionally suppress CONTEXT: field of log entry + * + * This should only be used for verbose debugging messages where the repeated + * inclusion of CONTEXT: bloats the log volume too much. + */ +int +errhidecontext(bool hide_ctx) +{ + ErrorData *edata = errordata[errordata_stack_depth]; + + /* we don't bother incrementing recursion_depth */ + CHECK_STACK_DEPTH(); + + edata-hide_ctx = hide_ctx; + + return 0; /* return value does not matter */ +} + /* * errfunction --- add reporting function name to the current error @@ -2724,7 +2743,8 @@ write_csvlog(ErrorData *edata) appendStringInfoChar(buf, ','); /* errcontext */ - appendCSVLiteral(buf, edata-context); + if (!edata-hide_ctx) + appendCSVLiteral(buf, edata-context); appendStringInfoChar(buf, ','); /* user query --- only reported if not disabled by the caller */ @@ -2856,7 +2876,7 @@ send_message_to_server_log(ErrorData *edata) append_with_tabs(buf, edata-internalquery); appendStringInfoChar(buf, '\n'); } - if (edata-context) + if (edata-context !edata-hide_ctx) { log_line_prefix(buf, edata); appendStringInfoString(buf, _(CONTEXT: )); diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h index 87438b8..ec7ed22 100644 --- a/src/include/utils/elog.h +++ b/src/include/utils/elog.h @@ -221,6 +221,7 @@ errcontext_msg(const char *fmt,...) __attribute__((format(PG_PRINTF_ATTRIBUTE, 1, 2))); extern int errhidestmt(bool hide_stmt); +extern int errhidecontext(bool hide_ctx); extern int errfunction(const char *funcname); extern int errposition(int cursorpos); @@ -385,6 +386,7 @@ typedef struct ErrorData bool output_to_client; /* will report to client? */ bool show_funcname; /* true to force funcname inclusion */ bool hide_stmt; /* true to prevent STATEMENT: inclusion */ + bool hide_ctx; /* true to prevent CONTEXT: inclusion */ const char *filename; /* __FILE__ of ereport() call */ int lineno; /* __LINE__ of ereport() call */ const char *funcname; /* __func__ of ereport() call */ -- 2.2.0.rc0.18.ga1ad247 From 0ef51c826faa53620fc7d8d39d5df6206be729f3 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Tue, 7 Oct 2014 15:32:34 +0200 Subject: [PATCH 2/4] Convert the PGPROC-lwWaitLink list into a dlist instead of open coding it. Besides being shorter and much easier to read it changes the logic in LWLockRelease() to release all shared lockers when waking up any. This can yield some significant performance improvements - and the fairness isn't really much worse than before, as we always allowed new shared lockers to jump the queue. --- src/backend/access/transam/twophase.c | 1 - src/backend/storage/lmgr/lwlock.c | 146 -- src/backend/storage/lmgr/proc.c | 2 - src/include/storage/lwlock.h | 5 +- src/include/storage/proc.h| 3 +- 5 files changed, 57 insertions(+), 100 deletions(-) diff --git
Re: Suppressing elog.c context messages (was Re: [HACKERS] Wait free LW_SHARED acquisition)
On Fri, Dec 19, 2014 at 9:36 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, When debugging lwlock issues I found PRINT_LWDEBUG/LOG_LWDEBUG rather painful to use because of the amount of elog contexts/statements emitted. Given the number of lwlock acquirations that's just not doable. To solve that during development I've solved that by basically replacing: if (Trace_lwlocks) elog(LOG, %s(%s %d): %s, where, name, index, msg); with something like if (Trace_lwlocks) { ErrorContextCallback *old_error_context_stack; ... old_error_context_stack = error_context_stack; error_context_stack = NULL; ereport(LOG, (errhidestmt(true), errmsg(%s(%s %d): %s, where, T_NAME(lock), T_ID(lock), msg))); I think it'd generally be useful to have something like errhidecontext() akin to errhidestatement() to avoid things like the above. Under this proposal, do you want to suppress the context/statement unconditionally or via some hook/variable, because it might be useful to print the contexts/statements in certain cases where there is complex application and we don't know which part of application code causes problem. The usecases wher eI see this as being useful is high volume debug logging, not normal messages... I think usecase is valid, it is really difficult to dig such a log especially when statement size is big. Also I think even with above, the number of logs generated are high for any statement which could still make debugging difficult, do you think it would be helpful if PRINT_LWDEBUG and LOG_LWDEBUG are used with separate defines (LOCK_DEBUG and LOCK_BLOCK_DEBUG) as in certain cases we might want to print info about locks which are acquired after waiting or in other words that gets blocked. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Suppressing elog.c context messages (was Re: [HACKERS] Wait free LW_SHARED acquisition)
Hi, When debugging lwlock issues I found PRINT_LWDEBUG/LOG_LWDEBUG rather painful to use because of the amount of elog contexts/statements emitted. Given the number of lwlock acquirations that's just not doable. To solve that during development I've solved that by basically replacing: if (Trace_lwlocks) elog(LOG, %s(%s %d): %s, where, name, index, msg); with something like if (Trace_lwlocks) { ErrorContextCallback *old_error_context_stack; ... old_error_context_stack = error_context_stack; error_context_stack = NULL; ereport(LOG, (errhidestmt(true), errmsg(%s(%s %d): %s, where, T_NAME(lock), T_ID(lock), msg))); I think it'd generally be useful to have something like errhidecontext() akin to errhidestatement() to avoid things like the above. The usecases wher eI see this as being useful is high volume debug logging, not normal messages... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Wed, Dec 3, 2014 at 4:03 PM, Michael Paquier michael.paqu...@gmail.com wrote: Ping? This patch is in a stale state for a couple of weeks and still marked as waiting on author for this CF. Marked as returned with feedback. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Tue, Nov 18, 2014 at 12:33 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Nov 17, 2014 at 10:31 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-11-17 10:21:04 -0500, Robert Haas wrote: Andres, where are we with this patch? 1. You're going to commit it, but haven't gotten around to it yet. 2. You're going to modify it some more and repost, but haven't gotten around to it yet. 3. You're willing to see it modified if somebody else does the work, but are out of time to spend on it yourself. 4. Something else? I'm working on it. Amit had found a hang on PPC that I couldn't reproduce on x86. Since then I've reproduced it and I think yesterday I found the problem. Unfortunately it always took a couple hours to trigger... I've also made some, in my opinion, cleanups to the patch since then. Those have the nice side effect of making the size of struct LWLock smaller, but that wasn't actually the indended effect. I'll repost once I've verified the problem is fixed and I've updated all commentary. The current problem is that I seem to have found a problem that's also reproducible with master :(. After a couple of hours a pgbench -h /tmp -p 5440 scale3000 -M prepared -P 5 -c 180 -j 60 -T 2 -S against a -c max_connections=200 -c shared_buffers=4GB cluster seems to hang on PPC. With all the backends waiting in buffer mapping locks. I'm now making sure it's really master and not my patch causing the problem - it's just not trivial with 180 processes involved. Ah, OK. Thanks for the update. Ping? This patch is in a stale state for a couple of weeks and still marked as waiting on author for this CF. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Sat, Oct 25, 2014 at 1:50 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Oct 24, 2014 at 4:05 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-24 15:59:30 +0530, Amit Kapila wrote: and w.r.t performance it can lead extra function call, few checks and I think in some cases even can acquire/release spinlock. I fail to see how that could be the case. Won't it happen incase first backend sets releaseOK to true and another backend which tries to wakeup waiters on lock will acquire spinlock and tries to release the waiters. Sure, that can happen. And again, this is code that's only executed around a couple syscalls. And the cacheline will be touched around there *anyway*. Sure, but I think syscalls are required in case we need to wake any waiter. It won't wake up a waiter if there's none on the list. Yeap, but still it will acquire/release spinlock. And it'd be a pretty pointless behaviour, leading to useless increased contention. The only time it'd make sense for X to be woken up is when it gets run faster than the S processes. Do we get any major benefit by changing the logic of waking up waiters? Yes. I think one downside I could see of new strategy is that the chance of Exclusive waiter to take more time before getting woked up is increased as now it will by pass Exclusive waiters in queue. Note that that *already* happens for any *new* shared locker that comes in. It doesn't really make sense to have share lockers queued behind the exclusive locker if others just go in front of it anyway. Yeah, but I think it is difficult to avoid that behaviour as even when it wakes Exclusive locker, some new shared locker can comes in and acquire the lock before Exclusive locker. I think it is difficult to say what is the best waking strategy, as priority for Exclusive lockers is not clearly defined incase of LWLocks. Andres, where are we with this patch? 1. You're going to commit it, but haven't gotten around to it yet. 2. You're going to modify it some more and repost, but haven't gotten around to it yet. 3. You're willing to see it modified if somebody else does the work, but are out of time to spend on it yourself. 4. Something else? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-11-17 10:21:04 -0500, Robert Haas wrote: Andres, where are we with this patch? 1. You're going to commit it, but haven't gotten around to it yet. 2. You're going to modify it some more and repost, but haven't gotten around to it yet. 3. You're willing to see it modified if somebody else does the work, but are out of time to spend on it yourself. 4. Something else? I'm working on it. Amit had found a hang on PPC that I couldn't reproduce on x86. Since then I've reproduced it and I think yesterday I found the problem. Unfortunately it always took a couple hours to trigger... I've also made some, in my opinion, cleanups to the patch since then. Those have the nice side effect of making the size of struct LWLock smaller, but that wasn't actually the indended effect. I'll repost once I've verified the problem is fixed and I've updated all commentary. The current problem is that I seem to have found a problem that's also reproducible with master :(. After a couple of hours a pgbench -h /tmp -p 5440 scale3000 -M prepared -P 5 -c 180 -j 60 -T 2 -S against a -c max_connections=200 -c shared_buffers=4GB cluster seems to hang on PPC. With all the backends waiting in buffer mapping locks. I'm now making sure it's really master and not my patch causing the problem - it's just not trivial with 180 processes involved. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Mon, Nov 17, 2014 at 10:31 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-11-17 10:21:04 -0500, Robert Haas wrote: Andres, where are we with this patch? 1. You're going to commit it, but haven't gotten around to it yet. 2. You're going to modify it some more and repost, but haven't gotten around to it yet. 3. You're willing to see it modified if somebody else does the work, but are out of time to spend on it yourself. 4. Something else? I'm working on it. Amit had found a hang on PPC that I couldn't reproduce on x86. Since then I've reproduced it and I think yesterday I found the problem. Unfortunately it always took a couple hours to trigger... I've also made some, in my opinion, cleanups to the patch since then. Those have the nice side effect of making the size of struct LWLock smaller, but that wasn't actually the indended effect. I'll repost once I've verified the problem is fixed and I've updated all commentary. The current problem is that I seem to have found a problem that's also reproducible with master :(. After a couple of hours a pgbench -h /tmp -p 5440 scale3000 -M prepared -P 5 -c 180 -j 60 -T 2 -S against a -c max_connections=200 -c shared_buffers=4GB cluster seems to hang on PPC. With all the backends waiting in buffer mapping locks. I'm now making sure it's really master and not my patch causing the problem - it's just not trivial with 180 processes involved. Ah, OK. Thanks for the update. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-21 12:40:56 +0530, Amit Kapila wrote: While doing performance tests, I noticed a hang at higher client counts with patch. I have tried to check call stack for few of processes and it is as below: #0 0x008010933e54 in .semop () from /lib64/libc.so.6 #1 0x10286e48 in .PGSemaphoreLock () #2 0x102f68bc in .LWLockAcquire () #3 0x102d1ca0 in .ReadBuffer_common () #4 0x102d2ae0 in .ReadBufferExtended () #5 0x100a57d8 in ._bt_getbuf () #6 0x100a6210 in ._bt_getroot () #7 0x100aa910 in ._bt_search () #8 0x100ab494 in ._bt_first () #9 0x100a8e84 in .btgettuple () .. #0 0x008010933e54 in .semop () from /lib64/libc.so.6 #1 0x10286e48 in .PGSemaphoreLock () #2 0x102f68bc in .LWLockAcquire () #3 0x102d1ca0 in .ReadBuffer_common () #4 0x102d2ae0 in .ReadBufferExtended () #5 0x100a57d8 in ._bt_getbuf () #6 0x100a6210 in ._bt_getroot () #7 0x100aa910 in ._bt_search () #8 0x100ab494 in ._bt_first () ... The test configuration is as below: Test env - Power - 7 (hydra) scale_factor - 3000 shared_buffers - 8GB test mode - pgbench read only test execution - ./pgbench -c 128 -j 128 -T 1800 -S -M prepared postgres I have ran it for half an hour, but it doesn't came out even after ~2 hours. It doesn't get reproduced every time, currently I am able to reproduce it and the m/c is in same state, if you want any info, let me know (unfortunately binaries are in release mode, so might not get enough information). Hm. What commit did you apply the series ontop? I managed to reproduce a hang, but it was just something that heikki had already fixed... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Thu, Oct 30, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-21 12:40:56 +0530, Amit Kapila wrote: While doing performance tests, I noticed a hang at higher client counts with patch. I have tried to check call stack for few of processes and it is as below: #0 0x008010933e54 in .semop () from /lib64/libc.so.6 #1 0x10286e48 in .PGSemaphoreLock () #2 0x102f68bc in .LWLockAcquire () #3 0x102d1ca0 in .ReadBuffer_common () #4 0x102d2ae0 in .ReadBufferExtended () #5 0x100a57d8 in ._bt_getbuf () #6 0x100a6210 in ._bt_getroot () #7 0x100aa910 in ._bt_search () #8 0x100ab494 in ._bt_first () #9 0x100a8e84 in .btgettuple () .. #0 0x008010933e54 in .semop () from /lib64/libc.so.6 #1 0x10286e48 in .PGSemaphoreLock () #2 0x102f68bc in .LWLockAcquire () #3 0x102d1ca0 in .ReadBuffer_common () #4 0x102d2ae0 in .ReadBufferExtended () #5 0x100a57d8 in ._bt_getbuf () #6 0x100a6210 in ._bt_getroot () #7 0x100aa910 in ._bt_search () #8 0x100ab494 in ._bt_first () ... The test configuration is as below: Test env - Power - 7 (hydra) scale_factor - 3000 shared_buffers - 8GB test mode - pgbench read only test execution - ./pgbench -c 128 -j 128 -T 1800 -S -M prepared postgres I have ran it for half an hour, but it doesn't came out even after ~2 hours. It doesn't get reproduced every time, currently I am able to reproduce it and the m/c is in same state, if you want any info, let me know (unfortunately binaries are in release mode, so might not get enough information). Hm. What commit did you apply the series ontop? I managed to reproduce a hang, but it was just something that heikki had already fixed... commit 494affbd900d1c90de17414a575af1a085c3e37a Author: Noah Misch n...@leadboat.com Date: Sun Oct 12 23:33:37 2014 -0400 And, I think you are saying that heikki's commit e0d97d has fixed this issue, in that case I will check once by including that fix? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-30 18:54:57 +0530, Amit Kapila wrote: On Thu, Oct 30, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-21 12:40:56 +0530, Amit Kapila wrote: I have ran it for half an hour, but it doesn't came out even after ~2 hours. It doesn't get reproduced every time, currently I am able to reproduce it and the m/c is in same state, if you want any info, let me know (unfortunately binaries are in release mode, so might not get enough information). Hm. What commit did you apply the series ontop? I managed to reproduce a hang, but it was just something that heikki had already fixed... commit 494affbd900d1c90de17414a575af1a085c3e37a Author: Noah Misch n...@leadboat.com Date: Sun Oct 12 23:33:37 2014 -0400 And, I think you are saying that heikki's commit e0d97d has fixed this issue, in that case I will check once by including that fix? Well, the hang I was able to reproduce was originally hanging because of that. I saw lot of content locks waiting as well, but the origin seems to have a backend waiting for a xloginsert. The way I could trigger it quite fast was by first running a read/write pgbench and then switch to a readonly one. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Thu, Oct 30, 2014 at 6:58 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-30 18:54:57 +0530, Amit Kapila wrote: On Thu, Oct 30, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com wrote: Hm. What commit did you apply the series ontop? I managed to reproduce a hang, but it was just something that heikki had already fixed... commit 494affbd900d1c90de17414a575af1a085c3e37a Author: Noah Misch n...@leadboat.com Date: Sun Oct 12 23:33:37 2014 -0400 And, I think you are saying that heikki's commit e0d97d has fixed this issue, in that case I will check once by including that fix? Well, the hang I was able to reproduce was originally hanging because of that. I saw lot of content locks waiting as well, but the origin seems to have a backend waiting for a xloginsert. The way I could trigger it quite fast was by first running a read/write pgbench and then switch to a readonly one. So what exactly you mean by 'switch to'? Is it that both read-write and readonly pgbench were running together or after read-write got finished and then by running read-only pgbench, you are able to reproduce it? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-30 19:05:06 +0530, Amit Kapila wrote: On Thu, Oct 30, 2014 at 6:58 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-30 18:54:57 +0530, Amit Kapila wrote: On Thu, Oct 30, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com wrote: Hm. What commit did you apply the series ontop? I managed to reproduce a hang, but it was just something that heikki had already fixed... commit 494affbd900d1c90de17414a575af1a085c3e37a Author: Noah Misch n...@leadboat.com Date: Sun Oct 12 23:33:37 2014 -0400 And, I think you are saying that heikki's commit e0d97d has fixed this issue, in that case I will check once by including that fix? Well, the hang I was able to reproduce was originally hanging because of that. I saw lot of content locks waiting as well, but the origin seems to have a backend waiting for a xloginsert. The way I could trigger it quite fast was by first running a read/write pgbench and then switch to a readonly one. So what exactly you mean by 'switch to'? Is it that both read-write and readonly pgbench were running together or after read-write got finished and then by running read-only pgbench, you are able to reproduce it? I don't think that matters all that much. In this case I first had a read-write one (accidentally, by leaving of -S), and then aborted and ran a readonly pgbench. That turned out to trigger it relatively fast. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Wed, Oct 22, 2014 at 7:12 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-22 13:32:07 +0530, Amit Kapila wrote: On Tue, Oct 21, 2014 at 7:56 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-25 19:06:32 +0530, Amit Kapila wrote: Today, I have verified all previous comments raised by me and looked at new code and below are my findings: 4. LWLockAcquireCommon() { .. if (!LWLockDequeueSelf(l)) { for (;;) { PGSemaphoreLock(proc-sem, false); if (!proc-lwWaiting) break; extraWaits++; } lock-releaseOK = true; .. } Setting releaseOK in above context might not be required because if the control comes in this part of code, it will not retry to acquire another time. Hm. You're probably right. You have agreed to fix this comment, but it seems you have forgot to change it. After I've thought more about it, it's is actually required. This essentially *is* a retry. Won't it needs to be set before retry? Whats the use of setting it when we have got the lock and we are not going to retry. Someobdy woke us up, which is where releaseOK is supposed to be set. I think that is true only in case when we are again going to retry or atleast that seems to be the mechanism used currently in LWLockAcquireCommon. 11. LWLockRelease() { .. PRINT_LWDEBUG(LWLockRelease, lock, mode); } Shouldn't this be in begining of LWLockRelease function rather than after processing held_lwlocks array? Ok. You have agreed to fix this comment, but it seems you have forgot to change it. Below comment doesn't seem to be adressed? LWLockAcquireOrWait() { .. LOG_LWDEBUG(LWLockAcquireOrWait, lockid, mode, succeeded); .. } a. such a log is not there in any other LWLock.. variants, if we want to introduce it, then shouldn't it be done at other places as well. I think you're placing unneccessarily high consistency constraints on a debugging feature here. This was just a very minor suggestion to keep code consistent, which if you want to ignore is okay. I understand that having or not having code consistent for this doesn't matter. Below point is yet to be resolved. 12. #ifdef LWLOCK_DEBUG lock-owner = MyProc; #endif Shouldn't it be reset in LWLockRelease? That's actually intentional. It's quite useful to know the last owner when debugging lwlock code. Won't it cause any problem if the last owner process exits? No. PGPROCs aren't deallocated or anything. And it's a debugging only variable. Thats right, the problem I was thinking is of wrong information. Ex. if process holding Exclusive locker has exited and then lot of other processes took shared locks and one new Exclusive locker waits on getting the lock, at that moment during debugging we can get wrong information about lock owner. However I think you are mainly worried about situtions when many backends are waiting for Exclusive locker which is probably the most common scenario. Can you explain how pg_read_barrier() in below code makes this access safe? LWLockWakeup() { .. + pg_read_barrier(); /* pairs with nwaiters-- */ + if (!BOOL_ACCESS_ONCE(lock-releaseOK)) .. } What's the concern you have? Full memory barriers (the atomic nwaiters--) pair with read memory barriers. IIUC, then pairing with nwaiters in LWLockAcquireCommon() ensures that releaseOK is set before again attemting for lock as atomic operation provides the necessary barrier. The point I am not getting is what kind of guarantee pg_read_barrier() provides us or why is it required? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Wed, Oct 22, 2014 at 8:04 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-21 19:56:05 +0530, Amit Kapila wrote: On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund and...@2ndquadrant.com wrote: spin_delay_count gives how much delay has happened to acquire spinlock which when combined with other stats gives the clear situation about the contention around aquisation of corresponding LWLock. Now if we want to count the spin lock delay for Release call as well, then the meaning of the stat is getting changed. It might be that new meaning of spin_delay_count stat is more useful in some situations, however the older one has its own benefits, so I am not sure if changing this as part of this patch is the best thing to do. In which case does the old definition make sense, where the new one doesn't? I don't think it exists. And changing it here seems to make sense because spinlock contention fundamentally changes it meaning for lwlocks anyway as in most paths we don't take a spinlock anymore. On second thought, I think probably you are right here. Why can't we decrement the nwaiters after waking up? I don't think there is any major problem even if callers do that themselves, but in some rare cases LWLockRelease() might spuriously assume that there are some waiters and tries to call LWLockWakeup(). Although this doesn't create any problem, keeping the value sane is good unless there is some problem in doing so. That won't work because then LWLockWakeup() wouldn't be called when necessary - precisely because nwaiters is 0. The reason I've done so is that it's otherwise much harder to debug issues where there are backend that have been woken up already, but haven't rerun yet. Without this there's simply no evidence of that state. I can't see this being relevant for performance, so I'd rather have it stay that way. I am not sure what useful information we can get during debugging by not doing this in LWLockWakeup() It's useful because you can detect backends that have been scheduled to acquire the lock, but haven't yet. They're otherwise invisible. and w.r.t performance it can lead extra function call, few checks and I think in some cases even can acquire/release spinlock. I fail to see how that could be the case. Won't it happen incase first backend sets releaseOK to true and another backend which tries to wakeup waiters on lock will acquire spinlock and tries to release the waiters. And again, this is code that's only executed around a couple syscalls. And the cacheline will be touched around there *anyway*. Sure, but I think syscalls are required in case we need to wake any waiter. In the above code, if the first waiter to be woken up is Exclusive waiter, then it will woke that waiter, else shared waiters till it got the first exclusive waiter and then first exlusive waiter. That's would be bug then. I am not sure of it, but I think it's more important to validate the new waking startegy as you see benefits by doing so. Per the comment you quoted If the front waiter wants exclusive lock, awaken him only. Otherwise awaken as many waiters as want shared access.. But I don't think it's what's happening. Note that 'proc = proc-lwWaitLink;' is only executed if 'proc-lwWaitLink-lwWaitMode != LW_EXCLUSIVE'. Which is the next waiter... And it'd be a pretty pointless behaviour, leading to useless increased contention. The only time it'd make sense for X to be woken up is when it gets run faster than the S processes. Do we get any major benefit by changing the logic of waking up waiters? Yes. I think one downside I could see of new strategy is that the chance of Exclusive waiter to take more time before getting woked up is increased as now it will by pass Exclusive waiters in queue. I don't have any concrete proof that it can do any harm to performance, so may be it's okay to have this new mechanism, however I think it might be helpful if you could add a comment in code to explain the benefit by skipping Exclusive lockers. Code is more readable, but I don't understand why you want to do refactoring as part of this patch which ideally doesn't get any benefit from the same. I did it first without. But there's required stuff like LWLockDequeueSelf(). And I had several bugs because of the list stuff. And I did separate the conversion into a separate patch? Yeah, but the main patch for wait free LW_SHARED also uses it. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-10-24 15:59:30 +0530, Amit Kapila wrote: and w.r.t performance it can lead extra function call, few checks and I think in some cases even can acquire/release spinlock. I fail to see how that could be the case. Won't it happen incase first backend sets releaseOK to true and another backend which tries to wakeup waiters on lock will acquire spinlock and tries to release the waiters. Sure, that can happen. And again, this is code that's only executed around a couple syscalls. And the cacheline will be touched around there *anyway*. Sure, but I think syscalls are required in case we need to wake any waiter. It won't wake up a waiter if there's none on the list. And it'd be a pretty pointless behaviour, leading to useless increased contention. The only time it'd make sense for X to be woken up is when it gets run faster than the S processes. Do we get any major benefit by changing the logic of waking up waiters? Yes. I think one downside I could see of new strategy is that the chance of Exclusive waiter to take more time before getting woked up is increased as now it will by pass Exclusive waiters in queue. Note that that *already* happens for any *new* shared locker that comes in. It doesn't really make sense to have share lockers queued behind the exclusive locker if others just go in front of it anyway. Code is more readable, but I don't understand why you want to do refactoring as part of this patch which ideally doesn't get any benefit from the same. I did it first without. But there's required stuff like LWLockDequeueSelf(). And I had several bugs because of the list stuff. And I did separate the conversion into a separate patch? Yeah, but the main patch for wait free LW_SHARED also uses it. Well, the only thing that it could have done given that the other patch is a preqrequisite is reverting the behaviour? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Tue, Oct 21, 2014 at 7:56 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-25 19:06:32 +0530, Amit Kapila wrote: Today, I have verified all previous comments raised by me and looked at new code and below are my findings: 4. LWLockAcquireCommon() { .. if (!LWLockDequeueSelf(l)) { for (;;) { PGSemaphoreLock(proc-sem, false); if (!proc-lwWaiting) break; extraWaits++; } lock-releaseOK = true; .. } Setting releaseOK in above context might not be required because if the control comes in this part of code, it will not retry to acquire another time. Hm. You're probably right. You have agreed to fix this comment, but it seems you have forgot to change it. 11. LWLockRelease() { .. PRINT_LWDEBUG(LWLockRelease, lock, mode); } Shouldn't this be in begining of LWLockRelease function rather than after processing held_lwlocks array? Ok. You have agreed to fix this comment, but it seems you have forgot to change it. Below comment doesn't seem to be adressed? LWLockAcquireOrWait() { .. LOG_LWDEBUG(LWLockAcquireOrWait, lockid, mode, succeeded); .. } a. such a log is not there in any other LWLock.. variants, if we want to introduce it, then shouldn't it be done at other places as well. Below point is yet to be resolved. 12. #ifdef LWLOCK_DEBUG lock-owner = MyProc; #endif Shouldn't it be reset in LWLockRelease? That's actually intentional. It's quite useful to know the last owner when debugging lwlock code. Won't it cause any problem if the last owner process exits? Can you explain how pg_read_barrier() in below code makes this access safe? LWLockWakeup() { .. + pg_read_barrier(); /* pairs with nwaiters-- */ + if (!BOOL_ACCESS_ONCE(lock-releaseOK)) .. } With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-10-22 13:32:07 +0530, Amit Kapila wrote: On Tue, Oct 21, 2014 at 7:56 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-25 19:06:32 +0530, Amit Kapila wrote: Today, I have verified all previous comments raised by me and looked at new code and below are my findings: 4. LWLockAcquireCommon() { .. if (!LWLockDequeueSelf(l)) { for (;;) { PGSemaphoreLock(proc-sem, false); if (!proc-lwWaiting) break; extraWaits++; } lock-releaseOK = true; .. } Setting releaseOK in above context might not be required because if the control comes in this part of code, it will not retry to acquire another time. Hm. You're probably right. You have agreed to fix this comment, but it seems you have forgot to change it. After I've thought more about it, it's is actually required. This essentially *is* a retry. Someobdy woke us up, which is where releaseOK is supposed to be set. 11. LWLockRelease() { .. PRINT_LWDEBUG(LWLockRelease, lock, mode); } Shouldn't this be in begining of LWLockRelease function rather than after processing held_lwlocks array? Ok. You have agreed to fix this comment, but it seems you have forgot to change it. Below comment doesn't seem to be adressed? LWLockAcquireOrWait() { .. LOG_LWDEBUG(LWLockAcquireOrWait, lockid, mode, succeeded); .. } a. such a log is not there in any other LWLock.. variants, if we want to introduce it, then shouldn't it be done at other places as well. I think you're placing unneccessarily high consistency constraints on a debugging feature here. Below point is yet to be resolved. 12. #ifdef LWLOCK_DEBUG lock-owner = MyProc; #endif Shouldn't it be reset in LWLockRelease? That's actually intentional. It's quite useful to know the last owner when debugging lwlock code. Won't it cause any problem if the last owner process exits? No. PGPROCs aren't deallocated or anything. And it's a debugging only variable. Can you explain how pg_read_barrier() in below code makes this access safe? LWLockWakeup() { .. + pg_read_barrier(); /* pairs with nwaiters-- */ + if (!BOOL_ACCESS_ONCE(lock-releaseOK)) .. } What's the concern you have? Full memory barriers (the atomic nwaiters--) pair with read memory barriers. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-10-21 19:56:05 +0530, Amit Kapila wrote: On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-25 19:06:32 +0530, Amit Kapila wrote: 2. LWLockWakeup() { .. #ifdef LWLOCK_STATS lwstats-spin_delay_count += SpinLockAcquire(lock-mutex); #else SpinLockAcquire(lock-mutex); #endif .. } Earlier while releasing lock, we don't count it towards LWLock stats spin_delay_count. I think if we see other places in lwlock.c, it only gets counted when we try to acquire it in a loop. I think the previous situation was clearly suboptimal. I've now modified things so all spinlock acquirations are counted. Code has mainly 4 stats (sh_acquire_count, ex_acquire_count, block_count, spin_delay_count) to track, if I try to see all stats together to understand the contention situation, the unpatched code makes sense. I don't think it does. It completely disregards that the contention may actually be in LWLockRelease(). That contributes to to spinlock contention just as much as LWLockAcquire(). spin_delay_count gives how much delay has happened to acquire spinlock which when combined with other stats gives the clear situation about the contention around aquisation of corresponding LWLock. Now if we want to count the spin lock delay for Release call as well, then the meaning of the stat is getting changed. It might be that new meaning of spin_delay_count stat is more useful in some situations, however the older one has its own benefits, so I am not sure if changing this as part of this patch is the best thing to do. In which case does the old definition make sense, where the new one doesn't? I don't think it exists. And changing it here seems to make sense because spinlock contention fundamentally changes it meaning for lwlocks anyway as in most paths we don't take a spinlock anymore. 5. LWLockWakeup() { .. dlist_foreach_modify(iter, (dlist_head *) wakeup) { PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur); LOG_LWDEBUG(LWLockRelease, l, mode, release waiter); dlist_delete(waiter-lwWaitLink); pg_write_barrier(); waiter-lwWaiting = false; PGSemaphoreUnlock(waiter-sem); } .. } Why can't we decrement the nwaiters after waking up? I don't think there is any major problem even if callers do that themselves, but in some rare cases LWLockRelease() might spuriously assume that there are some waiters and tries to call LWLockWakeup(). Although this doesn't create any problem, keeping the value sane is good unless there is some problem in doing so. That won't work because then LWLockWakeup() wouldn't be called when necessary - precisely because nwaiters is 0. The reason I've done so is that it's otherwise much harder to debug issues where there are backend that have been woken up already, but haven't rerun yet. Without this there's simply no evidence of that state. I can't see this being relevant for performance, so I'd rather have it stay that way. I am not sure what useful information we can get during debugging by not doing this in LWLockWakeup() It's useful because you can detect backends that have been scheduled to acquire the lock, but haven't yet. They're otherwise invisible. and w.r.t performance it can lead extra function call, few checks and I think in some cases even can acquire/release spinlock. I fail to see how that could be the case. And again, this is code that's only executed around a couple syscalls. And the cacheline will be touched around there *anyway*. 6. LWLockWakeup() { .. dlist_foreach_modify(iter, (dlist_head *) l-waiters) { .. if (wokeup_somebody waiter-lwWaitMode == LW_EXCLUSIVE) continue; .. if (waiter-lwWaitMode != LW_WAIT_UNTIL_FREE) { .. wokeup_somebody = true; } .. } .. } a. IIUC above logic, if the waiter queue is as follows: (S-Shared; X-Exclusive) S X S S S X S S it can skip the exclusive waiters and release shared waiter. If my understanding is right, then I think instead of continue, there should be *break* in above logic. No, it looks correct to me. What happened is that the first S was woken up. So there's no point in waking up an exclusive locker, but further non-exclusive lockers can be woken up. Okay, even then it makes the current logic of wakingup different which I am not sure is what this patch is intended for. It's already done in a separate patch... b. Consider below sequence of waiters: (S-Shared; X-Exclusive) S S X S S I think as per un-patched code, it will wakeup waiters uptill (including) first Exclusive, but patch will wake up uptill (*excluding*) first Exclusive. I don't think the current code does that. LWLockRelease() { .. /* * If the front waiter wants exclusive lock, awaken him only. * Otherwise awaken as many waiters as want
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Fri, Oct 17, 2014 at 11:41 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-17 17:14:16 +0530, Amit Kapila wrote: On Tue, Oct 14, 2014 at 11:34 AM, Amit Kapila amit.kapil...@gmail.com wrote: HEAD – commit 494affb + wait free lw_shared_v2 Shared_buffers=8GB; Scale Factor = 3000 Client Count/No. Of Runs (tps) 64 128 Run-1 286209 274922 Run-2 289101 274495 Run-3 289639 273633 So here the results with LW_SHARED were consistently better, right? Yes. You saw performance degradations here earlier? Yes. So I am planning to proceed further with the review/test of your latest patch. According to me, below things are left from myside: a. do some basic tpc-b tests with patch I have done few tests, the results of which are below, the data indicates that neither there is any noticeable gain nor any noticeable loss on tpc-b tests which I think is what could have been expected of this patch. There is slight variation at few client counts (for sync_commit =off, at 32 and 128), however I feel that is just noise as I don't see any general trend. Performance Data IBM POWER-8 24 cores, 192 hardware threads RAM = 492GB Database Locale =C max_connections =300 checkpoint_segments=300 checkpoint_timeout=15min maintenance_work_mem = 1GB checkpoint_completion_target = 0.9 Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8) Duration of each individual run = 30mins Test mode - tpc-b Below data is median of 3 runs, detailed data is attached with this mail. Scale_factor =3000; shared_buffers=8GB; Patch/Client_count 8 16 32 64 128 HEAD 3849 4889 3569 3845 4547 LW_SHARED 3844 4787 3532 3814 4408 Scale_factor =3000; shared_buffers=8GB; synchronous_commit=off; Patch/Client_count 8 16 32 64 128 HEAD 5966 8297 10084 9348 8836 LW_SHARED 6070 8612 8839 9503 8584 While doing performance tests, I noticed a hang at higher client counts with patch. I have tried to check call stack for few of processes and it is as below: #0 0x008010933e54 in .semop () from /lib64/libc.so.6 #1 0x10286e48 in .PGSemaphoreLock () #2 0x102f68bc in .LWLockAcquire () #3 0x102d1ca0 in .ReadBuffer_common () #4 0x102d2ae0 in .ReadBufferExtended () #5 0x100a57d8 in ._bt_getbuf () #6 0x100a6210 in ._bt_getroot () #7 0x100aa910 in ._bt_search () #8 0x100ab494 in ._bt_first () #9 0x100a8e84 in .btgettuple () .. #0 0x008010933e54 in .semop () from /lib64/libc.so.6 #1 0x10286e48 in .PGSemaphoreLock () #2 0x102f68bc in .LWLockAcquire () #3 0x102d1ca0 in .ReadBuffer_common () #4 0x102d2ae0 in .ReadBufferExtended () #5 0x100a57d8 in ._bt_getbuf () #6 0x100a6210 in ._bt_getroot () #7 0x100aa910 in ._bt_search () #8 0x100ab494 in ._bt_first () ... The test configuration is as below: Test env - Power - 7 (hydra) scale_factor - 3000 shared_buffers - 8GB test mode - pgbench read only test execution - ./pgbench -c 128 -j 128 -T 1800 -S -M prepared postgres I have ran it for half an hour, but it doesn't came out even after ~2 hours. It doesn't get reproduced every time, currently I am able to reproduce it and the m/c is in same state, if you want any info, let me know (unfortunately binaries are in release mode, so might not get enough information). b. re-review latest version posted by you Cool! I will post my feedback for code separately, once I am able to completely review the new versions. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com perf_lwlock_contention_tpcb_data_v1.ods Description: application/vnd.oasis.opendocument.spreadsheet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-25 19:06:32 +0530, Amit Kapila wrote: 2. LWLockWakeup() { .. #ifdef LWLOCK_STATS lwstats-spin_delay_count += SpinLockAcquire(lock-mutex); #else SpinLockAcquire(lock-mutex); #endif .. } Earlier while releasing lock, we don't count it towards LWLock stats spin_delay_count. I think if we see other places in lwlock.c, it only gets counted when we try to acquire it in a loop. I think the previous situation was clearly suboptimal. I've now modified things so all spinlock acquirations are counted. Code has mainly 4 stats (sh_acquire_count, ex_acquire_count, block_count, spin_delay_count) to track, if I try to see all stats together to understand the contention situation, the unpatched code makes sense. spin_delay_count gives how much delay has happened to acquire spinlock which when combined with other stats gives the clear situation about the contention around aquisation of corresponding LWLock. Now if we want to count the spin lock delay for Release call as well, then the meaning of the stat is getting changed. It might be that new meaning of spin_delay_count stat is more useful in some situations, however the older one has its own benefits, so I am not sure if changing this as part of this patch is the best thing to do. 3. LWLockRelease() { .. /* grant permission to run, even if a spurious share lock increases lockcount */ else if (mode == LW_EXCLUSIVE have_waiters) check_waiters = true; /* nobody has this locked anymore, potential exclusive lockers get a chance */ else if (lockcount == 0 have_waiters) check_waiters = true; .. } It seems comments have been reversed in above code. No, they look right. But I've expanded them in the version I'm going to post in a couple minutes. okay. 5. LWLockWakeup() { .. dlist_foreach_modify(iter, (dlist_head *) wakeup) { PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur); LOG_LWDEBUG(LWLockRelease, l, mode, release waiter); dlist_delete(waiter-lwWaitLink); pg_write_barrier(); waiter-lwWaiting = false; PGSemaphoreUnlock(waiter-sem); } .. } Why can't we decrement the nwaiters after waking up? I don't think there is any major problem even if callers do that themselves, but in some rare cases LWLockRelease() might spuriously assume that there are some waiters and tries to call LWLockWakeup(). Although this doesn't create any problem, keeping the value sane is good unless there is some problem in doing so. That won't work because then LWLockWakeup() wouldn't be called when necessary - precisely because nwaiters is 0. The reason I've done so is that it's otherwise much harder to debug issues where there are backend that have been woken up already, but haven't rerun yet. Without this there's simply no evidence of that state. I can't see this being relevant for performance, so I'd rather have it stay that way. I am not sure what useful information we can get during debugging by not doing this in LWLockWakeup() and w.r.t performance it can lead extra function call, few checks and I think in some cases even can acquire/release spinlock. 6. LWLockWakeup() { .. dlist_foreach_modify(iter, (dlist_head *) l-waiters) { .. if (wokeup_somebody waiter-lwWaitMode == LW_EXCLUSIVE) continue; .. if (waiter-lwWaitMode != LW_WAIT_UNTIL_FREE) { .. wokeup_somebody = true; } .. } .. } a. IIUC above logic, if the waiter queue is as follows: (S-Shared; X-Exclusive) S X S S S X S S it can skip the exclusive waiters and release shared waiter. If my understanding is right, then I think instead of continue, there should be *break* in above logic. No, it looks correct to me. What happened is that the first S was woken up. So there's no point in waking up an exclusive locker, but further non-exclusive lockers can be woken up. Okay, even then it makes the current logic of wakingup different which I am not sure is what this patch is intended for. b. Consider below sequence of waiters: (S-Shared; X-Exclusive) S S X S S I think as per un-patched code, it will wakeup waiters uptill (including) first Exclusive, but patch will wake up uptill (*excluding*) first Exclusive. I don't think the current code does that. LWLockRelease() { .. /* * If the front waiter wants exclusive lock, awaken him only. * Otherwise awaken as many waiters as want shared access. */ if (proc- lwWaitMode != LW_EXCLUSIVE) { while (proc-lwWaitLink != NULL proc-lwWaitLink-lwWaitMode != LW_EXCLUSIVE) { if (proc-lwWaitMode != LW_WAIT_UNTIL_FREE) releaseOK = false; proc = proc-lwWaitLink; } } /* proc is now the last PGPROC to be released */ lock-head = proc-lwWaitLink; proc-lwWaitLink = NULL; .. } In the above code, if the first waiter to be woken up is Exclusive waiter, then it will woke that waiter, else shared waiters till it
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Tue, Oct 14, 2014 at 11:34 AM, Amit Kapila amit.kapil...@gmail.com wrote: I am not sure why we are seeing difference even though running on same m/c with same configuration. I have tried many times, but I could not get the numbers you have posted above with HEAD, however now trying with the latest version [1] posted by you, everything seems to be fine at this workload. The data at higher client count is as below: HEAD – commit 494affb Shared_buffers=8GB; Scale Factor = 3000 Client Count/No. Of Runs (tps) 64 128 Run-1 271799 24 Run-2 274341 245207 Run-3 275019 252258 HEAD – commit 494affb + wait free lw_shared_v2 Shared_buffers=8GB; Scale Factor = 3000 Client Count/No. Of Runs (tps) 64 128 Run-1 286209 274922 Run-2 289101 274495 Run-3 289639 273633 So I am planning to proceed further with the review/test of your latest patch. According to me, below things are left from myside: a. do some basic tpc-b tests with patch b. re-review latest version posted by you I know that you have posted optimization into StrategyGetBuffer() in this thread, however I feel we can evaluate it separately unless you are of opinion that both the patches should go together. [1] http://www.postgresql.org/message-id/20141010111027.gc6...@alap3.anarazel.de With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-17 17:14:16 +0530, Amit Kapila wrote: On Tue, Oct 14, 2014 at 11:34 AM, Amit Kapila amit.kapil...@gmail.com wrote: I am not sure why we are seeing difference even though running on same m/c with same configuration. I have tried many times, but I could not get the numbers you have posted above with HEAD, however now trying with the latest version [1] posted by you, everything seems to be fine at this workload. The data at higher client count is as below: I'll try to reproduce it next week. But I don't think it matters all that much. Personally so far the performance numbers don't seem to indicate much reason to wait any further. We sure improve further, but I don't see much reason to wait because of that. HEAD – commit 494affb Shared_buffers=8GB; Scale Factor = 3000 Client Count/No. Of Runs (tps) 64 128 Run-1 271799 24 Run-2 274341 245207 Run-3 275019 252258 HEAD – commit 494affb + wait free lw_shared_v2 Shared_buffers=8GB; Scale Factor = 3000 Client Count/No. Of Runs (tps) 64 128 Run-1 286209 274922 Run-2 289101 274495 Run-3 289639 273633 So here the results with LW_SHARED were consistently better, right? You saw performance degradations here earlier? So I am planning to proceed further with the review/test of your latest patch. According to me, below things are left from myside: a. do some basic tpc-b tests with patch b. re-review latest version posted by you Cool! I know that you have posted optimization into StrategyGetBuffer() in this thread, however I feel we can evaluate it separately unless you are of opinion that both the patches should go together. [1] http://www.postgresql.org/message-id/20141010111027.gc6...@alap3.anarazel.de No, I don't think they should go together - I wrote that patch because it was the bottleneck in the possibly regressing test and I wanted to see the full effect. Although I do think we should apply it ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Sat, Oct 11, 2014 at 7:02 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Sat, Oct 11, 2014 at 6:40 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-11 07:26:57 +0530, Amit Kapila wrote: On Sat, Oct 11, 2014 at 7:00 AM, Andres Freund and...@2ndquadrant.com And since your general performance numbers are a fair bit lower than what I see with, hopefully, the same code on the same machine... You have reported numbers at 1000 scale factor and mine were at 3000 scale factor, so I think the difference is expected. The numbers for 3000 show pretty much the same: SCALE 128 160 175 HEAD352113 339005 336491 LW_SHARED 365874 347931 342528 Hm. I wonder if you're using pgbench without -M prepared? No, I use below statement: ./pgbench -c 128 -j 128 -T 300 -S -M prepared postgres That'd about explain the difference. Here I think first thing to clarify is why the numbers on HEAD are different? I have taken the latest code and recreated the database and tried again on power-7 m/c (hydra) and below is the result: Result with -M prepared: Duration of each individual run - 5 mins HEAD – commit 494affb Shared_buffers=8GB; Scale Factor = 3000 Client Count/No. Of Runs (tps) 128 160 Run-1 258385 239908 Run-2 257835 238624 Run-3 255967 237905 Result without -M prepared: Duration of each individual run - 5 mins HEAD – commit 494affb Shared_buffers=8GB; Scale Factor = 3000 Client Count/No. Of Runs (tps) 128 160 Run-1 228747 220961 Run-2 229817 214464 Run-3 227386 216619 I am not sure why we are seeing difference even though running on same m/c with same configuration. I think there is some difference in the way we are running tests, if you don't mind could you please share the exact steps and non-default postgresql.conf settings with me. The below list of things could be useful for me to reproduce the numbers you are seeing: a. build steps (any script you are using) b. non-default postgresql.conf settings c. Exact pgbench statements used With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Fri, Oct 10, 2014 at 11:00 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-10 16:41:39 +0200, Andres Freund wrote: FWIW, the profile always looks like - 48.61% postgres postgres [.] s_lock - s_lock + 96.67% StrategyGetBuffer + 1.19% UnpinBuffer + 0.90% PinBuffer + 0.70% hash_search_with_hash_value + 3.11% postgres postgres [.] GetSnapshotData + 2.47% postgres postgres [.] StrategyGetBuffer + 1.93% postgres [kernel.kallsyms] [k] copy_user_generic_string + 1.28% postgres postgres [.] hash_search_with_hash_value - 1.27% postgres postgres [.] LWLockAttemptLock - LWLockAttemptLock - 97.78% LWLockAcquire + 38.76% ReadBuffer_common + 28.62% _bt_getbuf + 8.59% _bt_relandgetbuf + 6.25% GetSnapshotData + 5.93% VirtualXactLockTableInsert + 3.95% VirtualXactLockTableCleanup + 2.35% index_fetch_heap + 1.66% StartBufferIO + 1.56% LockReleaseAll + 1.55% _bt_next + 0.78% LockAcquireExtended + 1.47% _bt_next + 0.75% _bt_relandgetbuf to me. Now that's with the client count 496, but it's similar with lower counts. BTW, that profile *clearly* indicates we should make StrategyGetBuffer() smarter. Which is nearly trivial now that atomics are in. Check out the attached WIP patch which eliminates the spinlock from StrategyGetBuffer() unless there's buffers on the freelist. Is this safe? + victim = pg_atomic_fetch_add_u32(StrategyControl-nextVictimBuffer, 1); - if (++StrategyControl-nextVictimBuffer = NBuffers) + buf = BufferDescriptors[victim % NBuffers]; + + if (victim % NBuffers == 0) snip I don't think there's any guarantee you could keep nextVictimBuffer from wandering off the end. You could buy a little safety by CAS'ing zero in instead, followed by atomic increment. However that's still pretty dodgy IMO and requires two atomic ops which will underperform the spin in some situations. I like Robert's idea to keep the spinlock but preallocate a small amount of buffers, say 8 or 16. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-14 08:40:49 -0500, Merlin Moncure wrote: On Fri, Oct 10, 2014 at 11:00 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-10 16:41:39 +0200, Andres Freund wrote: FWIW, the profile always looks like - 48.61% postgres postgres [.] s_lock - s_lock + 96.67% StrategyGetBuffer + 1.19% UnpinBuffer + 0.90% PinBuffer + 0.70% hash_search_with_hash_value + 3.11% postgres postgres [.] GetSnapshotData + 2.47% postgres postgres [.] StrategyGetBuffer + 1.93% postgres [kernel.kallsyms] [k] copy_user_generic_string + 1.28% postgres postgres [.] hash_search_with_hash_value - 1.27% postgres postgres [.] LWLockAttemptLock - LWLockAttemptLock - 97.78% LWLockAcquire + 38.76% ReadBuffer_common + 28.62% _bt_getbuf + 8.59% _bt_relandgetbuf + 6.25% GetSnapshotData + 5.93% VirtualXactLockTableInsert + 3.95% VirtualXactLockTableCleanup + 2.35% index_fetch_heap + 1.66% StartBufferIO + 1.56% LockReleaseAll + 1.55% _bt_next + 0.78% LockAcquireExtended + 1.47% _bt_next + 0.75% _bt_relandgetbuf to me. Now that's with the client count 496, but it's similar with lower counts. BTW, that profile *clearly* indicates we should make StrategyGetBuffer() smarter. Which is nearly trivial now that atomics are in. Check out the attached WIP patch which eliminates the spinlock from StrategyGetBuffer() unless there's buffers on the freelist. Is this safe? + victim = pg_atomic_fetch_add_u32(StrategyControl-nextVictimBuffer, 1); - if (++StrategyControl-nextVictimBuffer = NBuffers) + buf = BufferDescriptors[victim % NBuffers]; + + if (victim % NBuffers == 0) snip I don't think there's any guarantee you could keep nextVictimBuffer from wandering off the end. Not sure what you mean? It'll cycle around at 2^32. The code doesn't try to avoid nextVictimBuffer from going above NBuffers. To avoid overrunning BufferDescriptors I'm doing % NBuffers. Yes, that'll have the disadvantage of the first buffers being slightly more likely to be hit, but for that to be relevant you'd need unrealistically large amounts of shared_buffers. You could buy a little safety by CAS'ing zero in instead, followed by atomic increment. However that's still pretty dodgy IMO and requires two atomic ops which will underperform the spin in some situations. I like Robert's idea to keep the spinlock but preallocate a small amount of buffers, say 8 or 16. I think that's pretty much orthogonal. I *do* think it makes sense to increment nextVictimBuffer in bigger steps. But the above doesn't prohibit doing so - and it'll still be be much better without the spinlock. Same number of atomics, but no potential of spinning and no potential of being put to sleep while holding the spinlock. This callsite has a comparatively large likelihood of being put to sleep while holding a spinlock because it can run for a very long time doing nothing but spinlocking. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Tue, Oct 14, 2014 at 8:58 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-14 08:40:49 -0500, Merlin Moncure wrote: On Fri, Oct 10, 2014 at 11:00 AM, Andres Freund and...@2ndquadrant.com wrote: Which is nearly trivial now that atomics are in. Check out the attached WIP patch which eliminates the spinlock from StrategyGetBuffer() unless there's buffers on the freelist. Is this safe? + victim = pg_atomic_fetch_add_u32(StrategyControl-nextVictimBuffer, 1); - if (++StrategyControl-nextVictimBuffer = NBuffers) + buf = BufferDescriptors[victim % NBuffers]; + + if (victim % NBuffers == 0) snip I don't think there's any guarantee you could keep nextVictimBuffer from wandering off the end. Not sure what you mean? It'll cycle around at 2^32. The code doesn't try to avoid nextVictimBuffer from going above NBuffers. To avoid overrunning BufferDescriptors I'm doing % NBuffers. Yes, that'll have the disadvantage of the first buffers being slightly more likely to be hit, but for that to be relevant you'd need unrealistically large amounts of shared_buffers. Right -- my mistake. That's clever. I think that should work well. I think that's pretty much orthogonal. I *do* think it makes sense to increment nextVictimBuffer in bigger steps. But the above doesn't prohibit doing so - and it'll still be be much better without the spinlock. Same number of atomics, but no potential of spinning and no potential of being put to sleep while holding the spinlock. This callsite has a comparatively large likelihood of being put to sleep while holding a spinlock because it can run for a very long time doing nothing but spinlocking. A while back, I submitted a minor tweak to the clock sweep so that, instead of spinlocking every single buffer header as it swept it just did a single TAS as a kind of a trylock and punted to the next buffer if the test failed on the principle there's not good reason to hang around. You only spin if you passed the first test; that should reduce the likelihood of actual spinning to approximately zero. I still maintain there's no reason not to do that (I couldn't show a benefit but that was because mapping list locking was masking any clock sweep contention at that time). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Wed, Oct 15, 2014 at 12:06 AM, Merlin Moncure mmonc...@gmail.com wrote: A while back, I submitted a minor tweak to the clock sweep so that, instead of spinlocking every single buffer header as it swept it just did a single TAS as a kind of a trylock and punted to the next buffer if the test failed on the principle there's not good reason to hang around. You only spin if you passed the first test; that should reduce the likelihood of actual spinning to approximately zero. I still maintain there's no reason not to do that (I couldn't show a benefit but that was because mapping list locking was masking any clock sweep contention at that time). If you feel that can now show the benefit, then I think you can rebase it for the coming commit fest (which is going to start today). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
Hi, On 2014-10-11 07:26:57 +0530, Amit Kapila wrote: On Sat, Oct 11, 2014 at 7:00 AM, Andres Freund and...@2ndquadrant.com And since your general performance numbers are a fair bit lower than what I see with, hopefully, the same code on the same machine... You have reported numbers at 1000 scale factor and mine were at 3000 scale factor, so I think the difference is expected. The numbers for 3000 show pretty much the same: SCALE 128 160 175 HEAD352113 339005 336491 LW_SHARED 365874 347931 342528 Hm. I wonder if you're using pgbench without -M prepared? That'd about explain the difference. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Sat, Oct 11, 2014 at 6:40 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-11 07:26:57 +0530, Amit Kapila wrote: On Sat, Oct 11, 2014 at 7:00 AM, Andres Freund and...@2ndquadrant.com And since your general performance numbers are a fair bit lower than what I see with, hopefully, the same code on the same machine... You have reported numbers at 1000 scale factor and mine were at 3000 scale factor, so I think the difference is expected. The numbers for 3000 show pretty much the same: SCALE 128 160 175 HEAD352113 339005 336491 LW_SHARED 365874 347931 342528 Hm. I wonder if you're using pgbench without -M prepared? No, I use below statement: ./pgbench -c 128 -j 128 -T 300 -S -M prepared postgres That'd about explain the difference. Here I think first thing to clarify is why the numbers on HEAD are different? Another thing is that I generally see difference in numbers at 1000 and 3000 scale factor (although I have not run lately), but in your case the numbers are almost same. I will try once more by cleaning every thing(installation, data_dir, etc..) but not today... With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-11 15:10:45 +0200, Andres Freund wrote: Hi, On 2014-10-11 07:26:57 +0530, Amit Kapila wrote: On Sat, Oct 11, 2014 at 7:00 AM, Andres Freund and...@2ndquadrant.com And since your general performance numbers are a fair bit lower than what I see with, hopefully, the same code on the same machine... You have reported numbers at 1000 scale factor and mine were at 3000 scale factor, so I think the difference is expected. The numbers for 3000 show pretty much the same: SCALE 128 160 175 HEAD 352113 339005 336491 LW_SHARED 365874 347931 342528 Hm. I wonder if you're using pgbench without -M prepared? That'd about explain the difference. Started a test for a run without -M prepared (i.e. -M simple): SCALE 1 2 4 8 16 32 64 96 128 160 196 HEAD796815132 31147 63395 123551 180436 242098 263625 249652 244240 232679 LW_SHARED 826815032 31267 63911 118943 180447 247067 269262 259165 247166 231292 This really doesn't look exciting to me. scale 200, -M prepared: SCALE 1 2 4 8 16 32 64 96 128 160 196 HEAD13032 24712 50967 103801 201745 279226 380844 392623 379953 357206 361072 LW_SHARED 12997 25134 51119 102920 199597 282511 422424 460222 447662 436959 418519 My guess is that the primary benefit on systems with few sockets, like this, is that with the new code there's far fewer problems with processes sleeping (being scheduled out) while holding a spinlock. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
Hi, On 2014-10-10 10:13:03 +0530, Amit Kapila wrote: I have done few performance tests for above patches and results of same is as below: Cool, thanks. Performance Data -- IBM POWER-7 16 cores, 64 hardware threads RAM = 64GB max_connections =210 Database Locale =C checkpoint_segments=256 checkpoint_timeout=35min shared_buffers=8GB Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8) Duration of each individual run = 5mins Test type - read only pgbench with -M prepared Other Related information about test a. This is the data for median of 3 runs, the detailed data of individual run is attached with mail. b. I have applied both the patches to take performance data. Scale Factor - 100 Patch_ver/Client_count 1 8 16 32 64 128 HEAD 13344 106921 196629 295123 377846 333928 PATCH 13662 106179 203960 298955 452638 465671 Scale Factor - 3000 Patch_ver/Client_count 8 16 32 64 128 160 HEAD 86920 152417 231668 280827 257093 255122 PATCH 87552 160313 230677 276186 248609 244372 Observations -- a. The patch performs really well (increase upto ~40%) incase all the data fits in shared buffers (scale factor -100). b. Incase data doesn't fit in shared buffers, but fits in RAM (scale factor -3000), there is performance increase upto 16 client count, however after that it starts dipping (in above config unto ~4.4%). Hm. Interesting. I don't see that dip on x86. The above data shows that the patch improves performance for cases when there is shared LWLock contention, however there is a slight performance dip in case of Exclusive LWLocks (at scale factor 3000, it needs exclusive LWLocks for buf mapping tables). Now I am not sure if this is the worst case dip or under similar configurations the performance dip can be higher, because the trend shows that dip is increasing with more client counts. Brief Analysis of code w.r.t performance dip - Extra Instructions w.r.t Head in Acquire Exclusive lock path a. Attempt lock twice b. atomic operations for nwaiters in LWLockQueueSelf() and LWLockAcquireCommon() c. Now we need to take spinlock twice, once for self queuing and then again for setting releaseOK. d. few function calls and some extra checks Hm. I can't really see the number of atomics itself matter - a spinning lock will do many more atomic ops than this. But I wonder whether we could get rid of the releaseOK lock. Should be quite possible. Now probably these shouldn't matter much in case backend needs to wait for other Exclusive locker, but I am not sure what else could be the reason for dip in case we need to have Exclusive LWLocks. Any chance to get a profile? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-10-08 20:07:35 -0400, Robert Haas wrote: On Wed, Oct 8, 2014 at 2:04 PM, Andres Freund and...@2ndquadrant.com wrote: So, what makes it work for me (among other unrelated stuff) seems to be the following in .gdbinit, defineing away some things that gdb doesn't handle: macro define __builtin_offsetof(T, F) ((int) (((T *) 0)-F)) macro define __extension__ macro define AssertVariableIsOfTypeMacro(x, y) ((void)0) Additionally I have -ggdb -g3 in CFLAGS. That way gdb knows about postgres' macros. At least if you're in the right scope. As an example, the following works: (gdb) p dlist_is_empty(BackendList) ? NULL : dlist_head_element(Backend, elem, BackendList) Ah, cool. I'll try that. If that works for you, should we put it somewhere in the docs? If so, where? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
Hi Robert, On 2014-10-08 16:01:53 -0400, Robert Haas wrote: [ comment fixes ] Thanks, I've incorporated these + a bit more. Could you otherwise make sense of the explanation and the algorithm? +/* yipeyyahee */ Although this will be clear to individuals with a good command of English, I suggest avoiding such usages. I've removed them with a heavy heart. These are heartfelt emotions from getting the algorithm to work (:P) I've attached these fixes + the removal of spinlocks around releaseOK as follow up patches. Obviously they'll be merged into the other patch, but sounds useful to be able see them separately. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 6885a15cc6f2e193ff575a4463d90ad252d74f5e Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Tue, 7 Oct 2014 15:32:34 +0200 Subject: [PATCH 1/4] Convert the PGPROC-lwWaitLink list into a dlist instead of open coding it. Besides being shorter and much easier to read it: * changes the logic in LWLockRelease() to release all shared lockers when waking up any. This can yield some significant performance improvements - and the fairness isn't really much worse than before, as we always allowed new shared lockers to jump the queue. * adds a memory pg_write_barrier() in the wakeup paths between dequeuing and unsetting -lwWaiting. That was always required on weakly ordered machines, but f4077cda2 made it more urgent. Author: Andres Freund --- src/backend/access/transam/twophase.c | 1 - src/backend/storage/lmgr/lwlock.c | 151 +- src/backend/storage/lmgr/proc.c | 2 - src/include/storage/lwlock.h | 5 +- src/include/storage/proc.h| 3 +- 5 files changed, 60 insertions(+), 102 deletions(-) diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index d5409a6..6401943 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -389,7 +389,6 @@ MarkAsPreparing(TransactionId xid, const char *gid, proc-roleId = owner; proc-lwWaiting = false; proc-lwWaitMode = 0; - proc-lwWaitLink = NULL; proc-waitLock = NULL; proc-waitProcLock = NULL; for (i = 0; i NUM_LOCK_PARTITIONS; i++) diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index 9fe6855..e6f9158 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -35,6 +35,7 @@ #include miscadmin.h #include pg_trace.h #include replication/slot.h +#include storage/barrier.h #include storage/ipc.h #include storage/predicate.h #include storage/proc.h @@ -115,9 +116,9 @@ inline static void PRINT_LWDEBUG(const char *where, const LWLock *lock) { if (Trace_lwlocks) - elog(LOG, %s(%s %d): excl %d shared %d head %p rOK %d, + elog(LOG, %s(%s %d): excl %d shared %d rOK %d, where, T_NAME(lock), T_ID(lock), - (int) lock-exclusive, lock-shared, lock-head, + (int) lock-exclusive, lock-shared, (int) lock-releaseOK); } @@ -475,8 +476,7 @@ LWLockInitialize(LWLock *lock, int tranche_id) lock-exclusive = 0; lock-shared = 0; lock-tranche = tranche_id; - lock-head = NULL; - lock-tail = NULL; + dlist_init(lock-waiters); } @@ -615,12 +615,7 @@ LWLockAcquireCommon(LWLock *lock, LWLockMode mode, uint64 *valptr, uint64 val) proc-lwWaiting = true; proc-lwWaitMode = mode; - proc-lwWaitLink = NULL; - if (lock-head == NULL) - lock-head = proc; - else - lock-tail-lwWaitLink = proc; - lock-tail = proc; + dlist_push_head(lock-waiters, proc-lwWaitLink); /* Can release the mutex now */ SpinLockRelease(lock-mutex); @@ -836,12 +831,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode) proc-lwWaiting = true; proc-lwWaitMode = LW_WAIT_UNTIL_FREE; - proc-lwWaitLink = NULL; - if (lock-head == NULL) - lock-head = proc; - else - lock-tail-lwWaitLink = proc; - lock-tail = proc; + dlist_push_head(lock-waiters, proc-lwWaitLink); /* Can release the mutex now */ SpinLockRelease(lock-mutex); @@ -997,13 +987,8 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval) */ proc-lwWaiting = true; proc-lwWaitMode = LW_WAIT_UNTIL_FREE; - proc-lwWaitLink = NULL; - /* waiters are added to the front of the queue */ - proc-lwWaitLink = lock-head; - if (lock-head == NULL) - lock-tail = proc; - lock-head = proc; + dlist_push_head(lock-waiters, proc-lwWaitLink); /* Can release the mutex now */ SpinLockRelease(lock-mutex); @@ -1079,9 +1064,10 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval) void LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val) { - PGPROC *head; - PGPROC *proc; - PGPROC *next; + dlist_head wakeup; + dlist_mutable_iter iter; + + dlist_init(wakeup); /* Acquire mutex. Time spent holding mutex
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-10 10:13:03 +0530, Amit Kapila wrote: I have done few performance tests for above patches and results of same is as below: Cool, thanks. Performance Data -- IBM POWER-7 16 cores, 64 hardware threads RAM = 64GB max_connections =210 Database Locale =C checkpoint_segments=256 checkpoint_timeout=35min shared_buffers=8GB Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8) Duration of each individual run = 5mins Test type - read only pgbench with -M prepared Other Related information about test a. This is the data for median of 3 runs, the detailed data of individual run is attached with mail. b. I have applied both the patches to take performance data. Observations -- a. The patch performs really well (increase upto ~40%) incase all the data fits in shared buffers (scale factor -100). b. Incase data doesn't fit in shared buffers, but fits in RAM (scale factor -3000), there is performance increase upto 16 client count, however after that it starts dipping (in above config unto ~4.4%). Hm. Interesting. I don't see that dip on x86. Is it possible that implementation of some atomic operation is costlier for particular architecture? I have tried again for scale factor 3000 and could see the dip and this time I have even tried with 175 client count and the dip is approximately 5% which is slightly more than 160 client count. Patch_ver/Client_count 175 HEAD 248374 PATCH 235669 Now probably these shouldn't matter much in case backend needs to wait for other Exclusive locker, but I am not sure what else could be the reason for dip in case we need to have Exclusive LWLocks. Any chance to get a profile? Here it goes.. HEAD - client_count=128 - + 7.53% postgres postgres [.] GetSnapshotData + 3.41% postgres postgres [.] AllocSetAlloc + 2.61% postgres postgres [.] AllocSetFreeIndex + 2.49% postgres postgres [.] _bt_compare + 2.43% postgres [kernel.kallsyms] [k] .__copy_tofrom_user + 2.40% postgres postgres [.] hash_search_with_hash_value + 1.83% postgres postgres [.] tas + 1.29% postgres postgres [.] pg_encoding_mbcliplen + 1.27% postgres postgres [.] MemoryContextCreate + 1.22% postgres postgres [.] MemoryContextAllocZeroAligned + 1.17% postgres postgres [.] hash_seq_search + 0.97% postgres postgres [.] LWLockRelease + 0.96% postgres postgres [.] MemoryContextAllocZero + 0.91% postgres postgres [.] GetPrivateRefCountEntry + 0.82% postgres postgres [.] AllocSetFree + 0.79% postgres postgres [.] LWLockAcquireCommon + 0.78% postgres postgres [.] pfree Detailed Data - - 7.53% postgres postgres [.] GetSnapshotData - GetSnapshotData - 7.46% GetSnapshotData - 7.46% GetTransactionSnapshot - 3.74% exec_bind_message PostgresMain BackendRun BackendStartup ServerLoop PostmasterMain main generic_start_main.isra.0 __libc_start_main 0 - 3.72% PortalStart exec_bind_message PostgresMain BackendRun BackendStartup ServerLoop PostmasterMain main generic_start_main.isra.0 __libc_start_main 0 - 3.41% postgres postgres [.] AllocSetAlloc - AllocSetAlloc - 2.01% AllocSetAlloc 0.81% palloc 0.63% MemoryContextAlloc - 2.61% postgres postgres [.] AllocSetFreeIndex - AllocSetFreeIndex 1.59% AllocSetAlloc 0.79% AllocSetFree - 2.49% postgres postgres [.] _bt_compare - _bt_compare - 1.80% _bt_binsrch - 1.80% _bt_binsrch - 1.21% _bt_search _bt_first Lwlock_contention patches - client_count=128 -- + 7.95% postgres postgres [.] GetSnapshotData + 3.58% postgres postgres [.] AllocSetAlloc + 2.51% postgres postgres [.] _bt_compare + 2.44% postgres postgres [.] hash_search_with_hash_value + 2.33% postgres [kernel.kallsyms] [k]
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-10 17:18:46 +0530, Amit Kapila wrote: On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund and...@2ndquadrant.com wrote: Observations -- a. The patch performs really well (increase upto ~40%) incase all the data fits in shared buffers (scale factor -100). b. Incase data doesn't fit in shared buffers, but fits in RAM (scale factor -3000), there is performance increase upto 16 client count, however after that it starts dipping (in above config unto ~4.4%). Hm. Interesting. I don't see that dip on x86. Is it possible that implementation of some atomic operation is costlier for particular architecture? Yes, sure. And IIRC POWER improved atomics performance considerably for POWER8... I have tried again for scale factor 3000 and could see the dip and this time I have even tried with 175 client count and the dip is approximately 5% which is slightly more than 160 client count. FWIW, the profile always looks like - 48.61% postgres postgres [.] s_lock - s_lock + 96.67% StrategyGetBuffer + 1.19% UnpinBuffer + 0.90% PinBuffer + 0.70% hash_search_with_hash_value + 3.11% postgres postgres [.] GetSnapshotData + 2.47% postgres postgres [.] StrategyGetBuffer + 1.93% postgres [kernel.kallsyms] [k] copy_user_generic_string + 1.28% postgres postgres [.] hash_search_with_hash_value - 1.27% postgres postgres [.] LWLockAttemptLock - LWLockAttemptLock - 97.78% LWLockAcquire + 38.76% ReadBuffer_common + 28.62% _bt_getbuf + 8.59% _bt_relandgetbuf + 6.25% GetSnapshotData + 5.93% VirtualXactLockTableInsert + 3.95% VirtualXactLockTableCleanup + 2.35% index_fetch_heap + 1.66% StartBufferIO + 1.56% LockReleaseAll + 1.55% _bt_next + 0.78% LockAcquireExtended + 1.47% _bt_next + 0.75% _bt_relandgetbuf to me. Now that's with the client count 496, but it's similar with lower counts. BTW, that profile *clearly* indicates we should make StrategyGetBuffer() smarter. Patch_ver/Client_count 175 HEAD 248374 PATCH 235669 Now probably these shouldn't matter much in case backend needs to wait for other Exclusive locker, but I am not sure what else could be the reason for dip in case we need to have Exclusive LWLocks. Any chance to get a profile? Here it goes.. Lwlock_contention patches - client_count=128 -- + 7.95% postgres postgres [.] GetSnapshotData + 3.58% postgres postgres [.] AllocSetAlloc + 2.51% postgres postgres [.] _bt_compare + 2.44% postgres postgres [.] hash_search_with_hash_value + 2.33% postgres [kernel.kallsyms] [k] .__copy_tofrom_user + 2.24% postgres postgres [.] AllocSetFreeIndex + 1.75% postgres postgres [.] pg_atomic_fetch_add_u32_impl Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were the compiler settings you used? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-10 16:41:39 +0200, Andres Freund wrote: FWIW, the profile always looks like - 48.61% postgres postgres [.] s_lock - s_lock + 96.67% StrategyGetBuffer + 1.19% UnpinBuffer + 0.90% PinBuffer + 0.70% hash_search_with_hash_value + 3.11% postgres postgres [.] GetSnapshotData + 2.47% postgres postgres [.] StrategyGetBuffer + 1.93% postgres [kernel.kallsyms] [k] copy_user_generic_string + 1.28% postgres postgres [.] hash_search_with_hash_value - 1.27% postgres postgres [.] LWLockAttemptLock - LWLockAttemptLock - 97.78% LWLockAcquire + 38.76% ReadBuffer_common + 28.62% _bt_getbuf + 8.59% _bt_relandgetbuf + 6.25% GetSnapshotData + 5.93% VirtualXactLockTableInsert + 3.95% VirtualXactLockTableCleanup + 2.35% index_fetch_heap + 1.66% StartBufferIO + 1.56% LockReleaseAll + 1.55% _bt_next + 0.78% LockAcquireExtended + 1.47% _bt_next + 0.75% _bt_relandgetbuf to me. Now that's with the client count 496, but it's similar with lower counts. BTW, that profile *clearly* indicates we should make StrategyGetBuffer() smarter. Which is nearly trivial now that atomics are in. Check out the attached WIP patch which eliminates the spinlock from StrategyGetBuffer() unless there's buffers on the freelist. Test: pgbench -M prepared -P 5 -S -c 496 -j 496 -T 5000 on a scale=1000 database, with 4GB of shared buffers. Before: progress: 40.0 s, 136252.3 tps, lat 3.628 ms stddev 4.547 progress: 45.0 s, 135049.0 tps, lat 3.660 ms stddev 4.515 progress: 50.0 s, 135788.9 tps, lat 3.640 ms stddev 4.398 progress: 55.0 s, 135268.4 tps, lat 3.654 ms stddev 4.469 progress: 60.0 s, 134991.6 tps, lat 3.661 ms stddev 4.739 after: progress: 40.0 s, 207701.1 tps, lat 2.382 ms stddev 3.018 progress: 45.0 s, 208022.4 tps, lat 2.377 ms stddev 2.902 progress: 50.0 s, 209187.1 tps, lat 2.364 ms stddev 2.970 progress: 55.0 s, 206462.7 tps, lat 2.396 ms stddev 2.871 progress: 60.0 s, 210263.8 tps, lat 2.351 ms stddev 2.914 Yes, no kidding. The results are similar, but less extreme, for smaller client counts like 80 or 160. Amit, since your test seems to be currently completely bottlenecked within StrategyGetBuffer(), could you compare with that patch applied to HEAD and the LW_SHARED patch for one client count? That'll may allow us to see a more meaningful profile... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 6b486e5b467e94ab9297d7656a5b39b816c5c55a Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Fri, 10 Oct 2014 17:36:46 +0200 Subject: [PATCH] WIP: lockless StrategyGetBuffer hotpath --- src/backend/storage/buffer/freelist.c | 154 -- 1 file changed, 90 insertions(+), 64 deletions(-) diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c index 5966beb..0c634a0 100644 --- a/src/backend/storage/buffer/freelist.c +++ b/src/backend/storage/buffer/freelist.c @@ -18,6 +18,12 @@ #include storage/buf_internals.h #include storage/bufmgr.h +#include port/atomics.h + + +#define LATCHPTR_ACCESS_ONCE(var) ((Latch *)(*((volatile Latch **)(var +#define INT_ACCESS_ONCE(var) ((int)(*((volatile int *)(var + /* * The shared freelist control information. @@ -27,8 +33,12 @@ typedef struct /* Spinlock: protects the values below */ slock_t buffer_strategy_lock; - /* Clock sweep hand: index of next buffer to consider grabbing */ - int nextVictimBuffer; + /* + * Clock sweep hand: index of next buffer to consider grabbing. Note that + * this isn't a concrete buffer - we only ever increase the value. So, to + * get an actual buffer, it needs to be used modulo NBuffers. + */ + pg_atomic_uint32 nextVictimBuffer; int firstFreeBuffer; /* Head of list of unused buffers */ int lastFreeBuffer; /* Tail of list of unused buffers */ @@ -42,8 +52,8 @@ typedef struct * Statistics. These counters should be wide enough that they can't * overflow during a single bgwriter cycle. */ - uint32 completePasses; /* Complete cycles of the clock sweep */ - uint32 numBufferAllocs; /* Buffers allocated since last reset */ + pg_atomic_uint32 completePasses; /* Complete cycles of the clock sweep */ + pg_atomic_uint32 numBufferAllocs; /* Buffers allocated since last reset */ /* * Notification latch, or NULL if none. See StrategyNotifyBgWriter. @@ -124,87 +134,107 @@ StrategyGetBuffer(BufferAccessStrategy strategy) return buf; } - /* Nope, so lock the freelist */ - SpinLockAcquire(StrategyControl-buffer_strategy_lock); - /* * We count buffer allocation requests so that the bgwriter can estimate * the rate of buffer consumption.
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Fri, Oct 10, 2014 at 8:11 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-10 17:18:46 +0530, Amit Kapila wrote: On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund and...@2ndquadrant.com wrote: Observations -- a. The patch performs really well (increase upto ~40%) incase all the data fits in shared buffers (scale factor -100). b. Incase data doesn't fit in shared buffers, but fits in RAM (scale factor -3000), there is performance increase upto 16 client count, however after that it starts dipping (in above config unto ~4.4%). Hm. Interesting. I don't see that dip on x86. Is it possible that implementation of some atomic operation is costlier for particular architecture? Yes, sure. And IIRC POWER improved atomics performance considerably for POWER8... I have tried again for scale factor 3000 and could see the dip and this time I have even tried with 175 client count and the dip is approximately 5% which is slightly more than 160 client count. FWIW, the profile always looks like: For my tests on Power8, the profile looks somewhat similar to below profile mentioned by you, please see this mail: http://www.postgresql.org/message-id/caa4ek1je9zblhsfiavhd18gdwxux21zfqpjgq_dz_zoa35n...@mail.gmail.com However on Power7, the profile looks different which I have posted above thread. BTW, that profile *clearly* indicates we should make StrategyGetBuffer() smarter. Yeah, even bgreclaimer patch is able to achieve the same, however after that the contention moves to somewhere else as you can see in above link. Here it goes.. Lwlock_contention patches - client_count=128 -- + 7.95% postgres postgres [.] GetSnapshotData + 3.58% postgres postgres [.] AllocSetAlloc + 2.51% postgres postgres [.] _bt_compare + 2.44% postgres postgres [.] hash_search_with_hash_value + 2.33% postgres [kernel.kallsyms] [k] .__copy_tofrom_user + 2.24% postgres postgres [.] AllocSetFreeIndex + 1.75% postgres postgres [.] pg_atomic_fetch_add_u32_impl Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were the compiler settings you used? Nothing specific, for performance tests where I have to take profiles I use below: ./configure --prefix=installation_path CFLAGS=-fno-omit-frame-pointer make With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-11 06:18:11 +0530, Amit Kapila wrote: On Fri, Oct 10, 2014 at 8:11 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-10 17:18:46 +0530, Amit Kapila wrote: On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund and...@2ndquadrant.com wrote: Observations -- a. The patch performs really well (increase upto ~40%) incase all the data fits in shared buffers (scale factor -100). b. Incase data doesn't fit in shared buffers, but fits in RAM (scale factor -3000), there is performance increase upto 16 client count, however after that it starts dipping (in above config unto ~4.4%). Hm. Interesting. I don't see that dip on x86. Is it possible that implementation of some atomic operation is costlier for particular architecture? Yes, sure. And IIRC POWER improved atomics performance considerably for POWER8... I have tried again for scale factor 3000 and could see the dip and this time I have even tried with 175 client count and the dip is approximately 5% which is slightly more than 160 client count. I've run some short tests on hydra: scale 1000: base: 4GB: tps = 296273.004800 (including connections establishing) tps = 296373.978100 (excluding connections establishing) 8GB: tps = 338001.455970 (including connections establishing) tps = 338177.439106 (excluding connections establishing) base + freelist: 4GB: tps = 297057.523528 (including connections establishing) tps = 297156.987418 (excluding connections establishing) 8GB: tps = 335123.867097 (including connections establishing) tps = 335239.122472 (excluding connections establishing) base + LW_SHARED: 4GB: tps = 296262.164455 (including connections establishing) tps = 296357.524819 (excluding connections establishing) 8GB: tps = 336988.744742 (including connections establishing) tps = 337097.836395 (excluding connections establishing) base + LW_SHARED + freelist: 4GB: tps = 296887.981743 (including connections establishing) tps = 296980.231853 (excluding connections establishing) 8GB: tps = 345049.062898 (including connections establishing) tps = 345161.947055 (excluding connections establishing) I've also run some preliminary tests using scale=3000 - and I couldn't see a performance difference either. Note that all these are noticeably faster than your results. Lwlock_contention patches - client_count=128 -- + 7.95% postgres postgres [.] GetSnapshotData + 3.58% postgres postgres [.] AllocSetAlloc + 2.51% postgres postgres [.] _bt_compare + 2.44% postgres postgres [.] hash_search_with_hash_value + 2.33% postgres [kernel.kallsyms] [k] .__copy_tofrom_user + 2.24% postgres postgres [.] AllocSetFreeIndex + 1.75% postgres postgres [.] pg_atomic_fetch_add_u32_impl Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were the compiler settings you used? Nothing specific, for performance tests where I have to take profiles I use below: ./configure --prefix=installation_path CFLAGS=-fno-omit-frame-pointer make Hah. Doing so overwrites the CFLAGS configure normally sets. Check # CFLAGS are selected so: # If the user specifies something in the environment, that is used. # else: If the template file set something, that is used. # else: If coverage was enabled, don't set anything. # else: If the compiler is GCC, then we use -O2. # else: If the compiler is something else, then we use -O, unless debugging. so, if you do like above, you're compiling without optimizations... So, include at least -O2 as well. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Sat, Oct 11, 2014 at 6:29 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-11 06:18:11 +0530, Amit Kapila wrote: I've run some short tests on hydra: scale 1000: base: 4GB: tps = 296273.004800 (including connections establishing) tps = 296373.978100 (excluding connections establishing) 8GB: tps = 338001.455970 (including connections establishing) tps = 338177.439106 (excluding connections establishing) base + freelist: 4GB: tps = 297057.523528 (including connections establishing) tps = 297156.987418 (excluding connections establishing) 8GB: tps = 335123.867097 (including connections establishing) tps = 335239.122472 (excluding connections establishing) base + LW_SHARED: 4GB: tps = 296262.164455 (including connections establishing) tps = 296357.524819 (excluding connections establishing) 8GB: tps = 336988.744742 (including connections establishing) tps = 337097.836395 (excluding connections establishing) base + LW_SHARED + freelist: 4GB: tps = 296887.981743 (including connections establishing) tps = 296980.231853 (excluding connections establishing) 8GB: tps = 345049.062898 (including connections establishing) tps = 345161.947055 (excluding connections establishing) I've also run some preliminary tests using scale=3000 - and I couldn't see a performance difference either. Note that all these are noticeably faster than your results. What is the client count? Could you please post numbers you are getting for 3000 scale factor for client count 128 and 175? Nothing specific, for performance tests where I have to take profiles I use below: ./configure --prefix=installation_path CFLAGS=-fno-omit-frame-pointer make Hah. Doing so overwrites the CFLAGS configure normally sets. Check # CFLAGS are selected so: # If the user specifies something in the environment, that is used. # else: If the template file set something, that is used. # else: If coverage was enabled, don't set anything. # else: If the compiler is GCC, then we use -O2. # else: If the compiler is something else, then we use -O, unless debugging. so, if you do like above, you're compiling without optimizations... So, include at least -O2 as well. Hmm. okay, but is this required when we do actual performance tests, because for that currently I don't use CFLAGS. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-11 06:49:54 +0530, Amit Kapila wrote: On Sat, Oct 11, 2014 at 6:29 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-11 06:18:11 +0530, Amit Kapila wrote: I've run some short tests on hydra: scale 1000: base: 4GB: tps = 296273.004800 (including connections establishing) tps = 296373.978100 (excluding connections establishing) 8GB: tps = 338001.455970 (including connections establishing) tps = 338177.439106 (excluding connections establishing) base + freelist: 4GB: tps = 297057.523528 (including connections establishing) tps = 297156.987418 (excluding connections establishing) 8GB: tps = 335123.867097 (including connections establishing) tps = 335239.122472 (excluding connections establishing) base + LW_SHARED: 4GB: tps = 296262.164455 (including connections establishing) tps = 296357.524819 (excluding connections establishing) 8GB: tps = 336988.744742 (including connections establishing) tps = 337097.836395 (excluding connections establishing) base + LW_SHARED + freelist: 4GB: tps = 296887.981743 (including connections establishing) tps = 296980.231853 (excluding connections establishing) 8GB: tps = 345049.062898 (including connections establishing) tps = 345161.947055 (excluding connections establishing) I've also run some preliminary tests using scale=3000 - and I couldn't see a performance difference either. Note that all these are noticeably faster than your results. What is the client count? 160, because that was the one you reported the biggest regression. Could you please post numbers you are getting for 3000 scale factor for client count 128 and 175? Yes, although not tonight Also from hydra? Nothing specific, for performance tests where I have to take profiles I use below: ./configure --prefix=installation_path CFLAGS=-fno-omit-frame-pointer make Hah. Doing so overwrites the CFLAGS configure normally sets. Check # CFLAGS are selected so: # If the user specifies something in the environment, that is used. # else: If the template file set something, that is used. # else: If coverage was enabled, don't set anything. # else: If the compiler is GCC, then we use -O2. # else: If the compiler is something else, then we use -O, unless debugging. so, if you do like above, you're compiling without optimizations... So, include at least -O2 as well. Hmm. okay, but is this required when we do actual performance tests, because for that currently I don't use CFLAGS. I'm not sure what you mean? You need to include -O2 in CFLAGS whenever you override it. Your profile was clearly without inlining... And since your general performance numbers are a fair bit lower than what I see with, hopefully, the same code on the same machine... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Sat, Oct 11, 2014 at 7:00 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-11 06:49:54 +0530, Amit Kapila wrote: On Sat, Oct 11, 2014 at 6:29 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-11 06:18:11 +0530, Amit Kapila wrote: I've run some short tests on hydra: Could you please post numbers you are getting for 3000 scale factor for client count 128 and 175? Yes, although not tonight No issues, whenever you get it. Also from hydra? Yes. One more thing I would like to share with you is that while doing tests, there are some other settings change in postgresql.conf maintenance_work_mem = 1GB synchronous_commit = off wal_writer_delay = 20ms checkpoint_segments=256 checkpoint_timeout=35min I don't think these parameters matter for the tests we are doing, but still I thought it is good to share, because I forgot to send some of these non-default settings in previous mail. Nothing specific, for performance tests where I have to take profiles I use below: ./configure --prefix=installation_path CFLAGS=-fno-omit-frame-pointer make Hah. Doing so overwrites the CFLAGS configure normally sets. Check # CFLAGS are selected so: # If the user specifies something in the environment, that is used. # else: If the template file set something, that is used. # else: If coverage was enabled, don't set anything. # else: If the compiler is GCC, then we use -O2. # else: If the compiler is something else, then we use -O, unless debugging. so, if you do like above, you're compiling without optimizations... So, include at least -O2 as well. Hmm. okay, but is this required when we do actual performance tests, because for that currently I don't use CFLAGS. I'm not sure what you mean? You need to include -O2 in CFLAGS whenever you override it. okay, thats what I wanted to ask you, so that we should not see different numbers due to the way code is built. When I do performance tests where I don't want to see profile, I use below statement: ./configure --prefix=installation_path And since your general performance numbers are a fair bit lower than what I see with, hopefully, the same code on the same machine... You have reported numbers at 1000 scale factor and mine were at 3000 scale factor, so I think the difference is expected. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 10/8/14, 8:35 AM, Andres Freund wrote: +#define EXCLUSIVE_LOCK (((uint32) 1) (31 - 1)) + +/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */ +#define SHARED_LOCK_MASK (~EXCLUSIVE_LOCK) There should at least be a comment where we define MAX_BACKENDS about the relationship here... or better yet, validate that MAX_BACKENDS SHARED_LOCK_MASK during postmaster startup. (For those that think that's too pedantic, I'll argue that it's no worse than the patch verifying that MyProc != NULL in LWLockQueueSelf()). +/* + * Internal function that tries to atomically acquire the lwlock in the passed + * in mode. + * + * This function will not block waiting for a lock to become free - that's the + * callers job. + * + * Returns true if the lock isn't free and we need to wait. + * + * When acquiring shared locks it's possible that we disturb an exclusive + * waiter. If that's a problem for the specific user, pass in a valid pointer + * for 'potentially_spurious'. Its value will be set to true if we possibly + * did so. The caller then has to handle that scenario. + */ +static bool +LWLockAttemptLock(LWLock* lock, LWLockMode mode, bool *potentially_spurious) We should invert the return of this function. Current code returns true if the lock is actually acquired (see below), and I think that's true of other locking code as well. IMHO it makes more sense that way, plus consistency is good. (From 9.3) * LWLockConditionalAcquire - acquire a lightweight lock in the specified mode * * If the lock is not available, return FALSE with no side-effects. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-09 16:52:46 -0500, Jim Nasby wrote: On 10/8/14, 8:35 AM, Andres Freund wrote: +#define EXCLUSIVE_LOCK (((uint32) 1) (31 - 1)) + +/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */ +#define SHARED_LOCK_MASK (~EXCLUSIVE_LOCK) There should at least be a comment where we define MAX_BACKENDS about the relationship here... or better yet, validate that MAX_BACKENDS SHARED_LOCK_MASK during postmaster startup. (For those that think that's too pedantic, I'll argue that it's no worse than the patch verifying that MyProc != NULL in LWLockQueueSelf()). If you modify either, you better grep for them... I don't think that's going to happen anyway. Requiring it during startup would mean exposing SHARED_LOCK_MASK outside of lwlock.c which'd be ugly. We could possibly stick a StaticAssert() someplace in lwlock.c. And no, it's not comparable at all to MyProc != NULL - the lwlock code initially *does* run when MyProc isn't setup. We just better not conflict against any other lockers at that stage. +/* + * Internal function that tries to atomically acquire the lwlock in the passed + * in mode. + * + * This function will not block waiting for a lock to become free - that's the + * callers job. + * + * Returns true if the lock isn't free and we need to wait. + * + * When acquiring shared locks it's possible that we disturb an exclusive + * waiter. If that's a problem for the specific user, pass in a valid pointer + * for 'potentially_spurious'. Its value will be set to true if we possibly + * did so. The caller then has to handle that scenario. + */ +static bool +LWLockAttemptLock(LWLock* lock, LWLockMode mode, bool *potentially_spurious) We should invert the return of this function. Current code returns true if the lock is actually acquired (see below), and I think that's true of other locking code as well. IMHO it makes more sense that way, plus consistency is good. I don't think so. I've wondered about it as well, but the way the function is used its more consistent imo if it returns whether we must wait. Note that it's not an exported function. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 10/9/14, 4:57 PM, Andres Freund wrote: If you modify either, you better grep for them... I don't think that's going to happen anyway. Requiring it during startup would mean exposing SHARED_LOCK_MASK outside of lwlock.c which'd be ugly. We could possibly stick a StaticAssert() someplace in lwlock.c. Ahh, yeah, exposing it would be ugly. I just get the heeby-jeebies when I see assumptions like this though. I fear there's a bunch of cases where changing something will break a completely unrelated part of the system with no warning. Maybe add an assert() to check it? And no, it's not comparable at all to MyProc != NULL - the lwlock code initially*does* run when MyProc isn't setup. We just better not conflict against any other lockers at that stage. Ahh, can you maybe add that detail to the comment? That wasn't clear to me. +/* + * Internal function that tries to atomically acquire the lwlock in the passed + * in mode. + * + * This function will not block waiting for a lock to become free - that's the + * callers job. + * + * Returns true if the lock isn't free and we need to wait. + * + * When acquiring shared locks it's possible that we disturb an exclusive + * waiter. If that's a problem for the specific user, pass in a valid pointer + * for 'potentially_spurious'. Its value will be set to true if we possibly + * did so. The caller then has to handle that scenario. + */ +static bool +LWLockAttemptLock(LWLock* lock, LWLockMode mode, bool *potentially_spurious) We should invert the return of this function. Current code returns true if the lock is actually acquired (see below), and I think that's true of other locking code as well. IMHO it makes more sense that way, plus consistency is good. I don't think so. I've wondered about it as well, but the way the function is used its more consistent imo if it returns whether we must wait. Note that it's not an exported function. ISTM that a function attempting a lock would return success, not failure. Even though it's internal now it could certainly be made external at some point in the future. But I suppose it's ultimately a matter of preference... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Wed, Oct 8, 2014 at 7:05 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, Attached you can find the next version of my LW_SHARED patchset. Now that atomics are committed, it seems like a good idea to also add their raison d'être. Since the last public version I have: * Addressed lots of Amit's comments. Thanks! * Peformed a fair amount of testing. * Rebased the code. The volatile removal made that not entirely trivial... * Significantly cleaned up and simplified the code. * Updated comments and such * Fixed a minor bug (unpaired HOLD/RESUME_INTERRUPTS in a corner case) The feature currently consists out of two patches: 1) Convert PGPROC-lwWaitLink into a dlist. The old code was frail and verbose. This also does: * changes the logic in LWLockRelease() to release all shared lockers when waking up any. This can yield some significant performance improvements - and the fairness isn't really much worse than before, as we always allowed new shared lockers to jump the queue. * adds a memory pg_write_barrier() in the wakeup paths between dequeuing and unsetting -lwWaiting. That was always required on weakly ordered machines, but f4077cda2 made it more urgent. I can reproduce crashes without it. 2) Implement the wait free LW_SHARED algorithm. I have done few performance tests for above patches and results of same is as below: Performance Data -- IBM POWER-7 16 cores, 64 hardware threads RAM = 64GB max_connections =210 Database Locale =C checkpoint_segments=256 checkpoint_timeout=35min shared_buffers=8GB Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8) Duration of each individual run = 5mins Test type - read only pgbench with -M prepared Other Related information about test a. This is the data for median of 3 runs, the detailed data of individual run is attached with mail. b. I have applied both the patches to take performance data. Scale Factor - 100 Patch_ver/Client_count 1 8 16 32 64 128 HEAD 13344 106921 196629 295123 377846 333928 PATCH 13662 106179 203960 298955 452638 465671 Scale Factor - 3000 Patch_ver/Client_count 8 16 32 64 128 160 HEAD 86920 152417 231668 280827 257093 255122 PATCH 87552 160313 230677 276186 248609 244372 Observations -- a. The patch performs really well (increase upto ~40%) incase all the data fits in shared buffers (scale factor -100). b. Incase data doesn't fit in shared buffers, but fits in RAM (scale factor -3000), there is performance increase upto 16 client count, however after that it starts dipping (in above config unto ~4.4%). The above data shows that the patch improves performance for cases when there is shared LWLock contention, however there is a slight performance dip in case of Exclusive LWLocks (at scale factor 3000, it needs exclusive LWLocks for buf mapping tables). Now I am not sure if this is the worst case dip or under similar configurations the performance dip can be higher, because the trend shows that dip is increasing with more client counts. Brief Analysis of code w.r.t performance dip - Extra Instructions w.r.t Head in Acquire Exclusive lock path a. Attempt lock twice b. atomic operations for nwaiters in LWLockQueueSelf() and LWLockAcquireCommon() c. Now we need to take spinlock twice, once for self queuing and then again for setting releaseOK. d. few function calls and some extra checks Similarly there seems to be few additional instructions in LWLockRelease() path. Now probably these shouldn't matter much in case backend needs to wait for other Exclusive locker, but I am not sure what else could be the reason for dip in case we need to have Exclusive LWLocks. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com perf_lwlock_contention_data_v1.ods Description: application/vnd.oasis.opendocument.spreadsheet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-06-25 19:06:32 +0530, Amit Kapila wrote: 2. LWLockWakeup() { .. #ifdef LWLOCK_STATS lwstats-spin_delay_count += SpinLockAcquire(lock-mutex); #else SpinLockAcquire(lock-mutex); #endif .. } Earlier while releasing lock, we don't count it towards LWLock stats spin_delay_count. I think if we see other places in lwlock.c, it only gets counted when we try to acquire it in a loop. I think the previous situation was clearly suboptimal. I've now modified things so all spinlock acquirations are counted. 3. LWLockRelease() { .. /* grant permission to run, even if a spurious share lock increases lockcount */ else if (mode == LW_EXCLUSIVE have_waiters) check_waiters = true; /* nobody has this locked anymore, potential exclusive lockers get a chance */ else if (lockcount == 0 have_waiters) check_waiters = true; .. } It seems comments have been reversed in above code. No, they look right. But I've expanded them in the version I'm going to post in a couple minutes. 5. LWLockWakeup() { .. dlist_foreach_modify(iter, (dlist_head *) wakeup) { PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur); LOG_LWDEBUG(LWLockRelease, l, mode, release waiter); dlist_delete(waiter-lwWaitLink); pg_write_barrier(); waiter-lwWaiting = false; PGSemaphoreUnlock(waiter-sem); } .. } Why can't we decrement the nwaiters after waking up? I don't think there is any major problem even if callers do that themselves, but in some rare cases LWLockRelease() might spuriously assume that there are some waiters and tries to call LWLockWakeup(). Although this doesn't create any problem, keeping the value sane is good unless there is some problem in doing so. That won't work because then LWLockWakeup() wouldn't be called when necessary - precisely because nwaiters is 0. 6. LWLockWakeup() { .. dlist_foreach_modify(iter, (dlist_head *) l-waiters) { .. if (wokeup_somebody waiter-lwWaitMode == LW_EXCLUSIVE) continue; .. if (waiter-lwWaitMode != LW_WAIT_UNTIL_FREE) { .. wokeup_somebody = true; } .. } .. } a. IIUC above logic, if the waiter queue is as follows: (S-Shared; X-Exclusive) S X S S S X S S it can skip the exclusive waiters and release shared waiter. If my understanding is right, then I think instead of continue, there should be *break* in above logic. No, it looks correct to me. What happened is that the first S was woken up. So there's no point in waking up an exclusive locker, but further non-exclusive lockers can be woken up. b. Consider below sequence of waiters: (S-Shared; X-Exclusive) S S X S S I think as per un-patched code, it will wakeup waiters uptill (including) first Exclusive, but patch will wake up uptill (*excluding*) first Exclusive. I don't think the current code does that. And it'd be a pretty pointless behaviour, leading to useless increased contention. The only time it'd make sense for X to be woken up is when it gets run faster than the S processes. 7. LWLockWakeup() { .. dlist_foreach_modify(iter, (dlist_head *) l-waiters) { .. dlist_delete(waiter-lwWaitLink); dlist_push_tail(wakeup, waiter-lwWaitLink); .. } .. } Use of dlist has simplified the code, but I think there might be a slight overhead of maintaining wakeup queue as compare to un-patched mechanism especially when there is a long waiter queue. I don't see that as being relevant. The difference is an instruction or two - in the slow path we'll enter the kernel and sleep. This doesn't matter in comparison. And the code is *so* much more readable. 8. LWLockConditionalAcquire() { .. /* * We ran into an exclusive lock and might have blocked another * exclusive lock from taking a shot because it took a time to back * off. Retry till we are either sure we didn't block somebody (because * somebody else certainly has the lock) or till we got it. * * We cannot rely on the two-step lock-acquisition protocol as in * LWLockAcquire because we're not using it. */ if (potentially_spurious) { SPIN_DELAY(); goto retry; } .. } Due to above logic, I think it can keep on retrying for long time before it actually concludes whether it got lock or not incase other backend/'s takes Exclusive lock after *double_check* and release before unconditional increment of shared lock in function LWLockAttemptLock. I understand that it might be difficult to have such a practical scenario, however still there is a theoratical possibility of same. I'm not particularly concerned. We could optimize it a bit, but I really don't think it's necessary. Is there any advantage of retrying in LWLockConditionalAcquire()? It's required for correctness. We only retry if we potentially blocked an exclusive acquirer (by spuriously incrementing/decrementing lockcount with 1). We need to be sure to either get the lock (in which case we can wake up the waiter on release), or be sure that we didn't disturb anyone. 9. LWLockAcquireOrWait()
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-10-08 14:47:44 +0200, Andres Freund wrote: On 2014-06-25 19:06:32 +0530, Amit Kapila wrote: 5. LWLockWakeup() { .. dlist_foreach_modify(iter, (dlist_head *) wakeup) { PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur); LOG_LWDEBUG(LWLockRelease, l, mode, release waiter); dlist_delete(waiter-lwWaitLink); pg_write_barrier(); waiter-lwWaiting = false; PGSemaphoreUnlock(waiter-sem); } .. } Why can't we decrement the nwaiters after waking up? I don't think there is any major problem even if callers do that themselves, but in some rare cases LWLockRelease() might spuriously assume that there are some waiters and tries to call LWLockWakeup(). Although this doesn't create any problem, keeping the value sane is good unless there is some problem in doing so. That won't work because then LWLockWakeup() wouldn't be called when necessary - precisely because nwaiters is 0. Err, this is bogus. Memory fail. The reason I've done so is that it's otherwise much harder to debug issues where there are backend that have been woken up already, but haven't rerun yet. Without this there's simply no evidence of that state. I can't see this being relevant for performance, so I'd rather have it stay that way. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
Hi, Attached you can find the next version of my LW_SHARED patchset. Now that atomics are committed, it seems like a good idea to also add their raison d'être. Since the last public version I have: * Addressed lots of Amit's comments. Thanks! * Peformed a fair amount of testing. * Rebased the code. The volatile removal made that not entirely trivial... * Significantly cleaned up and simplified the code. * Updated comments and such * Fixed a minor bug (unpaired HOLD/RESUME_INTERRUPTS in a corner case) The feature currently consists out of two patches: 1) Convert PGPROC-lwWaitLink into a dlist. The old code was frail and verbose. This also does: * changes the logic in LWLockRelease() to release all shared lockers when waking up any. This can yield some significant performance improvements - and the fairness isn't really much worse than before, as we always allowed new shared lockers to jump the queue. * adds a memory pg_write_barrier() in the wakeup paths between dequeuing and unsetting -lwWaiting. That was always required on weakly ordered machines, but f4077cda2 made it more urgent. I can reproduce crashes without it. 2) Implement the wait free LW_SHARED algorithm. Personally I'm quite happy with the new state. I think it needs more review, but I personally don't know of anything that needs changing. There's lots of further improvements that could be done, but let's get this in first. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 6885a15cc6f2e193ff575a4463d90ad252d74f5e Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Tue, 7 Oct 2014 15:32:34 +0200 Subject: [PATCH 1/2] Convert the PGPROC-lwWaitLink list into a dlist instead of open coding it. Besides being shorter and much easier to read it: * changes the logic in LWLockRelease() to release all shared lockers when waking up any. This can yield some significant performance improvements - and the fairness isn't really much worse than before, as we always allowed new shared lockers to jump the queue. * adds a memory pg_write_barrier() in the wakeup paths between dequeuing and unsetting -lwWaiting. That was always required on weakly ordered machines, but f4077cda2 made it more urgent. Author: Andres Freund --- src/backend/access/transam/twophase.c | 1 - src/backend/storage/lmgr/lwlock.c | 151 +- src/backend/storage/lmgr/proc.c | 2 - src/include/storage/lwlock.h | 5 +- src/include/storage/proc.h| 3 +- 5 files changed, 60 insertions(+), 102 deletions(-) diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index d5409a6..6401943 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -389,7 +389,6 @@ MarkAsPreparing(TransactionId xid, const char *gid, proc-roleId = owner; proc-lwWaiting = false; proc-lwWaitMode = 0; - proc-lwWaitLink = NULL; proc-waitLock = NULL; proc-waitProcLock = NULL; for (i = 0; i NUM_LOCK_PARTITIONS; i++) diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index 9fe6855..e6f9158 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -35,6 +35,7 @@ #include miscadmin.h #include pg_trace.h #include replication/slot.h +#include storage/barrier.h #include storage/ipc.h #include storage/predicate.h #include storage/proc.h @@ -115,9 +116,9 @@ inline static void PRINT_LWDEBUG(const char *where, const LWLock *lock) { if (Trace_lwlocks) - elog(LOG, %s(%s %d): excl %d shared %d head %p rOK %d, + elog(LOG, %s(%s %d): excl %d shared %d rOK %d, where, T_NAME(lock), T_ID(lock), - (int) lock-exclusive, lock-shared, lock-head, + (int) lock-exclusive, lock-shared, (int) lock-releaseOK); } @@ -475,8 +476,7 @@ LWLockInitialize(LWLock *lock, int tranche_id) lock-exclusive = 0; lock-shared = 0; lock-tranche = tranche_id; - lock-head = NULL; - lock-tail = NULL; + dlist_init(lock-waiters); } @@ -615,12 +615,7 @@ LWLockAcquireCommon(LWLock *lock, LWLockMode mode, uint64 *valptr, uint64 val) proc-lwWaiting = true; proc-lwWaitMode = mode; - proc-lwWaitLink = NULL; - if (lock-head == NULL) - lock-head = proc; - else - lock-tail-lwWaitLink = proc; - lock-tail = proc; + dlist_push_head(lock-waiters, proc-lwWaitLink); /* Can release the mutex now */ SpinLockRelease(lock-mutex); @@ -836,12 +831,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode) proc-lwWaiting = true; proc-lwWaitMode = LW_WAIT_UNTIL_FREE; - proc-lwWaitLink = NULL; - if (lock-head == NULL) - lock-head = proc; - else - lock-tail-lwWaitLink = proc; - lock-tail = proc; + dlist_push_head(lock-waiters, proc-lwWaitLink); /* Can release the mutex now */ SpinLockRelease(lock-mutex);
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund and...@2ndquadrant.com wrote: I don't see that as being relevant. The difference is an instruction or two - in the slow path we'll enter the kernel and sleep. This doesn't matter in comparison. And the code is *so* much more readable. I find the slist/dlist stuff actually quite difficult to get right compared to a hand-rolled linked list. But the really big problem is that the debugger can't do anything useful with it. You have to work out the structure-member offset in order to walk the list and manually cast to char *, adjust the pointer, and cast back. That sucks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-10-08 13:13:33 -0400, Robert Haas wrote: On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund and...@2ndquadrant.com wrote: I don't see that as being relevant. The difference is an instruction or two - in the slow path we'll enter the kernel and sleep. This doesn't matter in comparison. And the code is *so* much more readable. I find the slist/dlist stuff actually quite difficult to get right compared to a hand-rolled linked list. Really? I've spent more than a day debugging things with the current code. And Heikki introduced a bug in it. If you look at how the code looks before/after I find the difference pretty clear. But the really big problem is that the debugger can't do anything useful with it. You have to work out the structure-member offset in order to walk the list and manually cast to char *, adjust the pointer, and cast back. That sucks. Hm. I can just do that with the debugger here. Not sure if that's because I added the right thing to my .gdbinit or because I use the correct compiler flags. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
Robert Haas wrote: On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund and...@2ndquadrant.com wrote: I don't see that as being relevant. The difference is an instruction or two - in the slow path we'll enter the kernel and sleep. This doesn't matter in comparison. And the code is *so* much more readable. I find the slist/dlist stuff actually quite difficult to get right compared to a hand-rolled linked list. But the really big problem is that the debugger can't do anything useful with it. You have to work out the structure-member offset in order to walk the list and manually cast to char *, adjust the pointer, and cast back. That sucks. As far as I recall you can get gdb to understand those pointer games by defining some structs or macros. Maybe we can improve by documenting this. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-10-08 14:23:44 -0300, Alvaro Herrera wrote: Robert Haas wrote: On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund and...@2ndquadrant.com wrote: I don't see that as being relevant. The difference is an instruction or two - in the slow path we'll enter the kernel and sleep. This doesn't matter in comparison. And the code is *so* much more readable. I find the slist/dlist stuff actually quite difficult to get right compared to a hand-rolled linked list. But the really big problem is that the debugger can't do anything useful with it. You have to work out the structure-member offset in order to walk the list and manually cast to char *, adjust the pointer, and cast back. That sucks. As far as I recall you can get gdb to understand those pointer games by defining some structs or macros. Maybe we can improve by documenting this. So, what makes it work for me (among other unrelated stuff) seems to be the following in .gdbinit, defineing away some things that gdb doesn't handle: macro define __builtin_offsetof(T, F) ((int) (((T *) 0)-F)) macro define __extension__ macro define AssertVariableIsOfTypeMacro(x, y) ((void)0) Additionally I have -ggdb -g3 in CFLAGS. That way gdb knows about postgres' macros. At least if you're in the right scope. As an example, the following works: (gdb) p dlist_is_empty(BackendList) ? NULL : dlist_head_element(Backend, elem, BackendList) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-08 15:23:22 -0400, Robert Haas wrote: On Wed, Oct 8, 2014 at 9:35 AM, Andres Freund and...@2ndquadrant.com wrote: 1) Convert PGPROC-lwWaitLink into a dlist. The old code was frail and verbose. This also does: * changes the logic in LWLockRelease() to release all shared lockers when waking up any. This can yield some significant performance improvements - and the fairness isn't really much worse than before, as we always allowed new shared lockers to jump the queue. * adds a memory pg_write_barrier() in the wakeup paths between dequeuing and unsetting -lwWaiting. That was always required on weakly ordered machines, but f4077cda2 made it more urgent. I can reproduce crashes without it. I think it's a really bad idea to mix a refactoring change (like converting PGPROC-lwWaitLink into a dlist) with an attempted performance enhancement (like changing the rules for jumping the lock queue) and a bug fix (like adding pg_write_barrier where needed). I'd suggest that the last of those be done first, and perhaps back-patched. I think it makes sense to separate out the write barrier one. I don't really see the point of separating the other two changes. I've indeed previously started a thread (http://archives.postgresql.org/message-id/20140210134625.GA15246%40awork2.anarazel.de) about the barrier issue. IIRC you argued that that might be to expensive. The current coding, using a hand-rolled list, touches shared memory fewer times. When many waiters are awoken at once, we clip them all out of the list at one go. Your revision moves them to a backend-private list one at a time, and then pops them off one at a time. The backend-private memory accesses don't seem like they matter much, but the shared memory accesses would be nice to avoid. I can't imagine this to matter. We're entering the kernel for each PROC for the PGSemaphoreUnlock() and we're dirtying the cacheline for proc-lwWaiting = false anyway. This really is the slow path. Does LWLockUpdateVar's wake-up loop need a write barrier per iteration, or just one before the loop starts? How about commenting the pg_write_barrier() with the read-fence to which it pairs? Hm. Are you picking out LWLockUpdateVar for a reason or just as an example? Because I don't see a difference between the different wakeup loops? It needs to be a barrier per iteration. Currently the loop looks like while (head != NULL) { proc = head; head = proc-lwWaitLink; proc-lwWaitLink = NULL; proc-lwWaiting = false; PGSemaphoreUnlock(proc-sem); } Consider what happens if either the compiler or the cpu reorders this to: proc-lwWaiting = false; head = proc-lwWaitLink; proc-lwWaitLink = NULL; PGSemaphoreUnlock(proc-sem); as soon as lwWaiting = false, 'proc' can wake up and acquire a new lock. Backends can wake up prematurely because proc-sem is used for other purposes than this (that's why the loops around PGSemaphoreLock exist). Then it could reset lwWaitLink while acquiring a new lock. And some processes wouldn't be woken up anymore. The barrier it pairs with is the spinlock acquiration before requeuing. To be more obviously correct we could add a read barrier before if (!proc-lwWaiting) break; but I don't think it's needed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Wed, Oct 8, 2014 at 9:35 AM, Andres Freund and...@2ndquadrant.com wrote: 2) Implement the wait free LW_SHARED algorithm. + * too high for workloads/locks that were locked in shared mode very s/locked/taken/? + * frequently. Often we were spinning in the (obviously exlusive) spinlock, exclusive. + * acquiration for locks that aren't exclusively locked. acquisition. + * For exlusive lock acquisition we use an atomic compare-and-exchange on the exclusive. + * lockcount variable swapping in EXCLUSIVE_LOCK/131-1/0x7FFF if and only Add comma after variable. Find some way of describing the special value (maybe a sentinel value, EXCLUSIVE_LOCK) just once, instead of three times. + * if the current value of lockcount is 0. If the swap was not successfull, we successful. + * by 1 again. If so, we have to wait for the exlusive locker to release the exclusive. + * The attentive reader probably might have noticed that naively doing the probably might is redundant. Delete probably. + * notice that we have to wait. Unfortunately until we have finished queuing, until - by the time + * Phase 2: Add us too the waitqueue of the lock too - to. And maybe us - ourselves. + *get queued in Phase 2 and we can wake them up if neccessary or they will necessary. + * When acquiring shared locks it's possible that we disturb an exclusive + * waiter. If that's a problem for the specific user, pass in a valid pointer + * for 'potentially_spurious'. Its value will be set to true if we possibly + * did so. The caller then has to handle that scenario. disturb is not clear enough. +/* yipeyyahee */ Although this will be clear to individuals with a good command of English, I suggest avoiding such usages. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Wed, Oct 8, 2014 at 2:04 PM, Andres Freund and...@2ndquadrant.com wrote: So, what makes it work for me (among other unrelated stuff) seems to be the following in .gdbinit, defineing away some things that gdb doesn't handle: macro define __builtin_offsetof(T, F) ((int) (((T *) 0)-F)) macro define __extension__ macro define AssertVariableIsOfTypeMacro(x, y) ((void)0) Additionally I have -ggdb -g3 in CFLAGS. That way gdb knows about postgres' macros. At least if you're in the right scope. As an example, the following works: (gdb) p dlist_is_empty(BackendList) ? NULL : dlist_head_element(Backend, elem, BackendList) Ah, cool. I'll try that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
Hi, Over at -performance Mark Kirkwood tested a recent version of this (http://archives.postgresql.org/message-id/53B283F3.7020005%40catalyst.net.nz) . I thought it's interesting to add the numbers to this thread: Test: pgbench Options: scale 500 read only Os: Ubuntu 14.04 Pg: 9.3.4 Pg Options: max_connections = 200 shared_buffers = 10GB maintenance_work_mem = 1GB effective_io_concurrency = 10 wal_buffers = 32MB checkpoint_segments = 192 checkpoint_completion_target = 0.8 Results Clients | 9.3 tps 32 cores | 9.3 tps 60 cores +--+- 6 | 70400 | 71028 12 | 98918 | 129140 24 | 230345 | 240631 48 | 324042 | 409510 96 | 346929 | 120464 192 | 312621 | 92663 So we have anti scaling with 60 cores as we increase the client connections. Ouch! A level of urgency led to trying out Andres's 'rwlock' 9.4 branch [1] - cherry picking the last 5 commits into 9.4 branch and building a package from that and retesting: Clients | 9.4 tps 60 cores (rwlock) +-- 6 | 70189 12 | 128894 24 | 233542 48 | 422754 96 | 590796 192 | 630672 Now, this is a bit of a skewed comparison due to 9.4 vs. 9.3 but still interesting. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
On 07/01/2014 01:08 PM, Andres Freund wrote: Hi, Over at -performance Mark Kirkwood tested a recent version of this (http://archives.postgresql.org/message-id/53B283F3.7020005%40catalyst.net.nz) . I thought it's interesting to add the numbers to this thread: Test: pgbench Options: scale 500 read only Os: Ubuntu 14.04 Pg: 9.3.4 Pg Options: max_connections = 200 shared_buffers = 10GB maintenance_work_mem = 1GB effective_io_concurrency = 10 wal_buffers = 32MB checkpoint_segments = 192 checkpoint_completion_target = 0.8 Results Clients | 9.3 tps 32 cores | 9.3 tps 60 cores +--+- 6 | 70400 | 71028 12 | 98918 | 129140 24 | 230345 | 240631 48 | 324042 | 409510 96 | 346929 | 120464 192 | 312621 | 92663 So we have anti scaling with 60 cores as we increase the client connections. Ouch! A level of urgency led to trying out Andres's 'rwlock' 9.4 branch [1] - cherry picking the last 5 commits into 9.4 branch and building a package from that and retesting: Clients | 9.4 tps 60 cores (rwlock) +-- 6 | 70189 12 | 128894 24 | 233542 48 | 422754 96 | 590796 192 | 630672 Now, this is a bit of a skewed comparison due to 9.4 vs. 9.3 but still interesting. It looks like the issue I reported here: http://www.postgresql.org/message-id/5190e17b.9060...@vmware.com fixed by this commit: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b03d196be055450c7260749f17347c2d066b4254. So, definitely need to compare plain 9.4 vs patched 9.4, not 9.3. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
On 01/07/14 23:25, Heikki Linnakangas wrote: On 07/01/2014 01:08 PM, Andres Freund wrote: Hi, Over at -performance Mark Kirkwood tested a recent version of this (http://archives.postgresql.org/message-id/53B283F3.7020005%40catalyst.net.nz) . I thought it's interesting to add the numbers to this thread: Test: pgbench Options: scale 500 read only Os: Ubuntu 14.04 Pg: 9.3.4 Pg Options: max_connections = 200 shared_buffers = 10GB maintenance_work_mem = 1GB effective_io_concurrency = 10 wal_buffers = 32MB checkpoint_segments = 192 checkpoint_completion_target = 0.8 Results Clients | 9.3 tps 32 cores | 9.3 tps 60 cores +--+- 6 | 70400 | 71028 12 | 98918 | 129140 24 | 230345 | 240631 48 | 324042 | 409510 96 | 346929 | 120464 192 | 312621 | 92663 So we have anti scaling with 60 cores as we increase the client connections. Ouch! A level of urgency led to trying out Andres's 'rwlock' 9.4 branch [1] - cherry picking the last 5 commits into 9.4 branch and building a package from that and retesting: Clients | 9.4 tps 60 cores (rwlock) +-- 6 | 70189 12 | 128894 24 | 233542 48 | 422754 96 | 590796 192 | 630672 Now, this is a bit of a skewed comparison due to 9.4 vs. 9.3 but still interesting. It looks like the issue I reported here: http://www.postgresql.org/message-id/5190e17b.9060...@vmware.com fixed by this commit: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b03d196be055450c7260749f17347c2d066b4254. So, definitely need to compare plain 9.4 vs patched 9.4, not 9.3. Here's plain 9.4 vs patched 9.4: Clients | 9.4 tps 60 cores | 9.4 tps 60 cores (rwlock) +--+-- 6 | 69490 | 70189 12 | 128200 | 128894 24 | 232243 | 233542 48 | 417689 | 422754 96 | 464037 | 590796 192 | 418252 | 630672 It appears that plain 9.4 does not exhibit the dramatic anti scaling that 9.3 showed, but there is still evidence of some contention in the higher client numbers, and we peak at the 96 client mark. The patched variant looks pretty much free from this, still scaling at 192 connections (might have been interesting to try more, but had max_connections set to 200)! Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Tue, Jun 24, 2014 at 9:33 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Jun 23, 2014 at 9:12 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-23 19:59:10 +0530, Amit Kapila wrote: 7. LWLockWaitForVar() { .. /* * Add myself to wait queue. Note that this is racy, somebody else * could wakeup before we're finished queuing. * NB: We're using nearly the same twice-in-a-row lock acquisition * protocol as LWLockAcquire(). Check its comments for details. */ LWLockQueueSelf(l, LW_WAIT_UNTIL_FREE); /* we're now guaranteed to be woken up if necessary */ mustwait = LWLockAttemptLock(l, LW_EXCLUSIVE, false, potentially_spurious); } Why is it important to Attempt lock after queuing in this case, can't we just re-check exclusive lock as done before queuing? Well, that's how Heikki designed LWLockWaitForVar(). In that case I might be missing some point here, un-patched code of LWLockWaitForVar() never tries to acquire the lock, but the new code does so. Basically I am not able to think what is the problem if we just do below after queuing: mustwait = pg_atomic_read_u32(lock-lockcount) != 0; Could you please explain what is the problem in just rechecking? I have further reviewed the lwlock related changes and thought its good to share my findings with you. This completes my initial review for lwlock related changes and below are my findings: 1. LWLockRelease() { .. TRACE_POSTGRESQL_LWLOCK_RELEASE(T_NAME(l), T_ID(l)); } Dynamic tracing macro seems to be omitted from LWLockRelease() call. 2. LWLockWakeup() { .. #ifdef LWLOCK_STATS lwstats-spin_delay_count += SpinLockAcquire(lock-mutex); #else SpinLockAcquire(lock-mutex); #endif .. } Earlier while releasing lock, we don't count it towards LWLock stats spin_delay_count. I think if we see other places in lwlock.c, it only gets counted when we try to acquire it in a loop. 3. LWLockRelease() { .. /* grant permission to run, even if a spurious share lock increases lockcount */ else if (mode == LW_EXCLUSIVE have_waiters) check_waiters = true; /* nobody has this locked anymore, potential exclusive lockers get a chance */ else if (lockcount == 0 have_waiters) check_waiters = true; .. } It seems comments have been reversed in above code. 4. LWLockWakeup() { .. dlist_foreach_modify(iter, (dlist_head *) l-waiters) .. } Shouldn't we need to use volatile variable in above loop (lock instead of l)? 5. LWLockWakeup() { .. dlist_foreach_modify(iter, (dlist_head *) wakeup) { PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur); LOG_LWDEBUG(LWLockRelease, l, mode, release waiter); dlist_delete(waiter-lwWaitLink); pg_write_barrier(); waiter-lwWaiting = false; PGSemaphoreUnlock(waiter-sem); } .. } Why can't we decrement the nwaiters after waking up? I don't think there is any major problem even if callers do that themselves, but in some rare cases LWLockRelease() might spuriously assume that there are some waiters and tries to call LWLockWakeup(). Although this doesn't create any problem, keeping the value sane is good unless there is some problem in doing so. 6. LWLockWakeup() { .. dlist_foreach_modify(iter, (dlist_head *) l-waiters) { .. if (wokeup_somebody waiter-lwWaitMode == LW_EXCLUSIVE) continue; .. if (waiter-lwWaitMode != LW_WAIT_UNTIL_FREE) { .. wokeup_somebody = true; } .. } .. } a. IIUC above logic, if the waiter queue is as follows: (S-Shared; X-Exclusive) S X S S S X S S it can skip the exclusive waiters and release shared waiter. If my understanding is right, then I think instead of continue, there should be *break* in above logic. b. Consider below sequence of waiters: (S-Shared; X-Exclusive) S S X S S I think as per un-patched code, it will wakeup waiters uptill (including) first Exclusive, but patch will wake up uptill (*excluding*) first Exclusive. 7. LWLockWakeup() { .. dlist_foreach_modify(iter, (dlist_head *) l-waiters) { .. dlist_delete(waiter-lwWaitLink); dlist_push_tail(wakeup, waiter-lwWaitLink); .. } .. } Use of dlist has simplified the code, but I think there might be a slight overhead of maintaining wakeup queue as compare to un-patched mechanism especially when there is a long waiter queue. 8. LWLockConditionalAcquire() { .. /* * We ran into an exclusive lock and might have blocked another * exclusive lock from taking a shot because it took a time to back * off. Retry till we are either sure we didn't block somebody (because * somebody else certainly has the lock) or till we got it. * * We cannot rely on the two-step lock-acquisition protocol as in * LWLockAcquire because we're not using it. */ if (potentially_spurious) { SPIN_DELAY(); goto retry; } .. } Due to above logic, I think it can keep on retrying for long time before it actually concludes whether it got lock or not incase other backend/'s takes Exclusive lock after *double_check* and release before unconditional increment of shared lock in function
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Tue, Jun 17, 2014 at 8:56 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-17 20:47:51 +0530, Amit Kapila wrote: On Tue, Jun 17, 2014 at 6:35 PM, Andres Freund and...@2ndquadrant.com wrote: You have followed it pretty well as far as I can understand from your replies, as there is no reproducible test (which I think is bit tricky to prepare), so it becomes difficult to explain by theory. I'm working an updated patch that moves the releaseOK into the spinlocks. Maybe that's the problem already - it's certainly not correct as is. Sure, I will do the test/performance test with updated patch; you might want to include some more changes based on comments in mail below: You are right, it will wakeup the existing waiters, but I think the new logic has one difference which is that it can allow the backend to take Exclusive lock when there are already waiters in queue. As per above example even though Session-2 and Session-3 are in wait queue, Session-4 will be able to acquire Exclusive lock which I think was previously not possible. I think that was previously possible as well - in a slightly different set of circumstances though. After a process releases a lock and wakes up some of several waiters another process can come in and acquire the lock. Before the woken up process gets scheduled again. lwlocks aren't fair locks... Okay, but I think changing behaviour for lwlocks might impact some tests/applications. As they are not fair, I think defining exact behaviour is not easy and we don't have any concrete scenario which can be effected, so there should not be problem in accepting slightly different behaviour. Few more comments: 1. LWLockAcquireCommon() { .. iterations++; } In current logic, I could not see any use of these *iterations* variable. 2. LWLockAcquireCommon() { .. if (!LWLockDequeueSelf(l)) { /* * Somebody else dequeued us and has or will.. .. */ for (;;) { PGSemaphoreLock(proc-sem, false); if (!proc-lwWaiting) break; extraWaits++; } lock-releaseOK = true; } Do we want to set result = false; after waking in above code? The idea behind setting false is to indicate whether we get the lock immediately or not which previously was decided based on if it needs to queue itself? 3. LWLockAcquireCommon() { .. /* * Ok, at this point we couldn't grab the lock on the first try. We * cannot simply queue ourselves to the end of the list and wait to be * woken up because by now the lock could long have been released. * Instead add us to the queue and try to grab the lock again. If we * suceed we need to revert the queuing and be happy, otherwise we * recheck the lock. If we still couldn't grab it, we know that the * other lock will see our queue entries when releasing since they * existed before we checked for the lock. */ /* add to the queue */ LWLockQueueSelf(l, mode); /* we're now guaranteed to be woken up if necessary */ mustwait = LWLockAttemptLock(l, mode, false, potentially_spurious); .. } a. By reading above code and comments, it is not quite clear why second attempt is important unless somebody thinks on it or refer your comments in *Notes* section at top of file. I think it's better to indicate in some way so that code reader can refer to Notes section or whereever you are planing to keep those comments. b. There is typo in above comment suceed/succeed. 4. LWLockAcquireCommon() { .. if (!LWLockDequeueSelf(l)) { for (;;) { PGSemaphoreLock(proc-sem, false); if (!proc-lwWaiting) break; extraWaits++; } lock-releaseOK = true; .. } Setting releaseOK in above context might not be required because if the control comes in this part of code, it will not retry to acquire another time. 5. LWLockWaitForVar() { .. else mustwait = false; if (!mustwait) break; .. } I think we can directly break in else part in above code. 6. LWLockWaitForVar() { .. /* * Quick test first to see if it the slot is free right now. * * XXX: the caller uses a spinlock before this,... */ if (pg_atomic_read_u32(lock-lockcount) == 0) return true; } Does the part of comment that refers to spinlock is still relevant after using atomic ops? 7. LWLockWaitForVar() { .. /* * Add myself to wait queue. Note that this is racy, somebody else * could wakeup before we're finished queuing. * NB: We're using nearly the same twice-in-a-row lock acquisition * protocol as LWLockAcquire(). Check its comments for details. */ LWLockQueueSelf(l, LW_WAIT_UNTIL_FREE); /* we're now guaranteed to be woken up if necessary */ mustwait = LWLockAttemptLock(l, LW_EXCLUSIVE, false, potentially_spurious); } Why is it important to Attempt lock after queuing in this case, can't we just re-check exclusive lock as done before queuing? 8. LWLockWaitForVar() { .. PRINT_LWDEBUG(LWLockAcquire undo queue, lock, mode); break; } else { PRINT_LWDEBUG(LWLockAcquire waiting 4, lock, mode); } .. } a. I think instead of LWLockAcquire, here we should use LWLockWaitForVar b. Isn't it better
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-06-23 19:59:10 +0530, Amit Kapila wrote: On Tue, Jun 17, 2014 at 8:56 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-17 20:47:51 +0530, Amit Kapila wrote: On Tue, Jun 17, 2014 at 6:35 PM, Andres Freund and...@2ndquadrant.com wrote: You have followed it pretty well as far as I can understand from your replies, as there is no reproducible test (which I think is bit tricky to prepare), so it becomes difficult to explain by theory. I'm working an updated patch that moves the releaseOK into the spinlocks. Maybe that's the problem already - it's certainly not correct as is. Sure, I will do the test/performance test with updated patch; you might want to include some more changes based on comments in mail below: I'm nearly finished in cleaning up the atomics part of the patch which also includes a bit of cleanup of the lwlocks code. Few more comments: 1. LWLockAcquireCommon() { .. iterations++; } In current logic, I could not see any use of these *iterations* variable. It's useful for debugging. Should be gone in the final code. 2. LWLockAcquireCommon() { .. if (!LWLockDequeueSelf(l)) { /* * Somebody else dequeued us and has or will.. .. */ for (;;) { PGSemaphoreLock(proc-sem, false); if (!proc-lwWaiting) break; extraWaits++; } lock-releaseOK = true; } Do we want to set result = false; after waking in above code? The idea behind setting false is to indicate whether we get the lock immediately or not which previously was decided based on if it needs to queue itself? Hm. I don't think it's clear which version is better. 3. LWLockAcquireCommon() { .. /* * Ok, at this point we couldn't grab the lock on the first try. We * cannot simply queue ourselves to the end of the list and wait to be * woken up because by now the lock could long have been released. * Instead add us to the queue and try to grab the lock again. If we * suceed we need to revert the queuing and be happy, otherwise we * recheck the lock. If we still couldn't grab it, we know that the * other lock will see our queue entries when releasing since they * existed before we checked for the lock. */ /* add to the queue */ LWLockQueueSelf(l, mode); /* we're now guaranteed to be woken up if necessary */ mustwait = LWLockAttemptLock(l, mode, false, potentially_spurious); .. } a. By reading above code and comments, it is not quite clear why second attempt is important unless somebody thinks on it or refer your comments in *Notes* section at top of file. I think it's better to indicate in some way so that code reader can refer to Notes section or whereever you are planing to keep those comments. Ok. b. There is typo in above comment suceed/succeed. Thanks, fixed. 4. LWLockAcquireCommon() { .. if (!LWLockDequeueSelf(l)) { for (;;) { PGSemaphoreLock(proc-sem, false); if (!proc-lwWaiting) break; extraWaits++; } lock-releaseOK = true; .. } Setting releaseOK in above context might not be required because if the control comes in this part of code, it will not retry to acquire another time. Hm. You're probably right. 5. LWLockWaitForVar() { .. else mustwait = false; if (!mustwait) break; .. } I think we can directly break in else part in above code. Well, there's another case of mustwait=false above which is triggered while the spinlock is held. Don't think it'd get simpler. 6. LWLockWaitForVar() { .. /* * Quick test first to see if it the slot is free right now. * * XXX: the caller uses a spinlock before this,... */ if (pg_atomic_read_u32(lock-lockcount) == 0) return true; } Does the part of comment that refers to spinlock is still relevant after using atomic ops? Yes. pg_atomic_read_u32() isn't a memory barrier (and explicitly documented not to be). 7. LWLockWaitForVar() { .. /* * Add myself to wait queue. Note that this is racy, somebody else * could wakeup before we're finished queuing. * NB: We're using nearly the same twice-in-a-row lock acquisition * protocol as LWLockAcquire(). Check its comments for details. */ LWLockQueueSelf(l, LW_WAIT_UNTIL_FREE); /* we're now guaranteed to be woken up if necessary */ mustwait = LWLockAttemptLock(l, LW_EXCLUSIVE, false, potentially_spurious); } Why is it important to Attempt lock after queuing in this case, can't we just re-check exclusive lock as done before queuing? Well, that's how Heikki designed LWLockWaitForVar(). 8. LWLockWaitForVar() { .. PRINT_LWDEBUG(LWLockAcquire undo queue, lock, mode); break; } else { PRINT_LWDEBUG(LWLockAcquire waiting 4, lock, mode); } .. } a. I think instead of LWLockAcquire, here we should use LWLockWaitForVar right. b. Isn't it better to use LOG_LWDEBUG instead of PRINT_LWDEBUG(), as PRINT_LWDEBUG() is generally used in file at entry of functions to log info about locks? Fine with me. 9. LWLockUpdateVar() { /*
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Mon, Jun 23, 2014 at 9:12 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-23 19:59:10 +0530, Amit Kapila wrote: On Tue, Jun 17, 2014 at 8:56 PM, Andres Freund and...@2ndquadrant.com wrote: 2. LWLockAcquireCommon() { .. if (!LWLockDequeueSelf(l)) { /* * Somebody else dequeued us and has or will.. .. */ for (;;) { PGSemaphoreLock(proc-sem, false); if (!proc-lwWaiting) break; extraWaits++; } lock-releaseOK = true; } Do we want to set result = false; after waking in above code? The idea behind setting false is to indicate whether we get the lock immediately or not which previously was decided based on if it needs to queue itself? Hm. I don't think it's clear which version is better. I thought if we get the lock at first attempt, then result should be true which seems to be clear, but for the case of second attempt you are right that it's not clear. In such a case, I think we can go either way and then later during tests or otherwise if any problem is discovered, we can revert it. 7. LWLockWaitForVar() { .. /* * Add myself to wait queue. Note that this is racy, somebody else * could wakeup before we're finished queuing. * NB: We're using nearly the same twice-in-a-row lock acquisition * protocol as LWLockAcquire(). Check its comments for details. */ LWLockQueueSelf(l, LW_WAIT_UNTIL_FREE); /* we're now guaranteed to be woken up if necessary */ mustwait = LWLockAttemptLock(l, LW_EXCLUSIVE, false, potentially_spurious); } Why is it important to Attempt lock after queuing in this case, can't we just re-check exclusive lock as done before queuing? Well, that's how Heikki designed LWLockWaitForVar(). In that case I might be missing some point here, un-patched code of LWLockWaitForVar() never tries to acquire the lock, but the new code does so. Basically I am not able to think what is the problem if we just do below after queuing: mustwait = pg_atomic_read_u32(lock-lockcount) != 0; Could you please explain what is the problem in just rechecking? I think both are actually critical for performance... Otherwise even a only lightly contended lock would require scheduler activity when a processes tries to lock something twice. Given the frequency we acquire some locks with that'd be disastrous... Do you have any suggestion how both behaviours can be retained? Not sure what you mean. I just wanted to say that current behaviour of releaseOK seems to be of use for some cases and if you want to change it, then would it retain the current behaviour we get by releaseOK? I understand that till now your patch has not changed anything specific to releaseOK, but by above discussion I got the impression that you are planing to change it, that's why I had asked above question. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Mon, Jun 23, 2014 at 9:12 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-23 19:59:10 +0530, Amit Kapila wrote: 12. #ifdef LWLOCK_DEBUG lock-owner = MyProc; #endif Shouldn't it be reset in LWLockRelease? That's actually intentional. It's quite useful to know the last owner when debugging lwlock code. Won't it cause any problem if the last owner process exits? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Fri, May 23, 2014 at 10:01 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Jan 31, 2014 at 3:24 PM, Andres Freund and...@2ndquadrant.com wrote: I've pushed a rebased version of the patchset to http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git branch rwlock contention. 220b34331f77effdb46798ddd7cca0cffc1b2858 actually was the small problem, ea9df812d8502fff74e7bc37d61bdc7d66d77a7f was the major PITA. As per discussion in developer meeting, I wanted to test shared buffer scaling patch with this branch. I am getting merge conflicts as per HEAD. Could you please get it resolved, so that I can get the data. I have started looking into this patch and have few questions/ findings which are shared below: 1. I think stats for lwstats-ex_acquire_count will be counted twice, first it is incremented in LWLockAcquireCommon() and then in LWLockAttemptLock() 2. Handling of potentialy_spurious case seems to be pending in LWLock functions like LWLockAcquireCommon(). LWLockAcquireCommon() { .. /* add to the queue */ LWLockQueueSelf(l, mode); /* we're now guaranteed to be woken up if necessary */ mustwait = LWLockAttemptLock(l, mode, false, potentially_spurious); } I think it can lead to some problems, example: Session -1 1. Acquire Exclusive LWlock Session -2 1. Acquire Shared LWlock 1a. Unconditionally incrementing shared count by session-2 Session -1 2. Release Exclusive lock Session -3 1. Acquire Exclusive LWlock It will put itself to wait queue by seeing the lock count incremented by Session-2 Session-2 1b. Decrement the shared count and add it to wait queue. Session-4 1. Acquire Exclusive lock This session will get the exclusive lock, because even though other lockers are waiting, lockcount is zero. Session-2 2. Try second time to take shared lock, it won't get as session-4 already has an exclusive lock, so it will start waiting Session-4 2. Release Exclusive lock it will not wake the waiters because waiters have been added before acquiring this lock. So in above scenario, Session-3 and Session-2 are waiting in queue with nobody to awake them. I have not reproduced the exact scenario above, so I might be missing some thing which will not lead to above situation. 3. LWLockAcquireCommon() { for (;;) { PGSemaphoreLock(proc-sem, false); if (!proc-lwWaiting) .. } proc-lwWaiting is checked, updated without spinklock where as previously it was done under spinlock, won't it be unsafe? 4. LWLockAcquireCommon() { .. for (;;) { /* false means cannot accept cancel/die interrupt here. */ PGSemaphoreLock(proc-sem, false); if (!proc-lwWaiting) break; extraWaits++; } lock-releaseOK = true; } lock-releaseOK is updated/checked without spinklock where as previously it was done under spinlock, won't it be unsafe? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-06-17 12:41:26 +0530, Amit Kapila wrote: On Fri, May 23, 2014 at 10:01 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Jan 31, 2014 at 3:24 PM, Andres Freund and...@2ndquadrant.com wrote: I've pushed a rebased version of the patchset to http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git branch rwlock contention. 220b34331f77effdb46798ddd7cca0cffc1b2858 actually was the small problem, ea9df812d8502fff74e7bc37d61bdc7d66d77a7f was the major PITA. As per discussion in developer meeting, I wanted to test shared buffer scaling patch with this branch. I am getting merge conflicts as per HEAD. Could you please get it resolved, so that I can get the data. I have started looking into this patch and have few questions/ findings which are shared below: 1. I think stats for lwstats-ex_acquire_count will be counted twice, first it is incremented in LWLockAcquireCommon() and then in LWLockAttemptLock() Hrmpf. Will fix. 2. Handling of potentialy_spurious case seems to be pending in LWLock functions like LWLockAcquireCommon(). LWLockAcquireCommon() { .. /* add to the queue */ LWLockQueueSelf(l, mode); /* we're now guaranteed to be woken up if necessary */ mustwait = LWLockAttemptLock(l, mode, false, potentially_spurious); } I think it can lead to some problems, example: Session -1 1. Acquire Exclusive LWlock Session -2 1. Acquire Shared LWlock 1a. Unconditionally incrementing shared count by session-2 Session -1 2. Release Exclusive lock Session -3 1. Acquire Exclusive LWlock It will put itself to wait queue by seeing the lock count incremented by Session-2 Session-2 1b. Decrement the shared count and add it to wait queue. Session-4 1. Acquire Exclusive lock This session will get the exclusive lock, because even though other lockers are waiting, lockcount is zero. Session-2 2. Try second time to take shared lock, it won't get as session-4 already has an exclusive lock, so it will start waiting Session-4 2. Release Exclusive lock it will not wake the waiters because waiters have been added before acquiring this lock. I don't understand this step here? When releasing the lock it'll notice that the waiters is 0 and acquire the spinlock which should protect against badness here? 3. LWLockAcquireCommon() { for (;;) { PGSemaphoreLock(proc-sem, false); if (!proc-lwWaiting) .. } proc-lwWaiting is checked, updated without spinklock where as previously it was done under spinlock, won't it be unsafe? It was previously checked/unset without a spinlock as well: /* * Awaken any waiters I removed from the queue. */ while (head != NULL) { LOG_LWDEBUG(LWLockRelease, T_NAME(l), T_ID(l), release waiter); proc = head; head = proc-lwWaitLink; proc-lwWaitLink = NULL; proc-lwWaiting = false; PGSemaphoreUnlock(proc-sem); } I don't think there's dangers here, lwWaiting will only ever be manipulated by the the PGPROC's owner. As discussed elsewhere there needs to be a write barrier before the proc-lwWaiting = false, even in upstream code. 4. LWLockAcquireCommon() { .. for (;;) { /* false means cannot accept cancel/die interrupt here. */ PGSemaphoreLock(proc-sem, false); if (!proc-lwWaiting) break; extraWaits++; } lock-releaseOK = true; } lock-releaseOK is updated/checked without spinklock where as previously it was done under spinlock, won't it be unsafe? Hm. That's probably buggy. Good catch. Especially if you have a compiler that does byte manipulation by reading e.g. 4 bytes from a struct and then write the wider variable back... So the releaseOk bit needs to move into LWLockDequeueSelf(). Thanks for looking! Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Tue, Jun 17, 2014 at 3:56 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-17 12:41:26 +0530, Amit Kapila wrote: On Fri, May 23, 2014 at 10:01 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Jan 31, 2014 at 3:24 PM, Andres Freund and...@2ndquadrant.com wrote: I've pushed a rebased version of the patchset to http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git branch rwlock contention. 220b34331f77effdb46798ddd7cca0cffc1b2858 actually was the small problem, ea9df812d8502fff74e7bc37d61bdc7d66d77a7f was the major PITA. As per discussion in developer meeting, I wanted to test shared buffer scaling patch with this branch. I am getting merge conflicts as per HEAD. Could you please get it resolved, so that I can get the data. I have started looking into this patch and have few questions/ findings which are shared below: 1. I think stats for lwstats-ex_acquire_count will be counted twice, first it is incremented in LWLockAcquireCommon() and then in LWLockAttemptLock() Hrmpf. Will fix. 2. Handling of potentialy_spurious case seems to be pending in LWLock functions like LWLockAcquireCommon(). LWLockAcquireCommon() { .. /* add to the queue */ LWLockQueueSelf(l, mode); /* we're now guaranteed to be woken up if necessary */ mustwait = LWLockAttemptLock(l, mode, false, potentially_spurious); } I think it can lead to some problems, example: Session -1 1. Acquire Exclusive LWlock Session -2 1. Acquire Shared LWlock 1a. Unconditionally incrementing shared count by session-2 Session -1 2. Release Exclusive lock Session -3 1. Acquire Exclusive LWlock It will put itself to wait queue by seeing the lock count incremented by Session-2 Session-2 1b. Decrement the shared count and add it to wait queue. Session-4 1. Acquire Exclusive lock This session will get the exclusive lock, because even though other lockers are waiting, lockcount is zero. Session-2 2. Try second time to take shared lock, it won't get as session-4 already has an exclusive lock, so it will start waiting Session-4 2. Release Exclusive lock it will not wake the waiters because waiters have been added before acquiring this lock. I don't understand this step here? When releasing the lock it'll notice that the waiters is 0 and acquire the spinlock which should protect against badness here? While Releasing lock, I think it will not go to Wakeup waiters (LWLockWakeup), because releaseOK will be false. releaseOK can be set as false when Session-1 has Released Exclusive lock and wakedup some previous waiter. Once it is set to false, it can be reset to true only for retry logic(after getting semaphore). 3. I don't think there's dangers here, lwWaiting will only ever be manipulated by the the PGPROC's owner. As discussed elsewhere there needs to be a write barrier before the proc-lwWaiting = false, even in upstream code. Agreed. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-06-17 18:01:58 +0530, Amit Kapila wrote: On Tue, Jun 17, 2014 at 3:56 PM, Andres Freund and...@2ndquadrant.com On 2014-06-17 12:41:26 +0530, Amit Kapila wrote: 2. Handling of potentialy_spurious case seems to be pending in LWLock functions like LWLockAcquireCommon(). LWLockAcquireCommon() { .. /* add to the queue */ LWLockQueueSelf(l, mode); /* we're now guaranteed to be woken up if necessary */ mustwait = LWLockAttemptLock(l, mode, false, potentially_spurious); } I think it can lead to some problems, example: Session -1 1. Acquire Exclusive LWlock Session -2 1. Acquire Shared LWlock 1a. Unconditionally incrementing shared count by session-2 Session -1 2. Release Exclusive lock Session -3 1. Acquire Exclusive LWlock It will put itself to wait queue by seeing the lock count incremented by Session-2 Session-2 1b. Decrement the shared count and add it to wait queue. Session-4 1. Acquire Exclusive lock This session will get the exclusive lock, because even though other lockers are waiting, lockcount is zero. Session-2 2. Try second time to take shared lock, it won't get as session-4 already has an exclusive lock, so it will start waiting Session-4 2. Release Exclusive lock it will not wake the waiters because waiters have been added before acquiring this lock. I don't understand this step here? When releasing the lock it'll notice that the waiters is 0 and acquire the spinlock which should protect against badness here? While Releasing lock, I think it will not go to Wakeup waiters (LWLockWakeup), because releaseOK will be false. releaseOK can be set as false when Session-1 has Released Exclusive lock and wakedup some previous waiter. Once it is set to false, it can be reset to true only for retry logic(after getting semaphore). I unfortunately still can't follow. If Session-1 woke up some previous waiter the woken up process will set releaseOK to true again when it loops to acquire the lock? Somewhat unrelated: I have a fair amount of doubt about the effectiveness of the releaseOK logic (which imo also is pretty poorly documented). Essentially its intent is to avoid unneccessary scheduling when other processes have already been woken up (i.e. releaseOK has been set to false). I believe the theory is that if any process has already been woken up it's pointless to wake up additional processes (i.e. PGSemaphoreUnlock()) because the originally woken up process will wake up at some point. But if the to-be-woken up process is scheduled out because it used all his last timeslices fully that means we'll not wakeup other waiters for a relatively long time. It's been introduced in the course of 5b9a058384e714b89e050fc0b6381f97037c665a whose logic generally is rather sound - I just doubt that the releaseOK part is necessary. It'd certainly interesting to rip releaseOK out and benchmark the result... My theory is that the average latency will go down on busy systems that aren't IO bound. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Tue, Jun 17, 2014 at 6:35 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-17 18:01:58 +0530, Amit Kapila wrote: On Tue, Jun 17, 2014 at 3:56 PM, Andres Freund and...@2ndquadrant.com On 2014-06-17 12:41:26 +0530, Amit Kapila wrote: 2. Handling of potentialy_spurious case seems to be pending in LWLock functions like LWLockAcquireCommon(). LWLockAcquireCommon() { .. /* add to the queue */ LWLockQueueSelf(l, mode); /* we're now guaranteed to be woken up if necessary */ mustwait = LWLockAttemptLock(l, mode, false, potentially_spurious); } I think it can lead to some problems, example: Session -1 1. Acquire Exclusive LWlock Session -2 1. Acquire Shared LWlock 1a. Unconditionally incrementing shared count by session-2 Session -1 2. Release Exclusive lock Session -3 1. Acquire Exclusive LWlock It will put itself to wait queue by seeing the lock count incremented by Session-2 Session-2 1b. Decrement the shared count and add it to wait queue. Session-4 1. Acquire Exclusive lock This session will get the exclusive lock, because even though other lockers are waiting, lockcount is zero. Session-2 2. Try second time to take shared lock, it won't get as session-4 already has an exclusive lock, so it will start waiting Session-4 2. Release Exclusive lock it will not wake the waiters because waiters have been added before acquiring this lock. I don't understand this step here? When releasing the lock it'll notice that the waiters is 0 and acquire the spinlock which should protect against badness here? While Releasing lock, I think it will not go to Wakeup waiters (LWLockWakeup), because releaseOK will be false. releaseOK can be set as false when Session-1 has Released Exclusive lock and wakedup some previous waiter. Once it is set to false, it can be reset to true only for retry logic(after getting semaphore). I unfortunately still can't follow. You have followed it pretty well as far as I can understand from your replies, as there is no reproducible test (which I think is bit tricky to prepare), so it becomes difficult to explain by theory. If Session-1 woke up some previous waiter the woken up process will set releaseOK to true again when it loops to acquire the lock? You are right, it will wakeup the existing waiters, but I think the new logic has one difference which is that it can allow the backend to take Exclusive lock when there are already waiters in queue. As per above example even though Session-2 and Session-3 are in wait queue, Session-4 will be able to acquire Exclusive lock which I think was previously not possible. Somewhat unrelated: I have a fair amount of doubt about the effectiveness of the releaseOK logic (which imo also is pretty poorly documented). Essentially its intent is to avoid unneccessary scheduling when other processes have already been woken up (i.e. releaseOK has been set to false). I believe the theory is that if any process has already been woken up it's pointless to wake up additional processes (i.e. PGSemaphoreUnlock()) because the originally woken up process will wake up at some point. But if the to-be-woken up process is scheduled out because it used all his last timeslices fully that means we'll not wakeup other waiters for a relatively long time. I think it will also maintain that the wokedup process won't stall for very long time, because if we wake new waiters, then previously woked process can again enter into wait queue and similar thing can repeat for long time. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-06-17 20:47:51 +0530, Amit Kapila wrote: On Tue, Jun 17, 2014 at 6:35 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-17 18:01:58 +0530, Amit Kapila wrote: On Tue, Jun 17, 2014 at 3:56 PM, Andres Freund and...@2ndquadrant.com On 2014-06-17 12:41:26 +0530, Amit Kapila wrote: I unfortunately still can't follow. You have followed it pretty well as far as I can understand from your replies, as there is no reproducible test (which I think is bit tricky to prepare), so it becomes difficult to explain by theory. I'm working an updated patch that moves the releaseOK into the spinlocks. Maybe that's the problem already - it's certainly not correct as is. If Session-1 woke up some previous waiter the woken up process will set releaseOK to true again when it loops to acquire the lock? You are right, it will wakeup the existing waiters, but I think the new logic has one difference which is that it can allow the backend to take Exclusive lock when there are already waiters in queue. As per above example even though Session-2 and Session-3 are in wait queue, Session-4 will be able to acquire Exclusive lock which I think was previously not possible. I think that was previously possible as well - in a slightly different set of circumstances though. After a process releases a lock and wakes up some of several waiters another process can come in and acquire the lock. Before the woken up process gets scheduled again. lwlocks aren't fair locks... Somewhat unrelated: I have a fair amount of doubt about the effectiveness of the releaseOK logic (which imo also is pretty poorly documented). Essentially its intent is to avoid unneccessary scheduling when other processes have already been woken up (i.e. releaseOK has been set to false). I believe the theory is that if any process has already been woken up it's pointless to wake up additional processes (i.e. PGSemaphoreUnlock()) because the originally woken up process will wake up at some point. But if the to-be-woken up process is scheduled out because it used all his last timeslices fully that means we'll not wakeup other waiters for a relatively long time. I think it will also maintain that the wokedup process won't stall for very long time, because if we wake new waiters, then previously woked process can again enter into wait queue and similar thing can repeat for long time. I don't think it effectively does that - newly incoming lockers ignore the queue and just acquire the lock. Even if there's some other backend scheduled to wake up. And shared locks can be acquired when there's exclusive locks waiting. I think both are actually critical for performance... Otherwise even a only lightly contended lock would require scheduler activity when a processes tries to lock something twice. Given the frequency we acquire some locks with that'd be disastrous... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Fri, Jan 31, 2014 at 3:24 PM, Andres Freund and...@2ndquadrant.com wrote: I've pushed a rebased version of the patchset to http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git branch rwlock contention. 220b34331f77effdb46798ddd7cca0cffc1b2858 actually was the small problem, ea9df812d8502fff74e7bc37d61bdc7d66d77a7f was the major PITA. As per discussion in developer meeting, I wanted to test shared buffer scaling patch with this branch. I am getting merge conflicts as per HEAD. Could you please get it resolved, so that I can get the data. From git://git.postgresql.org/git/users/andresfreund/postgres * branchrwlock-contention - FETCH_HEAD Auto-merging src/test/regress/regress.c CONFLICT (content): Merge conflict in src/test/regress/regress.c Auto-merging src/include/storage/proc.h Auto-merging src/include/storage/lwlock.h CONFLICT (content): Merge conflict in src/include/storage/lwlock.h Auto-merging src/include/storage/ipc.h CONFLICT (content): Merge conflict in src/include/storage/ipc.h Auto-merging src/include/storage/barrier.h CONFLICT (content): Merge conflict in src/include/storage/barrier.h Auto-merging src/include/pg_config_manual.h Auto-merging src/include/c.h Auto-merging src/backend/storage/lmgr/spin.c Auto-merging src/backend/storage/lmgr/proc.c Auto-merging src/backend/storage/lmgr/lwlock.c CONFLICT (content): Merge conflict in src/backend/storage/lmgr/lwlock.c Auto-merging src/backend/storage/ipc/shmem.c Auto-merging src/backend/storage/ipc/ipci.c Auto-merging src/backend/access/transam/xlog.c CONFLICT (content): Merge conflict in src/backend/access/transam/xlog.c Auto-merging src/backend/access/transam/twophase.c Auto-merging configure.in Auto-merging configure Auto-merging config/c-compiler.m4 Automatic merge failed; fix conflicts and then commit the result. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 01/31/2014 11:54 AM, Andres Freund wrote: Hi, On 2014-01-28 21:27:29 -0800, Peter Geoghegan wrote: On Fri, Nov 15, 2013 at 11:47 AM, Andres Freund and...@2ndquadrant.com wrote: 1) I've added an abstracted atomic ops implementation. Needs a fair amount of work, also submitted as a separate CF entry. (Patch 1 2) Commit 220b34331f77effdb46798ddd7cca0cffc1b2858 caused bitrot when applying 0002-Very-basic-atomic-ops-implementation.patch. Please rebase. I've pushed a rebased version of the patchset to http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git branch rwlock contention. 220b34331f77effdb46798ddd7cca0cffc1b2858 actually was the small problem, ea9df812d8502fff74e7bc37d61bdc7d66d77a7f was the major PITA. I plan to split the atomics patch into smaller chunks before reposting. Imo the Convert the PGPROC-lwWaitLink list into a dlist instead of open coding it. is worth being applied independently from the rest of the series, it simplies code and it fixes a bug... I committed a fix for the WakeupWaiters() bug now, without the rest of the open coding patch. Converting lwWaitLInk into a dlist is probably a good idea, but seems better to fix the bug separately, for the sake of git history if nothing else. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-02-03 17:51:20 -0800, Peter Geoghegan wrote: On Sun, Feb 2, 2014 at 6:00 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-01 19:47:29 -0800, Peter Geoghegan wrote: Here are the results of a benchmark on Nathan Boley's 64-core, 4 socket server: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/ That's interesting. The maximum number of what you see here (~293125) is markedly lower than what I can get. ... poke around ... Hm, that's partially because you're using pgbench without -M prepared if I see that correctly. The bottleneck in that case is primarily memory allocation. But even after that I am getting higher numbers: ~342497. Trying to nail down the differnce it oddly seems to be your max_connections=80 vs my 100. The profile in both cases is markedly different, way much more spinlock contention with 80. All in Pin/UnpinBuffer(). I updated this benchmark, with your BufferDescriptors alignment patch [1] applied on top of master (while still not using -M prepared in order to keep the numbers comparable). So once again, that's: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/ It made a bigger, fairly noticeable difference, but not so big a difference as you describe here. Are you sure that you saw this kind of difference with only 64 clients, as you mentioned elsewhere [1] (perhaps you fat-fingered [1] -- -cj is ambiguous)? Obviously max_connections is still 80 in the above. Should I have gone past 64 clients to see the problem? The best numbers I see with the [1] patch applied on master is only ~327809 for -S 10 64 clients. Perhaps I've misunderstood. That's likely -M prepared. It was with -c 64 -j 64... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
Hi, I'm doing some benchmarks regarding this problem: one set with baseline and one set with your patch. Machine was a 32 core machine (4 CPUs with 8 cores), 252 gib RAM. Both versions have the type align patch applied. pgbench-tools config: SCALES=100 SETCLIENTS=1 4 8 16 32 48 64 96 128 SETTIMES=2 I added -M prepared to the pgbench call in the benchwarmer script. The read-only tests are finished, I come to similiar results as yours: http://wwwtech.de/pg/benchmarks-lwlock-read-only/ I think the small differences are caused by the fact that I use TCP connections and not Unix domain sockets. The results are pretty impressive… I will post the read-write results as soon as they are finished. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services pgpKL7vRo_Vp0.pgp Description: PGP signature
Re: [HACKERS] Wait free LW_SHARED acquisition
On Tue, Feb 4, 2014 at 11:39 AM, Christian Kruse christ...@2ndquadrant.com wrote: I'm doing some benchmarks regarding this problem: one set with baseline and one set with your patch. Machine was a 32 core machine (4 CPUs with 8 cores), 252 gib RAM. Both versions have the type align patch applied. It certainly seems as if the interesting cases are where clients cores. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
On Tue, Feb 4, 2014 at 11:39 AM, Christian Kruse christ...@2ndquadrant.com wrote: I added -M prepared to the pgbench call in the benchwarmer script. The read-only tests are finished, I come to similiar results as yours: http://wwwtech.de/pg/benchmarks-lwlock-read-only/ Note that Christian ran this test with max_connections=201, presumably to exercise the alignment problem. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
On 2014-02-04 11:48:14 -0800, Peter Geoghegan wrote: On Tue, Feb 4, 2014 at 11:39 AM, Christian Kruse christ...@2ndquadrant.com wrote: I added -M prepared to the pgbench call in the benchwarmer script. The read-only tests are finished, I come to similiar results as yours: http://wwwtech.de/pg/benchmarks-lwlock-read-only/ Note that Christian ran this test with max_connections=201, presumably to exercise the alignment problem. I think he has applied the patch to hack around the alignment issue I pushed to git for both branches. It's not nice enough to be applied yet, but it should fix the issue. I think the 201 is just a remembrance of debugging the issue. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
On Tue, Feb 4, 2014 at 11:50 AM, Andres Freund and...@2ndquadrant.com wrote: I think he has applied the patch to hack around the alignment issue I pushed to git for both branches. It's not nice enough to be applied yet, but it should fix the issue. I think the 201 is just a remembrance of debugging the issue. I guess that given that *both* cases tested had the patch applied, that makes sense. However, I would have liked to see a real master baseline. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
On Tue, Feb 4, 2014 at 11:39 AM, Christian Kruse christ...@2ndquadrant.com wrote: I'm doing some benchmarks regarding this problem: one set with baseline and one set with your patch. Machine was a 32 core machine (4 CPUs with 8 cores), 252 gib RAM. What CPU model? Can you post /proc/cpuinfo? The distinction between logical and physical cores matters here. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
On February 4, 2014 8:53:36 PM CET, Peter Geoghegan p...@heroku.com wrote: On Tue, Feb 4, 2014 at 11:50 AM, Andres Freund and...@2ndquadrant.com wrote: I think he has applied the patch to hack around the alignment issue I pushed to git for both branches. It's not nice enough to be applied yet, but it should fix the issue. I think the 201 is just a remembrance of debugging the issue. I guess that given that *both* cases tested had the patch applied, that makes sense. However, I would have liked to see a real master baseline. Christian, could you rerun with master (the commit on which the branch is based on), the alignment patch, and then the lwlock patch? Best with max_connections 200. That's probably more important than the write tests as a first step.. Thanks, Andres -- Please excuse brevity and formatting - I am writing this on my mobile phone. Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
Hi, On 04/02/14 12:02, Peter Geoghegan wrote: On Tue, Feb 4, 2014 at 11:39 AM, Christian Kruse christ...@2ndquadrant.com wrote: I'm doing some benchmarks regarding this problem: one set with baseline and one set with your patch. Machine was a 32 core machine (4 CPUs with 8 cores), 252 gib RAM. What CPU model? Can you post /proc/cpuinfo? The distinction between logical and physical cores matters here. model name : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz 32 physical cores, 64 logical cores. /proc/cpuinfo is applied. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services pgpZpWnKfQtb4.pgp Description: PGP signature
Re: [HACKERS] Wait free LW_SHARED acquisition
Hi, On 04/02/14 21:03, Andres Freund wrote: Christian, could you rerun with master (the commit on which the branch is based on), the alignment patch, and then the lwlock patch? Best with max_connections 200. That's probably more important than the write tests as a first step.. Ok, benchmark for baseline+alignment patch is running. This will take a couple of hours and since I have to get up at about 05:00 I won't be able to post it before tomorrow. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services pgpsN5kcUiOgQ.pgp Description: PGP signature
Re: [HACKERS] Wait free LW_SHARED acquisition
On Tue, Feb 4, 2014 at 12:30 PM, Christian Kruse christ...@2ndquadrant.com wrote: Ok, benchmark for baseline+alignment patch is running. I see that you have enabled latency information. For this kind of thing I prefer to hack pgbench-tools to not collect this (i.e. to not pass the -l flag, Per-Transaction Logging). Just remove it and pgbench-tools rolls with it. It may well be that the overhead added is completely insignificant, but for something like this, where the latency information is unlikely to add any value, I prefer to not take the chance. This is a fairly minor point, however, especially since these are only 60 second runs where you're unlikely to accumulate enough transaction latency information to notice any effect. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
On 2014-02-04 13:42:51 -0800, Peter Geoghegan wrote: On Tue, Feb 4, 2014 at 12:30 PM, Christian Kruse christ...@2ndquadrant.com wrote: Ok, benchmark for baseline+alignment patch is running. I see that you have enabled latency information. For this kind of thing I prefer to hack pgbench-tools to not collect this (i.e. to not pass the -l flag, Per-Transaction Logging). Just remove it and pgbench-tools rolls with it. It may well be that the overhead added is completely insignificant, but for something like this, where the latency information is unlikely to add any value, I prefer to not take the chance. This is a fairly minor point, however, especially since these are only 60 second runs where you're unlikely to accumulate enough transaction latency information to notice any effect. Hm, I don't find that convincing. If you look at the results from the last run the latency information is actually quite interesting. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Sun, Feb 2, 2014 at 6:00 AM, Andres Freund and...@2ndquadrant.comwrote: Some background: The setups that triggered me into working on the patchset didn't really have a pgbench like workload, the individual queries were/are more complicated even though it's still an high throughput OLTP workload. And the contention was *much* higher than what I can reproduce with pgbench -S, there was often nearly all time spent in the lwlock's spinlock, and it was primarily the buffer mapping lwlocks, being locked in shared mode. The difference is that instead of locking very few buffers per query like pgbench does, they touched much more. Perhaps I should try to argue for this extension to pgbench again: http://www.postgresql.org/message-id/CAMkU=1w0K3RNhtPuLF8WQoVi6gxgG6mcnpC=-ivjwkjkydp...@mail.gmail.com I think it would go a good job of exercising what you want, provided you set the scale so that all data fit in RAM but not in shared_buffers. Or maybe you want it to fit in shared_buffers, since the buffer mapping lock was contended in shared mode--that suggests the problem is finding the buffer that already has the page, not making a buffer to have the page. Cheers, Jeff
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Sun, Feb 2, 2014 at 6:00 AM, Andres Freund and...@2ndquadrant.com wrote: The changed algorithm for lwlock imo is an *algorithmic* improvement, not one for a particular architecture. The advantage being that locking a lwlock which is primarily taken in shared mode will never need need to wait or loop. I agree. My point was only that the messaging ought to be that this is something that those with multi-socket Intel systems should take note of. Yes, that branch is used by some of them. But to make that clear to all that are still reading, I have *first* presented the patch findings to -hackers and *then* backported it, and I have referenced the existance of the patch for 9.2 on list before. This isn't some kind of secret sauce deal... No, of course not. I certainly didn't mean to imply that. My point was only that anyone that is affected to the same degree as the party with the 4 socket server might be left with a very poor impression of Postgres if we failed to fix the problem. It clearly rises to the level of a bugfix. That might be something to do later, as it *really* can hurt in practice. We had one server go from load 240 to 11... Well, we have to commit something on master first. But it should be a priority to avoid having this hurt users further, since the problems are highly predictable for certain types of servers. But I think we should first focus on getting the patch ready for master, then we can see where it's going. At the very least I'd like to split of the part modifying the current spinlocks to use the atomics, that seems far to invasive. Agreed. I unfortunately can't tell you that much more, not because it's private, but because it mostly was diagnosed by remote hand debugging, limiting insights considerably. Of course. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Sun, Feb 2, 2014 at 6:00 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-01 19:47:29 -0800, Peter Geoghegan wrote: Here are the results of a benchmark on Nathan Boley's 64-core, 4 socket server: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/ That's interesting. The maximum number of what you see here (~293125) is markedly lower than what I can get. ... poke around ... Hm, that's partially because you're using pgbench without -M prepared if I see that correctly. The bottleneck in that case is primarily memory allocation. But even after that I am getting higher numbers: ~342497. Trying to nail down the differnce it oddly seems to be your max_connections=80 vs my 100. The profile in both cases is markedly different, way much more spinlock contention with 80. All in Pin/UnpinBuffer(). I updated this benchmark, with your BufferDescriptors alignment patch [1] applied on top of master (while still not using -M prepared in order to keep the numbers comparable). So once again, that's: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/ It made a bigger, fairly noticeable difference, but not so big a difference as you describe here. Are you sure that you saw this kind of difference with only 64 clients, as you mentioned elsewhere [1] (perhaps you fat-fingered [1] -- -cj is ambiguous)? Obviously max_connections is still 80 in the above. Should I have gone past 64 clients to see the problem? The best numbers I see with the [1] patch applied on master is only ~327809 for -S 10 64 clients. Perhaps I've misunderstood. [1] Misaligned BufferDescriptors causing major performance problems on AMD: http://www.postgresql.org/message-id/20140202151319.gd32...@awork2.anarazel.de -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
Hi, On 2014-02-01 19:47:29 -0800, Peter Geoghegan wrote: Here are the results of a benchmark on Nathan Boley's 64-core, 4 socket server: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/ That's interesting. The maximum number of what you see here (~293125) is markedly lower than what I can get. ... poke around ... Hm, that's partially because you're using pgbench without -M prepared if I see that correctly. The bottleneck in that case is primarily memory allocation. But even after that I am getting higher numbers: ~342497. Trying to nail down the differnce it oddly seems to be your max_connections=80 vs my 100. The profile in both cases is markedly different, way much more spinlock contention with 80. All in Pin/UnpinBuffer(). I think =80 has to lead to some data being badly aligned. I can reproduce that =91 has *much* better performance than =90. 170841.844938 vs 368490.268577 in a 10s test. Reproducable both with an without the test. That's certainly worth some investigation. This is *not* reproducable on the intel machine, so it might the associativity of the L1/L2 cache on the AMD. Perhaps I should have gone past 64 clients, because in the document Lock Scaling Analysis on Intel Xeon Processors [1], Intel write: This implies that with oversubscription (more threads running than available logical CPUs), the performance of spinlocks can depend heavily on the exact OS scheduler behavior, and may change drastically with operating system or VM updates. I haven't bothered with a higher client counts though, because Andres noted it's the same with 90 clients on this AMD system. Andres: Do you see big problems when # clients # logical cores on the affected Intel systems? There's some slowdown with the patch applied, but it's not big. Without it, the slowdown is much earlier. There is only a marginal improvement in performance on this big 4 socket system. Andres informs me privately that he has reproduced the problem on multiple new 4-socket Intel servers, so it seems reasonable to suppose that more or less an Intel thing. I've just poked around, it's not just 4 socket, but also 2 socket systems. Some background: The setups that triggered me into working on the patchset didn't really have a pgbench like workload, the individual queries were/are more complicated even though it's still an high throughput OLTP workload. And the contention was *much* higher than what I can reproduce with pgbench -S, there was often nearly all time spent in the lwlock's spinlock, and it was primarily the buffer mapping lwlocks, being locked in shared mode. The difference is that instead of locking very few buffers per query like pgbench does, they touched much more. If you look at a profile of a pgbench -S workload -cj64 it's pretty much all bottlenecked by GetSnapshotData(): unpatched: - 40.91% postgres_plainl postgres_plainlw[.] s_lock - s_lock - 51.34% LWLockAcquire GetSnapshotData - GetTransactionSnapshot + 55.23% PortalStart + 44.77% PostgresMain - 48.66% LWLockRelease GetSnapshotData - GetTransactionSnapshot + 53.64% PostgresMain + 46.36% PortalStart + 2.65% pgbench [kernel.kallsyms] [k] try_to_wake_up + 2.61% postgres_plainl postgres_plainlw[.] LWLockRelease + 1.36% postgres_plainl postgres_plainlw[.] LWLockAcquire + 1.25% postgres_plainl postgres_plainlw[.] GetSnapshotData patched: - 2.94% postgres postgres [.] LWLockAcquire - LWLockAcquire + 26.61% ReadBuffer_common + 17.08% GetSnapshotData + 10.71% _bt_relandgetbuf + 10.24% LockAcquireExtended + 10.11% VirtualXactLockTableInsert + 9.17% VirtualXactLockTableCleanup + 7.55% _bt_getbuf + 3.40% index_fetch_heap + 1.51% LockReleaseAll + 0.89% StartTransactionCommand + 0.83% _bt_next + 0.75% LockRelationOid + 0.52% ReadBufferExtended - 2.75% postgres postgres [.] _bt_compare - _bt_compare + 96.72% _bt_binsrch + 1.71% _bt_moveright + 1.29% _bt_search - 2.67% postgres postgres [.] GetSnapshotData - GetSnapshotData + 97.03% GetTransactionSnapshot + 2.16% PostgresMain + 0.81% PortalStart So now the profile looks much saner/less contended which immediately is visible in transaction rates: 192584.218530 vs 552834.002573. But if you try to attack the contention from the other end, by setting default_transaction_isolation='repeatable read' to reduce the number of snapshots taken, its suddenly 536789.807391 vs 566839.328922. A *much* smaller benefit. The reason the patch doesn't help that much with that setting is that there simply isn't as much actual contention there: + 2.77% postgres postgres[.] _bt_compare - 2.72% postgres postgres[.]
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
I thought I'd try out what I was in an immediate position to do without having access to dedicated multi-socket hardware: A benchmark on AWS. This was a c3.8xlarge instance, which are reportedly backed by Intel Xeon E5-2680 processors. Since the Intel ARK website reports that these CPUs have 16 threads (8 cores + hyperthreading), and AWS's marketing material indicates that this instance type has 32 vCPUs, I inferred that the underlying hardware had 2 sockets. However, reportedly that wasn't the case when procfs was consulted, no doubt due to Xen Hypervisor voodoo: ubuntu@ip-10-67-128-2:~$ lscpu Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit Byte Order:Little Endian CPU(s):32 On-line CPU(s) list: 0-31 Thread(s) per core:32 Core(s) per socket:1 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family:6 Model: 62 Stepping: 4 CPU MHz: 2800.074 BogoMIPS: 5600.14 Hypervisor vendor: Xen Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 25600K NUMA node0 CPU(s): 0-31 I ran the benchmark on Ubuntu 13.10 server, because that seemed to be the only prominent enterprise x86_64 AMI (operating system image) that came with GCC 4.8 as part its standard toolchain. This exact setup is benchmarked here: http://www.phoronix.com/scan.php?page=articleitem=amazon_ec2_c3num=1 (Incidentally, some of the other benchmarks on that site use pgbench to benchmark the Linux kernel, filesystems, disks and so on. e.g.: http://www.phoronix.com/scan.php?page=news_itempx=NzI0NQ). I was hesitant to benchmark using a virtualized system. There is a lot of contradictory information about the overhead and/or noise added, which may vary from one workload or hypervisor to the next. But, needs must when the devil drives, and all that. Besides, this kind of setup is very commercially relevant these days, so it doesn't seem unreasonable to see how things work out on an AWS instance that generally performs well for this workload. Of course, I still want to replicate the big improvement you reported for multi-socket systems, but you might have to wait a while for that, unless some kindly benefactor that has a 4 socket server lying around would like to help me out. Results: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/c38xlarge-rwlocks/ You can drill down and find the postgresql.conf settings from the report. There appears to be a modest improvement in transaction throughput. It's not as large as the improvement you reported for your 2 socket workstation, but it's there, just about. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-01-31 17:52:58 -0800, Peter Geoghegan wrote: On Fri, Jan 31, 2014 at 1:54 AM, Andres Freund and...@2ndquadrant.com wrote: I plan to split the atomics patch into smaller chunks before reposting. Imo the Convert the PGPROC-lwWaitLink list into a dlist instead of open coding it. is worth being applied independently from the rest of the series, it simplies code and it fixes a bug... For things that require a format-patch series, I personally find it easier to work off a feature branch on a remote under the control of the patch author. The only reason that I don't do so myself is that I know that that isn't everyone's preference. I do to, that's why I have a git branch for all but the most trivial branches. I have access to a large server for the purposes of benchmarking this. On the plus side, this is a box very much capable of exercising these bottlenecks: a 48 core AMD system, with 256GB of RAM. However, I had to instruct someone else on how to conduct the benchmark, since I didn't have SSH access, and the OS and toolchain were antiquated, particularly for this kind of thing. This is Linux kernel 2.6.18-274.3.1.el5 (RHEL5.7). The GCC version that Postgres was built with was 4.1.2. This is not what I'd hoped for; obviously I would have preferred to be able to act on your warning: Please also note that due to the current state of the atomics implementation this likely will only work on a somewhat recent gcc and that the performance might be slightly worse than in the previous version because the atomic add is implemented using the CAS fallback. Even still, it might be marginally useful to get a sense of that cost. I *think* it should actually work on gcc 4.1, I've since added the fallbacks I hadn't back when I wrote the above. I've added exactly those atomics that are needed for the scalable lwlock things (xadd, xsub (yes, that's essentially the same) and cmpxchg). I'm looking at alternative options, because this is not terribly helpful. With those big caveats in mind, consider the results of the benchmark, which show the patch performing somewhat worse than the master baseline at higher client counts: I think that's actually something else. I'd tried to make some definitions simpler, and that has, at least for the machine I have occasional access to, pessimized things. I can't always run the tests there, so I hadn't noticed before the repost. I've pushed a preliminary relief to the git repository, any chance you could retry? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Sat, Feb 1, 2014 at 4:57 AM, Andres Freund and...@2ndquadrant.com wrote: I'm looking at alternative options, because this is not terribly helpful. With those big caveats in mind, consider the results of the benchmark, which show the patch performing somewhat worse than the master baseline at higher client counts: I think that's actually something else. I'd tried to make some definitions simpler, and that has, at least for the machine I have occasional access to, pessimized things. I can't always run the tests there, so I hadn't noticed before the repost. I should have been clearer on one point: The pre-rebased patch (actual patch series) [1] was applied on top of a commit from around the same period, in order to work around the bit rot. However, I tested the most recent revision from your git remote on the AWS instance. [1] http://www.postgresql.org/message-id/20131115194725.gg5...@awork2.anarazel.de -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-02-01 13:40:20 -0800, Peter Geoghegan wrote: On Sat, Feb 1, 2014 at 4:57 AM, Andres Freund and...@2ndquadrant.com wrote: I'm looking at alternative options, because this is not terribly helpful. With those big caveats in mind, consider the results of the benchmark, which show the patch performing somewhat worse than the master baseline at higher client counts: I think that's actually something else. I'd tried to make some definitions simpler, and that has, at least for the machine I have occasional access to, pessimized things. I can't always run the tests there, so I hadn't noticed before the repost. I should have been clearer on one point: The pre-rebased patch (actual patch series) [1] was applied on top of a commit from around the same period, in order to work around the bit rot. Ah. Then I indeed wouldn't expect improvements. However, I tested the most recent revision from your git remote on the AWS instance. [1] http://www.postgresql.org/message-id/20131115194725.gg5...@awork2.anarazel.de But that was before my fix, right. Except you managed to timetravel :) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Sat, Feb 1, 2014 at 1:41 PM, Andres Freund and...@2ndquadrant.com wrote: However, I tested the most recent revision from your git remote on the AWS instance. But that was before my fix, right. Except you managed to timetravel :) Heh, okay. So Nathan Boley has generously made available a machine with 4 AMD Opteron 6272s. I've performed the same benchmark on that server. However, I thought it might be interesting to circle back and get some additional numbers for the AWS instance already tested - I'd like to see what it looks like after your recent tweaks to fix the regression. The single client performance of that instance seems to be markedly better than that of Nathan's server. Tip: AWS command line tools + S3 are a great way to easily publish bulky pgbench-tools results, once you figure out how to correctly set your S3 bucket's security manifest to allow public http. It has similar advantages to rsync, and just works with the minimum of fuss. Anyway, I don't think that the new, third c3.8xlarge-rwlocks testset tells us much of anything: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/c38xlarge-rwlocks/ Here are the results of a benchmark on Nathan Boley's 64-core, 4 socket server: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/ Perhaps I should have gone past 64 clients, because in the document Lock Scaling Analysis on Intel Xeon Processors [1], Intel write: This implies that with oversubscription (more threads running than available logical CPUs), the performance of spinlocks can depend heavily on the exact OS scheduler behavior, and may change drastically with operating system or VM updates. I haven't bothered with a higher client counts though, because Andres noted it's the same with 90 clients on this AMD system. Andres: Do you see big problems when # clients # logical cores on the affected Intel systems? There is only a marginal improvement in performance on this big 4 socket system. Andres informs me privately that he has reproduced the problem on multiple new 4-socket Intel servers, so it seems reasonable to suppose that more or less an Intel thing. The Intel document [1] further notes: As the number of threads polling the status of a lock address increases, the time it takes to process those polling requests will increase. Initially, the latency to transfer data across socket boundaries will always be an order of magnitude longer than the on-chip cache-to-cache transfer latencies. Such cross-socket transfers, if they are not effectively minimized by software, will negatively impact the performance of any lock algorithm that depends on them. So, I think it's fair to say, given what we now know from Andres' numbers and the numbers I got from Nathan's server, that this patch is closer to being something that addresses a particularly unfortunate pathology on many-socket Intel system than it is to being a general performance optimization. Based on the above quoted passage, it isn't unreasonable to suppose other vendors or architectures could be affected, but that isn't in evidence. While I welcome the use of atomic operations in the context of LW_SHARED acquisition as general progress, I think that to the outside world my proposed messaging is more useful. It's not quite a bug-fix, but if you're using a many-socket Intel server, you're *definitely* going to want to use a PostgreSQL version that is unaffected. You may well not want to take on the burden of waiting for 9.4, or waiting for it to fully stabilize. I note that Andres has a feature branch of this backported to Postgres 9.2, no doubt because of a request from a 2ndQuadrant customer. I have to wonder if we should think about making this available with a configure switch in one or more back branches. I think that the complete back-porting of the fsync request queue issue's fix in commit 758728 could be considered a precedent - that too was a fix for a really bad worst-case that was encountered fairly infrequently in the wild. It's sort of horrifying to have red-hot spinlocks in production, so that seems like the kind of thing we should make an effort to address for those running multi-socket systems. Of those running Postgres on new multi-socket systems, the reality is that the majority are running on Intel hardware. Unfortunately, everyone knows that Intel will soon be the only game in town when it comes to high-end x86_64 servers, which contributes to my feeling that we need to target back branches. We should do something about the possible regression with older compilers using the fallback first, though. It would be useful to know more about the nature of the problem that made such an appreciable difference in Andres' original post. Again, through private e-mail, I saw perf profiles from affected servers and an unaffected though roughly comparable server (i.e. Nathan's 64 core AMD server). Andres observed that stalled-cycles-frontend and stalled-cycles-backend Linux
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
Hi, On 2014-01-28 21:27:29 -0800, Peter Geoghegan wrote: On Fri, Nov 15, 2013 at 11:47 AM, Andres Freund and...@2ndquadrant.com wrote: 1) I've added an abstracted atomic ops implementation. Needs a fair amount of work, also submitted as a separate CF entry. (Patch 1 2) Commit 220b34331f77effdb46798ddd7cca0cffc1b2858 caused bitrot when applying 0002-Very-basic-atomic-ops-implementation.patch. Please rebase. I've pushed a rebased version of the patchset to http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git branch rwlock contention. 220b34331f77effdb46798ddd7cca0cffc1b2858 actually was the small problem, ea9df812d8502fff74e7bc37d61bdc7d66d77a7f was the major PITA. I plan to split the atomics patch into smaller chunks before reposting. Imo the Convert the PGPROC-lwWaitLink list into a dlist instead of open coding it. is worth being applied independently from the rest of the series, it simplies code and it fixes a bug... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Fri, Jan 31, 2014 at 1:54 AM, Andres Freund and...@2ndquadrant.com wrote: I plan to split the atomics patch into smaller chunks before reposting. Imo the Convert the PGPROC-lwWaitLink list into a dlist instead of open coding it. is worth being applied independently from the rest of the series, it simplies code and it fixes a bug... For things that require a format-patch series, I personally find it easier to work off a feature branch on a remote under the control of the patch author. The only reason that I don't do so myself is that I know that that isn't everyone's preference. I have access to a large server for the purposes of benchmarking this. On the plus side, this is a box very much capable of exercising these bottlenecks: a 48 core AMD system, with 256GB of RAM. However, I had to instruct someone else on how to conduct the benchmark, since I didn't have SSH access, and the OS and toolchain were antiquated, particularly for this kind of thing. This is Linux kernel 2.6.18-274.3.1.el5 (RHEL5.7). The GCC version that Postgres was built with was 4.1.2. This is not what I'd hoped for; obviously I would have preferred to be able to act on your warning: Please also note that due to the current state of the atomics implementation this likely will only work on a somewhat recent gcc and that the performance might be slightly worse than in the previous version because the atomic add is implemented using the CAS fallback. Even still, it might be marginally useful to get a sense of that cost. I'm looking at alternative options, because this is not terribly helpful. With those big caveats in mind, consider the results of the benchmark, which show the patch performing somewhat worse than the master baseline at higher client counts: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/rwlock-contention/ This is exactly what you said would happen, but at least now we have a sense of that cost. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Fri, Nov 15, 2013 at 11:47 AM, Andres Freund and...@2ndquadrant.com wrote: 1) I've added an abstracted atomic ops implementation. Needs a fair amount of work, also submitted as a separate CF entry. (Patch 1 2) Commit 220b34331f77effdb46798ddd7cca0cffc1b2858 caused bitrot when applying 0002-Very-basic-atomic-ops-implementation.patch. Please rebase. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
This patch didn't make it out of the 2013-11 commit fest. You should move it to the next commit fest (probably with an updated patch) before January 15th. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers