On Sun, Jul 24, 2016 at 1:10 AM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> One solution could be to provide a non-circular variant of the dlist
> interface that uses NULL list termination.  I've attached a quick
> sketch of something like that which seems to work correctly.  It is
> only lightly tested so far and probably buggy, but shows the general
> idea.

On reflection, it wouldn't make much sense to put "noncircular" in the
names of interfaces, as that is an internal detail.  Maybe
"relocatable" or "position independent" or something like that, since
that's a user-visible property of the dlist_head.  Here is a version
of the patch that uses _relocatable.

> Any thoughts on this approach, or better ways to solve this problem?
> How valuable is the branch-free property of the circular push and
> delete operations?

I investigated this a bit.  If I do this on my laptop (clang, no
asserts, -O2),  it takes 3895 milliseconds, or 4.8ns per push/delete

        for (i = 0; i < 100000000; ++i)
            dlist_push_head(&head, &nodes[0]);
            dlist_push_tail(&head, &nodes[1]);
            dlist_push_head(&head, &nodes[2]);
            dlist_push_tail(&head, &nodes[3]);

The relocatable version takes 5907 milliseconds, or 7.4ns per
push/delete operation:

        for (i = 0; i < 100000000; ++i)
            dlist_push_head_relocatable(&head, &nodes[0]);
            dlist_push_tail_relocatable(&head, &nodes[1]);
            dlist_push_head_relocatable(&head, &nodes[2]);
            dlist_push_tail_relocatable(&head, &nodes[3]);
            dlist_delete_relocatable(&head, &nodes[2]);
            dlist_delete_relocatable(&head, &nodes[3]);
            dlist_delete_relocatable(&head, &nodes[0]);
            dlist_delete_relocatable(&head, &nodes[1]);

Those operations are ~50% slower.  So getting rid of dlist's clever
branch-free code generally seems like a bad idea.

Next I wondered if it would be a bad idea to use slower relocatable
dlist heads for all LWLocks.  Obviously LWLocks are used all over the
place and it would be bad to slow them down, but I wondered if maybe
dlist manipulation might not be a very significant part of their work.
So I put a LWLock and a counter in shared memory, and had N background
workers run a tight loop that locks, increments and unlocks
concurrently until the counter reaches 1 billion.  This creates
different degrees of contention and wait list sizes.  The worker loop
looks like this:

    while (!finished)
        LWLockAcquire(&control->lock, LW_EXCLUSIVE);
        if (control->counter >= control->goal)
            finished = true;

I measured the following times for unpatched master, on my 4 core laptop:

16 workers = 73.067s, 74.869s, 75.338s
8 workers  = 65.846s, 67.622s, 68.039s
4 workers  = 68.763s, 68.980s, 69.035s <-- curiously slower here
3 workers  = 59.701s, 59.991s, 60.133s
2 workers  = 53.620s, 55.300s, 55.790s
1 worker   = 21.578s, 21.535s, 21.598s

With the attached patched I got:

16 workers = 75.341s, 77.445s, 77.635s <- +3.4%
8 workers  = 67.462s, 68.622s, 68.851s <- +1.4%
4 workers  = 64.983s, 65.021s, 65.496s <- -5.7%
3 workers  = 60.247s, 60.425s, 60.492s <- +0.7%
2 workers  = 57.544s, 57.626s, 58.133s <- +2.3%
1 worker   = 21.403s, 21.486s, 21.661s <- -0.2%

Somewhat noisy data and different effects at different numbers of workers.

I can post the source for those tests if anyone is interested.  If you
have any other ideas for access patterns to test, or clever ways to
keep push and delete branch-free while also avoiding internal pointers
back to dlist_head, I'm all ears.  Otherwise, if a change affecting
all LWLocks turns out to be unacceptable, maybe we would need to have
a different LWLock interface for relocatable LWLocks to make them
suitable for use in DSM segments.  Any thoughts?

Thomas Munro

Attachment: lwlocks-in-dsm-v2.patch
Description: Binary data

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to