Re: [HACKERS] kqueue

2020-02-04 Thread Thomas Munro
On Wed, Jan 29, 2020 at 11:54 AM Thomas Munro  wrote:
> If there are no further objections, I'm planning to commit this sooner
> rather than later, so that it gets plenty of air time on developer and
> build farm machines.  If problems are discovered on a particular
> platform, there's a pretty good escape hatch: you can define
> WAIT_USE_POLL, and if it turns out to be necessary, we could always do
> something in src/template similar to what we do for semaphores.

I updated the error messages to match the new "unified" style, adjust
a couple of comments, and pushed.  Thanks to all the people who
tested.  I'll keep an eye on the build farm.




Re: [HACKERS] kqueue

2020-01-29 Thread Mark Wong
On Sat, Jan 25, 2020 at 11:29:11AM +1300, Thomas Munro wrote:
> On Thu, Jan 23, 2020 at 9:38 AM Rui DeSousa  wrote:
> > On Jan 22, 2020, at 2:19 PM, Tom Lane  wrote:
> >> It's certainly possible that to see any benefit you need stress
> >> levels above what I can manage on the small box I've got these
> >> OSes on.  Still, it'd be nice if a performance patch could show
> >> some improved performance, before we take any portability risks
> >> for it.
> 
> You might need more than one CPU socket, or at least lots more cores
> so that you can create enough contention.  That was needed to see the
> regression caused by commit ac1d794 on Linux[1].
> 
> > Here is two charts comparing a patched and unpatched system.
> > These systems are very large and have just shy of thousand
> > connections each with averages of 20 to 30 active queries concurrently
> > running at times including hundreds if not thousand of queries hitting
> > the database in rapid succession.  The effect is the unpatched system
> > generates a lot of system load just handling idle connections where as
> > the patched version is not impacted by idle sessions or sessions that
> > have already received data.
> 
> Thanks.  I can reproduce something like this on an Azure 72-vCPU
> system, using pgbench -S -c800 -j32.  The point of those settings is
> to have many backends, but they're all alternating between work and
> sleep.  That creates a stream of poll() syscalls, and system time goes
> through the roof (all CPUs pegged, but it's ~half system).  Profiling
> the kernel with dtrace, I see the most common stack (by a long way) is
> in a poll-related lock, similar to a profile Rui sent me off-list from
> his production system.  Patched, there is very little system time and
> the TPS number goes from 539k to 781k.
> 
> [1] 
> https://www.postgresql.org/message-id/flat/CAB-SwXZh44_2ybvS5Z67p_CDz%3DXFn4hNAD%3DCnMEF%2BQqkXwFrGg%40mail.gmail.com

Just to add some data...

I tried the kqueue v14 patch on a AWS EC2 m5a.24xlarge (96 vCPU) with
FreeBSD 12.1, driving from a m5.8xlarge (32 vCPU) CentOS 7 system.

I also use pgbench with a scale factor of 1000, with -S -c800 -j32.

Comparing pg 12.1 vs 13-devel (30012a04):

* TPS increased from ~93,000 to ~140,000, ~ 32% increase
* system time dropped from ~ 78% to ~ 70%, ~ 8% decrease
* user time increased from ~16% to ~ 23%, ~7% increase

I don't have any profile data, but I've attached a couple chart showing
the processor utilization over a 15 minute interval from the database
system.

Regards,
Mark
-- 
Mark Wong
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/


Re: [HACKERS] kqueue

2020-01-28 Thread Thomas Munro
On Sat, Jan 25, 2020 at 11:29 AM Thomas Munro  wrote:
> On Thu, Jan 23, 2020 at 9:38 AM Rui DeSousa  wrote:
> > Here is two charts comparing a patched and unpatched system.
> > These systems are very large and have just shy of thousand
> > connections each with averages of 20 to 30 active queries concurrently
> > running at times including hundreds if not thousand of queries hitting
> > the database in rapid succession.  The effect is the unpatched system
> > generates a lot of system load just handling idle connections where as
> > the patched version is not impacted by idle sessions or sessions that
> > have already received data.
>
> Thanks.  I can reproduce something like this on an Azure 72-vCPU
> system, using pgbench -S -c800 -j32.  The point of those settings is
> to have many backends, but they're all alternating between work and
> sleep.  That creates a stream of poll() syscalls, and system time goes
> through the roof (all CPUs pegged, but it's ~half system).  Profiling
> the kernel with dtrace, I see the most common stack (by a long way) is
> in a poll-related lock, similar to a profile Rui sent me off-list from
> his production system.  Patched, there is very little system time and
> the TPS number goes from 539k to 781k.

If there are no further objections, I'm planning to commit this sooner
rather than later, so that it gets plenty of air time on developer and
build farm machines.  If problems are discovered on a particular
platform, there's a pretty good escape hatch: you can define
WAIT_USE_POLL, and if it turns out to be necessary, we could always do
something in src/template similar to what we do for semaphores.




Re: [HACKERS] kqueue

2020-01-24 Thread Thomas Munro
On Thu, Jan 23, 2020 at 9:38 AM Rui DeSousa  wrote:
> On Jan 22, 2020, at 2:19 PM, Tom Lane  wrote:
>> It's certainly possible that to see any benefit you need stress
>> levels above what I can manage on the small box I've got these
>> OSes on.  Still, it'd be nice if a performance patch could show
>> some improved performance, before we take any portability risks
>> for it.

You might need more than one CPU socket, or at least lots more cores
so that you can create enough contention.  That was needed to see the
regression caused by commit ac1d794 on Linux[1].

> Here is two charts comparing a patched and unpatched system.
> These systems are very large and have just shy of thousand
> connections each with averages of 20 to 30 active queries concurrently
> running at times including hundreds if not thousand of queries hitting
> the database in rapid succession.  The effect is the unpatched system
> generates a lot of system load just handling idle connections where as
> the patched version is not impacted by idle sessions or sessions that
> have already received data.

Thanks.  I can reproduce something like this on an Azure 72-vCPU
system, using pgbench -S -c800 -j32.  The point of those settings is
to have many backends, but they're all alternating between work and
sleep.  That creates a stream of poll() syscalls, and system time goes
through the roof (all CPUs pegged, but it's ~half system).  Profiling
the kernel with dtrace, I see the most common stack (by a long way) is
in a poll-related lock, similar to a profile Rui sent me off-list from
his production system.  Patched, there is very little system time and
the TPS number goes from 539k to 781k.

[1] 
https://www.postgresql.org/message-id/flat/CAB-SwXZh44_2ybvS5Z67p_CDz%3DXFn4hNAD%3DCnMEF%2BQqkXwFrGg%40mail.gmail.com




Re: [HACKERS] kqueue

2020-01-22 Thread Tom Lane
I wrote:
> This just says it doesn't lock up, of course.  I've not attempted
> any performance-oriented tests.

I've now done some light performance testing -- just stuff like
pgbench -S -M prepared -c 20 -j 20 -T 60 bench

I cannot see any improvement on either FreeBSD 12 or NetBSD 8.1,
either as to net TPS or as to CPU load.  If anything, the TPS
rate is a bit lower with the patch, though I'm not sure that
that effect is above the noise level.

It's certainly possible that to see any benefit you need stress
levels above what I can manage on the small box I've got these
OSes on.  Still, it'd be nice if a performance patch could show
some improved performance, before we take any portability risks
for it.

regards, tom lane




Re: [HACKERS] kqueue

2020-01-22 Thread Tom Lane
Matteo Beccati  writes:
> On 22/01/2020 17:06, Tom Lane wrote:
>> Matteo Beccati  writes:
>>> I had a NetBSD 8.0 VM lying around and I gave the patch a spin on latest
>>> master.
>>> With the kqueue patch, a pgbench -c basically hangs the whole postgres
>>> instance. Not sure if it's a kernel issue, HyperVM issue o what, but
>>> when it hangs, I can't even kill -9 the postgres processes or get the VM
>>> to properly shutdown. The same doesn't happen, of course, with vanilla
>>> postgres.

>> I'm a bit confused about what you are testing --- the kqueue patch
>> as per this thread, or that plus the WaitLatch refactorizations in
>> the other thread you point to above?

> my bad, I tested the v14 patch attached to the email.

Thanks for clarifying.

FWIW, I can't replicate the problem here using NetBSD 8.1 amd64
on bare metal.  I tried various pgbench parameters up to "-c 20 -j 20"
(on a 4-cores-plus-hyperthreading CPU), and it seems fine.

One theory is that NetBSD fixed something since 8.0, but I trawled
their 8.1 release notes [1], and the only items mentioning kqueue
or kevent are for fixes in the pty and tun drivers, neither of which
seem relevant.  (But wait ... could your VM setup be dependent on
a tunnel network interface for outside-the-VM connectivity?  Still
hard to see the connection though.)

My guess is that what you're seeing is a VM bug.

regards, tom lane

[1] https://cdn.netbsd.org/pub/NetBSD/NetBSD-8.1/CHANGES-8.1




Re: [HACKERS] kqueue

2020-01-22 Thread Matteo Beccati
On 22/01/2020 17:06, Tom Lane wrote:
> Matteo Beccati  writes:
>> On 21/01/2020 02:06, Thomas Munro wrote:
>>> [1] 
>>> https://www.postgresql.org/message-id/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
> 
>> I had a NetBSD 8.0 VM lying around and I gave the patch a spin on latest
>> master.
>> With the kqueue patch, a pgbench -c basically hangs the whole postgres
>> instance. Not sure if it's a kernel issue, HyperVM issue o what, but
>> when it hangs, I can't even kill -9 the postgres processes or get the VM
>> to properly shutdown. The same doesn't happen, of course, with vanilla
>> postgres.
> 
> I'm a bit confused about what you are testing --- the kqueue patch
> as per this thread, or that plus the WaitLatch refactorizations in
> the other thread you point to above?

my bad, I tested the v14 patch attached to the email.

The quoted url was just above the patch name in the email client and
somehow my brain thought I was quoting the v14 patch name.


Cheers
-- 
Matteo Beccati

Development & Consulting - http://www.beccati.com/




Re: [HACKERS] kqueue

2020-01-22 Thread Tom Lane
Matteo Beccati  writes:
> On 21/01/2020 02:06, Thomas Munro wrote:
>> [1] 
>> https://www.postgresql.org/message-id/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com

> I had a NetBSD 8.0 VM lying around and I gave the patch a spin on latest
> master.
> With the kqueue patch, a pgbench -c basically hangs the whole postgres
> instance. Not sure if it's a kernel issue, HyperVM issue o what, but
> when it hangs, I can't even kill -9 the postgres processes or get the VM
> to properly shutdown. The same doesn't happen, of course, with vanilla
> postgres.

I'm a bit confused about what you are testing --- the kqueue patch
as per this thread, or that plus the WaitLatch refactorizations in
the other thread you point to above?

I've gotten through check-world successfully with the v14 kqueue patch
atop yesterday's HEAD on:

* macOS Catalina 10.15.2 (current release)
* FreeBSD/amd64 12.0-RELEASE-p12
* NetBSD/amd64 8.1
* NetBSD/arm 8.99.41
* OpenBSD/amd64 6.5

(These OSes are all on bare metal, no VMs involved)

This just says it doesn't lock up, of course.  I've not attempted
any performance-oriented tests.

regards, tom lane




Re: [HACKERS] kqueue

2020-01-21 Thread Matteo Beccati
Hi,

On 21/01/2020 02:06, Thomas Munro wrote:
> [1] 
> https://www.postgresql.org/message-id/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com

I had a NetBSD 8.0 VM lying around and I gave the patch a spin on latest
master.

With the kqueue patch, a pgbench -c basically hangs the whole postgres
instance. Not sure if it's a kernel issue, HyperVM issue o what, but
when it hangs, I can't even kill -9 the postgres processes or get the VM
to properly shutdown. The same doesn't happen, of course, with vanilla
postgres.

If the patch gets merged, I'd say it's safer not to enable it on NetBSD
and eventually leave it up to the pkgsrc team.


Cheers
-- 
Matteo Beccati

Development & Consulting - http://www.beccati.com/




Re: [HACKERS] kqueue

2020-01-20 Thread Thomas Munro
On Tue, Jan 21, 2020 at 8:03 AM Tom Lane  wrote:
> I observe very similar behavior on FreeBSD/amd64 12.0-RELEASE-p12,
> so it's not just macOS.

Thanks for testing.  Fixed by handling the new
exit_on_postmaster_death flag from commit cfdf4dc4.

On Tue, Jan 21, 2020 at 5:55 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > [ 0001-Add-kqueue-2-support-for-WaitEventSet-v13.patch ]
>
> I haven't read this patch in any detail, but a couple quick notes:
>
> * It needs to be rebased over the removal of pg_config.h.win32
> --- it should be touching Solution.pm instead, I believe.

Done.

> * I'm disturbed by the addition of a hunk to the supposedly
> system-API-independent WaitEventSetWait() function.  Is that
> a generic bug fix?  If not, can we either get rid of it, or
> at least wrap it in "#ifdef WAIT_USE_KQUEUE" so that this
> patch isn't inflicting a performance penalty on everyone else?

Here's a version that adds no new code to non-WAIT_USE_KQUEUE paths.
That code deals with the fact that we sometimes discover the
postmaster is gone before we're in a position to report an event, so
we need an inter-function memory of some kind.  The new coding also
handles a race case where someone reuses the postmaster's pid before
we notice it went away.  In theory, the need for that could be
entirely removed by collapsing the 'adjust' call into the 'wait' call
(a single kevent() invocation can do both things), but I'm not sure if
it's worth the complexity.  As for generally reducing syscalls noise,
for both kqueue and epoll, I think that should be addressed separately
by better reuse of WaitEventSet objects[1].

[1] 
https://www.postgresql.org/message-id/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com


0001-Add-kqueue-2-support-for-WaitEventSet-v14.patch
Description: Binary data


Re: [HACKERS] kqueue

2020-01-20 Thread Tom Lane
I wrote:
> Peter Eisentraut  writes:
>> I took this patch for a quick spin on macOS.  The result was that the 
>> test suite hangs in the test src/test/recovery/t/017_shm.pl.  I didn't 
>> see any mentions of this anywhere in the thread, but that test is newer 
>> than the beginning of this thread.  Can anyone confirm or deny this 
>> issue?  Is it specific to macOS perhaps?

> Yeah, I duplicated the problem in macOS Catalina (10.15.2), using today's
> HEAD.  The core regression tests pass, as do the earlier recovery tests
> (I didn't try a full check-world though).  Somewhere early in 017_shm.pl,
> things freeze up with four postmaster-child processes stuck in 100%-
> CPU-consuming loops.

I observe very similar behavior on FreeBSD/amd64 12.0-RELEASE-p12,
so it's not just macOS.

I now think that the autovac launcher isn't actually stuck in the way
that the other processes are.  The ones that are actually consuming
CPU are the checkpointer, bgwriter, and walwriter.  On the FreeBSD
box their stack traces are

(gdb) bt
#0  _close () at _close.S:3
#1  0x007b4dd1 in FreeWaitEventSet (set=) at latch.c:660
#2  WaitLatchOrSocket (latch=0x80a1477a8, wakeEvents=, sock=-1, 
timeout=, wait_event_info=83886084) at latch.c:432
#3  0x0074a1b0 in CheckpointerMain () at checkpointer.c:514
#4  0x005691e2 in AuxiliaryProcessMain (argc=2, argv=0x7fffce90)
at bootstrap.c:461

(gdb) bt
#0  _fcntl () at _fcntl.S:3
#1  0x000800a6cd84 in fcntl (fd=4, cmd=2)
at /usr/src/lib/libc/sys/fcntl.c:56
#2  0x007b4eb5 in CreateWaitEventSet (context=, 
nevents=) at latch.c:625
#3  0x007b4c82 in WaitLatchOrSocket (latch=0x80a147b00, wakeEvents=41, 
sock=-1, timeout=200, wait_event_info=83886083) at latch.c:389
#4  0x00749ecd in BackgroundWriterMain () at bgwriter.c:304
#5  0x005691dd in AuxiliaryProcessMain (argc=2, argv=0x7fffce90)
at bootstrap.c:456

(gdb) bt
#0  _kevent () at _kevent.S:3
#1  0x007b58a1 in WaitEventAdjustKqueue (set=0x800e6a120, 
event=0x800e6a170, old_events=) at latch.c:1034
#2  0x007b4d87 in AddWaitEventToSet (set=, 
events=, 
fd=-1, latch=, user_data=) at latch.c:778
#3  WaitLatchOrSocket (latch=0x80a147e58, wakeEvents=41, sock=-1, 
timeout=5000, wait_event_info=83886093) at latch.c:410
#4  0x0075b349 in WalWriterMain () at walwriter.c:256
#5  0x005691ec in AuxiliaryProcessMain (argc=2, argv=0x7fffce90)
at bootstrap.c:467

Note that these are just snapshots --- it looks like these processes
are repeatedly creating and destroying WaitEventSets, they're not
stuck inside the kernel.

regards, tom lane




Re: [HACKERS] kqueue

2020-01-20 Thread Thomas Munro
On Tue, Jan 21, 2020 at 2:34 AM Peter Eisentraut
 wrote:
> I took this patch for a quick spin on macOS.  The result was that the
> test suite hangs in the test src/test/recovery/t/017_shm.pl.  I didn't
> see any mentions of this anywhere in the thread, but that test is newer
> than the beginning of this thread.  Can anyone confirm or deny this
> issue?  Is it specific to macOS perhaps?

Thanks for testing, and sorry I didn't run a full check-world after
that rebase.  What happened here is that after commit cfdf4dc4 landed
on master, every implementation now needs to check for
exit_on_postmaster_death, and this patch didn't get the message.
Those processes are stuck in their main loops having detected
postmaster death, but not having any handling for it.  Will fix.




Re: [HACKERS] kqueue

2020-01-20 Thread Tom Lane
Thomas Munro  writes:
> [ 0001-Add-kqueue-2-support-for-WaitEventSet-v13.patch ]

I haven't read this patch in any detail, but a couple quick notes:

* It needs to be rebased over the removal of pg_config.h.win32
--- it should be touching Solution.pm instead, I believe.

* I'm disturbed by the addition of a hunk to the supposedly
system-API-independent WaitEventSetWait() function.  Is that
a generic bug fix?  If not, can we either get rid of it, or
at least wrap it in "#ifdef WAIT_USE_KQUEUE" so that this
patch isn't inflicting a performance penalty on everyone else?

regards, tom lane




Re: [HACKERS] kqueue

2020-01-20 Thread Tom Lane
Peter Eisentraut  writes:
> I took this patch for a quick spin on macOS.  The result was that the 
> test suite hangs in the test src/test/recovery/t/017_shm.pl.  I didn't 
> see any mentions of this anywhere in the thread, but that test is newer 
> than the beginning of this thread.  Can anyone confirm or deny this 
> issue?  Is it specific to macOS perhaps?

Yeah, I duplicated the problem in macOS Catalina (10.15.2), using today's
HEAD.  The core regression tests pass, as do the earlier recovery tests
(I didn't try a full check-world though).  Somewhere early in 017_shm.pl,
things freeze up with four postmaster-child processes stuck in 100%-
CPU-consuming loops.  I captured stack traces:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x7fff6554dbb6 libsystem_kernel.dylib`kqueue + 10
frame #1: 0x000105511533 
postgres`CreateWaitEventSet(context=, nevents=) at 
latch.c:622:19 [opt]
frame #2: 0x000105511305 
postgres`WaitLatchOrSocket(latch=0x000112e02da4, wakeEvents=41, sock=-1, 
timeout=237000, wait_event_info=83886084) at latch.c:389:22 [opt]
frame #3: 0x0001054a7073 postgres`CheckpointerMain at 
checkpointer.c:514:10 [opt]
frame #4: 0x0001052da390 postgres`AuxiliaryProcessMain(argc=2, 
argv=0x7ffeea9dded0) at bootstrap.c:461:4 [opt]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x7fff6554dbce libsystem_kernel.dylib`kevent + 10
frame #1: 0x000105511ddc 
postgres`WaitEventAdjustKqueue(set=0x7fc8e8805920, 
event=0x7fc8e8805958, old_events=) at latch.c:1034:7 [opt]
frame #2: 0x000105511638 postgres`AddWaitEventToSet(set=, 
events=, fd=, latch=, 
user_data=) at latch.c:778:2 [opt]
frame #3: 0x000105511342 
postgres`WaitLatchOrSocket(latch=0x000112e030f4, wakeEvents=41, sock=-1, 
timeout=200, wait_event_info=83886083) at latch.c:397:3 [opt]
frame #4: 0x0001054a6d69 postgres`BackgroundWriterMain at 
bgwriter.c:304:8 [opt]
frame #5: 0x0001052da38b postgres`AuxiliaryProcessMain(argc=2, 
argv=0x7ffeea9dded0) at bootstrap.c:456:4 [opt]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x7fff65549c66 libsystem_kernel.dylib`close + 10
frame #1: 0x000105511466 postgres`WaitLatchOrSocket [inlined] 
FreeWaitEventSet(set=) at latch.c:660:2 [opt]
frame #2: 0x00010551145d 
postgres`WaitLatchOrSocket(latch=0x000112e03444, wakeEvents=, 
sock=-1, timeout=5000, wait_event_info=83886093) at latch.c:432 [opt]
frame #3: 0x0001054b8685 postgres`WalWriterMain at walwriter.c:256:10 
[opt]
frame #4: 0x0001052da39a postgres`AuxiliaryProcessMain(argc=2, 
argv=0x7ffeea9dded0) at bootstrap.c:467:4 [opt]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x7fff655515be libsystem_kernel.dylib`__select + 10
frame #1: 0x0001056a6191 postgres`pg_usleep(microsec=) at 
pgsleep.c:56:10 [opt]
frame #2: 0x0001054abe12 postgres`backend_read_statsfile at 
pgstat.c:5720:3 [opt]
frame #3: 0x0001054adcc0 
postgres`pgstat_fetch_stat_dbentry(dbid=) at pgstat.c:2431:2 [opt]
frame #4: 0x0001054a320c postgres`do_start_worker at 
autovacuum.c:1248:20 [opt]
frame #5: 0x0001054a2639 postgres`AutoVacLauncherMain [inlined] 
launch_worker(now=632853327674576) at autovacuum.c:1357:9 [opt]
frame #6: 0x0001054a2634 
postgres`AutoVacLauncherMain(argc=, argv=) at 
autovacuum.c:769 [opt]
frame #7: 0x0001054a1ea7 postgres`StartAutoVacLauncher at 
autovacuum.c:415:4 [opt]

I'm not sure how much faith to put in the last couple of those, as
stopping the earlier processes could perhaps have had side-effects.
But evidently 017_shm.pl is doing something that interferes with
our ability to create kqueue-based WaitEventSets.

regards, tom lane




Re: [HACKERS] kqueue

2020-01-20 Thread Peter Eisentraut

On 2019-12-20 01:26, Thomas Munro wrote:

It's still my intention to get this committed eventually, but I got a
bit frazzled by conflicting reports on several operating systems.  For
FreeBSD, performance was improved in many cases, but there were also
some regressions that seemed to be related to ongoing work in the
kernel that seemed worth waiting for.  I don't have the details
swapped into my brain right now, but there was something about a big
kernel lock for Unix domain sockets which possibly explained some
local pgbench problems, and there was also a problem relating to
wakeup priority with some test parameters, which I'd need to go and
dig up.  If you want to test this and let us know how you get on,
that'd be great!  Here's a rebase against PostgreSQL's master branch,


I took this patch for a quick spin on macOS.  The result was that the 
test suite hangs in the test src/test/recovery/t/017_shm.pl.  I didn't 
see any mentions of this anywhere in the thread, but that test is newer 
than the beginning of this thread.  Can anyone confirm or deny this 
issue?  Is it specific to macOS perhaps?


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] kqueue

2020-01-17 Thread Rui DeSousa
Thanks Thomas,

Just a quick update.

I just deployed this patch into a lower environment yesterday running FreeBSD 
12.1 and PostgreSQL 11.6.  I see a significant reduction is CPU/system load 
from load highs of 500+ down to the low 20’s.  System CPU time has been reduced 
to practically nothing.  

I’m working with our support vendor in testing the patch and will continue to 
let it burn in.  Hopefully, we can get the patched committed.  Thanks.

> On Dec 19, 2019, at 7:26 PM, Thomas Munro  wrote:
> 
> It's still my intention to get this committed eventually, but I got a
> bit frazzled by conflicting reports on several operating systems.  For
> FreeBSD, performance was improved in many cases, but there were also
> some regressions that seemed to be related to ongoing work in the
> kernel that seemed worth waiting for.  I don't have the details
> swapped into my brain right now, but there was something about a big
> kernel lock for Unix domain sockets which possibly explained some
> local pgbench problems, and there was also a problem relating to
> wakeup priority with some test parameters, which I'd need to go and
> dig up.  If you want to test this and let us know how you get on,
> that'd be great!  Here's a rebase against PostgreSQL's master branch,
> and since you mentioned PostgreSQL 11, here's a rebased version for
> REL_11_STABLE in case that's easier for you to test/build via ports or
> whatever and test with your production workload (eg on a throwaway
> copy of your production system).  You can see it's working by looking
> in top: instead of state "select" (which is how poll() is reported)
> you see "kqread", which on its own isn't exciting enough to get this
> committed :-)
> 





Re: [HACKERS] kqueue

2019-12-19 Thread Thomas Munro
On Fri, Dec 20, 2019 at 1:26 PM Thomas Munro  wrote:
> On Fri, Dec 20, 2019 at 12:41 PM Rui DeSousa  wrote:
> > PostgreSQL 11

BTW, PostgreSQL 12 has an improvement that may be relevant for your
case: it suppresses a bunch of high frequency reads on the "postmaster
death" pipe in some scenarios, mainly the streaming replica replay
loop (if you build on a system new enough to have PROC_PDEATHSIG_CTL,
namely FreeBSD 11.2+, it doesn't bother reading the pipe unless it's
received a signal).  That pipe is inherited by every process and
included in every poll() set.  The kqueue patch doesn't even bother to
add it to the wait event set, preferring to use an EVFILT_PROC event,
so in theory we could get rid of the death pipe completely on FreeBSD
and rely on EVFILT_PROC (sleeping) and PDEATHSIG (while awake), but I
wouldn't want to make the code diverge from the Linux code too much,
so I figured we should leave the pipe in place but just avoid
accessing it when possible, if that makes sense.




Re: [HACKERS] kqueue

2019-12-19 Thread Thomas Munro
On Fri, Dec 20, 2019 at 12:41 PM Rui DeSousa  wrote:
> I’m instrested in the kqueue patch and would like to know its current state 
> and possible timeline for inclusion in the base code.  I have several large 
> FreeBSD systems running PostgreSQL 11 that I believe currently displays this 
> issue.  The system has 88 vCPUs, 512GB Ram, and very active application with 
> over 1000 connections to the database.  The system exhibits high kernel CPU 
> usage servicing poll() for connections that are idle.

Hi Rui,

It's still my intention to get this committed eventually, but I got a
bit frazzled by conflicting reports on several operating systems.  For
FreeBSD, performance was improved in many cases, but there were also
some regressions that seemed to be related to ongoing work in the
kernel that seemed worth waiting for.  I don't have the details
swapped into my brain right now, but there was something about a big
kernel lock for Unix domain sockets which possibly explained some
local pgbench problems, and there was also a problem relating to
wakeup priority with some test parameters, which I'd need to go and
dig up.  If you want to test this and let us know how you get on,
that'd be great!  Here's a rebase against PostgreSQL's master branch,
and since you mentioned PostgreSQL 11, here's a rebased version for
REL_11_STABLE in case that's easier for you to test/build via ports or
whatever and test with your production workload (eg on a throwaway
copy of your production system).  You can see it's working by looking
in top: instead of state "select" (which is how poll() is reported)
you see "kqread", which on its own isn't exciting enough to get this
committed :-)

PS Here's a list of slow burner PostgreSQL/FreeBSD projects:
https://wiki.postgresql.org/wiki/FreeBSD


0001-Add-kqueue-2-support-for-WaitEventSet-v13.patch
Description: Binary data


0001-Add-kqueue-2-support-for-WaitEvent-v13-REL_11_STABLE.patch
Description: Binary data


Re: [HACKERS] kqueue

2019-12-19 Thread Rui DeSousa


> On Apr 10, 2018, at 9:05 PM, Thomas Munro  
> wrote:
> 
> On Wed, Dec 6, 2017 at 12:53 AM, Thomas Munro
>  wrote:
>> On Thu, Jun 22, 2017 at 7:19 PM, Thomas Munro
>>  wrote:
>>> I don't plan to resubmit this patch myself, but I was doing some
>>> spring cleaning and rebasing today and I figured it might be worth
>>> quietly leaving a working patch here just in case anyone from the
>>> various BSD communities is interested in taking the idea further.
> 
> I heard through the grapevine of some people currently investigating
> performance problems on busy FreeBSD systems, possibly related to the
> postmaster pipe.  I suspect this patch might be a part of the solution
> (other patches probably needed to get maximum value out of this patch:
> reuse WaitEventSet objects in some key places, and get rid of high
> frequency PostmasterIsAlive() read() calls).  The autoconf-fu in the
> last version bit-rotted so it seemed like a good time to post a
> rebased patch.
> 
> -- 
> Thomas Munro
> http://www.enterprisedb.com
> 

Hi, 

I’m instrested in the kqueue patch and would like to know its current state and 
possible timeline for inclusion in the base code.  I have several large FreeBSD 
systems running PostgreSQL 11 that I believe currently displays this issue.  
The system has 88 vCPUs, 512GB Ram, and very active application with over 1000 
connections to the database.  The system exhibits high kernel CPU usage 
servicing poll() for connections that are idle.   

I’ve being testing pg_bouncer to reduce the number of connections and thus 
system CPU usage; however, not all connections can go through pg_bouncer. 

Thanks,
Rui.



Re: [HACKERS] kqueue

2018-10-09 Thread Thomas Munro
On Tue, Oct 2, 2018 at 6:28 AM Andres Freund  wrote:
> On 2018-10-01 19:25:45 +0200, Matteo Beccati wrote:
> > On 01/10/2018 01:09, Thomas Munro wrote:
> > > I don't know why the existence of the kqueue should make recvfrom()
> > > slower on the pgbench side.  That's probably something to look into
> > > off-line with some FreeBSD guru help.  Degraded performance for
> > > clients on the same machine does seem to be a show stopper for this
> > > patch for now.  Thanks for testing!
> >
> > Glad to be helpful!
> >
> > I've tried running pgbench from a separate VM and in fact kqueue
> > consistently takes the lead with 5-10% more tps on select/prepared pgbench
> > on NetBSD too.
> >
> > What I have observed is that sys cpu usage is ~65% (35% idle) with kqueue,
> > while unpatched master averages at 55% (45% idle): relatively speaking
> > that's almost 25% less idle cpu available for a local pgbench to do its own
> > stuff.
>
> This suggest that either the the wakeup logic between kqueue and poll,
> or the internal locking could be at issue.  Is it possible that poll
> triggers a directed wakeup path, but kqueue doesn't?

I am following up with some kernel hackers.  In the meantime, here is
a rebase for the new split-line configure.in, to turn cfbot green.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Add-kqueue-2-support-for-WaitEventSet-v12.patch
Description: Binary data


Re: [HACKERS] kqueue

2018-10-01 Thread Andres Freund
On 2018-10-01 19:25:45 +0200, Matteo Beccati wrote:
> On 01/10/2018 01:09, Thomas Munro wrote:
> > I don't know why the existence of the kqueue should make recvfrom()
> > slower on the pgbench side.  That's probably something to look into
> > off-line with some FreeBSD guru help.  Degraded performance for
> > clients on the same machine does seem to be a show stopper for this
> > patch for now.  Thanks for testing!
> 
> Glad to be helpful!
> 
> I've tried running pgbench from a separate VM and in fact kqueue
> consistently takes the lead with 5-10% more tps on select/prepared pgbench
> on NetBSD too.
> 
> What I have observed is that sys cpu usage is ~65% (35% idle) with kqueue,
> while unpatched master averages at 55% (45% idle): relatively speaking
> that's almost 25% less idle cpu available for a local pgbench to do its own
> stuff.

This suggest that either the the wakeup logic between kqueue and poll,
or the internal locking could be at issue.  Is it possible that poll
triggers a directed wakeup path, but kqueue doesn't?

Greetings,

Andres Freund



Re: [HACKERS] kqueue

2018-09-30 Thread Thomas Munro
On Sun, Sep 30, 2018 at 9:49 PM Matteo Beccati  wrote:
> On 30/09/2018 04:36, Thomas Munro wrote:
> > On Sat, Sep 29, 2018 at 7:51 PM Matteo Beccati  wrote:
> >> Out of curiosity, I've installed FreBSD on an identically specced VM,
> >> and the select benchmark was ~75k tps for kqueue vs ~90k tps on
> >> unpatched master, so maybe there's something wrong I'm doing when
> >> benchmarking. Could you please provide proper instructions?
> >
> > Ouch.  What kind of virtualisation is this?  Which version of FreeBSD?
> >  Not sure if it's relevant, but do you happen to see gettimeofday()
> > showing up as a syscall, if you truss a backend running pgbench?
>
> I downloaded 11.2 as VHD file in order to run on MS Hyper-V / Win10 Pro.
>
> Yes, I saw plenty of gettimeofday calls when running truss:
>
> > gettimeofday({ 1538297117.071344 },0x0)  = 0 (0x0)
> > gettimeofday({ 1538297117.071743 },0x0)  = 0 (0x0)
> > gettimeofday({ 1538297117.072021 },0x0)  = 0 (0x0)

Ok.  Those syscalls show up depending on your
kern.timecounter.hardware setting and virtualised hardware: just like
on Linux, gettimeofday() can be a cheap userspace operation (vDSO)
that avoids the syscall path, or not.  I'm not seeing any reason to
think that's relevant here.

> > getpid() = 766 (0x2fe)
> > __sysctl(0x7fffce90,0x4,0x0,0x0,0x801891000,0x2b) = 0 (0x0)
> > gettimeofday({ 1538297117.072944 },0x0)  = 0 (0x0)
> > getpid() = 766 (0x2fe)
> > __sysctl(0x7fffce90,0x4,0x0,0x0,0x801891000,0x29) = 0 (0x0)

That's setproctitle().  Those syscalls go away if you use FreeBSD 12
(which has setproctitle_fast()).  If you fix both of those problems,
you are left with just:

> > sendto(9,"2\0\0\0\^DT\0\0\0!\0\^Aabalance"...,71,0,NULL,0) = 71 (0x47)
> > recvfrom(9,"B\0\0\0\^\\0P0_1\0\0\0\0\^A\0\0"...,8192,0,NULL,0x0) = 51 (0x33)

These are the only syscalls I see for each pgbench -S transaction on
my bare metal machine: just the network round trip.  The funny thing
is ... there are almost no kevent() calls.

I managed to reproduce the regression (~70k -> ~50k) using a prewarmed
scale 10 select-only pgbench with 2GB of shared_buffers (so it all
fits), with -j 96 -c 96 on an 8 vCPU AWS t2.2xlarge running FreeBSD 12
ALPHA8.  Here is what truss -c says, capturing data from one backend
for about 10 seconds:

syscall seconds   calls  errors
sendto  0.3968401463452   0
recvfrom0.4158020293443   6
kevent  0.000626393   6   0
gettimeofday2.723923249   24053   0
  - --- ---
3.537191817   30954   6

(There's no regression with -j 8 -c 8, the problem is when
significantly overloaded, the same circumstances under which Matheusz
reported a great improvement).  So... it's very rarely accessing the
kqueue directly... but its existence somehow slows things down.
Curiously, when using poll() it's actually calling poll() ~90/sec for
me:

syscall seconds   calls  errors
sendto  0.3527848083226   0
recvfrom0.6148552544125 916
poll0.319396480 916   0
gettimeofday2.659035352   22456   0
  - --- ---
3.946071894   30723 916

I don't know what's going on here.  Based on the reports so far, we
know that kqueue gives a speedup when using bare metal with pgbench
running on a different machine, but a slowdown when using
virtualisation and pgbench running on the same machine (and I just
checked that that's observable with both Unix sockets and TCP
sockets).  That gave me the idea of looking at pgbench itself:

Unpatched:

syscall seconds   calls  errors
ppoll   0.004869268   1   0
sendto 16.4894169117033   0
recvfrom   21.1376062387049   0
  - --- ---
   37.631892417   14083   0

Patched:

syscall seconds   calls  errors
ppoll   0.002773195   1   0
sendto 16.5978804687217   0
recvfrom   25.6464060087238   0
  - --- ---
   42.247059671   14456   0

I don't know why the existence of the kqueue should make recvfrom()
slower on the pgbench side.  That's probably something to look into
off-line with some FreeBSD guru help.  Degraded performance for
clients on the same machine does seem to be a show stopper for this
patch for now.  Thanks for testing!

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] kqueue

2018-09-30 Thread Matteo Beccati
Hi Thomas,

On 30/09/2018 04:36, Thomas Munro wrote:
> On Sat, Sep 29, 2018 at 7:51 PM Matteo Beccati  wrote:
>> Out of curiosity, I've installed FreBSD on an identically specced VM,
>> and the select benchmark was ~75k tps for kqueue vs ~90k tps on
>> unpatched master, so maybe there's something wrong I'm doing when
>> benchmarking. Could you please provide proper instructions?
> 
> Ouch.  What kind of virtualisation is this?  Which version of FreeBSD?
>  Not sure if it's relevant, but do you happen to see gettimeofday()
> showing up as a syscall, if you truss a backend running pgbench?

I downloaded 11.2 as VHD file in order to run on MS Hyper-V / Win10 Pro.

Yes, I saw plenty of gettimeofday calls when running truss:

> gettimeofday({ 1538297117.071344 },0x0)  = 0 (0x0)
> gettimeofday({ 1538297117.071743 },0x0)  = 0 (0x0)
> gettimeofday({ 1538297117.072021 },0x0)  = 0 (0x0)
> getpid() = 766 (0x2fe)
> __sysctl(0x7fffce90,0x4,0x0,0x0,0x801891000,0x2b) = 0 (0x0)
> gettimeofday({ 1538297117.072944 },0x0)  = 0 (0x0)
> getpid() = 766 (0x2fe)
> __sysctl(0x7fffce90,0x4,0x0,0x0,0x801891000,0x29) = 0 (0x0)
> gettimeofday({ 1538297117.073682 },0x0)  = 0 (0x0)
> sendto(9,"2\0\0\0\^DT\0\0\0!\0\^Aabalance"...,71,0,NULL,0) = 71 (0x47)
> recvfrom(9,"B\0\0\0\^\\0P0_1\0\0\0\0\^A\0\0"...,8192,0,NULL,0x0) = 51 (0x33)
> gettimeofday({ 1538297117.074955 },0x0)  = 0 (0x0)
> gettimeofday({ 1538297117.075308 },0x0)  = 0 (0x0)
> getpid() = 766 (0x2fe)
> __sysctl(0x7fffce90,0x4,0x0,0x0,0x801891000,0x29) = 0 (0x0)
> gettimeofday({ 1538297117.076252 },0x0)  = 0 (0x0)
> gettimeofday({ 1538297117.076431 },0x0)  = 0 (0x0)
> gettimeofday({ 1538297117.076678 },0x0^C)= 0 (0x0)



Cheers
-- 
Matteo Beccati

Development & Consulting - http://www.beccati.com/



Re: [HACKERS] kqueue

2018-09-29 Thread Thomas Munro
On Sat, Sep 29, 2018 at 7:51 PM Matteo Beccati  wrote:
> On 28/09/2018 14:19, Thomas Munro wrote:
> > On Fri, Sep 28, 2018 at 11:09 AM Andres Freund  wrote:
> >> On 2018-09-28 10:55:13 +1200, Thomas Munro wrote:
> >>> Matteo Beccati reported a 5-10% performance drop on a
> >>> low-end Celeron NetBSD box which we have no explanation for, and we
> >>> have no reports from server-class machines on that OS -- so perhaps we
> >>> (or the NetBSD port?) should consider building with WAIT_USE_POLL on
> >>> NetBSD until someone can figure out what needs to be fixed there
> >>> (possibly on the NetBSD side)?
> >>
> >> Yea, I'm not too worried about that. It'd be great to test that, but
> >> otherwise I'm also ok to just plonk that into the template.
> >
> > Thanks for the review!  Ok, if we don't get a better idea I'll put
> > this in src/template/netbsd:
> >
> > CPPFLAGS="$CPPFLAGS -DWAIT_USE_POLL"
>
> A quick test on a 8 vCPU / 4GB RAM virtual machine running a fresh
> install of NetBSD 8.0 again shows that kqueue is consistently slower
> running pgbench vs unpatched master on tcp-b like pgbench workloads:
>
> ~1200tps vs ~1400tps w/ 96 clients and threads, scale factor 10
>
> while on select only benchmarks the difference is below the noise floor,
> with both doing roughly the same ~30k tps.
>
> Out of curiosity, I've installed FreBSD on an identically specced VM,
> and the select benchmark was ~75k tps for kqueue vs ~90k tps on
> unpatched master, so maybe there's something wrong I'm doing when
> benchmarking. Could you please provide proper instructions?

Ouch.  What kind of virtualisation is this?  Which version of FreeBSD?
 Not sure if it's relevant, but do you happen to see gettimeofday()
showing up as a syscall, if you truss a backend running pgbench?

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] kqueue

2018-09-29 Thread Matteo Beccati
On 28/09/2018 14:19, Thomas Munro wrote:
> On Fri, Sep 28, 2018 at 11:09 AM Andres Freund  wrote:
>> On 2018-09-28 10:55:13 +1200, Thomas Munro wrote:
>>> Matteo Beccati reported a 5-10% performance drop on a
>>> low-end Celeron NetBSD box which we have no explanation for, and we
>>> have no reports from server-class machines on that OS -- so perhaps we
>>> (or the NetBSD port?) should consider building with WAIT_USE_POLL on
>>> NetBSD until someone can figure out what needs to be fixed there
>>> (possibly on the NetBSD side)?
>>
>> Yea, I'm not too worried about that. It'd be great to test that, but
>> otherwise I'm also ok to just plonk that into the template.
> 
> Thanks for the review!  Ok, if we don't get a better idea I'll put
> this in src/template/netbsd:
> 
> CPPFLAGS="$CPPFLAGS -DWAIT_USE_POLL"

A quick test on a 8 vCPU / 4GB RAM virtual machine running a fresh
install of NetBSD 8.0 again shows that kqueue is consistently slower
running pgbench vs unpatched master on tcp-b like pgbench workloads:

~1200tps vs ~1400tps w/ 96 clients and threads, scale factor 10

while on select only benchmarks the difference is below the noise floor,
with both doing roughly the same ~30k tps.

Out of curiosity, I've installed FreBSD on an identically specced VM,
and the select benchmark was ~75k tps for kqueue vs ~90k tps on
unpatched master, so maybe there's something wrong I'm doing when
benchmarking. Could you please provide proper instructions?


Cheers
-- 
Matteo Beccati

Development & Consulting - http://www.beccati.com/



Re: [HACKERS] kqueue

2018-09-28 Thread Thomas Munro
On Fri, Sep 28, 2018 at 11:09 AM Andres Freund  wrote:
> On 2018-09-28 10:55:13 +1200, Thomas Munro wrote:
> > Matteo Beccati reported a 5-10% performance drop on a
> > low-end Celeron NetBSD box which we have no explanation for, and we
> > have no reports from server-class machines on that OS -- so perhaps we
> > (or the NetBSD port?) should consider building with WAIT_USE_POLL on
> > NetBSD until someone can figure out what needs to be fixed there
> > (possibly on the NetBSD side)?
>
> Yea, I'm not too worried about that. It'd be great to test that, but
> otherwise I'm also ok to just plonk that into the template.

Thanks for the review!  Ok, if we don't get a better idea I'll put
this in src/template/netbsd:

CPPFLAGS="$CPPFLAGS -DWAIT_USE_POLL"

> > @@ -576,6 +592,10 @@ CreateWaitEventSet(MemoryContext context, int nevents)
> >   if (fcntl(set->epoll_fd, F_SETFD, FD_CLOEXEC) == -1)
> >   elog(ERROR, "fcntl(F_SETFD) failed on epoll descriptor: %m");
> >  #endif   /* 
> > EPOLL_CLOEXEC */
> > +#elif defined(WAIT_USE_KQUEUE)
> > + set->kqueue_fd = kqueue();
> > + if (set->kqueue_fd < 0)
> > + elog(ERROR, "kqueue failed: %m");
> >  #elif defined(WAIT_USE_WIN32)
>
> Is this automatically opened with some FD_CLOEXEC equivalent?

No.  Hmm, I thought it wasn't necessary because kqueue descriptors are
not inherited and backends don't execve() directly without forking,
but I guess it can't hurt to add a fcntl() call.  Done.

> > + *((WaitEvent **)(_ev->udata)) = event;
>
> I'm mildly inclined to hide that behind a macro, so the other places
> have a reference, via the macro definition, to this too.

Done.

> > + if (rc < 0 && event->events == WL_POSTMASTER_DEATH && errno == ESRCH)
> > + {
> > + /*
> > +  * The postmaster is already dead.  Defer reporting this to 
> > the caller
> > +  * until wait time, for compatibility with the other 
> > implementations.
> > +  * To do that we will now add the regular alive pipe.
> > +  */
> > + WaitEventAdjustKqueueAdd(_ev[0], EVFILT_READ, EV_ADD, 
> > event);
> > + rc = kevent(set->kqueue_fd, _ev[0], count, NULL, 0, NULL);
> > + }
>
> That's, ... not particulary pretty. Kinda wonder if we shouldn't instead
> just add a 'pending_events' field, that we can check at wait time.

Done.

> > +/* Define to 1 if you have the `kqueue' function. */
> > +#undef HAVE_KQUEUE
> > +

> Should adjust pg_config.win32.h too.

Done.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Add-kqueue-2-support-for-WaitEventSet-v11.patch
Description: Binary data


Re: [HACKERS] kqueue

2018-09-28 Thread Matteo Beccati
Hi Thomas,

On 28/09/2018 00:55, Thomas Munro wrote:
> I would like to commit this patch for PostgreSQL 12, based on this
> report.  We know it helps performance on macOS developer machines and
> big FreeBSD servers, and it is the right kernel interface for the job
> on principle.  Matteo Beccati reported a 5-10% performance drop on a
> low-end Celeron NetBSD box which we have no explanation for, and we
> have no reports from server-class machines on that OS -- so perhaps we
> (or the NetBSD port?) should consider building with WAIT_USE_POLL on
> NetBSD until someone can figure out what needs to be fixed there
> (possibly on the NetBSD side)?

Thanks for keeping me in the loop.

Out of curiosity (and time permitting) I'll try to spin up a NetBSD 8 VM
and run some benchmarks, but I guess we should leave it up to the pkgsrc
people to eventually change the build flags.


Cheers
-- 
Matteo Beccati

Development & Consulting - http://www.beccati.com/



Re: [HACKERS] kqueue

2018-09-27 Thread Andres Freund
Hi,

On 2018-09-28 10:55:13 +1200, Thomas Munro wrote:
> On Tue, May 22, 2018 at 12:07 PM Thomas Munro
>  wrote:
> > On Mon, May 21, 2018 at 7:27 PM, Mateusz Guzik  wrote:
> > > I have benchmarked the change on a FreeBSD box and found an big
> > > performance win once the number of clients goes beyond the number of
> > > hardware threads on the target machine. For smaller number of clients
> > > the win was very modest.
> >
> > So to summarise your results:
> >
> > 32 connections: ~445k -> ~450k = +1.2%
> > 64 connections: ~416k -> ~544k = +30.7%
> > 96 connections: ~331k -> ~508k = +53.6%
> 
> I would like to commit this patch for PostgreSQL 12, based on this
> report.  We know it helps performance on macOS developer machines and
> big FreeBSD servers, and it is the right kernel interface for the job
> on principle.

Seems reasonable.


> Matteo Beccati reported a 5-10% performance drop on a
> low-end Celeron NetBSD box which we have no explanation for, and we
> have no reports from server-class machines on that OS -- so perhaps we
> (or the NetBSD port?) should consider building with WAIT_USE_POLL on
> NetBSD until someone can figure out what needs to be fixed there
> (possibly on the NetBSD side)?

Yea, I'm not too worried about that. It'd be great to test that, but
otherwise I'm also ok to just plonk that into the template.

> @@ -576,6 +592,10 @@ CreateWaitEventSet(MemoryContext context, int nevents)
>   if (fcntl(set->epoll_fd, F_SETFD, FD_CLOEXEC) == -1)
>   elog(ERROR, "fcntl(F_SETFD) failed on epoll descriptor: %m");
>  #endif   /* 
> EPOLL_CLOEXEC */
> +#elif defined(WAIT_USE_KQUEUE)
> + set->kqueue_fd = kqueue();
> + if (set->kqueue_fd < 0)
> + elog(ERROR, "kqueue failed: %m");
>  #elif defined(WAIT_USE_WIN32)

Is this automatically opened with some FD_CLOEXEC equivalent?


> +static inline void
> +WaitEventAdjustKqueueAdd(struct kevent *k_ev, int filter, int action,
> +  WaitEvent *event)
> +{
> + k_ev->ident = event->fd;
> + k_ev->filter = filter;
> + k_ev->flags = action | EV_CLEAR;
> + k_ev->fflags = 0;
> + k_ev->data = 0;
> +
> + /*
> +  * On most BSD family systems, udata is of type void * so we could 
> simply
> +  * assign event to it without casting, or use the EV_SET macro instead 
> of
> +  * filling in the struct manually.  Unfortunately, NetBSD and possibly
> +  * others have it as intptr_t, so here we wallpaper over that difference
> +  * with an unsightly lvalue cast.
> +  */
> + *((WaitEvent **)(_ev->udata)) = event;

I'm mildly inclined to hide that behind a macro, so the other places
have a reference, via the macro definition, to this too.

> + if (rc < 0 && event->events == WL_POSTMASTER_DEATH && errno == ESRCH)
> + {
> + /*
> +  * The postmaster is already dead.  Defer reporting this to the 
> caller
> +  * until wait time, for compatibility with the other 
> implementations.
> +  * To do that we will now add the regular alive pipe.
> +  */
> + WaitEventAdjustKqueueAdd(_ev[0], EVFILT_READ, EV_ADD, event);
> + rc = kevent(set->kqueue_fd, _ev[0], count, NULL, 0, NULL);
> + }

That's, ... not particulary pretty. Kinda wonder if we shouldn't instead
just add a 'pending_events' field, that we can check at wait time.

> diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
> index 90dda8ea050..4bcabc3b381 100644
> --- a/src/include/pg_config.h.in
> +++ b/src/include/pg_config.h.in
> @@ -330,6 +330,9 @@
>  /* Define to 1 if you have isinf(). */
>  #undef HAVE_ISINF
>  
> +/* Define to 1 if you have the `kqueue' function. */
> +#undef HAVE_KQUEUE
> +
>  /* Define to 1 if you have the  header file. */
>  #undef HAVE_LANGINFO_H
>  
> @@ -598,6 +601,9 @@
>  /* Define to 1 if you have the  header file. */
>  #undef HAVE_SYS_EPOLL_H
>  
> +/* Define to 1 if you have the  header file. */
> +#undef HAVE_SYS_EVENT_H
> +
>  /* Define to 1 if you have the  header file. */
>  #undef HAVE_SYS_IPC_H

Should adjust pg_config.win32.h too.

Greetings,

Andres Freund



Re: [HACKERS] kqueue

2018-09-27 Thread Thomas Munro
On Tue, May 22, 2018 at 12:07 PM Thomas Munro
 wrote:
> On Mon, May 21, 2018 at 7:27 PM, Mateusz Guzik  wrote:
> > I have benchmarked the change on a FreeBSD box and found an big
> > performance win once the number of clients goes beyond the number of
> > hardware threads on the target machine. For smaller number of clients
> > the win was very modest.
>
> So to summarise your results:
>
> 32 connections: ~445k -> ~450k = +1.2%
> 64 connections: ~416k -> ~544k = +30.7%
> 96 connections: ~331k -> ~508k = +53.6%

I would like to commit this patch for PostgreSQL 12, based on this
report.  We know it helps performance on macOS developer machines and
big FreeBSD servers, and it is the right kernel interface for the job
on principle.  Matteo Beccati reported a 5-10% performance drop on a
low-end Celeron NetBSD box which we have no explanation for, and we
have no reports from server-class machines on that OS -- so perhaps we
(or the NetBSD port?) should consider building with WAIT_USE_POLL on
NetBSD until someone can figure out what needs to be fixed there
(possibly on the NetBSD side)?

Here's a rebased patch, which I'm adding to the to November CF to give
people time to retest, object, etc if they want to.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Add-kqueue-support-to-WaitEventSet-v10.patch
Description: Binary data


Re: [HACKERS] kqueue

2018-05-21 Thread Thomas Munro
On Mon, May 21, 2018 at 7:27 PM, Mateusz Guzik  wrote:
> I have benchmarked the change on a FreeBSD box and found an big
> performance win once the number of clients goes beyond the number of
> hardware threads on the target machine. For smaller number of clients
> the win was very modest.

Thanks for the report!  This is good news for the patch, if we can
explain a few mysteries.

> 3 variants were tested:
> - stock 10.3
> - stock 10.3 + pdeathsig
> - stock 10.3 + pdeathsig + kqueue

For the record, "pdeathsig" refers to another patch of mine[1] that is
not relevant to this test (it's a small change in the recovery loop,
important for replication but not even reached here).

> [a bunch of neat output from ministat]

So to summarise your results:

32 connections: ~445k -> ~450k = +1.2%
64 connections: ~416k -> ~544k = +30.7%
96 connections: ~331k -> ~508k = +53.6%

As you added more connections above your thread count, stock 10.3's
TPS number went down, but with the patch it went up.  So now we have
to explain why you see a huge performance boost but others reported a
modest gain or in some cases loss.  The main things that jump out:

1.  You used TCP sockets and ran pgbench on another machine, while
others used Unix domain sockets.
2.  You're running a newer/bleeding edge kernel.
3.  You used more CPUs than most reporters.

For the record, Mateusz and others discovered some fixable global lock
contention in the Unix domain socket layer that is now being hacked
on[2], though it's not clear if that'd affect the results reported
earlier or not.

[1] 
https://www.postgresql.org/message-id/CAEepm%3D0w9AAHAH73-tkZ8VS2Lg6JzY4ii3TG7t-R%2B_MWyUAk9g%40mail.gmail.com
[2] https://reviews.freebsd.org/D15430

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] kqueue

2018-05-21 Thread Mateusz Guzik
On Mon, May 21, 2018 at 9:03 AM, Thomas Munro  wrote:

> On Wed, Apr 11, 2018 at 1:05 PM, Thomas Munro
>  wrote:
> > I heard through the grapevine of some people currently investigating
> > performance problems on busy FreeBSD systems, possibly related to the
> > postmaster pipe.  I suspect this patch might be a part of the solution
> > (other patches probably needed to get maximum value out of this patch:
> > reuse WaitEventSet objects in some key places, and get rid of high
> > frequency PostmasterIsAlive() read() calls).  The autoconf-fu in the
> > last version bit-rotted so it seemed like a good time to post a
> > rebased patch.
>
>
Hi everyone,

I have benchmarked the change on a FreeBSD box and found an big
performance win once the number of clients goes beyond the number of
hardware threads on the target machine. For smaller number of clients
the win was very modest.

The test was performed few weeks ago.

For convenience PostgreSQL 10.3 as found in the ports tree was used.

3 variants were tested:
- stock 10.3
- stock 10.3 + pdeathsig
- stock 10.3 + pdeathsig + kqueue

Appropriate patches were provided by Thomas.

In order to keep this message PG-13 I'm not going to show the actual
script, but a mere outline:

for i in $(seq 1 10): do
for t in vanilla pdeathsig pdeathsig_kqueue; do
start up the relevant version
for c in 32 64 96; do
pgbench -j 96 -c $c -T 120 -M prepared -S -U bench
-h 172.16.0.2 -P1 bench > ${t}-${c}-out-warmup 2>&1
pgbench -j 96 -c $c -T 120 -M prepared -S -U bench
-h 172.16.0.2 -P1 bench > ${t}-${c}-out 2>&1
done
shutdown the relevant version
done

Data from the warmup is not used. All the data was pre-read prior to the
test.

PostgreSQL was configured with 32GB of shared buffers and 200 max
connections, otherwise it was the default.

The server is:
Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz
2 package(s) x 8 core(s) x 2 hardware threads

i.e. 32 threads in total.

running FreeBSD -head with 'options NUMA' in kernel config and
sysctl net.inet.tcp.per_cpu_timers=1 on top of zfs.

The load was generated from a different box over a 100Gbit ethernet link.

x cumulative-tps-vanilla-32
+ cumulative-tps-pdeathsig-32
* cumulative-tps-pdeathsig_kqueue-32
++
|+   + x+* x+  *  x   *+ * *   * * **  *  ***|
|   |_|__M_A___M_A_|| |MA|   |
++
N   Min   MaxMedian   AvgStddev
x  10 442898.77 448476.81 444805.17 445062.08 1679.7169
+  10  442057.2 447835.46 443840.28 444235.01 1771.2254
No difference proven at 95.0% confidence
*  10 448138.07 452786.41 450274.56 450311.51 1387.2927
Difference at 95.0% confidence
5249.43 +/- 1447.41
1.17948% +/- 0.327501%
(Student's t, pooled s = 1540.46)
x cumulative-tps-vanilla-64
+ cumulative-tps-pdeathsig-64
* cumulative-tps-pdeathsig_kqueue-64
++
| ** |
| ** |
|  xx  x +***|
|++**x *+*++  ***|
|  ||_A|M_|   |A |
++
N   Min   MaxMedian   AvgStddev
x  10 411849.26  422145.5 416043.77  416061.9 3763.2545
+  10 407123.74 425727.84 419908.73  417480.7 6817.5549
No difference proven at 95.0% confidence
*  10 542032.71 546106.93 543948.05 543874.06 1234.1788
Difference at 95.0% confidence
127812 +/- 2631.31
30.7195% +/- 0.809892%
(Student's t, pooled s = 2800.47)
x cumulative-tps-vanilla-96
+ cumulative-tps-pdeathsig-96
* cumulative-tps-pdeathsig_kqueue-96
++
|  * |
|  * |
|  * |
|  * |
|  + x * |
|  *xxx+   **|
|+ *+* **|
|  |MA||   

Re: [HACKERS] kqueue

2018-05-21 Thread Thomas Munro
On Wed, Apr 11, 2018 at 1:05 PM, Thomas Munro
 wrote:
> I heard through the grapevine of some people currently investigating
> performance problems on busy FreeBSD systems, possibly related to the
> postmaster pipe.  I suspect this patch might be a part of the solution
> (other patches probably needed to get maximum value out of this patch:
> reuse WaitEventSet objects in some key places, and get rid of high
> frequency PostmasterIsAlive() read() calls).  The autoconf-fu in the
> last version bit-rotted so it seemed like a good time to post a
> rebased patch.

Once I knew how to get a message resent to someone who wasn't
subscribed to our mailing list at the time it was sent[1] so they
could join an existing thread.  I don't know how to do that with the
new mailing list software, so I'm CC'ing Mateusz so he can share his
results on-thread.  Sorry for the noise.

[1] 
https://www.postgresql.org/message-id/CAEepm=0-KsV4Sj-0Qd4rMCg7UYdOQA=tujlkezox7h_qiqq...@mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] kqueue

2018-04-10 Thread Thomas Munro
On Wed, Dec 6, 2017 at 12:53 AM, Thomas Munro
 wrote:
> On Thu, Jun 22, 2017 at 7:19 PM, Thomas Munro
>  wrote:
>> I don't plan to resubmit this patch myself, but I was doing some
>> spring cleaning and rebasing today and I figured it might be worth
>> quietly leaving a working patch here just in case anyone from the
>> various BSD communities is interested in taking the idea further.

I heard through the grapevine of some people currently investigating
performance problems on busy FreeBSD systems, possibly related to the
postmaster pipe.  I suspect this patch might be a part of the solution
(other patches probably needed to get maximum value out of this patch:
reuse WaitEventSet objects in some key places, and get rid of high
frequency PostmasterIsAlive() read() calls).  The autoconf-fu in the
last version bit-rotted so it seemed like a good time to post a
rebased patch.

-- 
Thomas Munro
http://www.enterprisedb.com


kqueue-v9.patch
Description: Binary data


Re: [HACKERS] kqueue

2017-12-05 Thread Thomas Munro
On Thu, Jun 22, 2017 at 7:19 PM, Thomas Munro
 wrote:
> I don't plan to resubmit this patch myself, but I was doing some
> spring cleaning and rebasing today and I figured it might be worth
> quietly leaving a working patch here just in case anyone from the
> various BSD communities is interested in taking the idea further.

Since there was a mention of kqueue on -hackers today, here's another
rebase.  I got curious just now and ran a very quick test on an AWS 64
vCPU m4.16xlarge instance running image "FreeBSD
11.1-STABLE-amd64-2017-08-08 - ami-00608178".  I set shared_buffers =
10GB and ran pgbench approximately the same way Heikki and Keith did
upthread:

pgbench -i -s 200 postgres
pgbench -M prepared  -j 6 -c 6 -S postgres -T60 -P1
pgbench -M prepared  -j 12 -c 12 -S postgres -T60 -P1
pgbench -M prepared  -j 24 -c 24 -S postgres -T60 -P1
pgbench -M prepared  -j 36 -c 36 -S postgres -T60 -P1
pgbench -M prepared  -j 48 -c 48 -S postgres -T60 -P1

The TPS numbers I got (including connections establishing) were:

clientsmasterpatched
  6   146,215147,535 (+0.9%)
 12   273,056280,505 (+2.7%)
 24   360,751369,965 (+2.5%)
 36   413,147420,769 (+1.8%)
 48   416,189444,537 (+6.8%)

The patch appears to be doing something positive on this particular
system and that effect was stable over a few runs.

-- 
Thomas Munro
http://www.enterprisedb.com


kqueue-v8.patch
Description: Binary data