Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-09-05 Thread didier
hi On Thu, Sep 4, 2014 at 7:01 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Sep 4, 2014 at 3:09 AM, Ants Aasma a...@cybertec.at wrote: On Thu, Sep 4, 2014 at 12:36 AM, Andres Freund and...@2ndquadrant.com wrote: It's imo quite clearly better to keep it allocated. For one after

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-09-05 Thread Robert Haas
On Fri, Sep 5, 2014 at 4:17 AM, didier did...@gmail.com wrote: It's not the size of the array that's the problem; it's the size of the detonation when the allocation fails. You can use a file backed memory array Or because it's only a hint and - keys are in buffers (BufferTag), right? -

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-09-05 Thread Claudio Freire
On Fri, Sep 5, 2014 at 3:09 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Sep 5, 2014 at 4:17 AM, didier did...@gmail.com wrote: It's not the size of the array that's the problem; it's the size of the detonation when the allocation fails. You can use a file backed memory array Or

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-09-04 Thread Ants Aasma
On Sat, Aug 30, 2014 at 8:50 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-08-27 19:23:04 +0300, Heikki Linnakangas wrote: A long time ago, Itagaki Takahiro wrote a patch sort the buffers and write them out in order

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-09-04 Thread Ants Aasma
On Thu, Sep 4, 2014 at 12:36 AM, Andres Freund and...@2ndquadrant.com wrote: It's imo quite clearly better to keep it allocated. For one after postmaster started the checkpointer successfully you don't need to be worried about later failures to allocate memory if you allocate it once (unless

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-09-04 Thread Robert Haas
On Thu, Sep 4, 2014 at 3:09 AM, Ants Aasma a...@cybertec.at wrote: On Thu, Sep 4, 2014 at 12:36 AM, Andres Freund and...@2ndquadrant.com wrote: It's imo quite clearly better to keep it allocated. For one after postmaster started the checkpointer successfully you don't need to be worried about

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-09-03 Thread Robert Haas
On Sat, Aug 30, 2014 at 2:04 PM, Andres Freund and...@2ndquadrant.com wrote: If the sort buffer is allocated when the checkpointer is started, not everytime we sort, as you've done in your version of the patch I think that risk is pretty manageable. If we really want to be sure nothing is

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-09-03 Thread Andres Freund
On 2014-09-03 17:08:12 -0400, Robert Haas wrote: On Sat, Aug 30, 2014 at 2:04 PM, Andres Freund and...@2ndquadrant.com wrote: If the sort buffer is allocated when the checkpointer is started, not everytime we sort, as you've done in your version of the patch I think that risk is pretty

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-09-02 Thread Jeff Janes
On Tue, Aug 26, 2014 at 1:02 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: Hello again, I have not found any mean to force bgwriter to send writes when it can. (Well, I have: create a process which sends CHECKPOINT every 0.2 seconds... it works more or less, but this is not my point:-)

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-09-02 Thread Fabien COELHO
There is scan_whole_pool_milliseconds, which currently forces bgwriter to circle the buffer pool at least once every 2 minutes. It is currently fixed, but it should be trivial to turn it into an experimental guc that you could use to test your hypothesis. I recompiled with the variable

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-09-02 Thread Jeff Janes
On Tue, Sep 2, 2014 at 8:14 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: There is scan_whole_pool_milliseconds, which currently forces bgwriter to circle the buffer pool at least once every 2 minutes. It is currently fixed, but it should be trivial to turn it into an experimental guc that

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-31 Thread Fabien COELHO
Hello Heikki, For the kicks, I wrote a quick dirty patch for interleaving the fsyncs, see attached. It works by repeatedly scanning the buffer pool, writing buffers belonging to a single relation segment at a time. I tried this patch on the same host I used with the same -R 25 -L 200 -T

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-30 Thread Andres Freund
On 2014-08-27 19:23:04 +0300, Heikki Linnakangas wrote: A long time ago, Itagaki Takahiro wrote a patch sort the buffers and write them out in order (http://www.postgresql.org/message-id/flat/20070614153758.6a62.itagaki.takah...@oss.ntt.co.jp). The performance impact of that was inconclusive,

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-30 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: On 2014-08-27 19:23:04 +0300, Heikki Linnakangas wrote: A long time ago, Itagaki Takahiro wrote a patch sort the buffers and write them out in order (http://www.postgresql.org/message-id/flat/20070614153758.6a62.itagaki.takah...@oss.ntt.co.jp).

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-30 Thread Andres Freund
On 2014-08-30 13:50:40 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-08-27 19:23:04 +0300, Heikki Linnakangas wrote: A long time ago, Itagaki Takahiro wrote a patch sort the buffers and write them out in order

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-30 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: On 2014-08-30 13:50:40 -0400, Tom Lane wrote: A possible compromise is to sort a limited number of buffers say, collect a few thousand dirty buffers then sort, dump and fsync them, repeat as needed. Yea, that's what I suggested nearby. But I

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-30 Thread Andres Freund
On 2014-08-30 14:16:10 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-08-30 13:50:40 -0400, Tom Lane wrote: A possible compromise is to sort a limited number of buffers say, collect a few thousand dirty buffers then sort, dump and fsync them, repeat as

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-30 Thread Heikki Linnakangas
On 08/30/2014 09:45 PM, Andres Freund wrote: On 2014-08-30 14:16:10 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-08-30 13:50:40 -0400, Tom Lane wrote: A possible compromise is to sort a limited number of buffers say, collect a few thousand dirty buffers

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-30 Thread Andres Freund
On 2014-08-31 01:50:48 +0300, Heikki Linnakangas wrote: On 08/30/2014 09:45 PM, Andres Freund wrote: On 2014-08-30 14:16:10 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-08-30 13:50:40 -0400, Tom Lane wrote: A possible compromise is to sort a limited number of

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-30 Thread Mitsumasa KONDO
Hi, 2014-08-31 8:10 GMT+09:00 Andres Freund and...@2ndquadrant.com: On 2014-08-31 01:50:48 +0300, Heikki Linnakangas wrote: If we're going to fsync between each file, there's no need to sort all the buffers at once. It's enough to pick one file as the target - like in my crude patch - and

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-28 Thread Fabien COELHO
Hello Aidan, If all you want is to avoid the write storms when fsyncs start happening on slow storage, can you not just adjust the kernel vm.dirty* tunables to start making the kernel write out dirty buffers much sooner instead of letting them accumulate until fsyncs force them out all at

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-28 Thread Claudio Freire
On Thu, Aug 28, 2014 at 3:27 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: Hello Aidan, If all you want is to avoid the write storms when fsyncs start happening on slow storage, can you not just adjust the kernel vm.dirty* tunables to start making the kernel write out dirty buffers much

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-28 Thread Fabien COELHO
I tried that by setting: vm.dirty_expire_centisecs = 100 vm.dirty_writeback_centisecs = 100 So it should start writing returned buffers at most 2s after they are returned, if I understood the doc correctly, instead of at most 35s. The result is that with a 5000s 25tps pretty small load

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Fabien COELHO
Hello Andres, [...] I think you're misunderstanding how spread checkpoints work. Yep, definitely:-) On the other hand I though I was seeking something simple, namely correct latency under small load, that I would expect out of the box. What you describe is reasonable, and is more or less

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Andres Freund
On 2014-08-27 09:32:16 +0200, Fabien COELHO wrote: Hello Andres, [...] I think you're misunderstanding how spread checkpoints work. Yep, definitely:-) On the other hand I though I was seeking something simple, namely correct latency under small load, that I would expect out of the box.

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Fabien COELHO
[...] What's your evidence the pacing doesn't work? Afaik it's the fsync that causes the problem, not the the writes themselves. Hmmm. My (poor) understanding is that fsync would work fine if everything was already written beforehand:-) that is it has nothing to do but assess that all is

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Andres Freund
On 2014-08-27 11:05:52 +0200, Fabien COELHO wrote: I can test a couple of patches. I already did one on someone advice (make bgwriter round all stuff in 1s instead of 120s, without positive effect. I've quickly cobbled together the attached patch (which at least doesn't seem to crash burn). It

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Andres Freund
On 2014-08-27 11:14:46 +0200, Andres Freund wrote: On 2014-08-27 11:05:52 +0200, Fabien COELHO wrote: I can test a couple of patches. I already did one on someone advice (make bgwriter round all stuff in 1s instead of 120s, without positive effect. I've quickly cobbled together the

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Andres Freund
On 2014-08-27 11:19:22 +0200, Andres Freund wrote: On 2014-08-27 11:14:46 +0200, Andres Freund wrote: On 2014-08-27 11:05:52 +0200, Fabien COELHO wrote: I can test a couple of patches. I already did one on someone advice (make bgwriter round all stuff in 1s instead of 120s, without

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Fabien COELHO
Hello Amit, I see there is some merit in your point which is to make bgwriter more useful than its current form. I could see 3 top level points to think about whether improvement in any of those can improve the current situation: a. Scanning of buffer pool to find the dirty buffers that

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Claudio Freire
On Wed, Aug 27, 2014 at 6:05 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: [...] What's your evidence the pacing doesn't work? Afaik it's the fsync that causes the problem, not the the writes themselves. Hmmm. My (poor) understanding is that fsync would work fine if everything was already

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Claudio Freire
On Wed, Aug 27, 2014 at 10:10 AM, Claudio Freire klaussfre...@gmail.com wrote: On Wed, Aug 27, 2014 at 6:05 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: [...] What's your evidence the pacing doesn't work? Afaik it's the fsync that causes the problem, not the the writes themselves. Hmmm. My

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Andres Freund
On 2014-08-27 10:10:49 -0300, Claudio Freire wrote: On Wed, Aug 27, 2014 at 6:05 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: [...] What's your evidence the pacing doesn't work? Afaik it's the fsync that causes the problem, not the the writes themselves. Hmmm. My (poor) understanding is

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Claudio Freire
On Wed, Aug 27, 2014 at 10:15 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-08-27 10:10:49 -0300, Claudio Freire wrote: On Wed, Aug 27, 2014 at 6:05 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: [...] What's your evidence the pacing doesn't work? Afaik it's the fsync that causes the

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Andres Freund
On 2014-08-27 10:17:06 -0300, Claudio Freire wrote: I think a somewhat smarter version of the explicit flushes in the hack^Wpatch I posted nearby is going to more likely to be successful. That path is dangerous (as in, may not work as intended) if the filesystem doesn't properly

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Claudio Freire
On Wed, Aug 27, 2014 at 10:20 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-08-27 10:17:06 -0300, Claudio Freire wrote: I think a somewhat smarter version of the explicit flushes in the hack^Wpatch I posted nearby is going to more likely to be successful. That path is dangerous

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Aidan Van Dyk
On Wed, Aug 27, 2014 at 3:32 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: Hello Andres, [...] I think you're misunderstanding how spread checkpoints work. Yep, definitely:-) On the other hand I though I was seeking something simple, namely correct latency under small load, that I would

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Andres Freund
On 2014-08-27 10:32:19 -0400, Aidan Van Dyk wrote: On Wed, Aug 27, 2014 at 3:32 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: Hello Andres, [...] I think you're misunderstanding how spread checkpoints work. Yep, definitely:-) On the other hand I though I was seeking something

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Fabien COELHO
Hello, If all you want is to avoid the write storms when fsyncs start happening on slow storage, can you not just adjust the kernel vm.dirty* tunables to start making the kernel write out dirty buffers much sooner instead of letting them accumulate until fsyncs force them out all at once? I

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Heikki Linnakangas
On 08/27/2014 04:20 PM, Andres Freund wrote: On 2014-08-27 10:17:06 -0300, Claudio Freire wrote: I think a somewhat smarter version of the explicit flushes in the hack^Wpatch I posted nearby is going to more likely to be successful. That path is dangerous (as in, may not work as intended) if

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Andres Freund
On 2014-08-27 19:23:04 +0300, Heikki Linnakangas wrote: On 08/27/2014 04:20 PM, Andres Freund wrote: On 2014-08-27 10:17:06 -0300, Claudio Freire wrote: I think a somewhat smarter version of the explicit flushes in the hack^Wpatch I posted nearby is going to more likely to be successful.

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Fabien COELHO
off: $ pgbench -p 5440 -h /tmp postgres -M prepared -c 16 -j16 -T 120 -R 180 -L 200 number of skipped transactions: 1345 (6.246 %) on: $ pgbench -p 5440 -h /tmp postgres -M prepared -c 16 -j16 -T 120 -R 180 -L 200 number of skipped transactions: 1 (0.005 %) That machine is far from idle

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Andres Freund
On 2014-08-27 19:00:12 +0200, Fabien COELHO wrote: off: $ pgbench -p 5440 -h /tmp postgres -M prepared -c 16 -j16 -T 120 -R 180 -L 200 number of skipped transactions: 1345 (6.246 %) on: $ pgbench -p 5440 -h /tmp postgres -M prepared -c 16 -j16 -T 120 -R 180 -L 200 number of

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-26 Thread Fabien COELHO
Hello Josh, So I think that you're confusing the roles of bgwriter vs. spread checkpoint. What you're experiencing above is pretty common for nonspread checkpoints on slow storage (and RAID5 is slow for DB updates, no matter how fast the disks are), or for attempts to do spread checkpoint on

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-26 Thread Jeff Janes
On Monday, August 25, 2014, Fabien COELHO coe...@cri.ensmp.fr wrote: I have not found any mean to force bgwriter to send writes when it can. (Well, I have: create a process which sends CHECKPOINT every 0.2 seconds... it works more or less, but this is not my point:-) There is

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-26 Thread Fabien COELHO
[oops, wrong from, resent...] Hello Jeff, The culprit I found is bgwriter, which is basically doing nothing to prevent the coming checkpoint IO storm, even though there would be ample time to write the accumulating dirty pages so that checkpoint would find a clean field and pass in a blink.

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-26 Thread Fabien COELHO
Hello Amit, I think another thing to know here is why exactly checkpoint storm is causing tps to drop so steeply. Yep. Actually it is not strictly 0, but a few tps that I rounded to 0. progress: 63.0 s, 47.0 tps, lat 2.810 ms stddev 5.194, lag 0.354 ms progress: 64.1 s, 11.9 tps, lat

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-26 Thread Fabien COELHO
Hello again, I have not found any mean to force bgwriter to send writes when it can. (Well, I have: create a process which sends CHECKPOINT every 0.2 seconds... it works more or less, but this is not my point:-) There is scan_whole_pool_milliseconds, which currently forces bgwriter to circle

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-26 Thread Andres Freund
On 2014-08-26 08:12:48 +0200, Fabien COELHO wrote: As for checkpoint spreading, raising checkpoint_completion_target to 0.9 degrades the situation (20% of transactions are more than 200 ms late instead of 10%, bgwriter wrote less that 1 page per second, on on 500s run). So maybe there is a bug

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-26 Thread Fabien COELHO
Hello Andres, checkpoint when the segments are full... the server is unresponsive about 10% of the time (one in ten transaction is late by more than 200 ms). That's ext4 I guess? Yes! Did you check whether xfs yields a, err, more predictable performance? No. I cannot test that easily

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-26 Thread Andres Freund
On 2014-08-26 10:25:29 +0200, Fabien COELHO wrote: Did you check whether xfs yields a, err, more predictable performance? No. I cannot test that easily without reinstalling the box. I did some quick tests with ZFS/FreeBSD which seemed to freeze the same, but not in the very same conditions.

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-26 Thread Fabien COELHO
What are the other settings here? checkpoint_segments, checkpoint_timeout, wal_buffers? They simply are the defaults: checkpoint_segments = 3 checkpoint_timeout = 5min wal_buffers = -1 I did some test checkpoint_segments = 1, the problem is just more frequent but shorter. I also

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-26 Thread Andres Freund
On 2014-08-26 10:49:31 +0200, Fabien COELHO wrote: What are the other settings here? checkpoint_segments, checkpoint_timeout, wal_buffers? They simply are the defaults: checkpoint_segments = 3 checkpoint_timeout = 5min wal_buffers = -1 I did some test checkpoint_segments = 1,

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-26 Thread Fabien COELHO
Uh. I'm not surprised you're facing utterly horrible performance with this. Did you try using a *large* checkpoints_segments setting? To achieve high performance I do not seek high performance per se, I seek lower maximum latency. I think that the current settings and parameters are designed

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-26 Thread Andres Freund
On 2014-08-26 11:34:36 +0200, Fabien COELHO wrote: Uh. I'm not surprised you're facing utterly horrible performance with this. Did you try using a *large* checkpoints_segments setting? To achieve high performance I do not seek high performance per se, I seek lower maximum latency. So? I

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-26 Thread Fabien COELHO
Hello Jeff, The culprit I found is bgwriter, which is basically doing nothing to prevent the coming checkpoint IO storm, even though there would be ample time to write the accumulating dirty pages so that checkpoint would find a clean field and pass in a blink. Indeed, at the end of the 500

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-26 Thread Amit Kapila
On Tue, Aug 26, 2014 at 12:53 PM, Fabien COELHO coe...@cri.ensmp.fr wrote: Given the small flow of updates, I do not think that there should be reason to get that big a write contention between WAL checkpoint. If tried with full_page_write = off for 500 seconds: same overall behavior, 8.5% of

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-25 Thread Josh Berkus
On 08/25/2014 01:23 PM, Fabien COELHO wrote: Hello pgdevs, I've been playing with pg for some time now to try to reduce the maximum latency of simple requests, to have a responsive server under small to medium load. On an old computer with a software RAID5 HDD attached, pgbench simple

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-25 Thread Andres Freund
Hi, On 2014-08-25 22:23:40 +0200, Fabien COELHO wrote: seconds followed by 16 seconds at about 0 tps for the checkpoint induced IO storm. The server is totally unresponsive 75% of the time. That's bandwidth optimization for you. Hmmm... why not. Now, given this setup, if pgbench is

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-25 Thread Jeff Janes
On Monday, August 25, 2014, Fabien COELHO coe...@cri.ensmp.fr wrote: The culprit I found is bgwriter, which is basically doing nothing to prevent the coming checkpoint IO storm, even though there would be ample time to write the accumulating dirty pages so that checkpoint would find a clean

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-25 Thread Amit Kapila
On Tue, Aug 26, 2014 at 1:53 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: Hello pgdevs, I've been playing with pg for some time now to try to reduce the maximum latency of simple requests, to have a responsive server under small to medium load. On an old computer with a software RAID5 HDD