Re: [HACKERS] checkpointer continuous flushing

2015-08-10 Thread Fabien COELHO
Hello Andres, Thanks for your comments. Some answers and new patches included. + /* + * Array of buffer ids of all buffers to checkpoint. + */ +static int *CheckpointBufferIds = NULL; Should be at the beginning of the file. There's a bunch more cases of that. done. +/* Compare

Re: [HACKERS] checkpointer continuous flushing

2015-08-10 Thread Fabien COELHO
Ok ok, I stop resisting... I'll have a look. Here is a v7 ab version which uses shared memory instead of palloc. -- Fabien.diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e900dcc..1cec243 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2454,6

Re: [HACKERS] checkpointer continuous flushing

2015-08-10 Thread Andres Freund
On 2015-08-10 19:07:12 +0200, Fabien COELHO wrote: I think that there is no issue with the current shared_buffers limit. I could allocate and use 4 GB on my laptop without problem. I added a cast to ensure that unsigned int are used for the size computation. You can't allocate 4GB with

Re: [HACKERS] checkpointer continuous flushing

2015-08-10 Thread Fabien COELHO
Hello Andres, You can't allocate 4GB with palloc(), it has a builtin limit against allocating more than 1GB. Argh, too bad, I assumed very naively that palloc was malloc in disguise. [...] Well, then everytime the checkpointer is restarted. Hm... The point is that it's done at

Re: [HACKERS] checkpointer continuous flushing

2015-08-10 Thread Andres Freund
Hi, On 2015-08-08 20:49:03 +0300, Heikki Linnakangas wrote: I ripped out the flushing part, keeping only the sorting. I refactored the logic in BufferSync() a bit. There's now a separate function, nextCheckpointBuffer(), that returns the next buffer ID from the sorted list. The

Re: [HACKERS] checkpointer continuous flushing

2015-08-10 Thread Andres Freund
On August 10, 2015 8:24:21 PM GMT+02:00, Fabien COELHO coe...@cri.ensmp.fr wrote: Hello Andres, You can't allocate 4GB with palloc(), it has a builtin limit against allocating more than 1GB. Argh, too bad, I assumed very naively that palloc was malloc in disguise. It is, but there's some

Re: [HACKERS] checkpointer continuous flushing

2015-08-10 Thread Michael Paquier
On Tue, Aug 11, 2015 at 4:28 AM, Andres Freund wrote: On August 10, 2015 8:24:21 PM GMT+02:00, Fabien COELHO wrote: You can't allocate 4GB with palloc(), it has a builtin limit against allocating more than 1GB. Argh, too bad, I assumed very naively that palloc was malloc in disguise. It is,

Re: [HACKERS] checkpointer continuous flushing

2015-08-10 Thread Andres Freund
On 2015-08-08 20:49:03 +0300, Heikki Linnakangas wrote: * I think we should drop the flush part of this for now. It's not as clearly beneficial as the sorting part, and adds a great deal more code complexity. And it's orthogonal to the sorting patch, so we can deal with it separately. I don't

Re: [HACKERS] checkpointer continuous flushing

2015-08-09 Thread Fabien COELHO
Hello Heikki, Thanks for having a look at the patch. * I think we should drop the flush part of this for now. It's not as clearly beneficial as the sorting part, and adds a great deal more code complexity. And it's orthogonal to the sorting patch, so we can deal with it separately. I

Re: [HACKERS] checkpointer continuous flushing

2015-08-08 Thread Heikki Linnakangas
On 07/26/2015 06:01 PM, Fabien COELHO wrote: Attached is very minor v5 update which does a rebase completes the cleanup of doing a full sort instead of a chuncked sort. Some thoughts on this: * I think we should drop the flush part of this for now. It's not as clearly beneficial as the

Re: [HACKERS] checkpointer continuous flushing

2015-07-26 Thread Fabien COELHO
Hello, Attached is very minor v5 update which does a rebase completes the cleanup of doing a full sort instead of a chuncked sort. Attached is an updated version of the patch which turns the sort option into a boolean, and also include the sort time in the checkpoint log. There is still

Re: [HACKERS] checkpointer continuous flushing

2015-06-26 Thread Andres Freund
On 2015-06-26 21:47:30 +0200, Fabien COELHO wrote: tps stddev full speed: HEAD OFF/OFF tiny 1 client 727 +- 227 221 +- 246 Huh? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:

Re: [HACKERS] checkpointer continuous flushing

2015-06-26 Thread Fabien COELHO
Note that I'm not comparing to HEAD in the above tests, but with the new options desactivated, which should be more or less comparable to current HEAD, i.e. there is no sorting nor flushing done, but this is not strictly speaking HEAD behavior. Probably I should get some figures with HEAD as

Re: [HACKERS] checkpointer continuous flushing

2015-06-26 Thread Fabien COELHO
Hello Andres, HEAD OFF/OFF tiny 1 client 727 +- 227 221 +- 246 Huh? Indeed, just to check that someone was reading this magnificent mail:-) Just a typo because I reformated the figures for simpler comparison. 221 is really 721, quite

Re: [HACKERS] checkpointer continuous flushing

2015-06-24 Thread Amit Kapila
On Wed, Jun 24, 2015 at 9:50 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: flsh | full speed tps | percent of late tx, 4 clients /srt | 1 client | 4 clients | 100 | 200 | 400 | N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 | N/Y | 458 +- 327* | 743

Re: [HACKERS] checkpointer continuous flushing

2015-06-24 Thread Fabien COELHO
Hello Amit, [...] Ok, I misunderstood your question. I thought you meant a dip between 1 client and 4 clients. I meant that when flush is turned on tps goes down by 8% (743 to 681 tps) on this particular run. This 8% might matter if the dip is bigger with more clients and more aggressive

Re: [HACKERS] checkpointer continuous flushing

2015-06-23 Thread Fabien COELHO
It'd be interesting to see numbers for tiny, without the overly small checkpoint timeout value. 30s is below the OS's writeback time. Here are some tests with longer timeout: tiny2: scale=10 shared_buffers=1GB checkpoint_timeout=5min max_wal_size=1GB warmup=600 time=4000 flsh |

Re: [HACKERS] checkpointer continuous flushing

2015-06-23 Thread Jim Nasby
On 6/22/15 11:59 PM, Fabien COELHO wrote: which might not be helpful for cases when checkpoint could have flushed soon-to-be-recycled buffers. I think flushing the sorted buffers w.r.t tablespaces is a good idea, but not giving any preference to clock-sweep point seems to me that we would loose

Re: [HACKERS] checkpointer continuous flushing

2015-06-23 Thread Amit Kapila
On Tue, Jun 23, 2015 at 10:29 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: Hello Amit, medium2: scale=300 shared_buffers=5GB checkpoint_timeout=30min max_wal_size=4GB warmup=1200 time=7500 flsh | full speed tps | percent of late tx, 4 clients /srt | 1 client |

Re: [HACKERS] checkpointer continuous flushing

2015-06-23 Thread Fabien COELHO
flsh | full speed tps | percent of late tx, 4 clients /srt | 1 client | 4 clients | 100 | 200 | 400 | N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 | N/Y | 458 +- 327* | 743 +- 920* | 7.05 | 14.24 | 24.07 | Y/N | 169 +- 166* | 187 +- 302* | 4.01 |

Re: [HACKERS] checkpointer continuous flushing

2015-06-22 Thread Fabien COELHO
sorry, resent stalled post, wrong from It'd be interesting to see numbers for tiny, without the overly small checkpoint timeout value. 30s is below the OS's writeback time. Here are some tests with longer timeout: tiny2: scale=10 shared_buffers=1GB checkpoint_timeout=5min

Re: [HACKERS] checkpointer continuous flushing

2015-06-22 Thread Fabien COELHO
Hello Jim, The small problem I see is that for a very large setting there could be several seconds or even minutes of sorting, which may or may not be desirable, so having some control on that seems a good idea. ISTM a more elegant way to handle that would be to start off with a very small

Re: [HACKERS] checkpointer continuous flushing

2015-06-22 Thread Amit Kapila
On Mon, Jun 22, 2015 at 1:41 PM, Fabien COELHO coe...@cri.ensmp.fr wrote: sorry, resent stalled post, wrong from It'd be interesting to see numbers for tiny, without the overly small checkpoint timeout value. 30s is below the OS's writeback time. Here are some tests with longer timeout:

Re: [HACKERS] checkpointer continuous flushing

2015-06-22 Thread Fabien COELHO
Hello Amit, medium2: scale=300 shared_buffers=5GB checkpoint_timeout=30min max_wal_size=4GB warmup=1200 time=7500 flsh | full speed tps | percent of late tx, 4 clients /srt | 1 client | 4 clients | 100 | 200 | 400 | N/N | 173 +- 289* | 198 +- 531* |

Re: [HACKERS] checkpointer continuous flushing

2015-06-21 Thread Andres Freund
Hi, On 2015-06-20 08:57:57 +0200, Fabien COELHO wrote: Actually I did, because as explained in another mail the fsync time when the other options are activated as reported in the logs is essentially null, so it would not bring significant improvements on these runs, and also the patch changes

Re: [HACKERS] checkpointer continuous flushing

2015-06-21 Thread Jim Nasby
On 6/20/15 2:57 AM, Fabien COELHO wrote: - as version 2: checkpoint buffer sorting based on a 2007 patch by Takahiro Itagaki but with a smaller and static buffer allocated once. Also, sorting is done by chunks of 131072 pages in the current version, with a guc to change this value. I

Re: [HACKERS] checkpointer continuous flushing

2015-06-21 Thread Fabien COELHO
Hello Andres, So this is an evidence-based decision. Meh. You're testing on low concurrency. Well, I'm just testing on the available box. I do not see the link between high concurrency and whether moving fsync as early as possible would have a large performance impact. I think it might

Re: [HACKERS] checkpointer continuous flushing

2015-06-20 Thread Fabien COELHO
Hello Andres, - Move fsync as early as possible, suggested by Andres Freund? My opinion is that this should be left out for the nonce. for the nonce - what does that mean? Nonce \Nonce\ (n[o^]ns), n. [For the nonce, OE. for the nones, ... {for the nonce}, i. e. for the present time.

Re: [HACKERS] checkpointer continuous flushing

2015-06-19 Thread Andres Freund
Hi, On 2015-06-17 08:24:38 +0200, Fabien COELHO wrote: Here is version 3, including many performance tests with various settings, representing about 100 hours of pgbench run. This patch aims at improving checkpoint I/O behavior so that tps throughput is improved, late transactions are less

Re: [HACKERS] checkpointer continuous flushing

2015-06-17 Thread Fabien COELHO
Hello, Here is version 3, including many performance tests with various settings, representing about 100 hours of pgbench run. This patch aims at improving checkpoint I/O behavior so that tps throughput is improved, late transactions are less frequent, and overall performances are more

Re: [HACKERS] checkpointer continuous flushing

2015-06-08 Thread Cédric Villemain
Le 07/06/2015 16:53, Fabien COELHO a écrit : +» » /*·Others:·say·that·data·should·not·be·kept·in·memory... +» » ·*·This·is·not·exactly·what·we·want·to·say,·because·we·want·to·write +» » ·*·the·data·for·durability·but·we·may·need·it·later·nevertheless. +» »

Re: [HACKERS] checkpointer continuous flushing

2015-06-08 Thread Fabien COELHO
Hello Cédric, It looks a bit hazardous, do you have a benchmark for freeBSD ? No, I just consulted the FreeBSD man page for posix_fadvise. I someone can run tests on something which HDDs is not linux, that would be nice. Sources says: case POSIX_FADV_DONTNEED: /*

Re: [HACKERS] checkpointer continuous flushing

2015-06-07 Thread Fabien COELHO
Hello Andres, They pretty much can't if you flush things frequently. That's why I think this won't be acceptable without the sorting in the checkpointer. * VERSION 2 WORK IN PROGRESS. The implementation is more a proof-of-concept for having feedback than clean code. What it does: - as

Re: [HACKERS] checkpointer continuous flushing

2015-06-02 Thread Amit Langote
Hi Fabien, On 2015-06-01 PM 08:40, Fabien COELHO wrote: Turning checkpoint_flush_to_disk = on reduces significantly the number of late transactions. These late transactions are not uniformly distributed, but are rather clustered around times when pg is stalled, i.e. more or less

Re: [HACKERS] checkpointer continuous flushing

2015-06-02 Thread Amit Kapila
On Tue, Jun 2, 2015 at 6:45 PM, Fabien COELHO coe...@cri.ensmp.fr wrote: Hello Amit, [...] The objective is to help avoid PG stalling when fsyncing on checkpoints, and in general to get better latency-bound performance. Won't this lead to more-unsorted writes (random I/O) as the

Re: [HACKERS] checkpointer continuous flushing

2015-06-02 Thread Fabien COELHO
That might be the case in a database with a single small table; i.e. where all the writes go to a single file. But as soon as you have large tables (i.e. many segments) or multiple tables, a significant part of the writes issued independently from checkpointing will be outside the processing

Re: [HACKERS] checkpointer continuous flushing

2015-06-02 Thread Fabien COELHO
Hello Amit, Not that the GUC naming is the most pressing issue here, but do you think *_flush_on_write describes what the patch does? It is currently *_flush_to_disk. In Andres Freund version the name is sync_on_checkpoint_flush, but I did not found it very clear. Using *_flush_on_write

Re: [HACKERS] checkpointer continuous flushing

2015-06-02 Thread Andres Freund
Hi, It's nice to see the topic being picked up. If I see correctly you picked up the version without sorting durch checkpoints. I think that's not going to work - there'll be too many situations where the new behaviour will be detrimental. Did you consider combining both approaches? Greetings,

Re: [HACKERS] checkpointer continuous flushing

2015-06-02 Thread Amit Kapila
On Mon, Jun 1, 2015 at 5:10 PM, Fabien COELHO coe...@cri.ensmp.fr wrote: Hello pg-devs, This patch is a simplified and generalized version of Andres Freund's August 2014 patch for flushing while writing during checkpoints, with some documentation and configuration warnings added. For the

Re: [HACKERS] checkpointer continuous flushing

2015-06-02 Thread Andres Freund
On 2015-06-02 15:42:14 +0200, Fabien COELHO wrote: This version seems already quite effective and very light. ISTM that adding a sort phase would mean reworking significantly how the checkpointer processes pages. Meh. The patch for that wasn't that big. Hmmm. I think it should be

Re: [HACKERS] checkpointer continuous flushing

2015-06-02 Thread Andres Freund
On 2015-06-02 15:15:39 +0200, Fabien COELHO wrote: Won't this lead to more-unsorted writes (random I/O) as the FlushBuffer requests (by checkpointer or bgwriter) are not sorted as per files or order of blocks on disk? Yep, probably. Under moderate load this is not an issue. The io-scheduler

Re: [HACKERS] checkpointer continuous flushing

2015-06-02 Thread Fabien COELHO
Hello Andres, If I see correctly you picked up the version without sorting durch checkpoints. I think that's not going to work - there'll be too many situations where the new behaviour will be detrimental. Did you consider combining both approaches? Ja, I thought that it was a more complex

Re: [HACKERS] checkpointer continuous flushing

2015-06-02 Thread Fabien COELHO
Hello Amit, [...] The objective is to help avoid PG stalling when fsyncing on checkpoints, and in general to get better latency-bound performance. Won't this lead to more-unsorted writes (random I/O) as the FlushBuffer requests (by checkpointer or bgwriter) are not sorted as per files or

Re: [HACKERS] checkpointer continuous flushing

2015-06-02 Thread Fabien COELHO
Hello Andres, I would rather separate them, unless this is a blocker. I think it is a blocker. Hmmm. This is an argument... This version seems already quite effective and very light. ISTM that adding a sort phase would mean reworking significantly how the checkpointer processes pages.

Re: [HACKERS] checkpointer continuous flushing

2015-06-02 Thread Andres Freund
On 2015-06-02 17:01:50 +0200, Fabien COELHO wrote: The actual problem is sorting fsyncing in a way that deals efficiently with tablespaces, i.e. doesn't write to tablespaces one-by-one. Not impossible, but it requires some thought. Hmmm... I would have neglected this point in a first

Re: [HACKERS] checkpointer continuous flushing

2015-06-02 Thread Fabien COELHO
Hmmm. I think it should be implemented as Tom suggested, that is per chunks of shared buffers, in order to avoid allocating a large memory. I don't necessarily agree. But that's really just a minor implementation detail. Probably. The actual problem is sorting fsyncing in a way that deals

[HACKERS] checkpointer continuous flushing

2015-06-01 Thread Fabien COELHO
Hello pg-devs, This patch is a simplified and generalized version of Andres Freund's August 2014 patch for flushing while writing during checkpoints, with some documentation and configuration warnings added. For the initial patch, see:

<    1   2   3