Hello Andres,
Thanks for your comments. Some answers and new patches included.
+ /*
+ * Array of buffer ids of all buffers to checkpoint.
+ */
+static int *CheckpointBufferIds = NULL;
Should be at the beginning of the file. There's a bunch more cases of that.
done.
+/* Compare
Ok ok, I stop resisting... I'll have a look.
Here is a v7 ab version which uses shared memory instead of palloc.
--
Fabien.diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e900dcc..1cec243 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2454,6
On 2015-08-10 19:07:12 +0200, Fabien COELHO wrote:
I think that there is no issue with the current shared_buffers limit. I
could allocate and use 4 GB on my laptop without problem. I added a cast to
ensure that unsigned int are used for the size computation.
You can't allocate 4GB with
Hello Andres,
You can't allocate 4GB with palloc(), it has a builtin limit against
allocating more than 1GB.
Argh, too bad, I assumed very naively that palloc was malloc in disguise.
[...]
Well, then everytime the checkpointer is restarted.
Hm...
The point is that it's done at
Hi,
On 2015-08-08 20:49:03 +0300, Heikki Linnakangas wrote:
I ripped out the flushing part, keeping only the sorting. I refactored the
logic in BufferSync() a bit. There's now a separate function,
nextCheckpointBuffer(), that returns the next buffer ID from the sorted
list. The
On August 10, 2015 8:24:21 PM GMT+02:00, Fabien COELHO coe...@cri.ensmp.fr
wrote:
Hello Andres,
You can't allocate 4GB with palloc(), it has a builtin limit against
allocating more than 1GB.
Argh, too bad, I assumed very naively that palloc was malloc in
disguise.
It is, but there's some
On Tue, Aug 11, 2015 at 4:28 AM, Andres Freund wrote:
On August 10, 2015 8:24:21 PM GMT+02:00, Fabien COELHO wrote:
You can't allocate 4GB with palloc(), it has a builtin limit against
allocating more than 1GB.
Argh, too bad, I assumed very naively that palloc was malloc in
disguise.
It is,
On 2015-08-08 20:49:03 +0300, Heikki Linnakangas wrote:
* I think we should drop the flush part of this for now. It's not as
clearly beneficial as the sorting part, and adds a great deal more code
complexity. And it's orthogonal to the sorting patch, so we can deal with it
separately.
I don't
Hello Heikki,
Thanks for having a look at the patch.
* I think we should drop the flush part of this for now. It's not as
clearly beneficial as the sorting part, and adds a great deal more code
complexity. And it's orthogonal to the sorting patch, so we can deal with it
separately.
I
On 07/26/2015 06:01 PM, Fabien COELHO wrote:
Attached is very minor v5 update which does a rebase completes the
cleanup of doing a full sort instead of a chuncked sort.
Some thoughts on this:
* I think we should drop the flush part of this for now. It's not as
clearly beneficial as the
Hello,
Attached is very minor v5 update which does a rebase completes the
cleanup of doing a full sort instead of a chuncked sort.
Attached is an updated version of the patch which turns the sort option
into a boolean, and also include the sort time in the checkpoint log.
There is still
On 2015-06-26 21:47:30 +0200, Fabien COELHO wrote:
tps stddev full speed:
HEAD OFF/OFF
tiny 1 client 727 +- 227 221 +- 246
Huh?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
Note that I'm not comparing to HEAD in the above tests, but with the new
options desactivated, which should be more or less comparable to current
HEAD, i.e. there is no sorting nor flushing done, but this is not strictly
speaking HEAD behavior. Probably I should get some figures with HEAD as
Hello Andres,
HEAD OFF/OFF
tiny 1 client 727 +- 227 221 +- 246
Huh?
Indeed, just to check that someone was reading this magnificent mail:-)
Just a typo because I reformated the figures for simpler comparison. 221
is really 721, quite
On Wed, Jun 24, 2015 at 9:50 AM, Fabien COELHO coe...@cri.ensmp.fr wrote:
flsh | full speed tps | percent of late tx, 4 clients
/srt | 1 client | 4 clients | 100 | 200 | 400 |
N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 |
N/Y | 458 +- 327* | 743
Hello Amit,
[...]
Ok, I misunderstood your question. I thought you meant a dip between 1
client and 4 clients. I meant that when flush is turned on tps goes down by
8% (743 to 681 tps) on this particular run.
This 8% might matter if the dip is bigger with more clients and
more aggressive
It'd be interesting to see numbers for tiny, without the overly small
checkpoint timeout value. 30s is below the OS's writeback time.
Here are some tests with longer timeout:
tiny2: scale=10 shared_buffers=1GB checkpoint_timeout=5min
max_wal_size=1GB warmup=600 time=4000
flsh |
On 6/22/15 11:59 PM, Fabien COELHO wrote:
which might not be helpful for cases when checkpoint could have
flushed soon-to-be-recycled buffers. I think flushing the sorted
buffers w.r.t tablespaces is a good idea, but not giving any
preference to clock-sweep point seems to me that we would loose
On Tue, Jun 23, 2015 at 10:29 AM, Fabien COELHO coe...@cri.ensmp.fr wrote:
Hello Amit,
medium2: scale=300 shared_buffers=5GB checkpoint_timeout=30min
max_wal_size=4GB warmup=1200 time=7500
flsh | full speed tps | percent of late tx, 4 clients
/srt | 1 client |
flsh | full speed tps | percent of late tx, 4 clients
/srt | 1 client | 4 clients | 100 | 200 | 400 |
N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 |
N/Y | 458 +- 327* | 743 +- 920* | 7.05 | 14.24 | 24.07 |
Y/N | 169 +- 166* | 187 +- 302* | 4.01 |
sorry, resent stalled post, wrong from
It'd be interesting to see numbers for tiny, without the overly small
checkpoint timeout value. 30s is below the OS's writeback time.
Here are some tests with longer timeout:
tiny2: scale=10 shared_buffers=1GB checkpoint_timeout=5min
Hello Jim,
The small problem I see is that for a very large setting there could be
several seconds or even minutes of sorting, which may or may not be
desirable, so having some control on that seems a good idea.
ISTM a more elegant way to handle that would be to start off with a very
small
On Mon, Jun 22, 2015 at 1:41 PM, Fabien COELHO coe...@cri.ensmp.fr wrote:
sorry, resent stalled post, wrong from
It'd be interesting to see numbers for tiny, without the overly small
checkpoint timeout value. 30s is below the OS's writeback time.
Here are some tests with longer timeout:
Hello Amit,
medium2: scale=300 shared_buffers=5GB checkpoint_timeout=30min
max_wal_size=4GB warmup=1200 time=7500
flsh | full speed tps | percent of late tx, 4 clients
/srt | 1 client | 4 clients | 100 | 200 | 400 |
N/N | 173 +- 289* | 198 +- 531* |
Hi,
On 2015-06-20 08:57:57 +0200, Fabien COELHO wrote:
Actually I did, because as explained in another mail the fsync time when the
other options are activated as reported in the logs is essentially null, so
it would not bring significant improvements on these runs,
and also the patch changes
On 6/20/15 2:57 AM, Fabien COELHO wrote:
- as version 2: checkpoint buffer sorting based on a 2007 patch by
Takahiro Itagaki but with a smaller and static buffer allocated once.
Also, sorting is done by chunks of 131072 pages in the current
version,
with a guc to change this value.
I
Hello Andres,
So this is an evidence-based decision.
Meh. You're testing on low concurrency.
Well, I'm just testing on the available box.
I do not see the link between high concurrency and whether moving fsync as
early as possible would have a large performance impact. I think it might
Hello Andres,
- Move fsync as early as possible, suggested by Andres Freund?
My opinion is that this should be left out for the nonce.
for the nonce - what does that mean?
Nonce \Nonce\ (n[o^]ns), n. [For the nonce, OE. for the nones, ...
{for the nonce}, i. e. for the present time.
Hi,
On 2015-06-17 08:24:38 +0200, Fabien COELHO wrote:
Here is version 3, including many performance tests with various settings,
representing about 100 hours of pgbench run. This patch aims at improving
checkpoint I/O behavior so that tps throughput is improved, late
transactions are less
Hello,
Here is version 3, including many performance tests with various settings,
representing about 100 hours of pgbench run. This patch aims at improving
checkpoint I/O behavior so that tps throughput is improved, late
transactions are less frequent, and overall performances are more
Le 07/06/2015 16:53, Fabien COELHO a écrit :
+» » /*·Others:·say·that·data·should·not·be·kept·in·memory...
+» » ·*·This·is·not·exactly·what·we·want·to·say,·because·we·want·to·write
+» » ·*·the·data·for·durability·but·we·may·need·it·later·nevertheless.
+» »
Hello Cédric,
It looks a bit hazardous, do you have a benchmark for freeBSD ?
No, I just consulted the FreeBSD man page for posix_fadvise. I someone can
run tests on something which HDDs is not linux, that would be nice.
Sources says:
case POSIX_FADV_DONTNEED:
/*
Hello Andres,
They pretty much can't if you flush things frequently. That's why I
think this won't be acceptable without the sorting in the checkpointer.
* VERSION 2 WORK IN PROGRESS.
The implementation is more a proof-of-concept for having feedback than
clean code. What it does:
- as
Hi Fabien,
On 2015-06-01 PM 08:40, Fabien COELHO wrote:
Turning checkpoint_flush_to_disk = on reduces significantly the number
of late transactions. These late transactions are not uniformly distributed,
but are rather clustered around times when pg is stalled, i.e. more or less
On Tue, Jun 2, 2015 at 6:45 PM, Fabien COELHO coe...@cri.ensmp.fr wrote:
Hello Amit,
[...]
The objective is to help avoid PG stalling when fsyncing on checkpoints,
and in general to get better latency-bound performance.
Won't this lead to more-unsorted writes (random I/O) as the
That might be the case in a database with a single small table; i.e.
where all the writes go to a single file. But as soon as you have
large tables (i.e. many segments) or multiple tables, a significant
part of the writes issued independently from checkpointing will be
outside the processing
Hello Amit,
Not that the GUC naming is the most pressing issue here, but do you think
*_flush_on_write describes what the patch does?
It is currently *_flush_to_disk. In Andres Freund version the name is
sync_on_checkpoint_flush, but I did not found it very clear. Using
*_flush_on_write
Hi,
It's nice to see the topic being picked up.
If I see correctly you picked up the version without sorting durch
checkpoints. I think that's not going to work - there'll be too many
situations where the new behaviour will be detrimental. Did you
consider combining both approaches?
Greetings,
On Mon, Jun 1, 2015 at 5:10 PM, Fabien COELHO coe...@cri.ensmp.fr wrote:
Hello pg-devs,
This patch is a simplified and generalized version of Andres Freund's
August 2014 patch for flushing while writing during checkpoints, with some
documentation and configuration warnings added.
For the
On 2015-06-02 15:42:14 +0200, Fabien COELHO wrote:
This version seems already quite effective and very light. ISTM that
adding a sort phase would mean reworking significantly how the
checkpointer processes pages.
Meh. The patch for that wasn't that big.
Hmmm. I think it should be
On 2015-06-02 15:15:39 +0200, Fabien COELHO wrote:
Won't this lead to more-unsorted writes (random I/O) as the
FlushBuffer requests (by checkpointer or bgwriter) are not sorted as
per files or order of blocks on disk?
Yep, probably. Under moderate load this is not an issue. The io-scheduler
Hello Andres,
If I see correctly you picked up the version without sorting durch
checkpoints. I think that's not going to work - there'll be too many
situations where the new behaviour will be detrimental. Did you
consider combining both approaches?
Ja, I thought that it was a more complex
Hello Amit,
[...]
The objective is to help avoid PG stalling when fsyncing on checkpoints,
and in general to get better latency-bound performance.
Won't this lead to more-unsorted writes (random I/O) as the
FlushBuffer requests (by checkpointer or bgwriter) are not sorted as
per files or
Hello Andres,
I would rather separate them, unless this is a blocker.
I think it is a blocker.
Hmmm. This is an argument...
This version seems already quite effective and very light. ISTM that
adding a sort phase would mean reworking significantly how the
checkpointer processes pages.
On 2015-06-02 17:01:50 +0200, Fabien COELHO wrote:
The actual problem is sorting fsyncing in a way that deals efficiently
with tablespaces, i.e. doesn't write to tablespaces one-by-one.
Not impossible, but it requires some thought.
Hmmm... I would have neglected this point in a first
Hmmm. I think it should be implemented as Tom suggested, that is per chunks
of shared buffers, in order to avoid allocating a large memory.
I don't necessarily agree. But that's really just a minor implementation
detail.
Probably.
The actual problem is sorting fsyncing in a way that deals
Hello pg-devs,
This patch is a simplified and generalized version of Andres Freund's
August 2014 patch for flushing while writing during checkpoints, with some
documentation and configuration warnings added.
For the initial patch, see:
201 - 247 of 247 matches
Mail list logo