Re: [HACKERS] Incremental checkopints

2011-08-03 Thread Robert Haas
2011/7/29 Greg Smith g...@2ndquadrant.com:
 1) Postponing writes as long as possible always improves the resulting
 throughput of those writes.  Any incremental checkpoint approach will detune
 throughput by some amount.  If you make writes go out more often, they will
 be less efficient; that's just how things work if you benchmark anything
 that allows write combining.  Any incremental checkpoint approach is likely
 to improve latency in some cases if it works well, while decreasing
 throughput in most cases.

Agreed.  I came to the same conclusion a while back and then got
depressed.  That might mean we need a parameter to control the
behavior, unless we can find a change where the throughput drop is
sufficiently small that we don't really care, or make the optimization
apply only in cases where we determine that the latency problem will
be so severe that we'll certainly be willing to accept a drop in
throughput to avoid it.

 2) The incremental checkpoint approach used by other databases, such as the
 MySQL implementation, works by tracking what transaction IDs were associated
 with a buffer update.  The current way PostgreSQL saves buffer sync
 information for the checkpoint to process things doesn't store enough
 information to do that.  As you say, the main price there is some additional
 memory.

I think what we'd need to track is the LSN that first dirtied the page
(as opposed to the current field, which tracks the LSN that most
recently wrote the page).  If we write and flush all pages whose
first-dirtied LSN precedes some cutoff point, then we ought to be able
to advance the redo pointer to that point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incremental checkopints

2011-08-03 Thread jordani
I have not explained well what I have in my mind in first message.

Main goal is more buffers to stay dirty in memory for longer time. So
checkpoint segments have to be 2, 3... times than in current approach. And
separate parameter can control how much buffers to write at once. DBA can
tune:
- checkpoint segments - higher number for less writes but more time for
crash recovery;
- (how much buffers to write at once) - more for throughput and less for
latency.

 I think what we'd need to track is the LSN that first dirtied the page
 (as opposed to the current field, which tracks the LSN that most
 recently wrote the page).  If we write and flush all pages whose
 first-dirtied LSN precedes some cutoff point, then we ought to be able
 to advance the redo pointer to that point.

Also if the page is written by backend LSN_first_dirtied have to be cleared.

I believe (but I can not prove) it worths testing and advantage will be
noticeable.

Jordan Ivanov


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Incremental checkopints

2011-07-29 Thread jordani
Hi,
I have read all information about checkpoints in PostgreSQL I have found.
I think that current implementation of checkpoints is not good for huge
shared buffer cache and for many WAL segments. If there is more buffers
and if buffers can be written rarely more updates of buffers can be
combined so total number of writes to disk will be significantly less. I
think that incremental checkpoints can achieve this goal (maybe more) and
price is additional memory (about 1/1000 of size of buffer cache).

My main source of information is
http://wiki.postgresql.org/wiki/User:Gsmith#How_do_checkpoints_happen_inside_the_PostgreSQL_backend.3F
I see that some data are required to be written into WAL in 3) and 6). I
will use CD to denote that data and P1, P2... to denote pages that are
dirty and has to be written to disk in 4).

In incremental checkpoint when WAL segment has written we will not start
writing but we will add to queue pages P1, P2 ... and CD. If meanwhile
background writer has to clean some page that page is removed from queue.
When checkpoint_segments are written in the transaction log we have in
queue:
P1, P2 ... CD, Pi ... CD, Pj ... CD ...
Here we have to make checkpoint in order to free first WAL segment. Only
pages before first CD have to be written and fsync’d.

I suppose that this task can be done in background writer. So first we can
make some number of writes per round both lru and checkpoint. There is no
deadline for each incremental checkpoint but if WAL is growing total
number of writes have to increase. Also it is not required to do
checkpoint for each WAL segment. It is possible to write N pages from
queue and to combine several potential checkpoint in one.

I hope I have explained the general idea. I am not C programmer so it is
hard to me to give more details.

Jordan Ivanov


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incremental checkopints

2011-07-29 Thread Greg Smith

On 07/29/2011 11:04 AM, jord...@go-link.net wrote:

I think that current implementation of checkpoints is not good for huge
shared buffer cache and for many WAL segments. If there is more buffers
and if buffers can be written rarely more updates of buffers can be
combined so total number of writes to disk will be significantly less. I
think that incremental checkpoints can achieve this goal (maybe more) and
price is additional memory (about 1/1000 of size of buffer cache).
   


The current code optimizes for buffers that are written frequently.  
Those will sit in shared_buffers and in the hoped for case, only be 
written once at checkpoint time.


There are two issues with adopting increment checkpoints instead, one 
fundamental, the other solvable but not started on yet:


1) Postponing writes as long as possible always improves the resulting 
throughput of those writes.  Any incremental checkpoint approach will 
detune throughput by some amount.  If you make writes go out more often, 
they will be less efficient; that's just how things work if you 
benchmark anything that allows write combining.  Any incremental 
checkpoint approach is likely to improve latency in some cases if it 
works well, while decreasing throughput in most cases.


2) The incremental checkpoint approach used by other databases, such as 
the MySQL implementation, works by tracking what transaction IDs were 
associated with a buffer update.  The current way PostgreSQL saves 
buffer sync information for the checkpoint to process things doesn't 
store enough information to do that.  As you say, the main price there 
is some additional memory.


From my perspective, the main problem with plans to tweak the 
checkpoint code is that we don't have a really good benchmark that 
tracks both throughput and latency to test proposed changes against.  
Mark Wong has been working to get his TCP-E clone DBT-5 running 
regularly for that purpose, and last I heard that was basically done at 
this point--he's running daily tests now.  There's already a small pile 
of patches that adjust checkpoint behavior around that were postponed 
from being included in 9.1 mainly because it was hard to prove they were 
useful given the benchmark used to test them, pgbench.  I have higher 
hopes for DBT-5 as being a test that gives informative data in this 
area.  I would want to go back and revisit the existing patches (sorted 
checkpoints, spread sync) before launching into this whole new area.  I 
don't think any of those has even been proven not to work, they just 
didn't help the slightly unrealistic pgbench write-heavy workload.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incremental checkopints

2011-07-29 Thread jordani
 If you make writes go out more often, they will be less efficient

I think fsync is more important. But many writes + fsync is no good too.
Let suppose that 30 WAL segments are good for performance (to be written
at once). In incremental approach we can have 60 segments and we can write
30 at once. There is no checkpoint_timeout - more buffers will stay more
time.

I can not see any disadvantage.

Jordan Ivanov


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers