date:20130617

Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-17 Thread Samrat Revagade

On Sun, Jun 16, 2013 at 11:08 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 16 June 2013 17:25, Samrat Revagade revagade.sam...@gmail.com wrote:
 
 
  On Sun, Jun 16, 2013 at 5:10 PM, Simon Riggs si...@2ndquadrant.com
 wrote:
 
 
 
  So I strongly object to calling this patch anything to do with
  failback safe. You simply don't have enough data to make such a bold
  claim. (Which is why we call it synchronous replication and not zero
  data loss, for example).
 
  But that's not the whole story. I can see some utility in a patch that
  makes all WAL transfer synchronous, rather than just commits. Some
  name like synchronous_transfer might be appropriate. e.g.
  synchronous_transfer = all | commit (default).
 
 
  I agree with you about the fact that,
  Now a days the need of fresh backup in crash recovery  seems to be a
 major
  problem.
  we might need to change the name of patch if there other problems too
 with
  crash recovery.

 (Sorry don't understand)

  Sorry for the confusion. I will change name of a patch.


  The idea of another slew of parameters that are very similar to
  synchronous replication but yet somehow different seems weird. I can't
  see a reason why we'd want a second lot of parameters. Why not just
  use the existing ones for sync rep? (I'm surprised the Parameter
  Police haven't visited you in the night...) Sure, we might want to
  expand the design for how we specify multi-node sync rep, but that is
  a different patch.
 
 
  The different set of parameters are needed to differentiate between
  fail-safe standby and synchronous standby, the fail-safe standby and
 standby
  in synchronous replication can be two different servers.

 Why would they be different? What possible reason would you have for
 that config? There is no *need* for those parameters, the proposal
 could work perfectly well without them.

 Let's make this patch fulfill the stated objectives, not add in
 optional extras, especially ones that don't appear well thought
 through. If you wish to enhance the design for the specification of
 multi-node sync rep, make that a separate patch, later.

 I agree with you.I will remove the extra parameters if they are not
required in next version of the patch.

-- 
Regards,

Samrat Revgade

Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

2013-06-17 Thread KONDO Mitsumasa

Thank you for giving comments and my patch reviewer!

(2013/06/16 23:27), Heikki Linnakangas wrote:

On 10.06.2013 13:51, KONDO Mitsumasa wrote:

I create patch which is improvement of checkpoint IO scheduler for
stable transaction responses.

* Problem in checkpoint IO schedule in heavy transaction case
When heavy transaction in database, I think PostgreSQL checkpoint
scheduler has two problems at start and end of checkpoint. One problem
is IO heavy when starting initial checkpoint in rounds of checkpoint.
This problem was caused by full-page-write which cause WAL IO in fast
page writes after checkpoint write page. Therefore, when starting
checkpoint, WAL-based checkpoint scheduler wrong judgment that is late
schedule by full-page-write, nevertheless checkpoint schedule is not
late. This is caused bad transaction response. I think WAL-based
checkpoint scheduler was not property in starting checkpoint.

Yeah, the checkpoint scheduling logic doesn't take into account the heavy WAL
activity caused by full page images. That's an interesting phenomenon, but did
you actually see that causing a problem in your tests? I couldn't tell from the
results you posted what the impact of that was. Could you repeat the tests
separately with the two separate patches you posted later in this thread?

OK, I try to test with the two separate patches. My patches results which I
send past
indicate high WAL throughputs(write_size_per_sec) and high transaction during
checkpoint. Please see
under following HTML file which I set tag jump, and put 'checkpoint highlight
switch' button.

* With my patched PG
http://pgstatsinfo.projects.pgfoundry.org/dbt2_result/report/patchedPG-report.html#transaction_statistics
http://pgstatsinfo.projects.pgfoundry.org/dbt2_result/report/patchedPG-report.html#wal_statistics

* Plain PG
http://pgstatsinfo.projects.pgfoundry.org/dbt2_result/report/plainPG-report.html#transaction_statistics
http://pgstatsinfo.projects.pgfoundry.org/dbt2_result/report/plainPG-report.html#wal_statistics

In wal statistics result, I think that high WAL thorouputs in checkpoint starting
indicates that checkpoint IO does not disturb other executing transaction IO.

Rationalizing a bit, I could even argue to myself that it's a *good* thing. At
the beginning of a checkpoint, the OS write cache should be relatively empty, as
the checkpointer hasn't done any writes yet. So it might make sense to write a
burst of pages at the beginning, to partially fill the write cache first, before
starting to throttle. But this is just handwaving - I have no idea what the
effect is in real life.
Yes, I think so. If we want to change IO throttle, we change OS parameter which
are '/proc/sys/vm/dirty_background_ratio' or '/proc/sys/vm/dirty_ratio'. But this
parameter effects whole applications in OS, it is difficult to change this
parameter and cannot set intuitive parameter. And I think that database tuning
should be set in database parameter rather than OS parameter. It is more clear in
tuning a server.

Another thought is that rather than trying to compensate for that effect in the
checkpoint scheduler, could we avoid the sudden rush of full-page images in the
first place? The current rule for when to write a full page image is
conservative: you don't actually need to write a full page image when you modify
a buffer that's sitting in the buffer cache, if that buffer hasn't been flushed
to disk by the checkpointer yet, because the checkpointer will write and fsync
it
later. I'm not sure how much it would smoothen WAL write I/O, but it would be
interesting to try.
It is most right method in ideal implementations. But I don't have any idea about
this method. It seems very difficult...

Second problem is fsync freeze problem in end of checkpoint.
Normally, checkpoint write is executed in background by OS's IO
scheduler. But when it does not correctly work, end of checkpoint
fsync was caused IO freeze and slower transactions. Unexpected slow
transaction will cause monitor error in HA-cluster and decrease
user-experience in application service. It is especially serious
problem in cloud and virtual server database system which does not
have IO performance. However we don't have solution in
postgresql.conf parameter very much. We prefer checkpoint time to
fast response transactions. In fact checkpoint time is short, and it
becomes little bit long that is not problem. You may think that
checkpoint_segments and checkpoint_timeout are set larger value,
however large checkpoint_segments affects file-cache which is not
read and is wasted, and large checkpoint_timeout was caused
long-time crash-recovery.

A long time ago, Itagaki wrote a patch to sort the checkpoint writes:
www.postgresql.org/message-id/flat/20070614153758.6a62.itagaki.takah...@oss.ntt.co.jp.
He posted very promising performance numbers, but it was dropped because Tom
couldn't reproduce the numbers, and because sorting requires allocating a large
array, which has the

86 matches

Mail list logo