Fujii Masao wrote:
What makes the sender process bottleneck?

The keyword here is "might". There's many possibilities, like:
- Slow network.
- Ridiculously fast disk. Like a RAM disk. If you have a synchronous slave you can fail over to, putting WAL on a RAM disk isn't that crazy.
- slower WAL disk on the slave.
etc.

Backends then wait
* not at all for asynch commit
* just for Write for local synch commit
* for both Write and Send for remote synch commit
(various additional options for what happens to confirm Send)

I'd like to introduce new parameter "synchronous_replication" which specifies
whether backends waits for the response from WAL sender process. By
combining synchronous_commit and synchronous_replication, users can
choose various options.

There's one thing I haven't figured out in this discussion. Does the write to the disk happen before or after the write to the slave? Can you guarantee that if a transaction is committed in the master, it's also committed in the slave, or vice versa?

Another thought occurs that we might measure the time a Send takes and
specify a limit on how long we are prepared to wait for confirmation.
Limit=0 => asynchronous. Limit > 0 implies synchronous-up-to-the-limit.
This would give better user behaviour across a highly variable network
connection.

In the viewpoint of detection of a network failure, this feature is necessary.
When the network goes down, WAL sender can be blocked until it detects
the network failure, i.e. WAL sender keeps waiting for the response which
never comes. A timeout notification is necessary in order to detect a
network failure soon.

Agreed. But what happens if you hit that timeout? Should we enforce that timeout within the server, or should we leave that to the external heartbeat system?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to