Re: Syncrep and improving latency due to WAL throttling

2023-12-03 Thread Tomas Vondra
Hi, Since the last patch version I've done a number of experiments with this throttling idea, so let me share some of the ideas and results, and see where that gets us. The patch versions so far tied everything to syncrep - commit latency with sync replica was the original motivation, so this

Re: Syncrep and improving latency due to WAL throttling

2023-11-08 Thread Andres Freund
Hi, On 2023-11-08 19:29:38 +0100, Tomas Vondra wrote: > >>> I haven't checked, but I'd assume that 100bytes back and forth should > >>> easily > >>> fit a new message to update LSNs and the existing feedback response. Even > >>> just > >>> the difference between sending 100 bytes and sending

Re: Syncrep and improving latency due to WAL throttling

2023-11-08 Thread Tomas Vondra
On 11/8/23 18:11, Andres Freund wrote: > Hi, > > On 2023-11-08 13:59:55 +0100, Tomas Vondra wrote: >>> I used netperf's tcp_rr between my workstation and my laptop on a local >>> 10Gbit >>> network (albeit with a crappy external card for my laptop), to put some >>> numbers to this. I used -r

Re: Syncrep and improving latency due to WAL throttling

2023-11-08 Thread Andres Freund
Hi, On 2023-11-08 13:59:55 +0100, Tomas Vondra wrote: > > I used netperf's tcp_rr between my workstation and my laptop on a local > > 10Gbit > > network (albeit with a crappy external card for my laptop), to put some > > numbers to this. I used -r $s,100 to test sending a variable sized data to

Re: Syncrep and improving latency due to WAL throttling

2023-11-08 Thread Tomas Vondra
On 11/8/23 07:40, Andres Freund wrote: > Hi, > > On 2023-11-04 20:00:46 +0100, Tomas Vondra wrote: >> scope >> - >> Now, let's talk about scope - what the patch does not aim to do. The >> patch is explicitly intended for syncrep clusters, not async. There have >> been proposals to also

Re: Syncrep and improving latency due to WAL throttling

2023-11-07 Thread Andres Freund
Hi, On 2023-11-04 20:00:46 +0100, Tomas Vondra wrote: > scope > - > Now, let's talk about scope - what the patch does not aim to do. The > patch is explicitly intended for syncrep clusters, not async. There have > been proposals to also support throttling for async replicas, logical >

Re: Syncrep and improving latency due to WAL throttling

2023-11-04 Thread Tomas Vondra
Hi, I keep getting occasional complaints about the impact of large/bulk transactions on latency of small OLTP transactions, so I'd like to revive this thread a bit and move it forward. Attached is a rebased v3, followed by 0002 patch with some review comments, missing comments and minor tweaks.

Re: Syncrep and improving latency due to WAL throttling

2023-02-02 Thread Jakub Wartak
On Thu, Feb 2, 2023 at 11:03 AM Tomas Vondra wrote: > > I agree that some other concurrent backend's > > COMMIT could fsync it, but I was wondering if that's sensible > > optimization to perform (so that issue_fsync() would be called for > > only commit/rollback records). I can imagine a

Re: Syncrep and improving latency due to WAL throttling

2023-02-02 Thread Tomas Vondra
On 2/1/23 14:40, Jakub Wartak wrote: > On Wed, Feb 1, 2023 at 2:14 PM Tomas Vondra > wrote: > >>> Maybe we should avoid calling fsyncs for WAL throttling? (by teaching >>> HandleXLogDelayPending()->XLogFlush()->XLogWrite() to NOT to sync when >>> we are flushing just because of WAL thortting ?)

Re: Syncrep and improving latency due to WAL throttling

2023-02-01 Thread Jakub Wartak
On Wed, Feb 1, 2023 at 2:14 PM Tomas Vondra wrote: > > Maybe we should avoid calling fsyncs for WAL throttling? (by teaching > > HandleXLogDelayPending()->XLogFlush()->XLogWrite() to NOT to sync when > > we are flushing just because of WAL thortting ?) Would that still be > > safe? > > It's not

Re: Syncrep and improving latency due to WAL throttling

2023-02-01 Thread Tomas Vondra
On 2/1/23 11:04, Jakub Wartak wrote: > On Mon, Jan 30, 2023 at 9:16 AM Bharath Rupireddy > wrote: > > Hi Bharath, thanks for reviewing. > >> I think measuring the number of WAL flushes with and without this >> feature that the postgres generates is great to know this feature >> effects on

Re: Syncrep and improving latency due to WAL throttling

2023-02-01 Thread Jakub Wartak
On Mon, Jan 30, 2023 at 9:16 AM Bharath Rupireddy wrote: Hi Bharath, thanks for reviewing. > I think measuring the number of WAL flushes with and without this > feature that the postgres generates is great to know this feature > effects on IOPS. Probably it's even better with variations in >

Re: Syncrep and improving latency due to WAL throttling

2023-01-30 Thread Bharath Rupireddy
On Sat, Jan 28, 2023 at 6:06 AM Tomas Vondra wrote: > > > > > That's not the sole goal, from my end: I'd like to avoid writing out + > > flushing the WAL in too small chunks. Imagine a few concurrent vacuums or > > COPYs or such - if we're unlucky they'd each end up exceeding their > >

Re: Syncrep and improving latency due to WAL throttling

2023-01-27 Thread Tomas Vondra
On 1/27/23 22:19, Andres Freund wrote: > Hi, > > On 2023-01-27 12:06:49 +0100, Jakub Wartak wrote: >> On Thu, Jan 26, 2023 at 4:49 PM Andres Freund wrote: >> >>> Huh? Why did you remove the GUC? >> >> After reading previous threads, my optimism level of getting it ever >> in shape of being

Re: Syncrep and improving latency due to WAL throttling

2023-01-27 Thread Tomas Vondra
On 1/27/23 22:33, Andres Freund wrote: > Hi, > > On 2023-01-27 21:45:16 +0100, Tomas Vondra wrote: >> On 1/27/23 08:18, Bharath Rupireddy wrote: I think my idea of only forcing to flush/wait an LSN some distance in the past would automatically achieve that? >>> >>> I'm sorry, I

Re: Syncrep and improving latency due to WAL throttling

2023-01-27 Thread Andres Freund
Hi, On 2023-01-27 21:45:16 +0100, Tomas Vondra wrote: > On 1/27/23 08:18, Bharath Rupireddy wrote: > >> I think my idea of only forcing to flush/wait an LSN some distance in the > >> past > >> would automatically achieve that? > > > > I'm sorry, I couldn't get your point, can you please explain

Re: Syncrep and improving latency due to WAL throttling

2023-01-27 Thread Andres Freund
Hi, On 2023-01-27 12:48:43 +0530, Bharath Rupireddy wrote: > Looking at the patch, the feature, in its current shape, focuses on > improving replication lag (by throttling WAL on the primary) only when > synchronous replication is enabled. Why is that? Why can't we design > it for replication in

Re: Syncrep and improving latency due to WAL throttling

2023-01-27 Thread Andres Freund
Hi, On 2023-01-27 12:06:49 +0100, Jakub Wartak wrote: > On Thu, Jan 26, 2023 at 4:49 PM Andres Freund wrote: > > > Huh? Why did you remove the GUC? > > After reading previous threads, my optimism level of getting it ever > in shape of being widely accepted degraded significantly (mainly due >

Re: Syncrep and improving latency due to WAL throttling

2023-01-27 Thread Tomas Vondra
On 1/27/23 08:18, Bharath Rupireddy wrote: > On Thu, Jan 26, 2023 at 9:21 PM Andres Freund wrote: >> >>> 7. I think we need to not let backends throttle too frequently even >>> though they have crossed wal_throttle_threshold bytes. The best way is >>> to rely on replication lag, after all the

Re: Syncrep and improving latency due to WAL throttling

2023-01-27 Thread Jakub Wartak
Hi Bharath, On Fri, Jan 27, 2023 at 12:04 PM Bharath Rupireddy wrote: > > On Fri, Jan 27, 2023 at 2:03 PM Alvaro Herrera > wrote: > > > > On 2023-Jan-27, Bharath Rupireddy wrote: > > > > > Looking at the patch, the feature, in its current shape, focuses on > > > improving replication lag (by

Re: Syncrep and improving latency due to WAL throttling

2023-01-27 Thread Jakub Wartak
Hi, v2 is attached. On Thu, Jan 26, 2023 at 4:49 PM Andres Freund wrote: > Huh? Why did you remove the GUC? After reading previous threads, my optimism level of getting it ever in shape of being widely accepted degraded significantly (mainly due to the discussion of wider category of 'WAL I/O

Re: Syncrep and improving latency due to WAL throttling

2023-01-27 Thread Bharath Rupireddy
On Fri, Jan 27, 2023 at 2:03 PM Alvaro Herrera wrote: > > On 2023-Jan-27, Bharath Rupireddy wrote: > > > Looking at the patch, the feature, in its current shape, focuses on > > improving replication lag (by throttling WAL on the primary) only when > > synchronous replication is enabled. Why is

Re: Syncrep and improving latency due to WAL throttling

2023-01-27 Thread Alvaro Herrera
On 2023-Jan-27, Bharath Rupireddy wrote: > Looking at the patch, the feature, in its current shape, focuses on > improving replication lag (by throttling WAL on the primary) only when > synchronous replication is enabled. Why is that? Why can't we design > it for replication in general (async,

Re: Syncrep and improving latency due to WAL throttling

2023-01-26 Thread Bharath Rupireddy
On Thu, Jan 26, 2023 at 9:21 PM Andres Freund wrote: > > > 7. I think we need to not let backends throttle too frequently even > > though they have crossed wal_throttle_threshold bytes. The best way is > > to rely on replication lag, after all the goal of this feature is to > > keep replication

Re: Syncrep and improving latency due to WAL throttling

2023-01-26 Thread Tomas Vondra
On 1/26/23 16:40, Andres Freund wrote: > Hi, > > On 2023-01-26 12:08:16 +0100, Tomas Vondra wrote: >> It's not clear to me how could it cause deadlocks, as we're not waiting >> for a lock/resource locked by someone else, but it's certainly an issue >> for uninterruptible hangs. > > Maybe not.

Re: Syncrep and improving latency due to WAL throttling

2023-01-26 Thread Andres Freund
Hi, On 2023-01-26 13:33:27 +0530, Bharath Rupireddy wrote: > 6. Backends can ignore throttling for WAL records marked as unimportant, no? Why would that be a good idea? Not that it matters today, but those records still need to be flushed in case of a commit by another transaction. > 7. I

Re: Syncrep and improving latency due to WAL throttling

2023-01-26 Thread Andres Freund
Hi, On 2023-01-26 14:40:56 +0100, Jakub Wartak wrote: > In summary: Attached is a slightly reworked version of this patch. > 1. Moved logic outside XLogInsertRecord() under ProcessInterrupts() > 2. Flushes up to the last page boundary, still uses SyncRepWaitForLSN() > 3. Removed GUC for now

Re: Syncrep and improving latency due to WAL throttling

2023-01-26 Thread Andres Freund
Hi, On 2023-01-26 12:08:16 +0100, Tomas Vondra wrote: > It's not clear to me how could it cause deadlocks, as we're not waiting > for a lock/resource locked by someone else, but it's certainly an issue > for uninterruptible hangs. Maybe not. But I wouldn't want to bet on it. It's a violation of

Re: Syncrep and improving latency due to WAL throttling

2023-01-26 Thread Jakub Wartak
> On 1/25/23 20:05, Andres Freund wrote: > > Hi, > > > > Such a feature could be useful - but I don't think the current place of > > throttling has any hope of working reliably: [..] > > You're blocking in the middle of an XLOG insertion. [..] > Yeah, I agree the sleep would have to happen

Re: Syncrep and improving latency due to WAL throttling

2023-01-26 Thread Tomas Vondra
On 1/25/23 20:05, Andres Freund wrote: > Hi, > > On 2023-01-25 14:32:51 +0100, Jakub Wartak wrote: >> In other words it allows slow down of any backend activity. Any feedback on >> such a feature is welcome, including better GUC name proposals ;) and >> conditions in which such feature should

Re: Syncrep and improving latency due to WAL throttling

2023-01-26 Thread Bharath Rupireddy
On Thu, Jan 26, 2023 at 12:35 AM Andres Freund wrote: > > Hi, > > On 2023-01-25 14:32:51 +0100, Jakub Wartak wrote: > > In other words it allows slow down of any backend activity. Any feedback on > > such a feature is welcome, including better GUC name proposals ;) and > > conditions in which

Re: Syncrep and improving latency due to WAL throttling

2023-01-25 Thread Andres Freund
Hi, On 2023-01-25 14:32:51 +0100, Jakub Wartak wrote: > In other words it allows slow down of any backend activity. Any feedback on > such a feature is welcome, including better GUC name proposals ;) and > conditions in which such feature should be disabled even if it would be > enabled globally

Syncrep and improving latency due to WAL throttling

2023-01-25 Thread Jakub Wartak
Hi, attached is proposal idea by Tomas (in CC) for protecting and prioritizing OLTP latency on syncrep over other heavy WAL hitting sessions. This is the result of internal testing and research related to the syncrep behavior with Tomas, Alvaro and me. The main objective of this