On Tue, Apr 30, 2024 at 11:28 PM Christophe Pettus <x...@thebuild.com> wrote:
> > Hi, > > I wanted to check my understanding of how control flows in a walsender > doing logical replication. My understanding is that the (single) thread in > each walsender process, in the simplest case, loops on: > > 1. Pull a record out of the WAL. > 2. Pass it to the recorder buffer code, which, > 3. Sorts it out into the appropriate in-memory structure for that > transaction (spilling to disk as required), and then continues with #1, or, > 4. If it's a commit record, it iteratively passes the transaction data one > change at a time to, > 5. The logical decoding plugin, which returns the output format of that > plugin, and then, > 6. The walsender sends the output from the plugin to the client. It cycles > on passing the data to the plugin and sending it to the client until it > runs out of changes in that transaction, and then resumes reading the WAL > in #1. > > This is correct barring some details on master. > In particular, I wanted to confirm that while it is pulling the reordered > transaction and sending it to the plugin (and thence to the client), that > particular walsender is *not* reading new WAL records or putting them in > the reorder buffer. > > This is correct. > The specific issue I'm trying to track down is an enormous pileup of spill > files. This is in a non-supported version of PostgreSQL (v11), so an > upgrade may fix it, but at the moment, I'm trying to find a cause and a > mitigation. > > Is there a large transaction which is failing to be replicated repeatedly - timeouts, crashes on upstream or downstream? -- Best Wishes, Ashutosh Bapat