Re: Report bytes and transactions actually sent downtream

Ashutosh Bapat Fri, 19 Dec 2025 00:53:36 -0800

On Fri, Dec 19, 2025 at 7:24 AM Chao Li <[email protected]> wrote:
>
>
>
> > On Dec 18, 2025, at 20:52, Ashutosh Bapat <[email protected]> 
> > wrote:
> >
> > On Thu, Dec 18, 2025 at 7:56 AM Chao Li <[email protected]> wrote:
> >>
> >>
> >>
> >>> On Dec 17, 2025, at 13:55, Ashutosh Bapat <[email protected]> 
> >>> wrote:
> >>>
> >>> Thanks for pointing this out. I have fixed it my code. However, at
> >>> this point I am looking for a design review, especially to verify that
> >>> the new implementation addresses Andres's concern raised in [1] while
> >>> not introducing any design issues raised earlier e.g. those raised in
> >>> threads [2], [3] and [4]
> >>>
> >>> [1] 
> >>> https://www.postgresql.org/message-id/zzidfgaowvlv4opptrcdlw57vmulnh7gnes4aerl6u35mirelm@tj2vzseptkjk
> >>>>> [2] 
> >>>>> https://www.postgresql.org/message-id/CAA4eK1KzYaq9dcaa20Pv44ewomUPj_PbbeLfEnvzuXYMZtNw0A%40mail.gmail.com
> >>>>> [3] 
> >>>>> https://www.postgresql.org/message-id/[email protected]
> >>>>> [4] 
> >>>>> https://www.postgresql.org/message-id/CAExHW5tfVHABuv1moL_shp7oPrWmg8ha7T8CqwZxiMrKror7iw%40mail.gmail.com
> >>>
> >>> --
> >>> Best Wishes,
> >>> Ashutosh Bapat
> >>
> >>
> >> Hi Ashutosh,
> >>
> >> Yeah, I owe you a review. I committed to review this patch but I forgot, 
> >> sorry about that.
> >>
> >> From design perspective, I agree increasing counters should belong to the 
> >> core, plugin should return properly values following the contract. And I 
> >> got some more comments:
> >>
> >> 1. I just feel a bool return value might not be clear enough. For example:
> >>
> >> ```
> >> -       ctx->callbacks.change_cb(ctx, txn, relation, change);
> >> +       if (!ctx->callbacks.change_cb(ctx, txn, relation, change))
> >> +               cache->filteredBytes += ReorderBufferChangeSize(change);
> >> ```
> >>
> >> You increase filteredBytes when change_cb returns false. But if we look at 
> >> pgoutput_change(), there are many reasons to return false. Counting all 
> >> the cases to filteredBytes seems wrong.
> >
> > I am not able to understand this. Every "return false" from
> > pgoutput_change() indicates that the change was filtered out and hence
> > the size of corresponding change is being added to filteredBytes by
> > the caller. Which "return false" does not indicate a filtered out
> > change?
>
> I think the confusion comes from the counter name “filteredBytes”, what does 
> “filtered” mean? There are 3 types of data not steaming out:
>
> a. WAL data of tables that doesn’t belong to the publication
> b. table belong to the publication, but action doesn’t. For example, FOR ALL 
> TABLES (INSERT), then update/delete will not be streamed out
> c. Filtered by row filter (WHERE)
>
> I thought only c should be counted to filteredBytes; thinking over again, 
> maybe b should also be counted. But I still don’t think a should be counted.
>
> IMO, sentBytes + filteredBytes == supposedToSendBytes. If a table doesn’t 
> belong to a publication, then it should not be counted into 
> supposedToSendBytes, so it should not be counted into filteredBytes.
>
> The other point is that, if we count a into filteredBytes, then ends up 
> totalBytes == sendBytes + filteredBytes, if that’s true, why don’t compute 
> such a number by (totalBytes-sendBytes) in client side?
>
> If we insist to count a, then maybe we need to consider a better counter name.



>
> >>
> >> 2.
> >> ```
> >> -       ctx->callbacks.truncate_cb(ctx, txn, nrelations, relations, 
> >> change);
> >> +       if (!ctx->callbacks.truncate_cb(ctx, txn, nrelations, relations, 
> >> change))
> >> +               cache->filteredBytes += ReorderBufferChangeSize(change);
> >> ```
> >>
> >> Row filter doesn’t impact TRUNCATE, why increase filteredBytes after 
> >> truncate_cb()?
> >
> > A TRUNCATE of a relation which is not part of the publication will be
> > filtered out.
>
> Same as 1.
>

filtered_bytes is the amount of data filtered out of total_bytes.
Since total_bytes accounts for the changes from tables which are not
included in the publications, filtered_bytes should include them since
they are "filtered" from total_bytes. Hence include a in
filtered_bytes. Quoting from the document
--
        Amount of changes, from
<structfield>total_wal_bytes</structfield>, filtered
        out by the output plugin and not sent downstream. Please note that it
        does not include the changes filtered before a change is sent to
        the output plugin, e.g. the changes filtered by origin.
--

sent_bytes is related but different metric. From the documentation
--
        Amount of transaction changes, in the output format, sent downstream for
        this slot by the output plugin.
-- 
Assumption sentBytes + filteredBytes == supposedToSendBytes. is wrong.
Since filtered_bytes were never converted into the output format we
don't know how many bytes would have been sent downstream, had those
bytes not been filtered. We will never know how much supposedToSend
bytes would be. ALso note that sent_bytes + filtered_bytes is not the
same as total_wal_bytes.

> >
> >>
> >> 3.
> >> ```
> >> -       ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
> >> +       if (ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn))
> >> +               cache->sentTxns++;
> >> ```
> >>
> >> For 2-phase commit, it increase sentTxns after prepare_cb, and
> >> ```
> >> +       if (ctx->callbacks.stream_abort_cb(ctx, txn, abort_lsn))
> >> +               cache->sentTxns++;
> >> ```
> >>
> >> If the transaction is aborted, sentTxns is increased again, which is 
> >> confusing. Though for aborting there is some data (a notification) is 
> >> streamed, but I don’t think that should be counted as a transaction.
> >>
> >> After commit, sentTxns is also increased, so that, a 2-phase commit is 
> >> counted as two transactions, which feels also confusing. IMO, a 2-phase 
> >> commit should still be counted as one transaction.
> >
> > stream_commit/abort_cb is called after stream_prepare_cb not after 
> > prepare_cb.
>
> That’s my typo, but the problem is still there. Should we count a 
> 2-phase-commit as 2 transactions?
>

Can you please provide me a repro where a prepared transaction gets
counted twice as sent_txns?

> >
> >>
> >> 4. You add sentBytes and filteredBytes. I am thinking if it makes sense to 
> >> also add sentRows and filteredRows. Because tables could be big or small, 
> >> bytes + rows could show a more clear picture to users.
> >
> > We don't have corresponding total_rows and streamed_rows counts. I
> > think that's because we haven't come across a use case for them. Do
> > you have a use case in mind?
> >
>
> That’s still related to 1. totalBytes includes tables don’t belong to the 
> publication, thus totalRows doesn’t make much sense. But sendRows will only 
> include those rows belonging to the publication. For filterRows, if we 
> exclude a, then I believe filterRows also makes sense.
>
> If you argue that “rows” request should be treated in a separate thread, I’ll 
> be okay with that.

I think so. It will be good to provide examples of how this statistics
will be used. A separate thread will be better.

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

Reply via email to