On Tuesday, December 21, 2021 3:03 PM, tanghy.f...@fujitsu.com 
<tanghy.f...@fujitsu.com> wrote:
> To: Amit Kapila <amit.kapil...@gmail.com>; Euler Taveira <eu...@eulerto.com>
> Cc: Dilip Kumar <dilipbal...@gmail.com>; Peter Smith <smithpb2...@gmail.com>;
> Greg Nancarrow <gregn4...@gmail.com>; Hou, Zhijie/侯 志杰
> <houzj.f...@fujitsu.com>; vignesh C <vignes...@gmail.com>; Ajin Cherian
> <itsa...@gmail.com>; Rahila Syed <rahilasye...@gmail.com>; Peter Eisentraut
> <peter.eisentr...@enterprisedb.com>; Önder Kalacı <onderkal...@gmail.com>;
> japin <japi...@hotmail.com>; Michael Paquier <mich...@paquier.xyz>; David
> Steele <da...@pgmasters.net>; Craig Ringer <cr...@2ndquadrant.com>; Amit
> Langote <amitlangot...@gmail.com>; PostgreSQL Hackers
> <pgsql-hackers@lists.postgresql.org>
> Subject: RE: row filtering for logical replication
> 
> On Monday, December 20, 2021 4:47 PM tanghy.f...@fujitsu.com
> <tanghy.f...@fujitsu.com> wrote:
> > On Monday, December 20, 2021 11:24 AM tanghy.f...@fujitsu.com
> > <tanghy.f...@fujitsu.com>
> > >
> > > On Wednesday, December 8, 2021 2:29 PM Amit Kapila
> > > <amit.kapil...@gmail.com> wrote:
> > > >
> > > > On Mon, Dec 6, 2021 at 6:04 PM Euler Taveira <eu...@eulerto.com> wrote:
> > > > >
> > > > > On Mon, Dec 6, 2021, at 3:35 AM, Dilip Kumar wrote:
> > > > >
> > > > > On Mon, Dec 6, 2021 at 6:49 AM Euler Taveira <eu...@eulerto.com> 
> > > > > wrote:
> > > > > >
> > > > > > On Fri, Dec 3, 2021, at 8:12 PM, Euler Taveira wrote:
> > > > > >
> > > > > > PS> I will update the commit message in the next version. I barely
> changed
> > > the
> > > > > > documentation to reflect the current behavior. I probably missed 
> > > > > > some
> > > > changes
> > > > > > but I will fix in the next version.
> > > > > >
> > > > > > I realized that I forgot to mention a few things about the UPDATE 
> > > > > > behavior.
> > > > > > Regardless of 0003, we need to define which tuple will be used to 
> > > > > > evaluate
> > > the
> > > > > > row filter for UPDATEs. We already discussed it circa [1]. This 
> > > > > > current
> > version
> > > > > > chooses *new* tuple. Is it the best choice?
> > > > >
> > > > > But with 0003, we are using both the tuple for evaluating the row
> > > > > filter, so instead of fixing 0001, why we don't just merge 0003 with
> > > > > 0001?  I mean eventually, 0003 is doing what is the agreed behavior,
> > > > > i.e. if just OLD is matching the filter then convert the UPDATE to
> > > > > DELETE OTOH if only new is matching the filter then convert the UPDATE
> > > > > to INSERT.  Do you think that even we merge 0001 and 0003 then also
> > > > > there is an open issue regarding which row to select for the filter?
> > > > >
> > > > > Maybe I was not clear. IIUC we are still discussing 0003 and I would
> like
> > to
> > > > > propose a different default based on the conclusion I came up. If we
> merged
> > > > > 0003, that's fine; this change will be useless. If we don't or it is
> optional,
> > > > > it still has its merit.
> > > > >
> > > > > Do we want to pay the overhead to evaluating both tuple for UPDATEs?
> I'm
> > still
> > > > > processing if it is worth it. If you think that in general the row 
> > > > > filter
> > > > > contains the primary key and it is rare to change it, it will waste 
> > > > > cycles
> > > > > evaluating the same expression twice. It seems this behavior could be
> > > > > controlled by a parameter.
> > > > >
> > > >
> > > > I think the first thing we should do in this regard is to evaluate the
> > > > performance for both cases (when we apply a filter to both tuples vs.
> > > > to one of the tuples). In case the performance difference is
> > > > unacceptable, I think it would be better to still compare both tuples
> > > > as default to avoid data inconsistency issues and have an option to
> > > > allow comparing one of the tuples.
> > > >
> > >
> > > I did some performance tests to see if 0003 patch has much overhead.
> > > With which I compared applying first two patches and applying first three
> patches
> > > in four cases:
> > > 1) only old rows match the filter.
> > > 2) only new rows match the filter.
> > > 3) both old rows and new rows match the filter.
> > > 4) neither old rows nor new rows match the filter.
> > >
> > > 0003 patch checks both old rows and new rows, and without 0003 patch, it
> only
> > > checks either old or new rows. We want to know whether it would take more
> time
> > > if we check the old rows.
> > >
> > > I ran the tests in asynchronous mode and compared the SQL execution time.
> I
> > also
> > > tried some complex filters, to see if the difference could be more 
> > > obvious.
> > >
> > > The result and the script are attached.
> > > I didn’t see big difference between the result of applying 0003 patch and
> the
> > > one not in all cases. So I think 0003 patch doesn’t have much overhead.
> > >
> >
> > In previous test, I ran 3 times and took the average value, which may be 
> > affected
> > by
> > performance fluctuations.
> >
> > So, to make the results more accurate, I tested them more times (10 times)
> and
> > took the average value. The result is attached.
> >
> > In general, I can see the time difference is within 3.5%, which is in an 
> > reasonable
> > performance range, I think.
> >
> 
> Hi,
> 
> I ran tests for various percentages of rows being filtered (based on v49 
> patch).
> The result and the script are attached.
> 
> In synchronous mode, with row filter patch, the fewer rows match the row 
> filter,
> the less time it took.
> In the case that all rows match the filter, row filter patch took about the 
> same
> time as the one on HEAD code.
> 
> In asynchronous mode, I could see time is reduced when the percentage of rows
> sent is small (<25%), other cases took about the same time as the one on HEAD.
> 
> I think the above result is good. It shows that row filter patch doesn’t have
> much overhead.
> 

Here is the PNG picture for the performance results. Kindly take it as your 
reference.

Regards,
Tang

Reply via email to