On Fri, 9 Feb 2024 at 20:51, Masahiko Sawada <[email protected]> wrote:
>
> On Thu, Feb 8, 2024 at 6:33 PM Hayato Kuroda (Fujitsu)
> <[email protected]> wrote:
> >
> > Dear Sawada-san,
> >
> > Thanks for making v3 patchset. I have also benchmarked the case [1].
> > Below results are the average of 5th, there are almost the same result
> > even when median is used for the comparison. On my env, the regression
> > cannot be seen.
> >
> > HEAD (1e285a5) HEAD + v3 patches difference
> > 10910.722 ms 10714.540 ms around 1.8%
> >
>
> Thank you for doing the performance test!
>
> > Also, here are mino comments for v3 set.
> >
> > 01.
> > bh_nodeidx_entry and ReorderBufferMemTrackState is missing in typedefs.list.
>
> Will add them.
>
> >
> > 02. ReorderBufferTXNSizeCompare
> > Should we assert {ta, tb} are not NULL?
>
> Not sure we really need it as other binaryheap users don't have such checks.
>
> On Tue, Feb 6, 2024 at 2:45 PM Hayato Kuroda (Fujitsu)
> <[email protected]> wrote:
> >
> > > I've run a benchmark test that I shared before[1]. Here are results of
> > > decoding a transaction that has 1M subtransaction each of which has 1
> > > INSERT:
> > >
> > > HEAD:
> > > 1810.192 ms
> > >
> > > HEAD w/ patch:
> > > 2001.094 ms
> > >
> > > I set a large enough value to logical_decoding_work_mem not to evict
> > > any transactions. I can see about about 10% performance regression in
> > > this case.
> >
> > Thanks for running. I think this workload is the worst and an extreme case
> > which
> > would not be occurred on the real system (Such a system should be fixed),
> > so we
> > can say that the regression is up to -10%. I felt it could be negligible
> > but how
> > do other think?
>
> I think this performance regression is not acceptable. In this
> workload, one transaction has 10k subtransactions and the logical
> decoding becomes quite slow if logical_decoding_work_mem is not big
> enough. Therefore, it's a legitimate and common approach to increase
> logical_decoding_work_mem to speedup the decoding. However, with thie
> patch, the decoding becomes slower than today. It's a bad idea in
> general to optimize an extreme case while sacrificing the normal (or
> more common) cases.
>
Since this same function is used by pg_dump sorting TopoSort functions
also, we can just verify once if there is no performance impact with
large number of objects during dump sorting:
binaryheap *
binaryheap_allocate(int capacity, binaryheap_comparator compare, void *arg)
{
- int sz;
binaryheap *heap;
- sz = offsetof(binaryheap, bh_nodes) + sizeof(bh_node_type) * capacity;
- heap = (binaryheap *) palloc(sz);
+ heap = (binaryheap *) palloc(sizeof(binaryheap));
heap->bh_space = capacity;
heap->bh_compare = compare;
heap->bh_arg = arg;
heap->bh_size = 0;
heap->bh_has_heap_property = true;
+ heap->bh_nodes = (bh_node_type *) palloc(sizeof(bh_node_type)
* capacity);
Regards,
Vignesh