Re: Avoid stack frame setup in performance critical routines using tail calls

2024-03-04 Thread Andres Freund
Hi, On 2024-03-04 17:43:50 +1300, David Rowley wrote: > On Thu, 29 Feb 2024 at 00:29, David Rowley wrote: > > I switched over to working on doing what you did in 0002 for > > generation.c and slab.c. > > > > See the attached patch which runs the same test as in [1] (aset.c is > > just there for c

Re: Avoid stack frame setup in performance critical routines using tail calls

2024-03-03 Thread David Rowley
On Thu, 29 Feb 2024 at 00:29, David Rowley wrote: > I switched over to working on doing what you did in 0002 for > generation.c and slab.c. > > See the attached patch which runs the same test as in [1] (aset.c is > just there for comparisons between slab and generation) > > The attached includes s

Re: Avoid stack frame setup in performance critical routines using tail calls

2024-02-22 Thread Andres Freund
Hi, On 2024-02-23 00:46:26 +1300, David Rowley wrote: > I've rebased the 0001 patch and gone over it again and made a few > additional changes besides what I mentioned in my review. > > On Wed, 9 Aug 2023 at 20:44, David Rowley wrote: > > Here's a review of v2-0001: > > 2. Why do you need to add

Re: Avoid stack frame setup in performance critical routines using tail calls

2024-02-22 Thread David Rowley
I've rebased the 0001 patch and gone over it again and made a few additional changes besides what I mentioned in my review. On Wed, 9 Aug 2023 at 20:44, David Rowley wrote: > Here's a review of v2-0001: > 2. Why do you need to add the NULL check here? > > #ifdef USE_VALGRIND > - if (method != MC

Re: Avoid stack frame setup in performance critical routines using tail calls

2023-08-09 Thread David Rowley
On Fri, 21 Jul 2023 at 14:03, David Rowley wrote: > I'll reply back with a more detailed review next week. Here's a review of v2-0001: 1. /* * XXX: Should this also be moved into alloc()? We could possibly avoid * zeroing in some cases (e.g. if we used mmap() ourselves. */ MemSetAligned(ret, 0,

Re: Avoid stack frame setup in performance critical routines using tail calls

2023-08-08 Thread John Naylor
On Wed, Jul 19, 2023 at 3:53 PM Andres Freund wrote: > > Hi, > > David and I were chatting about this patch, in the context of his bump > allocator patch. Attached is a rebased version that is also split up into two > steps, and a bit more polished. Here is a quick test -- something similar was

Re: Avoid stack frame setup in performance critical routines using tail calls

2023-07-19 Thread Andres Freund
Hi, David and I were chatting about this patch, in the context of his bump allocator patch. Attached is a rebased version that is also split up into two steps, and a bit more polished. I wasn't sure what a good test was. I ended up measuring COPY pgbench_accounts TO '/dev/null' WITH (FORMAT 'b

Re: Avoid stack frame setup in performance critical routines using tail calls

2021-07-20 Thread Andres Freund
Hi, On 2021-07-20 19:37:46 +1200, David Rowley wrote: > On Tue, 20 Jul 2021 at 19:04, Andres Freund wrote: > > > * AllocateSetAlloc.txt > > > * palloc.txt > > > * percent.txt > > > > Huh, that's interesting. You have some control flow enforcement stuff > > turned on (the endbr64). And it looks l

Re: Avoid stack frame setup in performance critical routines using tail calls

2021-07-20 Thread David Rowley
On Tue, 20 Jul 2021 at 19:04, Andres Freund wrote: > > * AllocateSetAlloc.txt > > * palloc.txt > > * percent.txt > > Huh, that's interesting. You have some control flow enforcement stuff turned > on (the endbr64). And it looks like it has a non zero cost (or maybe it's > just skid). Did you enab

Re: Avoid stack frame setup in performance critical routines using tail calls

2021-07-20 Thread Andres Freund
Hi, On Mon, Jul 19, 2021, at 23:53, David Rowley wrote: > On Tue, 20 Jul 2021 at 18:17, Andres Freund wrote: > > Any chance you could show a `perf annotate AllocSetAlloc` and `perf annotate > > palloc` from a patched run? And perhaps how high their percentages of the > > total work are. E.g. usin

Re: Avoid stack frame setup in performance critical routines using tail calls

2021-07-19 Thread David Rowley
On Tue, 20 Jul 2021 at 18:17, Andres Freund wrote: > Any chance you could show a `perf annotate AllocSetAlloc` and `perf annotate > palloc` from a patched run? And perhaps how high their percentages of the > total work are. E.g. using something like > perf report -g none|grep -E 'AllocSetAlloc|pal

Re: Avoid stack frame setup in performance critical routines using tail calls

2021-07-19 Thread Andres Freund
Hi, On 2021-07-20 16:50:09 +1200, David Rowley wrote: > I've not taken the time to study the patch but I was running some > other benchmarks today on a small scale pgbench readonly test and I > took this patch for a spin to see if I could see the same performance > gains. Thanks! > This is an A

Re: Avoid stack frame setup in performance critical routines using tail calls

2021-07-19 Thread David Rowley
On Tue, 20 Jul 2021 at 08:00, Andres Freund wrote: > I have *not* carefully benchmarked this, but a quick implementation of this > does seem to increase readonly pgbench tps at a small scale by 2-3% (both Interesting. I've not taken the time to study the patch but I was running some other benchm

Avoid stack frame setup in performance critical routines using tail calls

2021-07-19 Thread Andres Freund
Hi, We have a few routines that are taking up a meaningful share of nearly all workloads. They are worth micro-optimizing, even though they rarely the most expensive parts of a profile. The memory allocation infrastructure is an example of that. When looking at a profile one can often see that a