On Tue, May 20, 2025 at 4:52 PM Paolo Bonzini <pbonz...@redhat.com> wrote:
> Il mar 20 mag 2025, 21:01 Stefan Hajnoczi <stefa...@gmail.com> ha scritto:
>>
>> On Mon, May 19, 2025 at 2:52 PM Tanish Desai <tanishdesa...@gmail.com> wrote:
>> >
>> > Remove hot paths from .c file and added it in .h file to keep it inline.
>>
>> Please include performance results in the commit description so it's
>> clear what impact this change has.
>
>
> Hi Stefan,
>
> I am replying because I take the blame for this :) and as an example of how 
> he could interact with the maintainer.
>
> For now we mostly looked at differences between the code that tracetool 
> generates for the various backends, and the observation that some of them put 
> code in the .h and some in the .c file. I explained to Tanish the concept of 
> separating hot vs cold code in theory, showed him some examples in QEMU where 
> performance measurements were done in the past, and suggested applying this 
> to various backends (starting with the one with the weirdest choice!). 
> However we didn't do any measurement yet.
>
> Some possibilities that come to mind:
>
> 1) maybe the coroutine benchmarks are enough to show a difference, with some 
> luck
>
> 2) a new microbenchmark (or a set of microbenchmarks that try various level 
> of register pressure around the tracepoint), which would be nice to have 
> anyway

2 is going above and beyond :). I'm not trying to make life hard.

Another option if 1 or 2 are difficult to quantify: I'd be happy with
just a "before" and "after" disassembly snippet from objdump showing
that the instructions at a call site changed as intended.

What I'm looking for is a commit description that explains the purpose
of the commit followed by, since this is a performance improvement,
some form of evidence that the change achieved its goal. At that point
it's easy for me to merge because it has a justification and the
nature of the commit will be clear to anyone looking at the git log in
the future.

> 3) perhaps we could try to check the code size for some object files in 
> block/ (for example libblock.a.p/*.c.o), as a proxy for how much instruction 
> cache space is saved when all tracepoints are disabled
>
> We can start from 3, but also try 1+3 and 2+3 if it fails if you think that 
> would be more persuasive.

Can you explain 3 more? Total code size on disk should stay about the
same because most trace events only have one call site. Moving code
between the caller and callee doesn't make a big difference in either
direction. At runtime there is a benefit from inlining the condition
check since the call site no longer needs to execute the callee's
function prologue/epilogue when the trace event is runtime-disabled,
but that CPU cycle and instruction cache effect won't be visible from
the on-disk code size.

Thanks,
Stefan

Reply via email to