Re: [FFmpeg-devel] [PATCH 4/7] checkasm: use pointers for start/stop functions
Le keskiviikkona 19. heinäkuuta 2023, 0.32.26 EEST Lynne a écrit : > > User-space access to the cycle counter has been deemed a security threat > > due to the Cycle Drift attack, and is disabled as of OpenSBI 1.3. > > > > If FFmpeg does not support Linux perf, FFmpeg will get _no_ performance > > benchmarks on Linux. > > It does support linux perf. I've used it many times. Yes of course. This MR modifies the Linux perf code, so I can see that it's there. My point is that *if* Linux perf is *not* enabled (in ./configure), there will be no benchmarks anymore on newer Linux RISC-V systems. But if it is enabled it, there will be no benchmarks on existing systems. Hence the need for the run-time indirection, or some very major PITA for development and supporting new developers (and we've seen already 3 different people sending RISC-V patches besides me in the past few months). > >> This is a development tool first and foremost. > > > > A development tool is not a justification for leaving a security whole in > > the system. I don't make the rules, and you don't either. OpenSBI and > > Linux make them. > > You're not required to leave them on always. It's a development tool. > You don't disable frequency scaling on all the time on other platforms, > just when doing benchmarks, which is what the tool is meant for. There is no need to disable frequency scaling if you use the most _suited_ counter for microbencmarks, that is to say the cycle counter. And then, disabling frequency scaling is not always feasible. > For ARM, we routinely use an external kernel module to enable > the performance timers just so we avoid linux perf. I don't think an external ARM-specific kernel module is an excuse for breaking checkasm for _normal_ out-of-the-box kernel setups. Just because checkasm is a development tool doesn't mean that it only runs on custom development systems. > Specifically on arm, the linux perf noise is very high, which is where > my concerns are. I fail to see how this MR makes this any worse. You can still disable Linux perf at build time as you could before. Quite the contrary in fact: if the noise is so high then there are no ways that a single indirect function call would make any meaningful difference for the worse. > >> If anyone doesn't want to use rdcycle, they can use linux perf, it still > >> works, with or without the patch. > > > > It does not. > > Why not? It works on arm. The same exact code is used on risc-v > and on any other platform too. Currently it's disabled by default on RISC-V (really anything but Arm) in FFmpeg's configure, and hardly anybody has the necessary OpenSBI and Linux updates for Linux perf to work on RISC-V. We can't simply assume that Linux perf is supported. We're a few years too early for that. And by then, Android will have blocked Linux perf, so we will have the same problem anyway, only for different reasons. > >> Either way, I don't agree with this patch, not accepting it. > > > > The only vaguely valid reason you've given is that this should cache the > > function pointers locally, which version 2 does. > > I disagree with it doing an indirection at all. I can see that, and you're entitled to your opinion. But that's neither technical reasoning, nor actionable review feedback. > It's overhead and a potential source of inaccuracy. If Linux perf is enabled, it's blatantly obvious that whatever additional overhead this adds is negligible. Since you insist, on top of my head, Linux perf already now will do quite a few indirect calls: - from checkasm's PLT to libc's ioctl(), - from libc's syscall to kernel exception vector, - from exception vector to ioctl syscall handler, - from ioctl handler to PMU framework, - from PMU framework to PMU backend driver. And that's grossly over-simplified and glossing over all the other calls and stuff that happen in-between. As for the case that Linux perf is disabled at build time, then I can make a different version that calls AV_READ_TIME directly without indirection if that's what bothers Gramner and/or you. Though you've noted yourself that that path is also very noisy as it is so not like adding one indirect call will matter here :shrug: > It's up to you to prove it wouldn't affect major platforms. I think that I have supplied plenty of technical arguments that there shouldn't be a problem with this MR. And in the unlikely event that there would be an unexpected problem, I'm sure somebody would revert it; it's not an irreversible situation. Not to mention that, in the mean time, Martin did indeed verify that the overhead is small and stable (on Arm). > > Since you're giving zero valid reasons, I'm invoking the TC. > > What, you expect them to materialize an opinion out of thin air? I think that they already have plenty to materialise an opinion from, between the MR, this thread, Martin's benchmarks, and their experience. Otherwise, they always have the possibility to ask for
Re: [FFmpeg-devel] [PATCH 4/7] checkasm: use pointers for start/stop functions
Jul 17, 2023, 20:09 by r...@remlab.net: > Le maanantaina 17. heinäkuuta 2023, 20.48.40 EEST Lynne a écrit : > >> >> > But I still argue that that is, either way, completely negligible >> >> > compared >> >> > to the *existing* overhead. Each loop is making 4 system calls, and >> >> > each >> >> > of those system call requires a direct call (to PLT) and an indirect >> >> > branch (from GOT). If you have a problem with the two additional >> >> > function >> >> > calls, then you can't be using Linux perf in the first place. >> >> >> >> You don't want to ever use linux perf in the first place, it's second >> >> class.> >> > No it isn't. The interface is more involved than just reading a CSR; and >> > sure I'd prefer the simple interface that RDCYCLE is all other things >> > being equal. But other things are not equal. Linux perf is in fact *more* >> > accurate by virtue of not *wrongly* counting other things. And it does >> > not threaten the security of the entire system, so it will work inside a >> > rented VM or an unprivileged process. >> >> Threaten? >> > > User-space access to the cycle counter has been deemed a security threat due > to the Cycle Drift attack, and is disabled as of OpenSBI 1.3. > > If FFmpeg does not support Linux perf, FFmpeg will get _no_ performance > benchmarks on Linux. > It does support linux perf. I've used it many times. >> This is a development tool first and foremost. >> > > A development tool is not a justification for leaving a security whole in the > system. I don't make the rules, and you don't either. OpenSBI and Linux make > them. > You're not required to leave them on always. It's a development tool. You don't disable frequency scaling on all the time on other platforms, just when doing benchmarks, which is what the tool is meant for. For ARM, we routinely use an external kernel module to enable the performance timers just so we avoid linux perf. Specifically on arm, the linux perf noise is very high, which is where my concerns are. >> If anyone doesn't want to use rdcycle, they can use linux perf, it still >> works, with or without the patch. >> > > It does not. > Why not? It works on arm. The same exact code is used on risc-v and on any other platform too. >> >> I don't think it's worth changing the direct inlining we had before. >> >> You're >> >> not interested in whether or not the same exact code is ran between >> >> platforms, >> > >> > Err, I am definitely interested in doing exactly that. I don't want to >> > have to reconfigure and recompile the entire FFmpeg just to switch >> > between Linux perf and raw cycle counter. A contrario, I *do* want to >> > compare performance between vendors once the hardware is available. >> >> That's a weak reason to compromise the accuracy of a development tool. >> > > This is not compromising any accuracy of any development tool. Again, in a > scenario where both RDCYCLE and Linux perf work, Linux perf is more accurate > for reasons that I already outlined in previous messages. And on systems with > newer OpenSBI and kernel, RDCYCLE does not work at all. > >> >> just that the code that's measuring timing is as efficient and >> >> low overhead as possible. >> > >> > Of course not. Low overhead is irrelevant here. The measurement overhead >> > is know and is subtracted. What we need is stable/reproducible overhead, >> > and accurate measurements. >> >> Which is what TSC or the equivalent gets you. It's noisy, but that's because >> it's better and higher accuracy than having to roundtrip through the >> kernel. >> > > We _could_ use RDTIME. That's not blocked. > > And while that should be "low overhead", but measuring time is also obviously > way _less_ accurate than measuring cycles (RDCYCLE), which is in turn _less_ > accurate than measuring cycles only in user mode and only of the current > process (Linux perf). > >> Either way, I don't agree with this patch, not accepting it. >> > > The only vaguely valid reason you've given is that this should cache the > function pointers locally, which version 2 does. > I disagree with it doing an indirection at all. It's overhead and a potential source of inaccuracy. It's up to you to prove it wouldn't affect major platforms. > Since you're giving zero valid reasons, I'm invoking the TC. > What, you expect them to materialize an opinion out of thin air? You have to call for a vote. And bring up why it hasn't been changed in 3 years, even though in the duties listed, it's only supposed to last for a year before elections are called again. Which possibly involves doing elections again. You could've asked the actual maintainer, and one who knows the most about the code, Gramner. But you're focusing too hard on arguing about irrelevant issues, even bringing up security, rather than using reasoning or clearly describing the need for the patch. > All you've done is make it abundantly clear that you don't like Linux perf > and > don't care about the rationale for this patch
Re: [FFmpeg-devel] [PATCH 4/7] checkasm: use pointers for start/stop functions
Le maanantaina 17. heinäkuuta 2023, 20.48.40 EEST Lynne a écrit : > >> > But I still argue that that is, either way, completely negligible > >> > compared > >> > to the *existing* overhead. Each loop is making 4 system calls, and > >> > each > >> > of those system call requires a direct call (to PLT) and an indirect > >> > branch (from GOT). If you have a problem with the two additional > >> > function > >> > calls, then you can't be using Linux perf in the first place. > >> > >> You don't want to ever use linux perf in the first place, it's second > >> class.> > > No it isn't. The interface is more involved than just reading a CSR; and > > sure I'd prefer the simple interface that RDCYCLE is all other things > > being equal. But other things are not equal. Linux perf is in fact *more* > > accurate by virtue of not *wrongly* counting other things. And it does > > not threaten the security of the entire system, so it will work inside a > > rented VM or an unprivileged process. > > Threaten? User-space access to the cycle counter has been deemed a security threat due to the Cycle Drift attack, and is disabled as of OpenSBI 1.3. If FFmpeg does not support Linux perf, FFmpeg will get _no_ performance benchmarks on Linux. > This is a development tool first and foremost. A development tool is not a justification for leaving a security whole in the system. I don't make the rules, and you don't either. OpenSBI and Linux make them. > If anyone doesn't want to use rdcycle, they can use linux perf, it still > works, with or without the patch. It does not. > >> I don't think it's worth changing the direct inlining we had before. > >> You're > >> not interested in whether or not the same exact code is ran between > >> platforms, > > > > Err, I am definitely interested in doing exactly that. I don't want to > > have to reconfigure and recompile the entire FFmpeg just to switch > > between Linux perf and raw cycle counter. A contrario, I *do* want to > > compare performance between vendors once the hardware is available. > > That's a weak reason to compromise the accuracy of a development tool. This is not compromising any accuracy of any development tool. Again, in a scenario where both RDCYCLE and Linux perf work, Linux perf is more accurate for reasons that I already outlined in previous messages. And on systems with newer OpenSBI and kernel, RDCYCLE does not work at all. > >> just that the code that's measuring timing is as efficient and > >> low overhead as possible. > > > > Of course not. Low overhead is irrelevant here. The measurement overhead > > is know and is subtracted. What we need is stable/reproducible overhead, > > and accurate measurements. > > Which is what TSC or the equivalent gets you. It's noisy, but that's because > it's better and higher accuracy than having to roundtrip through the > kernel. We _could_ use RDTIME. That's not blocked. And while that should be "low overhead", but measuring time is also obviously way _less_ accurate than measuring cycles (RDCYCLE), which is in turn _less_ accurate than measuring cycles only in user mode and only of the current process (Linux perf). > Either way, I don't agree with this patch, not accepting it. The only vaguely valid reason you've given is that this should cache the function pointers locally, which version 2 does. All you've done is make it abundantly clear that you don't like Linux perf and don't care about the rationale for this patch because it doesn't suit your personal preferences, especially on IRC: 20:49 <@Lynne> "security" 20:49 <@Lynne> it's a fucking timer 20:49 <@Lynne> "insecure" 20:51 <@Lynne> computers are very good at knowing how much time has passed, to the point of the speed of light being an issue 20:52 <@Lynne> getting rid of this because some script kiddie could potentially figure out a bit by calling this a million times while the CPU is busy is paranoid 20:58 <@Lynne> it's a cache issue, so you fix the cache lifetime, not nerf a timer And it's not a cache issue. Cycle Drift is not a cache issue, it's an instruction timing issue. Since you're giving zero valid reasons, I'm invoking the TC. -- レミ・デニ-クールモン http://www.remlab.net/ ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 4/7] checkasm: use pointers for start/stop functions
Jul 17, 2023, 07:18 by r...@remlab.net: > Le sunnuntaina 16. heinäkuuta 2023, 23.32.21 EEST Lynne a écrit : > >> Introducing additional overhead in the form of a dereference is a point >> where instability can creep in. Can you guarantee that a context will >> always remain in L1D cache, >> > > L1D is not involved here. In version 2, the pointers are cached locally. > >> as opposed to just reading the raw CPU timing >> directly where that's supported. >> > > Of course not. Raw CPU timing is subject to noise from interrupts (and > whatever those interrupts trigger). And that's not just theoretical. I've > experienced it and it sucks. Raw CPU timing is much noisier than Linux perf. > > And because it has also been proven vastly insecure, it's been disabled on > Arm > for a long time, and is being disabled on RISC-V too now. > >> > But I still argue that that is, either way, completely negligible compared >> > to the *existing* overhead. Each loop is making 4 system calls, and each >> > of those system call requires a direct call (to PLT) and an indirect >> > branch (from GOT). If you have a problem with the two additional function >> > calls, then you can't be using Linux perf in the first place. >> >> You don't want to ever use linux perf in the first place, it's second class. >> > > No it isn't. The interface is more involved than just reading a CSR; and sure > I'd prefer the simple interface that RDCYCLE is all other things being equal. > But other things are not equal. Linux perf is in fact *more* accurate by > virtue of not *wrongly* counting other things. And it does not threaten the > security of the entire system, so it will work inside a rented VM or an > unprivileged process. > Threaten? This is a development tool first and foremost. If anyone doesn't want to use rdcycle, they can use linux perf, it still works, with or without the patch. >> I don't think it's worth changing the direct inlining we had before. You're >> not interested in whether or not the same exact code is ran between >> platforms, >> > > Err, I am definitely interested in doing exactly that. I don't want to have > to > reconfigure and recompile the entire FFmpeg just to switch between Linux perf > and raw cycle counter. A contrario, I *do* want to compare performance > between > vendors once the hardware is available. > That's a weak reason to compromise the accuracy of a development tool. >> just that the code that's measuring timing is as efficient and >> low overhead as possible. >> > > Of course not. Low overhead is irrelevant here. The measurement overhead is > know and is subtracted. What we need is stable/reproducible overhead, and > accurate measurements. > Which is what TSC or the equivalent gets you. It's noisy, but that's because it's better and higher accuracy than having to roundtrip through the kernel. > And that's assuming the stuff works at all. You can argue that we should use > Arm PMU and RISC-V RDCYCLE, and that Linux perf sucks, all you want. PMU > access will just throw a SIGILL and end the checkasm process with zero > measurements. The rest of the industry wants to use system calls for informed > reasons. I don't think you, or even the whole FFmpeg project, can win that > argument against OS and CPU vendors. > Either way, I don't agree with this patch, not accepting it. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 4/7] checkasm: use pointers for start/stop functions
Le sunnuntaina 16. heinäkuuta 2023, 23.32.21 EEST Lynne a écrit : > Introducing additional overhead in the form of a dereference is a point > where instability can creep in. Can you guarantee that a context will > always remain in L1D cache, L1D is not involved here. In version 2, the pointers are cached locally. > as opposed to just reading the raw CPU timing > directly where that's supported. Of course not. Raw CPU timing is subject to noise from interrupts (and whatever those interrupts trigger). And that's not just theoretical. I've experienced it and it sucks. Raw CPU timing is much noisier than Linux perf. And because it has also been proven vastly insecure, it's been disabled on Arm for a long time, and is being disabled on RISC-V too now. > > But I still argue that that is, either way, completely negligible compared > > to the *existing* overhead. Each loop is making 4 system calls, and each > > of those system call requires a direct call (to PLT) and an indirect > > branch (from GOT). If you have a problem with the two additional function > > calls, then you can't be using Linux perf in the first place. > > You don't want to ever use linux perf in the first place, it's second class. No it isn't. The interface is more involved than just reading a CSR; and sure I'd prefer the simple interface that RDCYCLE is all other things being equal. But other things are not equal. Linux perf is in fact *more* accurate by virtue of not *wrongly* counting other things. And it does not threaten the security of the entire system, so it will work inside a rented VM or an unprivileged process. > I don't think it's worth changing the direct inlining we had before. You're > not interested in whether or not the same exact code is ran between > platforms, Err, I am definitely interested in doing exactly that. I don't want to have to reconfigure and recompile the entire FFmpeg just to switch between Linux perf and raw cycle counter. A contrario, I *do* want to compare performance between vendors once the hardware is available. > just that the code that's measuring timing is as efficient and > low overhead as possible. Of course not. Low overhead is irrelevant here. The measurement overhead is know and is subtracted. What we need is stable/reproducible overhead, and accurate measurements. And that's assuming the stuff works at all. You can argue that we should use Arm PMU and RISC-V RDCYCLE, and that Linux perf sucks, all you want. PMU access will just throw a SIGILL and end the checkasm process with zero measurements. The rest of the industry wants to use system calls for informed reasons. I don't think you, or even the whole FFmpeg project, can win that argument against OS and CPU vendors. -- Rémi Denis-Courmont http://www.remlab.net/ ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 4/7] checkasm: use pointers for start/stop functions
Jul 15, 2023, 22:13 by r...@remlab.net: > Le lauantaina 15. heinäkuuta 2023, 20.43.26 EEST Lynne a écrit : > >> Jul 15, 2023, 10:26 by r...@remlab.net: >> > Le lauantaina 15. heinäkuuta 2023, 11.05.51 EEST Lynne a écrit : >> >> Jul 14, 2023, 20:29 by r...@remlab.net: >> >> > This makes all calls to the bench start and stop functions via >> >> > function pointers. While the primary goal is to support run-time >> >> > selection of the performance measurement back-end in later commits, >> >> > this has the side benefit of containing platform dependencies in to >> >> > checkasm.c and out of checkasm.h. >> >> > --- >> >> > >> >> > tests/checkasm/checkasm.c | 33 - >> >> > tests/checkasm/checkasm.h | 31 --- >> >> > 2 files changed, 32 insertions(+), 32 deletions(-) >> >> >> >> Not sure I agree with this commit, the overhead can be detectable, >> >> and we have a lot of small functions with runtime a few times that >> >> of a null function call. >> > >> > I don't think the function call is ever null. The pointers are left NULL >> > only if none of the backend initialise. But then, checkasm will bail out >> > and exit before we try to benchmark anything anyway. >> > >> > As for the real functions, they always do *something*. None of them "just >> > return 0". >> >> I meant a no-op function call to measure the overhead of function >> calls themselves, complete with all the ABI stuff. >> > > I > >> >> >> Can you store the function pointers out of the loop to reduce >> >> the derefs needed? >> > >> > Taking just the two loads is out of the loop should be feasible but it >> > seems a rather vain. You will still have the overhead of the indirect >> > function call, the function, and most importantly in the case of Linux >> > perf and MacOS kperf, the system calls. >> > >> > The only way to avoid the indirect function calls are to use IFUNC (tricky >> > and not portable), or to make horrible macros to spawn one bench loop for >> > each backend. >> > >> > In the end, I think we should rather aim for as constant time as possible, >> > rather than as fast as possible, so that the nop loop can estimate the >> > benchmarking overhead as well as possible. In this respect, I think it is >> > actually marginally better *not* to cache the function pointers in local >> > variables, which could end up spilled on the stack, or not, depending on >> > local compiler optimisations for any given test case. >> >> I disagree, uninlining the timer fetches adds another source of >> inconsistency. >> > > Err, outlining the timer makes sure that it's always the exact same code > that's run, and not differently optimised inlinings, at least if LTO is > absent. > (And even with LTO, it vastly reduces the compiler's ability to optimise and > vary the compilation.) Again, given how the calculations are made at the > moment, the stability of the overhead is important, so that we can *compare* > measurements. The absolute value of the overhead, not so much. > Introducing additional overhead in the form of a dereference is a point where instability can creep in. Can you guarantee that a context will always remain in L1D cache, as opposed to just reading the raw CPU timing directly where that's supported. > But I still argue that that is, either way, completely negligible compared to > the *existing* overhead. Each loop is making 4 system calls, and each of > those > system call requires a direct call (to PLT) and an indirect branch (from > GOT). > If you have a problem with the two additional function calls, then you can't > be using Linux perf in the first place. > You don't want to ever use linux perf in the first place, it's second class. I don't think it's worth changing the direct inlining we had before. You're not interested in whether or not the same exact code is ran between platforms, just that the code that's measuring timing is as efficient and low overhead as possible. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 4/7] checkasm: use pointers for start/stop functions
Le lauantaina 15. heinäkuuta 2023, 20.43.26 EEST Lynne a écrit : > Jul 15, 2023, 10:26 by r...@remlab.net: > > Le lauantaina 15. heinäkuuta 2023, 11.05.51 EEST Lynne a écrit : > >> Jul 14, 2023, 20:29 by r...@remlab.net: > >> > This makes all calls to the bench start and stop functions via > >> > function pointers. While the primary goal is to support run-time > >> > selection of the performance measurement back-end in later commits, > >> > this has the side benefit of containing platform dependencies in to > >> > checkasm.c and out of checkasm.h. > >> > --- > >> > > >> > tests/checkasm/checkasm.c | 33 - > >> > tests/checkasm/checkasm.h | 31 --- > >> > 2 files changed, 32 insertions(+), 32 deletions(-) > >> > >> Not sure I agree with this commit, the overhead can be detectable, > >> and we have a lot of small functions with runtime a few times that > >> of a null function call. > > > > I don't think the function call is ever null. The pointers are left NULL > > only if none of the backend initialise. But then, checkasm will bail out > > and exit before we try to benchmark anything anyway. > > > > As for the real functions, they always do *something*. None of them "just > > return 0". > > I meant a no-op function call to measure the overhead of function > calls themselves, complete with all the ABI stuff. I > > >> Can you store the function pointers out of the loop to reduce > >> the derefs needed? > > > > Taking just the two loads is out of the loop should be feasible but it > > seems a rather vain. You will still have the overhead of the indirect > > function call, the function, and most importantly in the case of Linux > > perf and MacOS kperf, the system calls. > > > > The only way to avoid the indirect function calls are to use IFUNC (tricky > > and not portable), or to make horrible macros to spawn one bench loop for > > each backend. > > > > In the end, I think we should rather aim for as constant time as possible, > > rather than as fast as possible, so that the nop loop can estimate the > > benchmarking overhead as well as possible. In this respect, I think it is > > actually marginally better *not* to cache the function pointers in local > > variables, which could end up spilled on the stack, or not, depending on > > local compiler optimisations for any given test case. > > I disagree, uninlining the timer fetches adds another source of > inconsistency. Err, outlining the timer makes sure that it's always the exact same code that's run, and not differently optimised inlinings, at least if LTO is absent. (And even with LTO, it vastly reduces the compiler's ability to optimise and vary the compilation.) Again, given how the calculations are made at the moment, the stability of the overhead is important, so that we can *compare* measurements. The absolute value of the overhead, not so much. But I still argue that that is, either way, completely negligible compared to the *existing* overhead. Each loop is making 4 system calls, and each of those system call requires a direct call (to PLT) and an indirect branch (from GOT). If you have a problem with the two additional function calls, then you can't be using Linux perf in the first place. -- レミ・デニ-クールモン http://www.remlab.net/ ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 4/7] checkasm: use pointers for start/stop functions
Jul 15, 2023, 10:26 by r...@remlab.net: > Le lauantaina 15. heinäkuuta 2023, 11.05.51 EEST Lynne a écrit : > >> Jul 14, 2023, 20:29 by r...@remlab.net: >> > This makes all calls to the bench start and stop functions via >> > function pointers. While the primary goal is to support run-time >> > selection of the performance measurement back-end in later commits, >> > this has the side benefit of containing platform dependencies in to >> > checkasm.c and out of checkasm.h. >> > --- >> > >> > tests/checkasm/checkasm.c | 33 - >> > tests/checkasm/checkasm.h | 31 --- >> > 2 files changed, 32 insertions(+), 32 deletions(-) >> >> Not sure I agree with this commit, the overhead can be detectable, >> and we have a lot of small functions with runtime a few times that >> of a null function call. >> > > I don't think the function call is ever null. The pointers are left NULL only > if none of the backend initialise. But then, checkasm will bail out and exit > before we try to benchmark anything anyway. > > As for the real functions, they always do *something*. None of them "just > return 0". > I meant a no-op function call to measure the overhead of function calls themselves, complete with all the ABI stuff. >> Can you store the function pointers out of the loop to reduce >> the derefs needed? >> > > Taking just the two loads is out of the loop should be feasible but it seems > a > rather vain. You will still have the overhead of the indirect function call, > the function, and most importantly in the case of Linux perf and MacOS kperf, > the system calls. > > The only way to avoid the indirect function calls are to use IFUNC (tricky > and > not portable), or to make horrible macros to spawn one bench loop for each > backend. > > In the end, I think we should rather aim for as constant time as possible, > rather than as fast as possible, so that the nop loop can estimate the > benchmarking overhead as well as possible. In this respect, I think it is > actually marginally better *not* to cache the function pointers in local > variables, which could end up spilled on the stack, or not, depending on > local > compiler optimisations for any given test case. > I disagree, uninlining the timer fetches adds another source of inconsistency. It may be messy, but I think accuracy here is more important than cleanliness, especially as it's a development tool. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 4/7] checkasm: use pointers for start/stop functions
Le lauantaina 15. heinäkuuta 2023, 11.05.51 EEST Lynne a écrit : > Jul 14, 2023, 20:29 by r...@remlab.net: > > This makes all calls to the bench start and stop functions via > > function pointers. While the primary goal is to support run-time > > selection of the performance measurement back-end in later commits, > > this has the side benefit of containing platform dependencies in to > > checkasm.c and out of checkasm.h. > > --- > > > > tests/checkasm/checkasm.c | 33 - > > tests/checkasm/checkasm.h | 31 --- > > 2 files changed, 32 insertions(+), 32 deletions(-) > > Not sure I agree with this commit, the overhead can be detectable, > and we have a lot of small functions with runtime a few times that > of a null function call. I don't think the function call is ever null. The pointers are left NULL only if none of the backend initialise. But then, checkasm will bail out and exit before we try to benchmark anything anyway. As for the real functions, they always do *something*. None of them "just return 0". > Can you store the function pointers out of the loop to reduce > the derefs needed? Taking just the two loads is out of the loop should be feasible but it seems a rather vain. You will still have the overhead of the indirect function call, the function, and most importantly in the case of Linux perf and MacOS kperf, the system calls. The only way to avoid the indirect function calls are to use IFUNC (tricky and not portable), or to make horrible macros to spawn one bench loop for each backend. In the end, I think we should rather aim for as constant time as possible, rather than as fast as possible, so that the nop loop can estimate the benchmarking overhead as well as possible. In this respect, I think it is actually marginally better *not* to cache the function pointers in local variables, which could end up spilled on the stack, or not, depending on local compiler optimisations for any given test case. -- 雷米‧德尼-库尔蒙 http://www.remlab.net/ ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 4/7] checkasm: use pointers for start/stop functions
Jul 14, 2023, 20:29 by r...@remlab.net: > This makes all calls to the bench start and stop functions via > function pointers. While the primary goal is to support run-time > selection of the performance measurement back-end in later commits, > this has the side benefit of containing platform dependencies in to > checkasm.c and out of checkasm.h. > --- > tests/checkasm/checkasm.c | 33 - > tests/checkasm/checkasm.h | 31 --- > 2 files changed, 32 insertions(+), 32 deletions(-) > Not sure I agree with this commit, the overhead can be detectable, and we have a lot of small functions with runtime a few times that of a null function call. Can you store the function pointers out of the loop to reduce the derefs needed? ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".