Re: -fno-omit-frame-pointer does not work as advertised

2023-01-17 Thread Daan De Meyer via devel
> What about the new SFrame unwind info?

We're closely following up on this new format and will compare it against frame 
pointers if a patch introducing a kernel unwinder for sframe is proposed and 
likely to be merged. It's still very early days for SFrame though so we'll have 
to see what happens first.

Cheers,

Daan


From: Demi Marie Obenour 
Sent: 16 January 2023 20:33
To: devel@lists.fedoraproject.org
Subject: Re: -fno-omit-frame-pointer does not work as advertised

!---|
  This Message Is From an External Sender

|---!

On 1/16/23 08:40, Florian Weimer wrote:
> * Daniel Alley:
>
>> What has happened is that because -O2 optimized away all of the stack
>> access for the function, so it uses no space on the stack, so there is
>> no stack frame separate from the caller's.
>>
>> It is unlikely that the critical bottleneck of any applications will
>> be on such a function.
>
> Is it?  Plenty of math functions and cryptographic primitives are like
> that.  Anything that makes an inline system call, too.  Maybe you can
> infer from the caller's caller where the time is spent in these cases.
> People certainly seem to be concerned about this gap because they
> included -mno-omit-leaf-frame-pointer in the build flags.
>
> This is something that an upstream/ABI discussion could cover, with some
> sort of protocol that ensures the toolchain produces something the
> intended tools can consume.  For example, there could be a rule that
> only frames up to a certain size may lack a frame pointer, so that a
> fixed-size copy from the top of the stack can recover the caller address
> by looking at the DWARF unwinding data (out of context, for that frame
> alone).  Or it could be spelt out that LBR has to be used to recover the
> calling frame.  This isn't really something that Fedora can implement in
> a downstream change, though.
What about the new SFrame unwind info?
--
Sincerely,
Demi Marie Obenour (she/her/hers)
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: FESCo revote on "Add -fno-omit-frame-pointer" Change proposal [was Re: Schedule for Tuesday's FESCo Meeting (2023-01-03)]

2023-01-09 Thread Daan De Meyer via devel
> I think it would save everyone a bit of time if we restricted the change
> to x86-64.  We do not have much experience with the -mbackchain flag
> that was added at the last minute on s390x.  The change owners have
> stated that they aren't interested in s390x.  IBM doesn't want this.
> Platform Tools does not want it.  I doubt the desktop team does GNOME
> performance analysis on s390x on Fedora.  I'm not even sure if the tools
> support backchain-based unwinding; it's not a frame pointer after all.
> Maybe -mbackchain won't cause any issues in after all, but we just don't
> have the time to test this before the mass rebuild.

I had a look at the kernel unwinding code for s390 and it seems to use
the backchain if it's available and fall back to the frame pointer otherwise.
So from our end (change proposal authors) we're OK with dropping
mbackchain for s390 and only using fno-omit-frame-pointer for s390.
We'll open a PR to change this in the rpm macros.

> As Jakub and I have repeatedly explained, -fno-omit-frame-pointer on
> i686 is known to break certain packages (although I worked around this
> in glibc last year), simply because the reduced number of registers
> makes it impossible for GCC to compile certain functions with inline
> assembly in them.  As with s390x, the concrete impact is not known at
> this point, and we are out of time for test builds.

Given these issues should manifest as compilation failures, we should notice
very clearly once the mass rebuild starts if there's a bigger problem. If 
there's
only a few packages that run into issues, they can opt-out. If there's larger 
problems,
we can remove frame pointers from i686.

> Using -mno-omit-leaf-frame-pointer for aarch64 seems to be another
> last-minute addition without any clear justification.  (On AArch64, the
> link register allows one to recover the address of the immediate caller
> even if a leaf function does not have a frame pointer.  That's not
> possible on x86-64, where the caller's address must be read from the
> stack, and that has to be based on the frame pointer.)  Just because the
> compiler option is there to enable doesn't mean it does anything useful
> in this context.

As I mentioned in the fesco ticket, the kernel unwinder looks at the
frame pointer register (x29) first when starting an unwind on aarch64 before 
looking
at the link register. As such, it seems logical to require frame pointers to be 
available
in leaf functions so the frame pointer register is available to start 
unwinding. I'm happy
to be proven wrong here so we can remove mno-omit-leaf-frame-pointer for 
aarch64.

Cheers,

Daan De Meyer


From: Daan De Meyer 
Sent: 09 January 2023 19:21
To: Matthew Miller; Development discussions related to Fedora
Subject: Re: FESCo revote on "Add -fno-omit-frame-pointer" Change proposal [was 
Re: Schedule for Tuesday's FESCo Meeting (2023-01-03)]

> I think it would save everyone a bit of time if we restricted the change
> to x86-64.  We do not have much experience with the -mbackchain flag
> that was added at the last minute on s390x.  The change owners have
> stated that they aren't interested in s390x.  IBM doesn't want this.
> Platform Tools does not want it.  I doubt the desktop team does GNOME
> performance analysis on s390x on Fedora.  I'm not even sure if the tools
> support backchain-based unwinding; it's not a frame pointer after all.
> Maybe -mbackchain won't cause any issues in after all, but we just don't
> have the time to test this before the mass rebuild.

I had a look at the kernel unwinding code for s390 and it seems to use
the backchain if it's available and fall back to the frame pointer otherwise.
So from our end (change proposal authors) we're OK with dropping
mbackchain for s390 and only using fno-omit-frame-pointer for s390.
We'll open a PR to change this in the rpm macros.

> As Jakub and I have repeatedly explained, -fno-omit-frame-pointer on
> i686 is known to break certain packages (although I worked around this
> in glibc last year), simply because the reduced number of registers
> makes it impossible for GCC to compile certain functions with inline
> assembly in them.  As with s390x, the concrete impact is not known at
> this point, and we are out of time for test builds.

Given these issues should manifest as compilation failures, we should notice
very clearly once the mass rebuild starts if there's a bigger problem. If 
there's
only a few packages that run into issues, they can opt-out. If there's larger 
problems,
we can remove frame pointers from i686.

> Using -mno-omit-leaf-frame-pointer for aarch64 seems to be another
> last-minute addition without any clear justification.  (On AArch64, the
> link register allows one to recover the address of the immediate caller
> even if a leaf function does not have a frame pointer.  That's not
> possible on x86-64, where the caller's address must be read from the
> stack, and that has to be based on 

Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change proposal)

2022-11-01 Thread Daan De Meyer via devel
I've added a new section to the proposal with the benchmark results of some 
benchmarks we performed against a Fedora 37 system built with frame pointers 
and a regular Fedora 37 system. The impact on most benchmarks seems limited 
aside from the CPython benchmark suite (pyperformance). See the proposal itself 
for the details.

Cheers,

Daan De Meyer
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: limiting the (systemd) journal size

2022-09-29 Thread Daan De Meyer via devel
Heads up that I'm trying to get https://github.com/systemd/systemd/pull/22998 
in before the next systemd release which should reduce the journal size by +- 
50% in a way that will be taken into account by journald's retention logic 
(unlike the btrfs compression).

Also, as soon as there's a kernel API to query compressed file size I'll update 
journald's retention logic to use that so we can take the actual file size into 
account when making retention decisions.

Cheers,

Daan De Meyer


From: Chris Murphy 
Sent: 27 September 2022 17:12
To: fedora devel
Subject: limiting the (systemd) journal size

!---|
  This Message Is From an External Sender

|---!

Hi,

Fedora uses systemd-journald for system logging. By default it is a persistent 
log kept on /var, and uses up to 4G disk space, although in certain 
circumstances it can go a bit higher. See 'man journald.conf' for details.

Example:
>Sep 27 07:26:05 fovo.local systemd-journald[602]: System Journal 
>(/var/log/journal/$machine_id) is 385.9M, max 4.0G, 3.6G free.

In this example Fedora 37 Workstation system, logging is happening since August 
20, is about 10M/day of journal accumulation, or 1.12 years of journals before 
garbage collection begins.

Exactly what will trigger garbage collection depends on the system. There are 
quite a few knobs for adjusting various aspects of retention and how granular 
the garbage collection will be. e.g. it's common to see 64M system journal 
files that contain weeks of entries. It's also possible to limit the journal 
file size, thus improving granularity whether to retain a bit more or less than 
the ideal amount.

Some folks use services with verbose or debug logging. 4G might only be a few 
months of logs in such a case. Whereas other folks have a small root device in 
which even the smaller of 10% or 4G can be quite a lot and in certain cases is 
not a hard limit.

Also note that on Btrfs with compression enabled, the stored amount is quite a 
bit less. Like all of user space, systemd-journald sees the uncompressed file 
sizes, so its retention behavior hasn't changed as a result of btrfs 
compression. What has changed is we're only (physically) storing about 1/3 of 
whatever the max retention is on a given system.

The obvious bike-shedding questions are:
Is 4G is too much or too little? If so what amount it should be? Is size still 
the correct approach? Or should we consider a max retention time? And if so, 
what would it be and how granular should it be?

Also, what's the scope? Is a change needed Fedora-wide, in a manner that's 
upstreamable? That could prove difficult because any change will negatively 
impact other use cases, not least of which is what the upgrade behavior should 
be if it'll involve trimming journals. Are the current defaults optimal for 
most use cases most of the time? There will be a higher burden of persuasion to 
get a Fedora-wide change, rather than optimizing for just desktops.

But that isn't intended to limit the discussion to just the desktop case. Just 
to be aware that the broader and grander the change, the more consideration of 
the consequences there needs to be, i.e. less bike shedding.

More background and discussion upstream and Workstation working group issues. 
[1]



[1]
https://pagure.io/fedora-workstation/issue/213
https://github.com/systemd/systemd/issues/17382

--
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change proposal)

2022-07-11 Thread Daan De Meyer via devel
> You still might have LBR buffers deep enough for your purposes, I think
> that's worth checking.  They have been around for much longer (on
> Intel).

We've been using LBR opportunistically for a while if available to augment 
frame 
pointer based stacks. It turns out to be quite helpful at the lowest levels of 
the 
stacktrace where it can help to work around the lack of frame pointers in base 
system libraries or libraries using inline assembly.

32 entries are not sufficient to capture all of our stacktraces though so using
only LBR is not sufficient for our profiling.

Currently we augment the frame pointer stacks with LBR data in userspace but 
I've
asked Andrii to look into doing this augmentation directly in the kernel so that
everyone can benefit from it. I think it might help with cases such as the 
glibc string
functions as well (but you probably know more about that than I do).

> Does your use case actually involve high-frequency time-based profiling,
> or is it more about being able to get the data at all, and process it
> further using BPF?

Yes, we're doing high-frequency sampling profiling using the perf 
subsystem (see the proposal for a detailed description). We attach
BPF programs to perf events for some minimal post processing before 
storing the data.

Cheers,

Daan
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change proposal)

2022-07-11 Thread Daan De Meyer via devel
> If we can get SHSTK to work, the value of the DWARF integration and
> performance work will diminish fairly quickly because most developers
> will soon have CPUs with fairly deep (32 entry) LBR buffers, SHSTK
> support, or both.

This seems like a fairly bold assumption. I also want to add that as discussed
in the proposal, we want to enable profiling not just on our laptops, but across
our entire fleet that's running various generations of hardware. We can't simply
replace all of our hardware just to get shadow stack support unfortunately. So
we can't rely on new hardware features to get stacktraces.

Of course, if shadow stack support lands upstream, is found to be reliable and 
is
fully supported by all hardware running on our fleet, we'd definitely look into 
using
it instead of frame pointers. But's it's going to take many years before we can 
rely
by all our hardware.

Aside from our use case, I don't think developers are constantly replacing their
hardware either. I'd guess that with this approach we'd have many years of
developers debugging why they're not getting full stacktraces only to find out
their hardware doesn't support shadow stacks.

So to summarize, while we're anxiously awaiting for one of the mentioned
alternatives to become viable, at the moment we think all of them result in a
degraded profiling experience compared to frame pointers, either due to being
slow, being a prototype and not available upstream, or due to requiring new
hardware support.

Cheers,

Daan


From: Florian Weimer 
Sent: 11 July 2022 07:12
To: Matthias Clasen
Cc: Development discussions related to Fedora
Subject: Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation 
flags (System-Wide Change proposal)

* Matthias Clasen:

> On Wed, Jul 6, 2022 at 3:06 PM Florian Weimer  wrote:
>
>> If the GNOME's sysprof does not work with Fedora, fix it or use
>> something else.  Do not change how Fedora is built.
>
> The result of that attitude is that performance work in the desktop
> space is happening on GNOME OS images, or in Flatpak runtimes instead
> of on Fedora. Which is a bit sad for Fedora as a supposedly
> developer-friendly environment.

My comment was specifically about sysprof.  I've been told that the
GNOME developers will not even consider anything else.  This means that
we need to fix sysprof.  If we do that, it will be possible to use GNOME
OS for profiling on older CPUs, and hardware-assisted backtraces on
newer CPUs on Fedora (at least Skylake and Zen 3, especially once we've
got userspace SHSTK support).

Even if this proposal is not accepted, I think we can collaborate on a
couple of things:

* Enhance sysprof with LBR and SHSTK support.

* Enable userspace backtrace generation from BPF without frame pointers
  (possibly by using LBR and SHSTK at first).

* Investigate use of the Systemtap and elfutils unwinders in these
  tools.

* Speed up decoding of DWARF data structures using the BMI instruction
  sets (which only operate on scalar registers and should therefore be
  usable even within the kernel).  According to
  
  that's a major source of DWARF processing overhead, and I don't think
  it has to be.

I'll try to get confirmation that it is technically feasible in priciple
to use SHSTK to get arbitrarily deep backtraces from kernel space for
userspace applications.

If we can get SHSTK to work, the value of the DWARF integration and
performance work will diminish fairly quickly because most developers
will soon have CPUs with fairly deep (32 entry) LBR buffers, SHSTK
support, or both.

Thanks,
Florian
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change proposal)

2022-07-10 Thread Daan De Meyer via devel

> I strongly prefer the latter approach.  I believe the unwinder
> executes in NMI context, meaning that it must not block and must finish
> executing in a bounded amount of time.  Furthermore, any oops becomes
> an immediate kernel panic.  The eBPF verifier can trivially guarantee
> that the unwinder satisfies the properties needed here.  For security
> reasons, submitting eBPF programs is a privileged operation, but some
> programs could be compiled into the kernel and thus considered trusted.
> Such programs could be used without any special privileges.
>
> The key advantage of this approach is that privileged user-mode
> profiling tools, such as sysprof, can submit their own eBPF unwinders.
> This means that the kernel does not need to support whatever unwind
> info format userspace uses.  One could use DWARF, ORC, or any other
> format one wishes.

BPF programs do not have access to arbitrary ELF sections AFAIK. Every EBPF
unwinder that I've found is implemented via preprocessing the unwind format
in userspace and storing that in BPF maps so that it can be accessed from the
BPF program.

Effectively, this means that every program that wants to do unwinding
in BPF has to do this preprocessing and store all the required information
in BPF maps. When you don't know which program you're going to be
requesting a stacktrace for, this effectively means userspace has to provide
this information for every program that might run on the system. While this
might work for dedicated long-running system profiling daemons, it is not
an option for software such as perf or bpftrace since it would drastically
increase their startup time, as well as their overall resource usage.

Cheers,

Daan


From: Demi Marie Obenour 
Sent: 09 July 2022 04:02
To: devel@lists.fedoraproject.org
Subject: Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation 
flags (System-Wide Change proposal)

On 7/8/22 20:18, Christian Hergert wrote:
>> That is the problem right here: .eh_frame-based unwinding is too slow, so it 
>> has to be
>> done offline in userspace.  What about instead adding ORC information to 
>> userspace?  That
>> would be much faster to use.
>
> I'm not familiar with ORC, but there are a few things that initially come to
> mind in looking towards such a solution.
>
> First, are there any examples of perf being able to reference ORC data coming
> from user-space or is it currently limited to PERF_CONTEXT_KERNEL? For
> system-wide profiling, we still require that the kernel can do high-velocity
> unwinding across address contexts.

Why does the unwinding need to happen in the kernel?  The kernel can
already asynchronously invoke userspace code in the form of signal
handlers.  Is the problem that it is necessary to collect profiling
information in the middle of a system call, where another syscall
would see inconsistent (and potentially exploitable) kernel state?

> My (limited) understanding of ORC is that the result produced by objtool gets
> you a series of unwind tables, but those tables require further processing by
> the kernel at boot.
>
> Again, I have limited understanding, but wouldn't something need to
> be processed as part of spawning and loading executable pages? There are both
> .orc_unwind and .orc_unwind_ip sections, both of which need to be sorted. I
> don't know what layer would be responsible for that, or how it adapts to
> dlopen(), double-mapping pages like libffi, etc... but I'm sure people will
> have opinions about it.

Ouch.  That is a serious problem for a number of reasons, not least
of which is security.  Having the kernel parse even more complex
untrusted input in C is a horrible idea.

I can think of at least two better options:

1. Wait for Rust support to be merged, and write the unwinder in Rust.
2. Implement the unwinder as an eBPF program.

I strongly prefer the latter approach.  I believe the unwinder
executes in NMI context, meaning that it must not block and must finish
executing in a bounded amount of time.  Furthermore, any oops becomes
an immediate kernel panic.  The eBPF verifier can trivially guarantee
that the unwinder satisfies the properties needed here.  For security
reasons, submitting eBPF programs is a privileged operation, but some
programs could be compiled into the kernel and thus considered trusted.
Such programs could be used without any special privileges.

The key advantage of this approach is that privileged user-mode
profiling tools, such as sysprof, can submit their own eBPF unwinders.
This means that the kernel does not need to support whatever unwind
info format userspace uses.  One could use DWARF, ORC, or any other
format one wishes.

Christian, would this be sufficient for your needs?
--
Sincerely,
Demi Marie Obenour (she/her/hers)
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora 

Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change proposal)

2022-07-06 Thread Daan De Meyer via devel
I've just updated the proposal with an extended description describing the use 
cases enabled by frame pointers in more details. More specifically, on top of 
describing the profiling use case in much more detail, I've also added a 
section on BPF debugging tooling, such as bcc and bpftrace, which will also 
benefit from much more reliable stacktraces if this proposal is implemented.

I've also clarified that we'll also add -mno-omit-leaf-frame-pointer to the 
compiler options. I think this is already implied by -fno-omit-frame-pointer 
for GCC (maybe a GCC expert can correct me if I'm wrong), but it's better to be 
explicit.

Finally, I've added a description of shadow stacks to the alternatives section, 
which are a new hardware feature that might be an option for unwinding in the 
far future, but at the moment lacks widespread support and isn't available in 
the kernel so it's not an option just yet.

Cheers,

Daan
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change proposal)

2022-07-05 Thread Daan De Meyer via devel
The proposed configuration is to add "-fno-omit-frame-pointer 
-mno-omit-leaf-frame-pointer" to the default compilation flags. Are you 
alluding to inline assembly that won't have frame pointers set up correctly 
even with these two options enabled?

Cheers,

Daan
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change proposal)

2022-07-05 Thread Daan De Meyer via devel
The goal was to try and reproduce the phoronix benchmark results so this is 
without any system dependencies rebuilt with frame pointers, same as the 
phoronix benchmark.

Cheers,

Daan
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change proposal)

2022-07-04 Thread Daan De Meyer via devel
> I have had to use frame pointers, but only for deeply embedded projects where 
> the cost
> tradeoffs are different and a smaller constrained unwinder was needed.

As mentioned in the change proposal, when using sampling profilers that rely on 
fast access to the stacktrace, there is currently no viable alternative to 
frame pointers. DWARF unwinding in absence of frame pointers is too slow 
because of the complexity of the DWARF format and the necessity to copy the 
stack to userspace and do unwinding there due to the lack of an in kernel DWARF 
unwinder.

Looking at the future, we will be following up on the alternative approaches 
such as CTF Frame which will hopefully provide us with a sufficiently fast way 
to unwind the stack in the kernel itself without requiring frame pointers. 
Until such an alternative is available, we see no option but to use frame 
pointers in order to do reliable and fast profiling.

Cheers,

Daan
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change proposal)

2022-07-04 Thread Daan De Meyer via devel
Similarly, for the sysbench RAM test, which was the other test in the phoronix 
benchmark showing substantial regressions when compiled with frame pointers, we 
were unable to reproduce the results. Our results are as follows:

https://user-images.githubusercontent.com/9395011/177169145-d19bab77-cd97-44d0-9c0b-a0a76b16712e.png

While our results also show a difference in performance, the sysbench benchmark 
is also somewhat noisy as shown by the standard deviation.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change proposal)

2022-06-29 Thread Daan De Meyer via devel
Given the recent benchmarks from Phoronix 
(https://www.phoronix.com/scan.php?page=article=fedora-frame-pointer=1)
 on the proposal that showed some surprising results, we went and tried to 
reproduce some of the benchmarks to make sure they were actually making sense.

The first one we looked at is the redis benchmark from 
https://www.phoronix.com/scan.php?page=article=fedora-frame-pointer=5. 
We were unable to reproduce the results from the Phoronix article.

Redis GET: 
https://user-images.githubusercontent.com/9395011/176536797-7424d40f-7140-46f8-89d3-7b555aa4cd13.png
Redis SET: 
https://user-images.githubusercontent.com/9395011/176536624-eeb5f85c-a63b-4987-8b3b-2b3607be0cf8.png

Instead, we only saw differences from 0%-2% between Redis compiled with frame 
pointers and Redis compiled without frame pointers. These benchmarks were done 
using the phoronix-test-suite in exactly the same way as documented in the 
phoronix article.

The other one we've looked at is the Botan AES-256 benchmark 
(https://www.phoronix.com/scan.php?page=article=fedora-frame-pointer=2).
 Initially, we were able to reproduce the results of this benchmark when 
setting CXXFLAGS="-fno-omit-frame-pointer". However, what we found here, is 
that due to the way Botan's custom build system works, when the CXXFLAGS 
environment variable is set to enable -fno-omit-frame-pointer, the botan binary 
is built in debug mode without optimizations whereas when CXXFLAGS is unset, 
the botan binary is build in release mode (-O3). This explains the huge 
difference in performance in the botan AES-256 benchmark.

When making sure both binaries are built in release mode by setting 
CXXFLAGS="-O2" and CXXFLAGS="-O2 -fno-omit-frame-pointer" respectively, we get 
the following results :

Without frame pointers:

AES-256 encrypt buffer size 1024 bytes: 5410.085 MiB/sec 0.42 cycles/byte 
(2705.04 MiB in 500.00 ms)
AES-256 decrypt buffer size 1024 bytes: 5407.610 MiB/sec 0.42 cycles/byte 
(2703.81 MiB in 500.00 ms)

With frame pointers:

AES-256 encrypt buffer size 1024 bytes: 5359.241 MiB/sec 0.42 cycles/byte 
(2679.62 MiB in 500.00 ms)
AES-256 decrypt buffer size 1024 bytes: 5404.226 MiB/sec 0.42 cycles/byte 
(2702.11 MiB in 500.00 ms)

Which shows a smaller than 1% slowdown between the binary built with frame 
pointers and the binary built without frame pointers.

Supposedly, the Phoronix benchmark was also built with "-O2" in both 
configurations but given that we saw very similar results to what was in the 
phoronix benchmark result when building Botan in debug mode, we assume that's 
what happened with the AES Botan benchmark.

We haven't yet dived deeper into the other benchmarks, but we expect that the 
benchmarks showing significant differences might suffer from similar issues, 
where the huge differences are not caused by the inclusion of frame pointers, 
but other unrelated issues such as the botan case where setting CXXFLAGS causes 
binaries to be built in debug mode unless an explicit optimization mode is set.

These benchmarks were done on an Amazon EC2 instance running Fedora 36 Cloud 
edition. The full details as reported by phoronix-test-suite can be found here: 
https://user-images.githubusercontent.com/9395011/176538700-c82974fa-fbb5-4146-be96-d6db1ce7dfb0.png
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change proposal)

2022-06-20 Thread Daan De Meyer via devel
That makes total sense. We're looking at doing exactly this. Are there any 
particular benchmarks the community would be interested in? Our idea was to 
take the phoronix test suite and run some of the relevant suites from that with 
and without the updated packages and compare the results. On a quick initial 
look, compilation and compression suites look like relevant benchmarks for this 
use case. If there's any other benchmarks that would be interesting, I can take 
a look at running those as well.

Cheers,

Daan
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change proposal)

2022-06-17 Thread Daan De Meyer via devel
> Regressions of such magnitude can veto such changes, especially when
> they hit everyone, not just those who are highly dependent on the
> profiling tools the proposal is concerned about.

The kernel benchmarks were added as an example of openly available data we 
could find on the potential impact of frame pointers. Note that the email from 
Mel Gorman is all we have to go on. Unfortunately the original data from the 
benchmarks is gone so I can't try to reproduce them. I've emailed Mel to see if 
he still has the benchmarks stored somewhere so we can perhaps try to reproduce 
the results.

I've added a clarification to the change proposal that we don't intend to 
actually compile the kernel with frame pointers, since the kernel is already 
built with ORC support and this works well so there's nothing to really be 
gained by building the kernel with frame pointers. That means we won't see the 
kernel regressions that were reported by the Suse benchmarks.

Unfortunately, there's no readily available benchmarks that I've been able to 
find that would show the exact impact of frame pointers on common Fedora 
workflows. The Phoronix benchmark suite could be used but that would imply 
doing a mass rebuild with frame pointers before we could actually run it and 
measure the impact.

Also, as mentioned in the proposal, all our internal services at Meta are built 
with frame pointers enabled. We did canaries a few years ago on some of our 
most CPU intensive services to see if it would make sense to build them without 
frame pointers, and found that there were no significant enough wins to be had 
to justify the loss in continuous profiling data caused by building without 
frame pointers

> (Are you referring to a novel kernel-resident tool?)

Unfortunately, no, there's no in-kernel DWARF unwinder due to the complexity 
involved. Instead, the kernel uses ORC and has an unwinder for that. Adding ORC 
support to all of Linux userspace so that we can unwind it in the kernel isn't 
likely to happen, since all tooling would have to be changed to support ORC.


> The proposal doesn't characterize the "reasonably low overhead" that
> this operation targets.  That makes it hard to judge the tradeoffs.

Characterizing the impact would mean rebuilding most of the distro with frame 
pointers and running a comprehensive benchmark suite on it. Doing this will be 
a rather involved process. If you know of any other representative benchmark 
suites that we could run that wouldn't require rebuilding most of the distro, 
we could look into running these with and without frame pointers to measure the 
impact.

> If typing that option were a hardship, it could be made default on
> Fedora.  With broad debuginfod auto-downloading capability, maybe it's
> worth considering.

The issue with DWARF isn't that we have to add an extra option to perf, it's 
that without an in kernel DWARF unwinder (which is very unlikely to ever  
happen as discussed above), it's expensive to use DWARF for stacktrace 
unwinding, as we have to copy the entire stack and unwind it in user space, 
which adds substantial overhead. This means we can't use it for continuous 
profiling.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure