Re: qemu & LTO

2020-11-10 Thread Jeff Law

On 11/5/20 8:24 AM, Richard W.M. Jones wrote:
> We discovered a few days ago that LTO broke qemu on aarch64.
>
> The original bug reported was:
>
>   https://bugzilla.redhat.com/show_bug.cgi?id=1893892
>
> But actually looking at the build.log[1] we see another assertion
> failure in the test suite.  (Unfortunately although we run the test
> suite, the spec file was ignoring the result so the broken build
> escaped into Rawhide.)
>
> Because qemu is a complicated piece of software we're not clear if the
> bugs found are general bugs in LTO, bugs which are specific to LTO on
> aarch64, or bugs in qemu which are exposed by optimizations made
> possible by LTO.
>
> One thing we do suspect is that this could be the tip of the iceberg
> since the qemu test suite only tests a tiny fraction of the code.
>
> LTO has been disabled across all arches for now.  See the list of
> latest commits that Dan has added:
>
> https://src.fedoraproject.org/rpms/qemu/commits/master

Right.  And it's on my list to investigate (probably sometime in early
2021 given other pressing commitments).  FWIW, there's ~150 LTO opt-outs
on that list to investigate.  While I don't think we need to fix &
enable everything, I do want to thoroughly *understand* every issue and
get appropriate bugs filed.  FWIW, I have a Fedora spec file scanner
which notes all the opted-out packages so I know what needs investigation.


What I've typically seen for execution/testsuite failures have mostly
been package issues, not LTO issues.  Probably the biggest thing is that
LTO can inline across translation units.  So things like poorly written
ASMs that under-specified their dataflow suddenly get inlined and that
under-specification becomes critically important.  The other thing I've
seen a few times is ordering of static constructors for C++ -- LTO
can/will change that and code which relies on specific ordering (which
isn't defined) can fail.   On the LTO side, symbol visibility has been
the biggest headache and they can be insanely hard to track down.

I expect at least some of the opt outs, particularly the target
dependent ones are actually latent codegen issues that LTO happens to
expose, but aren't issues in LTO itself.

There's little doubt we'll find more issues as we work through the
opt-outs.  But that's also why we pushed so hard to get the vast
majority of things up with LTO in F33.  Soak time in Fedora is critical
in my mind.


jeff




.  LTO can change the ordering of static constructors for C++, or
aggressively inline across translation units exposing things like poorly
written asms.  We've seen a small number of (insanely annoying) issues
with symbol visibility in the LTO path itself.



I thought we had LTO disabled for QEMU until I could sit down and dig
into it?!?


jeff

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


qemu & LTO

2020-11-05 Thread Richard W.M. Jones
We discovered a few days ago that LTO broke qemu on aarch64.

The original bug reported was:

  https://bugzilla.redhat.com/show_bug.cgi?id=1893892

But actually looking at the build.log[1] we see another assertion
failure in the test suite.  (Unfortunately although we run the test
suite, the spec file was ignoring the result so the broken build
escaped into Rawhide.)

Because qemu is a complicated piece of software we're not clear if the
bugs found are general bugs in LTO, bugs which are specific to LTO on
aarch64, or bugs in qemu which are exposed by optimizations made
possible by LTO.

One thing we do suspect is that this could be the tip of the iceberg
since the qemu test suite only tests a tiny fraction of the code.

LTO has been disabled across all arches for now.  See the list of
latest commits that Dan has added:

https://src.fedoraproject.org/rpms/qemu/commits/master

Rich.

[1] 
https://kojipkgs.fedoraproject.org//packages/qemu/5.1.0/6.fc34/data/logs/aarch64/build.log
 from https://koji.fedoraproject.org/koji/buildinfo?buildID=1634562

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org