Re: XDR related breakage in libvirt v6.6.0 when using libvirt-lxc

2020-08-25 Thread Daniel P . Berrangé
On Tue, Aug 25, 2020 at 04:26:01PM +0200, Christian Ehrhardt wrote:
> On Tue, Aug 25, 2020 at 4:07 PM Daniel P. Berrangé  
> wrote:
> >
> > On Tue, Aug 25, 2020 at 03:16:50PM +0200, Christian Ehrhardt wrote:
> > > Hi,
> > > I expect that this falls under the "with meson now everything is
> > > different anyway" umbrella but wanted to let you know about this as it
> > > affects v6.6 in at least Ubuntu/Debian.
> > >
> > > The following recent patch has broken libvirt-lxc for us:
> > > commit d7147b3797380de2d159ce6324536f3e1f2d97e3
> > > Author: Pavel Hrdina 
> > > Date: Fri Jun 19 00:44:07 2020 +0200
> > > m4: virt-xdr: rewrite XDR check
> > >
> > > I was tracking that down for [1] since the tests [4] failed on me. [2]
> > > holds the backtrace.
> > > In Debian the tests are skipped which explains why they were not seen 
> > > there:
> > >   smoke-lxc SKIP Test requires machine-level isolation but testbed
> > > does not provide that
> > >
> > > What happens is that the libvirt_lxc segfaults when using XDR functions.
> > >
> > > dmesg shows:
> > > [582093.524644] libvirt_lxc[261446]: segfault at 0 ip 
> > > sp 7ffdd2345598 error 14 in libvirt_lxc[5587e42aa000+8000]
> > > [582093.524650] Code: Bad RIP value.
> > >
> > > There are quite some uncertainties left, but on the surface it seems
> > > that it links with libtirpc but
> > > then instead of calling
> > > libtirpc: src/xdr.c:929:xdr_uint64_t(xdrs, ullp)
> > > it ends (gdb tells us in [2]) in glibc
> > > glibc: sunrpc/xdr_intXX_t.c:62:xdr_uint64_t (XDR *xdrs, uint64_t *uip)
> > >
> > > And the return from that function breaks it badly (instruction pointer
> > > at 0x0 -> segfault)
> >
> > Right so that's a serious problem with clashing symbols between tirpc
> > and glibc.
> >
> > In Fedora/RHEL it is impossible to build against glibc for the XDR
> > symbols for a long time now. Glibc maintainers want everyone to be
> > using tirpc.   The symbols are still exported from glibc, but they
> > should only be used by legacy apps built against older glibc.
> >
> > Symbol versioning should ensure libvirt_lxc always resolves to the
> > libtirpc library
> >
> > $ eu-readelf -a /usr/lib64/libc.so.6 | grep xdr_uint64 | grep GLOBAL
> >  2017: 001349c0226 FUNCGLOBAL DEFAULT   15 
> > xdr_uint64_t@GLIBC_2.2.5
> >
> >
> > $ eu-readelf -a /usr/lib64/libtirpc.so | grep xdr_uint64 | grep GLOBAL
> >   344: 0001ce20  9 FUNCGLOBAL DEFAULT   14 
> > xdr_uint64_t@@TIRPC_0.3.0
> >
> > $ eu-readelf -a /usr/libexec/libvirt_lxc  | grep xdr_uint64
> >   0x00024a30  X86_64_JUMP_SLOT 00  +0 
> > xdr_uint64_t
> >   149:   0 FUNCGLOBAL DEFAULTUNDEF 
> > xdr_uint64_t@TIRPC_0.3.0 (13)
> 
> ubuntu@groovy:~$ eu-readelf -a /lib/x86_64-linux-gnu/libc.so.6 | grep
> xdr_uint64 | grep GLOBAL
>  2019: 00159ed0228 FUNCGLOBAL DEFAULT   16
> xdr_uint64_t@@GLIBC_2.2.5
> ubuntu@groovy:~$ eu-readelf -a /lib/x86_64-linux-gnu/libtirpc.so.3.0.0
> | grep xdr_uint64 | grep GLOBAL
>   343: 0001ae20  9 FUNCGLOBAL DEFAULT   15
> xdr_uint64_t@@TIRPC_0.3.0
> 
> Ubuntu v6.0 builds
> ubuntu@groovy:~$ eu-readelf -a /usr/lib/libvirt/libvirt_lxc  | grep xdr_uint64
>   0x00026820  X86_64_JUMP_SLOT 00  +0 xdr_uint64_t
>99:   0 FUNCGLOBAL DEFAULTUNDEF
> xdr_uint64_t@GLIBC_2.2.5 (4)
>   [  1c02]  xdr_uint64_t
> 
> Ubuntu v6.6 builds
> ubuntu@groovy:~$ eu-readelf -a /usr/lib/libvirt/libvirt_lxc  | grep xdr_uint64
>   0x000268d0  X86_64_JUMP_SLOT 00  +0 xdr_uint64_t
>   104:   0 FUNCGLOBAL DEFAULTUNDEF
> xdr_uint64_t@GLIBC_2.2.5 (4)
>   [  1a81]  xdr_uint64_t
> 
> They miss the version 3.0 entry - interesting.
> 
> libvirt 6.6 build from git on the same system:
> $ eu-readelf -a libvirt/build/src/.libs/libvirt_lxc  | grep xdr_uint64
>   0x00028968  X86_64_JUMP_SLOT 00  +0 xdr_uint64_t
>99:   0 FUNCGLOBAL DEFAULTUNDEF
> xdr_uint64_t@GLIBC_2.2.5 (3)
>   598:   0 FUNCGLOBAL DEFAULTUNDEF
> xdr_uint64_t@@GLIBC_2.2.5
>   [  31df]  xdr_uint64_t@@GLIBC_2.2.5
>   [  18f4]  xdr_uint64_t
> 
> That is with
> configure:  xdr: yes (CFLAGS='-I/usr/include/tirpc'
> LIBS='-ltirpc')
> 
> So something is wrong at build time when glibc AND tirpc provide that symbol.

The 'struct XDR' contents is public and defined differently in glibc
vs tirpc. So when libvirt has been built using libtirpc include files
but linked to glibc, it could end up accessing bad fields in the struct
if we use any of the inline'd accessor functions.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-

Re: XDR related breakage in libvirt v6.6.0 when using libvirt-lxc

2020-08-25 Thread Christian Ehrhardt
On Tue, Aug 25, 2020 at 4:07 PM Daniel P. Berrangé  wrote:
>
> On Tue, Aug 25, 2020 at 03:16:50PM +0200, Christian Ehrhardt wrote:
> > Hi,
> > I expect that this falls under the "with meson now everything is
> > different anyway" umbrella but wanted to let you know about this as it
> > affects v6.6 in at least Ubuntu/Debian.
> >
> > The following recent patch has broken libvirt-lxc for us:
> > commit d7147b3797380de2d159ce6324536f3e1f2d97e3
> > Author: Pavel Hrdina 
> > Date: Fri Jun 19 00:44:07 2020 +0200
> > m4: virt-xdr: rewrite XDR check
> >
> > I was tracking that down for [1] since the tests [4] failed on me. [2]
> > holds the backtrace.
> > In Debian the tests are skipped which explains why they were not seen there:
> >   smoke-lxc SKIP Test requires machine-level isolation but testbed
> > does not provide that
> >
> > What happens is that the libvirt_lxc segfaults when using XDR functions.
> >
> > dmesg shows:
> > [582093.524644] libvirt_lxc[261446]: segfault at 0 ip 
> > sp 7ffdd2345598 error 14 in libvirt_lxc[5587e42aa000+8000]
> > [582093.524650] Code: Bad RIP value.
> >
> > There are quite some uncertainties left, but on the surface it seems
> > that it links with libtirpc but
> > then instead of calling
> > libtirpc: src/xdr.c:929:xdr_uint64_t(xdrs, ullp)
> > it ends (gdb tells us in [2]) in glibc
> > glibc: sunrpc/xdr_intXX_t.c:62:xdr_uint64_t (XDR *xdrs, uint64_t *uip)
> >
> > And the return from that function breaks it badly (instruction pointer
> > at 0x0 -> segfault)
>
> Right so that's a serious problem with clashing symbols between tirpc
> and glibc.
>
> In Fedora/RHEL it is impossible to build against glibc for the XDR
> symbols for a long time now. Glibc maintainers want everyone to be
> using tirpc.   The symbols are still exported from glibc, but they
> should only be used by legacy apps built against older glibc.
>
> Symbol versioning should ensure libvirt_lxc always resolves to the
> libtirpc library
>
> $ eu-readelf -a /usr/lib64/libc.so.6 | grep xdr_uint64 | grep GLOBAL
>  2017: 001349c0226 FUNCGLOBAL DEFAULT   15 
> xdr_uint64_t@GLIBC_2.2.5
>
>
> $ eu-readelf -a /usr/lib64/libtirpc.so | grep xdr_uint64 | grep GLOBAL
>   344: 0001ce20  9 FUNCGLOBAL DEFAULT   14 
> xdr_uint64_t@@TIRPC_0.3.0
>
> $ eu-readelf -a /usr/libexec/libvirt_lxc  | grep xdr_uint64
>   0x00024a30  X86_64_JUMP_SLOT 00  +0 xdr_uint64_t
>   149:   0 FUNCGLOBAL DEFAULTUNDEF 
> xdr_uint64_t@TIRPC_0.3.0 (13)

ubuntu@groovy:~$ eu-readelf -a /lib/x86_64-linux-gnu/libc.so.6 | grep
xdr_uint64 | grep GLOBAL
 2019: 00159ed0228 FUNCGLOBAL DEFAULT   16
xdr_uint64_t@@GLIBC_2.2.5
ubuntu@groovy:~$ eu-readelf -a /lib/x86_64-linux-gnu/libtirpc.so.3.0.0
| grep xdr_uint64 | grep GLOBAL
  343: 0001ae20  9 FUNCGLOBAL DEFAULT   15
xdr_uint64_t@@TIRPC_0.3.0

Ubuntu v6.0 builds
ubuntu@groovy:~$ eu-readelf -a /usr/lib/libvirt/libvirt_lxc  | grep xdr_uint64
  0x00026820  X86_64_JUMP_SLOT 00  +0 xdr_uint64_t
   99:   0 FUNCGLOBAL DEFAULTUNDEF
xdr_uint64_t@GLIBC_2.2.5 (4)
  [  1c02]  xdr_uint64_t

Ubuntu v6.6 builds
ubuntu@groovy:~$ eu-readelf -a /usr/lib/libvirt/libvirt_lxc  | grep xdr_uint64
  0x000268d0  X86_64_JUMP_SLOT 00  +0 xdr_uint64_t
  104:   0 FUNCGLOBAL DEFAULTUNDEF
xdr_uint64_t@GLIBC_2.2.5 (4)
  [  1a81]  xdr_uint64_t

They miss the version 3.0 entry - interesting.

libvirt 6.6 build from git on the same system:
$ eu-readelf -a libvirt/build/src/.libs/libvirt_lxc  | grep xdr_uint64
  0x00028968  X86_64_JUMP_SLOT 00  +0 xdr_uint64_t
   99:   0 FUNCGLOBAL DEFAULTUNDEF
xdr_uint64_t@GLIBC_2.2.5 (3)
  598:   0 FUNCGLOBAL DEFAULTUNDEF
xdr_uint64_t@@GLIBC_2.2.5
  [  31df]  xdr_uint64_t@@GLIBC_2.2.5
  [  18f4]  xdr_uint64_t

That is with
configure:  xdr: yes (CFLAGS='-I/usr/include/tirpc'
LIBS='-ltirpc')

So something is wrong at build time when glibc AND tirpc provide that symbol.

>
> This shows libvirt_lxc will only resolve to libtirpc.
>
>
> I see the Ubuntu package for glibc is passing --enable-obsolete-rpc  which
> allows apps to continue to build against glibc for RPC :-(
>
> So I suspect somehow libvirt has ended up using tirpc headers, but the linker
> probably resolved symbols to glibc.

As I wrote above my builds don't get the 3.0 entry in libvirt_lxc
which seems to be the reason to then jump to the wrong one.

> I don't know how the linker decides which library to resolve symbols to
> when multiple provided the same symbol with different versions. Possibly
> tries in order ? I do recall that there were lots of problems with having
> both glibc and libtirpc used in Fedora before glibc introduced the
> abilty to disable RPC via --disable-obsolete-rpc to
>
> 

Re: XDR related breakage in libvirt v6.6.0 when using libvirt-lxc

2020-08-25 Thread Daniel P . Berrangé
On Tue, Aug 25, 2020 at 03:16:50PM +0200, Christian Ehrhardt wrote:
> Hi,
> I expect that this falls under the "with meson now everything is
> different anyway" umbrella but wanted to let you know about this as it
> affects v6.6 in at least Ubuntu/Debian.
> 
> The following recent patch has broken libvirt-lxc for us:
> commit d7147b3797380de2d159ce6324536f3e1f2d97e3
> Author: Pavel Hrdina 
> Date: Fri Jun 19 00:44:07 2020 +0200
> m4: virt-xdr: rewrite XDR check
> 
> I was tracking that down for [1] since the tests [4] failed on me. [2]
> holds the backtrace.
> In Debian the tests are skipped which explains why they were not seen there:
>   smoke-lxc SKIP Test requires machine-level isolation but testbed
> does not provide that
> 
> What happens is that the libvirt_lxc segfaults when using XDR functions.
> 
> dmesg shows:
> [582093.524644] libvirt_lxc[261446]: segfault at 0 ip 
> sp 7ffdd2345598 error 14 in libvirt_lxc[5587e42aa000+8000]
> [582093.524650] Code: Bad RIP value.
> 
> There are quite some uncertainties left, but on the surface it seems
> that it links with libtirpc but
> then instead of calling
> libtirpc: src/xdr.c:929:xdr_uint64_t(xdrs, ullp)
> it ends (gdb tells us in [2]) in glibc
> glibc: sunrpc/xdr_intXX_t.c:62:xdr_uint64_t (XDR *xdrs, uint64_t *uip)
> 
> And the return from that function breaks it badly (instruction pointer
> at 0x0 -> segfault)

Right so that's a serious problem with clashing symbols between tirpc
and glibc.

In Fedora/RHEL it is impossible to build against glibc for the XDR
symbols for a long time now. Glibc maintainers want everyone to be
using tirpc.   The symbols are still exported from glibc, but they
should only be used by legacy apps built against older glibc.

Symbol versioning should ensure libvirt_lxc always resolves to the
libtirpc library

$ eu-readelf -a /usr/lib64/libc.so.6 | grep xdr_uint64 | grep GLOBAL
 2017: 001349c0226 FUNCGLOBAL DEFAULT   15 
xdr_uint64_t@GLIBC_2.2.5


$ eu-readelf -a /usr/lib64/libtirpc.so | grep xdr_uint64 | grep GLOBAL
  344: 0001ce20  9 FUNCGLOBAL DEFAULT   14 
xdr_uint64_t@@TIRPC_0.3.0

$ eu-readelf -a /usr/libexec/libvirt_lxc  | grep xdr_uint64
  0x00024a30  X86_64_JUMP_SLOT 00  +0 xdr_uint64_t
  149:   0 FUNCGLOBAL DEFAULTUNDEF 
xdr_uint64_t@TIRPC_0.3.0 (13)


This shows libvirt_lxc will only resolve to libtirpc.


I see the Ubuntu package for glibc is passing --enable-obsolete-rpc  which
allows apps to continue to build against glibc for RPC :-(

So I suspect somehow libvirt has ended up using tirpc headers, but the linker
probably resolved symbols to glibc.  

I don't know how the linker decides which library to resolve symbols to
when multiple provided the same symbol with different versions. Possibly
tries in order ? I do recall that there were lots of problems with having
both glibc and libtirpc used in Fedora before glibc introduced the
abilty to disable RPC via --disable-obsolete-rpc to 

Did I mention that --enable-obsolete-rpc is a bad idea yet :-P

FWIW, you're going to forced to stop using this arg because it has been
deleted entirely in glibc 2.32, so there's no way to compile against
glibc for XDR. Only existing built binaries will work.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



XDR related breakage in libvirt v6.6.0 when using libvirt-lxc

2020-08-25 Thread Christian Ehrhardt
Hi,
I expect that this falls under the "with meson now everything is
different anyway" umbrella but wanted to let you know about this as it
affects v6.6 in at least Ubuntu/Debian.

The following recent patch has broken libvirt-lxc for us:
commit d7147b3797380de2d159ce6324536f3e1f2d97e3
Author: Pavel Hrdina 
Date: Fri Jun 19 00:44:07 2020 +0200
m4: virt-xdr: rewrite XDR check

I was tracking that down for [1] since the tests [4] failed on me. [2]
holds the backtrace.
In Debian the tests are skipped which explains why they were not seen there:
  smoke-lxc SKIP Test requires machine-level isolation but testbed
does not provide that

What happens is that the libvirt_lxc segfaults when using XDR functions.

dmesg shows:
[582093.524644] libvirt_lxc[261446]: segfault at 0 ip 
sp 7ffdd2345598 error 14 in libvirt_lxc[5587e42aa000+8000]
[582093.524650] Code: Bad RIP value.

There are quite some uncertainties left, but on the surface it seems
that it links with libtirpc but
then instead of calling
libtirpc: src/xdr.c:929:xdr_uint64_t(xdrs, ullp)
it ends (gdb tells us in [2]) in glibc
glibc: sunrpc/xdr_intXX_t.c:62:xdr_uint64_t (XDR *xdrs, uint64_t *uip)

And the return from that function breaks it badly (instruction pointer
at 0x0 -> segfault)

Bisecting pointed to the referred commit which brings libtirpc into the mix.
The former builds had xdr detected, but not with libtirpc.
configure: xdr: yes (CFLAGS='' LIBS='')
The new config now does
configure: xdr: yes (CFLAGS='-I/usr/include/tirpc' LIBS='-ltirpc')

And the resulting libvirt_lxc reflects that

v6.0.0
$ lddtree /usr/lib/libvirt/libvirt_lxc | grep tirpc
v6.6.0
$ lddtree /usr/lib/libvirt/libvirt_lxc | grep tirpc
libtirpc.so.3 => /lib/x86_64-linux-gnu/libtirpc.so.3

This seems to lead to the bad jump and the crash eventually.
Meanwhile reverting d7147b37 "m4: virt-xdr: rewrite XDR check" on top
of v6.6.0 resolves the issue back to the former state.

For anyone that wants to recreate this, I also attached a bisect
script [3] which includes the test case you'd need.

[1]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1892826
[2]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1892826/comments/4
[3]: 
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1892826/+attachment/5404392/+files/bisect-libvirt.sh
[4]: 
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-groovy/groovy/amd64/libv/libvirt/20200825_005918_44b74@/log.gz

-- 
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd