Bug#983379: linux uml segfault

2021-03-07 Thread Hajime Tazaki


Sorry that this email is going to be long.  In summary, what Johannes
said is right: what objcopy does is not sufficient, and with ld it
transforms as we expected.

More goes to below.

On Sat, 06 Mar 2021 05:22:19 +0900,
Johannes Berg wrote:
> 
> On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:
> > 
> > objcopy (from binutils) can localize symbols (i.e., objcopy -L
> > sem_init $orig_file $new_file).
> 
> This doesn't seem to be sufficient.
> 
> > It also does renaming symbols.  But
> > not sure this is the ideal solution.
> 
> Even that doesn't seem to actually work/help? I still get libcom_err
> trying to call UML's sem_init, even after doing
>  objcopy --redefine-sym sem_init=uml_sem_init
> 
> 
> > How does UML handle symbol conflicts between userspace code and Linux
> > kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
> > Linux kernel (genlmsg_put) and others can possibly do as well.
> 
> I think like I said it just doesn't but since you don't have much
> userspace code linked with UML it never really mattered?
> 
> We only link a 'linux' binary, after all. How does LKL handle this
> though? It should be far more affected?
> 
> 
> Despite the objcopy *not* fixing it, this does seem to:

with slightly old version:
 - objcopy/ld version 2.29.1-23.fc28

I confirmed that objcopy (both --redefine-sym and --localize-symbol)
only changes symbols of .symtab table.  But there is another table,
.dynsym table, which is used to resolve.
So, the original file looks like this:


1) before objcopy (vmlinux)
% readelf -s obj-x86-um/vmlinux |grep -E "sem_init|Symbol table|Num:"
Symbol table '.dynsym' contains 179 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
   129: 60011d3872 FUNCGLOBAL DEFAULT2 sem_init
Symbol table '.symtab' contains 38474 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
 28515: 60011d3872 FUNCGLOBAL DEFAULT2 sem_init
 37798: 601e30d562 FUNCGLOBAL DEFAULT   13 sem_init_ns
 
the result object looks like

2) after objcopy (linux)
% readelf -s obj-x86-um/linux |grep -E "sem_init|Symbol table|Num:"
Symbol table '.dynsym' contains 179 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
   129: 60011d3872 FUNCGLOBAL DEFAULT2 sem_init
Symbol table '.symtab' contains 38474 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
 28455: 60011d3872 FUNCLOCAL  DEFAULT2 sem_init
 37798: 601e30d562 FUNCGLOBAL DEFAULT   13 sem_init_ns

Only .symtab symbol table is changed to local while .dynsym table is
not changed.  So, sem_init call from libcom_err.so still can resolve
the Linux symbol.


On the other hand, ld --version script solution does as we wish.

3) localized with ld
% readelf -s obj-x86-um/linux G -E "sem_init|Symbol table|Num:" 
Symbol table '.dynsym' contains 142 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
Symbol table '.symtab' contains 38474 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
 28512: 60011d3872 FUNCLOCAL  DEFAULT2 sem_init
 37669: 601e2b4562 FUNCLOCAL  DEFAULT   13 sem_init_ns

Only .symtab table is generated for the sem_init symbol and it's localized.


Because the way to build is different from what UML currently does,
LKL (and UML binaries) do not have this issue, with a quick check.

LKL applies objcopy before generating intermediate file (linux.o), and
the symbols of the final binary (linux) are localized and have no
.dynsym entries, thus no issue in this case.

refs:
https://stackoverflow.com/questions/54332797/binding-failure-with-objcopy-redefine-syms
https://sourceware.org/legacy-ml/binutils/2019-01/msg00254.html


-- Hajime



Bug#983379: linux uml segfault

2021-03-05 Thread Hajime Tazaki


might be late, but I'll give it a try with your dlopen reproducer.

-- Hajime

On Sat, 06 Mar 2021 05:22:19 +0900,
Johannes Berg wrote:
> 
> On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:
> > 
> > objcopy (from binutils) can localize symbols (i.e., objcopy -L
> > sem_init $orig_file $new_file).
> 
> This doesn't seem to be sufficient.
> 
> > It also does renaming symbols.  But
> > not sure this is the ideal solution.
> 
> Even that doesn't seem to actually work/help? I still get libcom_err
> trying to call UML's sem_init, even after doing
>  objcopy --redefine-sym sem_init=uml_sem_init



Bug#983379: linux uml segfault

2021-03-03 Thread Hajime Tazaki


On Thu, 04 Mar 2021 07:40:00 +0900,
Johannes Berg wrote:
> 
> I think the problem is here:
> 
> > #24 0x6080f234 in ipc_init_ids (ids=0x60c60de8 )
> > at ipc/util.c:119
> > #25 0x60813c6d in sem_init_ns (ns=0x60d895bb ) at
> > ipc/sem.c:254
> > #26 0x60015b5d in sem_init () at ipc/sem.c:268
> > #27 0x7f89906d92f7 in ?? () from /lib/x86_64-linux-
> > gnu/libcom_err.so.2
> 
> You're in the init of libcom_err.so.2, which is loaded by
> 
> > "libnss_nis.so.2"
> 
> which is loaded by normal NSS code (getgrnam):
> 
> > #40 0x7f89909bf3a6 in nss_load_library (ni=ni@entry=0x61497db0) at
> > nsswitch.c:359
> > #41 0x7f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0,
> > fct_name=, fct_name@entry=0x7f899089b020 "setgrent") at
> > nsswitch.c:467
> > #42 0x7f899089554b in init_nss_interface () at nss_compat/compat-
> > grp.c:83
> > #43 init_nss_interface () at nss_compat/compat-grp.c:79
> > #44 0x7f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0
> > "tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024,
> > errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486
> > #45 0x7f8990968b85 in __getgrnam_r (name=name@entry=0x7f8990a2a1e0
> > "tty", resbuf=resbuf@entry=0x7ffe3e7a2910,
> > buffer=buffer@entry=0x7ffe3e7a24e0 "", buflen=1024,
> > result=result@entry=0x7ffe3e7a2908)
> > at ../nss/getXXbyYY_r.c:315
> 
> 
> You have a strange nsswitch configuration that causes all of this
> (libnss_nis.so.2 -> libcom_err.so.2) to get loaded.
> 
> Now libcom_err.so.2 is trying to call sem_init(), and that gets ... tada
> ... Linux's sem_init() instead of libpthread's.
> 
> And then the crash.
> 
> Now, I don't know how to fix it (short of changing your nsswitch
> configuration) - maybe we could somehow rename sem_init()? Or maybe we
> can somehow give the kernel binary a lower symbol resolution than the
> libc/libpthread.

objcopy (from binutils) can localize symbols (i.e., objcopy -L
sem_init $orig_file $new_file).  It also does renaming symbols.  But
not sure this is the ideal solution.

How does UML handle symbol conflicts between userspace code and Linux
kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
Linux kernel (genlmsg_put) and others can possibly do as well.


-- Hajime