Public bug reported:

Description

We're not able to unwind the stack from within __kernel_clock_gettime in
the Linux vDSO on Summit. This affects both DDT and MAP (via GDB and
libunwind). The issue is more serious than may first appear as the
function appears to be called somewhat often by the CUDA runtime, and
can defer to a syscall making it relatively time consuming (making it
more likely to be encountered).

To reproduce:

Compile $CUDA_DIR/samples/0_Simple/matrixMul (attached is a small patch
to modify the Makefile to compile outside of the samples directory)

Run the following GDB commands:
user@deb3qwsp1:/usr/local/cuda-10.0/samples/0_Simple/matrixMul$ gdb ./matrixMul 
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64le-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./matrixMul...(no debugging symbols found)...done.
(gdb) break main
Breakpoint 1 at 0x8284
(gdb) run
Starting program: /usr/local/cuda-10.0/samples/0_Simple/matrixMul/matrixMul 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".

Breakpoint 1, 0x0000000100008284 in main ()
(gdb) break *(__kernel_clock_gettime+144)
Breakpoint 2 at 0x7ffff7f805e4: file 
/build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S, 
line 127.
(gdb) continue
Continuing.
[Matrix Multiply Using CUDA] - Starting...

Breakpoint 2, __kernel_clock_gettime () at 
/build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S:127
127     
/build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S: No 
such file or directory.
(gdb) bt
#0  __kernel_clock_gettime () at 
/build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S:127
#1  0x00007ffff7b8f530 in ?? () from /lib/powerpc64le-linux-gnu/libc.so.6
#2  0x00007ffff6b81118 in ?? () from /usr/lib/powerpc64le-linux-gnu/libcuda.so.1
#3  0x00007ffff6a69c70 in ?? () from /usr/lib/powerpc64le-linux-gnu/libcuda.so.1
#4  0x00007ffff6bf0ba0 in cuInit () from 
/usr/lib/powerpc64le-linux-gnu/libcuda.so.1
#5  0x000000010003ca50 in cudart::__loadDriverInternalUtil() ()
#6  0x00007ffff7f05274 in __pthread_once_slow (
    once_control=0x1000c00f0 
<cudart::globalState::loadDriver()::loadDriverControl>, 
    init_routine=0x10003c950 <cudart::__loadDriverInternalUtil()>) at 
pthread_once.c:116
#7  0x000000010008ea88 in cudart::cuosOnce(int*, void (*)()) ()
#8  0x00000001000410a8 in cudart::globalState::initializeDriver() ()
#9  0x000000010005ec90 in cudaGetDeviceCount ()
#10 0x0000000100009930 in gpuGetMaxGflopsDeviceId() ()
#11 0x0000000100009bf4 in findCudaDevice(int, char const**) ()
#12 0x000000010000836c in main ()
(gdb) step
128     in 
/build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S
(gdb) bt
#0  __kernel_clock_gettime () at 
/build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S:128
#1  0x0000000000000000 in ?? ()
(gdb) 


Note: __kernel_clock_gettime+144 is currently the point in the function at 
which the syscall made, and is liable to change if updated. It corresponds to 
the "sc" instruction here: 
https://gitlab.com/TeeFirefly/linux-kernel/blob/7408b38cfdf9b0c6c3bda97402c75bd27ef69a85/arch/powerpc/kernel/vdso64/gettimeofday.S#L127
 and can be rediscovered if needed by disassembling the function.

Note that a backtrace can be collected before entering the syscall, but
not during. The inability to unwind also prevents GDB from being able to
"finish" (step out of) the function:

(gdb) finish
Run till exit from #0  __kernel_clock_gettime ()
    at 
/build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S:128
Warning:
Cannot insert breakpoint 0.
Cannot access memory at address 0x0

Command aborted.
(gdb)


The cause of the issue is a lack of Call Frame Information (CFI) in the syscall 
code path, and so a potential fix here could be to save the link register and 
add the corresponding CFI directive for the syscall code path (as is done for 
the alternative code path).[Less]

This is now upstream accepted in the powerpc tree as git commit
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=next&id=56d20861c027498b5a1112b4f9f05b56d906fdda
("powerpc/vdso: Correct call frame information")

** Affects: ubuntu-power-systems
     Importance: High
     Assignee: Canonical Kernel Team (canonical-kernel-team)
         Status: New

** Affects: linux (Ubuntu)
     Importance: High
     Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
         Status: New


** Tags: architecture-ppc64le bugnameltc-172349 severity-high 
targetmilestone-inin1804

** Tags added: architecture-ppc64le bugnameltc-172349 severity-high
targetmilestone-inin1804

** Changed in: ubuntu
     Assignee: (unassigned) => Ubuntu on IBM Power Systems Bug Triage 
(ubuntu-power-triage)

** Package changed: ubuntu => linux (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1797963

Title:
  not able to unwind the stack from within __kernel_clock_gettime in the
  Linux vDSO

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1797963/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to