Re: panic shortly after boot when amdgpu.ko is loaded (fpu?)

2020-11-27 Thread Bakul Shah



> On Nov 27, 2020, at 1:47 PM, Bakul Shah  wrote:
> 
> 
> 
>> On Nov 27, 2020, at 9:09 AM, Rebecca Cran  wrote:
>> 
>> On 11/27/20 4:29 AM, Hans Petter Selasky wrote:
>>> 
>>> Is the problem always triggered by hald? If you disable hald in rc.conf, 
>>> does the system run for a longer period of time?
>> 
>> It turns out that disabling ntpd let the system run for a longer period of 
>> time - until I ran "sysctl sys" at which point I got a panic.
>> 
>> And this time the panic actually implicates amdgpu.ko, which is an 
>> improvement:
>> 
>> 
>> #9  0x in ?? ()
>> #10 0x82a14c4e in amdgpu_device_get_pcie_replay_count ()
>>   from /boot/modules/amdgpu.ko
>> #11 0x82a14b80 in sysctl_handle_attr () from /boot/modules/amdgpu.ko
>> 
>> #12 0x80c06cc1 in sysctl_root_handler_locked (oid=0xfe02133ff000,
>>arg1=0xfe016e360980, arg2=-8724518803888, req=0xfe016e360980,
>>tracker=0xf81099af6280) at /usr/src/sys/kern/kern_sysctl.c:184
>> #13 0x80c0610c in sysctl_root (oidp=,
>>arg1=0xf810aa27e650, arg2=-2100190360, req=0xfe016e360980)
>>at /usr/src/sys/kern/kern_sysctl.c:2211
>> 
>> 
>> Since it _is_ a problem in amdgpu, I'll stop this thread and re-post on 
>> freebsd-x11.
> 
> FWIW, I am using amdgpu on a Ryzen 5 3500U system on a couple days old
> -current (r368025). "sysctl sys" complains about "unknown oid 'sys'".
> I am runing hald & ntpd.  I had a few amdgpu related panics initially
> but they vanished once I added
>   PORTS_MODULES=graphics/drm-devel-kmod
> to /etc/src.conf to compile it along with the kernel. I am running
> GENERIC-NODEBUG. The machine gets rebooted when I install a new kernel
> (usually once a week).
> 
> My guess is some weird interaction rather than something in amdgpu.

To get sysctl sys working I compiled a GENERIC kernel from today's
368108 revision and so far there are no problems.

$ sysctl sys.device.drmn0.pcie_replay_count
sys.device.drmn0.pcie_replay_count: 0

sysctl -a also works.

Last commit log on drm-devel-kmod (the last tiem may be what you're
running into):
Author: manu 
Date:   Mon Nov 9 13:37:12 2020 +

drm-current-kmod/drm-devel-kmod: Update to latest version

- Use acpi code from base (thanks to wulf@)
- Add radeon/i386 patches (thanks to tilj@)
- Translate O_ flags for linuxulator (thanks to Greg V)
- Lot of linuxkpi cleanup
- Hack for amdgpu when the IP isn't init properly, this happens
  on one of my laptop with a dGPU. We still don't support it but
  we don't panic when we load amdgpu


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic shortly after boot when amdgpu.ko is loaded (fpu?)

2020-11-27 Thread Bakul Shah



> On Nov 27, 2020, at 9:09 AM, Rebecca Cran  wrote:
> 
> On 11/27/20 4:29 AM, Hans Petter Selasky wrote:
>> 
>> Is the problem always triggered by hald? If you disable hald in rc.conf, 
>> does the system run for a longer period of time?
> 
> It turns out that disabling ntpd let the system run for a longer period of 
> time - until I ran "sysctl sys" at which point I got a panic.
> 
> And this time the panic actually implicates amdgpu.ko, which is an 
> improvement:
> 
> 
> #9  0x in ?? ()
> #10 0x82a14c4e in amdgpu_device_get_pcie_replay_count ()
>from /boot/modules/amdgpu.ko
> #11 0x82a14b80 in sysctl_handle_attr () from /boot/modules/amdgpu.ko
> 
> #12 0x80c06cc1 in sysctl_root_handler_locked (oid=0xfe02133ff000,
> arg1=0xfe016e360980, arg2=-8724518803888, req=0xfe016e360980,
> tracker=0xf81099af6280) at /usr/src/sys/kern/kern_sysctl.c:184
> #13 0x80c0610c in sysctl_root (oidp=,
> arg1=0xf810aa27e650, arg2=-2100190360, req=0xfe016e360980)
> at /usr/src/sys/kern/kern_sysctl.c:2211
> 
> 
> Since it _is_ a problem in amdgpu, I'll stop this thread and re-post on 
> freebsd-x11.

FWIW, I am using amdgpu on a Ryzen 5 3500U system on a couple days old
-current (r368025). "sysctl sys" complains about "unknown oid 'sys'".
I am runing hald & ntpd.  I had a few amdgpu related panics initially
but they vanished once I added
PORTS_MODULES=graphics/drm-devel-kmod
to /etc/src.conf to compile it along with the kernel. I am running
GENERIC-NODEBUG. The machine gets rebooted when I install a new kernel
(usually once a week).

My guess is some weird interaction rather than something in amdgpu.



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic shortly after boot when amdgpu.ko is loaded (fpu?)

2020-11-27 Thread Rebecca Cran

On 11/27/20 11:10 AM, Konstantin Belousov wrote:

And what is the instruction at 0x81002dcf ?


I got a much clearer panic by running "sysctl sys" which shows it's more 
likely a problem for the amdgpu folks and not an underlying FreeBSD problem.



#7  0x810295cd in trap (frame=0xfe016e360760)
    at /usr/src/sys/amd64/amd64/trap.c:398
#8  
#9  0x in ?? ()
#10 0x82a14c4e in amdgpu_device_get_pcie_replay_count ()
   from /boot/modules/amdgpu.ko
#11 0x82a14b80 in sysctl_handle_attr () from /boot/modules/amdgpu.ko
#12 0x80c06cc1 in sysctl_root_handler_locked 
(oid=0xfe02133ff000,

    arg1=0xfe016e360980, arg2=-8724518803888, req=0xfe016e360980,
    tracker=0xf81099af6280) at /usr/src/sys/kern/kern_sysctl.c:184
#13 0x80c0610c in sysctl_root (oidp=,
    arg1=0xf810aa27e650, arg2=-2100190360, req=0xfe016e360980)
    at /usr/src/sys/kern/kern_sysctl.c:2211
#14 0x80c06783 in userland_sysctl (td=0xfe00f00b6100,
    name=0xfe016e360a40, namelen=4, old=,
    oldlenp=, inkernel=, new=0x0, newlen=0,
    retval=0xfe016e360aa8, flags=0)
    at /usr/src/sys/kern/kern_sysctl.c:2368
#15 0x80c065cf in sys___sysctl (td=0xfe00f00b6100,
    uap=0xfe00f00b64e8) at /usr/src/sys/kern/kern_sysctl.c:2241
#16 0x8102a81c in syscallenter (td=0xfe00f00b6100)
    at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:189
#17 amd64_syscall (td=0xfe00f00b6100, traced=0)
    at /usr/src/sys/amd64/amd64/trap.c:1156
#18 
#19 0x0008003819ca in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffb618
(kgdb)


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic shortly after boot when amdgpu.ko is loaded (fpu?)

2020-11-27 Thread Konstantin Belousov
On Thu, Nov 26, 2020 at 10:09:24PM -0700, Rebecca Cran wrote:
> I have a Threadripper 2990WX system that I recently installed an AMD Radeon
> Pro W5700 into. It runs fine unless I load the amdgpu driver, at which point
> it panics several seconds after boot: I have enough time to login and run a
> few commands, but even if I just leave it it'll panic.  I'm running:
> 
> 
> FreeBSD photon.int.bluestop.org 13.0-CURRENT FreeBSD 13.0-CURRENT #0
> 6db1a3e8098-c273171(master): Thu Nov 26 01:26:17 MST 2020 
> bc...@photon.int.bluestop.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG
> amd64
> 
> 
> I rebuilt the drm-current-kmod-5.4.62.g20201109_1 port today.
> 
> 
> The panic is:
> 
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 24; apic id = 18
> instruction pointer   = 0x20:0x81002dcf
> stack pointer = 0x0:0xfe016e6ffaa0
> frame pointer = 0x0:0xfe016e6ffaa0
> code segment  = base 0x0, limit 0xf, type 0x1b
>   = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags  = interrupt enabled, resume, IOPL = 0
> current process   = 4372 (hald)
> trap number   = 9
> panic: general protection fault
> cpuid = 24
> time = 1606450595
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe016e6ff7b0
> vpanic() at vpanic+0x181/frame 0xfe016e6ff800
> panic() at panic+0x43/frame 0xfe016e6ff860
> trap_fatal() at trap_fatal+0x387/frame 0xfe016e6ff8c0
> trap() at trap+0x8e/frame 0xfe016e6ff9d0
> calltrap() at calltrap+0x8/frame 0xfe016e6ff9d0
> --- trap 0x9, rip = 0x81002dcf, rsp = 0xfe016e6ffaa0, rbp = 
> 0xfe016e6ffaa0 ---
> fpurestore_xrstor3264() at fpurestore_xrstor3264+0x2f/frame 0xfe016e6ffaa0
> restore_fpu_curthread() at restore_fpu_curthread+0x85/frame 0xfe016e6ffac0
> fpudna() at fpudna+0x3a/frame 0xfe016e6ffae0
> trap() at trap+0x246/frame 0xfe016e6ffbf0
> calltrap() at calltrap+0x8/frame 0xfe016e6ffbf0
> --- trap 0x16, rip = 0x80067137f, rsp = 0x7fffd8b0, rbp = 0x7fffd8f0 
> ---
> Uptime: 1m4s
> Dumping 4193 out of 130894 
> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> 
> 
> I've uploaded details (core.txt, dmesg.txt etc.) to
> https://bsdio.com/freebsd/crashes/2020-11-26-amdgpu/ and the vmcore file is
> available on request.

And what is the instruction at 0x81002dcf ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic shortly after boot when amdgpu.ko is loaded (fpu?)

2020-11-27 Thread Rebecca Cran

On 11/27/20 4:29 AM, Hans Petter Selasky wrote:


Is the problem always triggered by hald? If you disable hald in 
rc.conf, does the system run for a longer period of time?


It turns out that disabling ntpd let the system run for a longer period 
of time - until I ran "sysctl sys" at which point I got a panic.


And this time the panic actually implicates amdgpu.ko, which is an 
improvement:



#9  0x in ?? ()
#10 0x82a14c4e in amdgpu_device_get_pcie_replay_count ()
   from /boot/modules/amdgpu.ko
#11 0x82a14b80 in sysctl_handle_attr () from /boot/modules/amdgpu.ko

#12 0x80c06cc1 in sysctl_root_handler_locked 
(oid=0xfe02133ff000,

    arg1=0xfe016e360980, arg2=-8724518803888, req=0xfe016e360980,
    tracker=0xf81099af6280) at /usr/src/sys/kern/kern_sysctl.c:184
#13 0x80c0610c in sysctl_root (oidp=,
    arg1=0xf810aa27e650, arg2=-2100190360, req=0xfe016e360980)
    at /usr/src/sys/kern/kern_sysctl.c:2211


Since it _is_ a problem in amdgpu, I'll stop this thread and re-post on 
freebsd-x11.



--
Rebecca Cran


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic shortly after boot when amdgpu.ko is loaded (fpu?)

2020-11-27 Thread Hans Petter Selasky

On 11/27/20 6:09 AM, Rebecca Cran wrote:
I have a Threadripper 2990WX system that I recently installed an AMD 
Radeon Pro W5700 into. It runs fine unless I load the amdgpu driver, at 
which point it panics several seconds after boot: I have enough time to 
login and run a few commands, but even if I just leave it it'll panic. 
I'm running:



FreeBSD photon.int.bluestop.org 13.0-CURRENT FreeBSD 13.0-CURRENT #0 
6db1a3e8098-c273171(master): Thu Nov 26 01:26:17 MST 2020 
bc...@photon.int.bluestop.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG 
amd64



I rebuilt the drm-current-kmod-5.4.62.g20201109_1 port today.


The panic is:

Fatal trap 9: general protection fault while in kernel mode
cpuid = 24; apic id = 18
instruction pointer    = 0x20:0x81002dcf
stack pointer    = 0x0:0xfe016e6ffaa0
frame pointer    = 0x0:0xfe016e6ffaa0
code segment    = base 0x0, limit 0xf, type 0x1b
     = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process    = 4372 (hald)
trap number    = 9
panic: general protection fault
cpuid = 24
time = 1606450595
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe016e6ff7b0

vpanic() at vpanic+0x181/frame 0xfe016e6ff800
panic() at panic+0x43/frame 0xfe016e6ff860
trap_fatal() at trap_fatal+0x387/frame 0xfe016e6ff8c0
trap() at trap+0x8e/frame 0xfe016e6ff9d0
calltrap() at calltrap+0x8/frame 0xfe016e6ff9d0
--- trap 0x9, rip = 0x81002dcf, rsp = 0xfe016e6ffaa0, rbp = 
0xfe016e6ffaa0 ---
fpurestore_xrstor3264() at fpurestore_xrstor3264+0x2f/frame 
0xfe016e6ffaa0
restore_fpu_curthread() at restore_fpu_curthread+0x85/frame 
0xfe016e6ffac0

fpudna() at fpudna+0x3a/frame 0xfe016e6ffae0
trap() at trap+0x246/frame 0xfe016e6ffbf0
calltrap() at calltrap+0x8/frame 0xfe016e6ffbf0
--- trap 0x16, rip = 0x80067137f, rsp = 0x7fffd8b0, rbp = 
0x7fffd8f0 ---

Uptime: 1m4s
Dumping 4193 out of 130894 
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%



I've uploaded details (core.txt, dmesg.txt etc.) to 
https://bsdio.com/freebsd/crashes/2020-11-26-amdgpu/ and the vmcore file 
is available on request.




Hi,

Is the problem always triggered by hald? If you disable hald in rc.conf, 
does the system run for a longer period of time?


--HPS

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


panic shortly after boot when amdgpu.ko is loaded (fpu?)

2020-11-26 Thread Rebecca Cran
I have a Threadripper 2990WX system that I recently installed an AMD 
Radeon Pro W5700 into. It runs fine unless I load the amdgpu driver, at 
which point it panics several seconds after boot: I have enough time to 
login and run a few commands, but even if I just leave it it'll panic.  
I'm running:



FreeBSD photon.int.bluestop.org 13.0-CURRENT FreeBSD 13.0-CURRENT #0 
6db1a3e8098-c273171(master): Thu Nov 26 01:26:17 MST 2020 
bc...@photon.int.bluestop.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG 
amd64



I rebuilt the drm-current-kmod-5.4.62.g20201109_1 port today.


The panic is:

Fatal trap 9: general protection fault while in kernel mode
cpuid = 24; apic id = 18
instruction pointer = 0x20:0x81002dcf
stack pointer   = 0x0:0xfe016e6ffaa0
frame pointer   = 0x0:0xfe016e6ffaa0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 4372 (hald)
trap number = 9
panic: general protection fault
cpuid = 24
time = 1606450595
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe016e6ff7b0
vpanic() at vpanic+0x181/frame 0xfe016e6ff800
panic() at panic+0x43/frame 0xfe016e6ff860
trap_fatal() at trap_fatal+0x387/frame 0xfe016e6ff8c0
trap() at trap+0x8e/frame 0xfe016e6ff9d0
calltrap() at calltrap+0x8/frame 0xfe016e6ff9d0
--- trap 0x9, rip = 0x81002dcf, rsp = 0xfe016e6ffaa0, rbp = 
0xfe016e6ffaa0 ---
fpurestore_xrstor3264() at fpurestore_xrstor3264+0x2f/frame 0xfe016e6ffaa0
restore_fpu_curthread() at restore_fpu_curthread+0x85/frame 0xfe016e6ffac0
fpudna() at fpudna+0x3a/frame 0xfe016e6ffae0
trap() at trap+0x246/frame 0xfe016e6ffbf0
calltrap() at calltrap+0x8/frame 0xfe016e6ffbf0
--- trap 0x16, rip = 0x80067137f, rsp = 0x7fffd8b0, rbp = 0x7fffd8f0 ---
Uptime: 1m4s
Dumping 4193 out of 130894 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%


I've uploaded details (core.txt, dmesg.txt etc.) to 
https://bsdio.com/freebsd/crashes/2020-11-26-amdgpu/ and the vmcore file 
is available on request.



--
Rebecca Cran

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"