Re: AMD EPYC 7551 box panic: pr_find_pagehead
On Tue, Jun 16, 2020 at 10:34:44AM +0200, Janne Johansson wrote: > Den m??n 15 juni 2020 kl 20:18 skrev Kenneth R Westerback < > kwesterb...@gmail.com>: > > > > If it would help, I could screenshot one page at a time, that seems to be > > > the best I can do today. > > > > Works for me, though I don't recommend sending all those pics to > > bugs@. > > > > > http://c66.it.su.se:8080/obsd/amd-dmesg-jpegs/index.html > > Painstakingly screenshot:ed one at a time, but hopefully readable as a > series of images in a row. > > -- > May the most significant bit of your life be positive. I see nvme0: KXG60ZNV256G Toshiba, firmware AGGA4103, serial scsibus2 at nvme0: 2 targets, initiator 0 sd0 at scsibus2 targ 1 lun 0: sd0: 244198MB, 512 bytes/sector, 500118192 sectors nvme1: Micro_9300_MTFDHAL3T8DP, firmware 11300DG0, serial scsibus3 at nvme1: 33 targets, initiator 0 sd1 at scsibus3 targ 1 lun 0: sd1: 3662830MB, 512 bytes/sector, 7501476528 sectors sd32 at scsibus3 targ 2 -> 32 lun 0: nvme2: Micro_9300_MTFDHAL3T8DP, firmware 11300DG0, serial scsibus4 at nvme2: 33 targets, initiator 0 sd33 at scsibus4 targ 1 lun 0: sd33: 3662830MB, 512 bytes/sector, 7501476528 sectors sd64 at scsibus4 targ 2 -> 32 lun 0: nvme3: Micro_9300_MTFDHAL3T8DP, firmware 11300DG0, serial scsibus5 at nvme3: 33 targets, initiator 0 sd65 at scsibus5 targ 1 lun 0: sd65: 3662830MB, 512 bytes/sector, 7501476528 sectors sd96 at scsibus6 targ 2 -> 32 lun 0: nvme4: INTEL SSDPED1K375GA, firmware E2010435, serial scsibus6 at nvme4: 2 targets, initiator 0 sd97 at scsibus6 targ 1 lun 0 sd97: 357707MB, 512 bytes/sector, 732585168 sectors So FIVE physical drives? It looks to me like the Micron devices are reporting/configured with 33 namespaces. Each namespace is treated as a separate disk. I don't know if the device is actually configured this way, or is reporting a max number where other drives are reporting actual configured namespaces. I am *guessing* that all the namespaces other than the first one have 0 sectors allocated. But we print out the "sdNN at scsibusXX ..." line before the size is determined via INQUIRY. Dunno if that is avoidable. A SCSIDEBUG kernel will print a lot of interesting information that would shed more light. Ken
Re: AMD EPYC 7551 box panic: pr_find_pagehead
Den mån 15 juni 2020 kl 20:18 skrev Kenneth R Westerback < kwesterb...@gmail.com>: > > If it would help, I could screenshot one page at a time, that seems to be > > the best I can do today. > > Works for me, though I don't recommend sending all those pics to > bugs@. > > http://c66.it.su.se:8080/obsd/amd-dmesg-jpegs/index.html Painstakingly screenshot:ed one at a time, but hopefully readable as a series of images in a row. -- May the most significant bit of your life be positive.
Re: AMD EPYC 7551 box panic: pr_find_pagehead
On Mon, Jun 15, 2020 at 08:03:40PM +0200, Janne Johansson wrote: > Den m??n 15 juni 2020 kl 19:59 skrev Kenneth R Westerback < > kwesterb...@gmail.com>: > > > On Mon, Jun 15, 2020 at 07:15:36PM +0200, Janne Johansson wrote: > > > Recent AMD box with a bunch of nvme drives, never booted anything, > > crashes > > The line > > > > scsibus2 at nvme1: 33 targets, initiator 0 > > > > is also weird. I have never seen anything but 1 or 2 targets on nvme. > > > > > My bad. It has four or five nvmes and there is some .. reflection going on. > I seem to recall similar things in the old scsi1 days if you had bad > termination, long since I saw mirrored devices like this on the bus. > > > > Running a kernel with SCSIDEBUG will produce more information on the > > negotiation/discovery interactions. > > Clarification of "bunch of nvme drives" and a complete dmesg would > > also help. > > > > If it would help, I could screenshot one page at a time, that seems to be > the best I can do today. Works for me, though I don't recommend sending all those pics to bugs@. > > -- > May the most significant bit of your life be positive. The other random thing to try is to find the line sc->sc_link.adapter_buswidth = sc->sc_nn + 1; in /usr/src/sys/dev/ic/nvme.c and replace the "sc->sc_nn + 1" with 1 or 2. Perhaps the nvme controller is returning interesting values for the namespace count in the identify message. I see there is already some weird code to deal with Apple oddities. :-) Ken
Re: AMD EPYC 7551 box panic: pr_find_pagehead
Den mån 15 juni 2020 kl 19:59 skrev Kenneth R Westerback < kwesterb...@gmail.com>: > On Mon, Jun 15, 2020 at 07:15:36PM +0200, Janne Johansson wrote: > > Recent AMD box with a bunch of nvme drives, never booted anything, > crashes > The line > > scsibus2 at nvme1: 33 targets, initiator 0 > > is also weird. I have never seen anything but 1 or 2 targets on nvme. > > My bad. It has four or five nvmes and there is some .. reflection going on. I seem to recall similar things in the old scsi1 days if you had bad termination, long since I saw mirrored devices like this on the bus. > Running a kernel with SCSIDEBUG will produce more information on the > negotiation/discovery interactions. > Clarification of "bunch of nvme drives" and a complete dmesg would > also help. > If it would help, I could screenshot one page at a time, that seems to be the best I can do today. -- May the most significant bit of your life be positive.
Re: AMD EPYC 7551 box panic: pr_find_pagehead
On Mon, Jun 15, 2020 at 07:15:36PM +0200, Janne Johansson wrote: > Recent AMD box with a bunch of nvme drives, never booted anything, crashes > with > panic: pr_find_pagehead: dma256: page header missing > after listing some sd(4) drives on 14-Jun snapshot installation. > > That is the only odd output in the dmesg as far as I can see, picture > included. > > 6.7 release has the same issue. > > 6.6 installer works and I can boot the installed OS but I have no net on > the box (mcx 40/56/100GE card in it) yet. > > > -- > May the most significant bit of your life be positive. The line scsibus2 at nvme1: 33 targets, initiator 0 is also weird. I have never seen anything but 1 or 2 targets on nvme. Running a kernel with SCSIDEBUG will produce more information on the negotiation/discovery interactions. Clarification of "bunch of nvme drives" and a complete dmesg would also help. Ken
Re: AMD EPYC 7551 box panic: pr_find_pagehead
Den mån 15 juni 2020 kl 19:55 skrev Janne Johansson : > > I currently only have remote-console access to it, but the three of the > four nvmes all get tons of ghosts. > > and I am sorry in advance for the pictures, but with no network configured on the switches the box is connected to, all I can do is screenshot the console window I used for installation (along with remote-cd-iso use for install66/67.iso) -- May the most significant bit of your life be positive.
Re: AMD EPYC 7551 box panic: pr_find_pagehead
> From: Janne Johansson > Date: Mon, 15 Jun 2020 19:15:36 +0200 > Content-Type: multipart/mixed; boundary="ea5a1505a8229334" > > Recent AMD box with a bunch of nvme drives, never booted anything, crashes > with > panic: pr_find_pagehead: dma256: page header missing > after listing some sd(4) drives on 14-Jun snapshot installation. > > That is the only odd output in the dmesg as far as I can see, picture > included. That Micron_9300_MTFD disk showing up twice is suspicious. Does the machine boot with nvme(4) disabled and/or that particular NVMe disk removed?