Re: nvme timeout issues with hardware and bhyve vm's
On 10/12/23 8:45 PM, Warner Losh wrote: What version is that kernel? oh dang i sent this to the wrong list, i'm not running current. the hypervisor and vm are both 13.2 and my workstation is a recent 14.0 pre-release build. i'll do more homework tomorrow and post to questions or a more appropriate list. -pete -- Pete Wright p...@nomadlogic.org
Re: nvme timeout issues with hardware and bhyve vm's
What version is that kernel? Warner On Thu, Oct 12, 2023, 9:41 PM Pete Wright wrote: > hey there - i was curious if anyone has had issues with nvme devices > recently. i'm chasing down similar issues on my workstation which has a > physical NVMe zroot, and on a bhyve VM which has a large pool exposed as > a NVMe device (and is backed by a zvol). > > on the most recent bhyve issue the VM reported this: > > Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432416007567 vs > 13737432371683671 > Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432718499597 vs > 13737432371683671 > Oct 13 02:52:52 emby kernel: nvme1: timeout with nothing complete, > resetting > Oct 13 02:52:52 emby kernel: nvme1: Resetting controller due to a timeout. > Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_WAITING > Oct 13 02:52:52 emby kernel: nvme1: resetting controller > Oct 13 02:52:53 emby kernel: nvme1: waiting > Oct 13 02:53:23 emby syslogd: last message repeated 114 times > Oct 13 02:53:23 emby kernel: nvme1: controller ready did not become 1 > within 30500 ms > Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o > Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:1 cid:119 nsid:1 > lba:4968850592 len:256 > Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:1 sqid:1 cid:119 cdw0:0 > Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o > Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:6 cid:0 nsid:1 > lba:5241952432 len:32 > Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:123 nsid:1 > lba:4968850336 len:256 > Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:1 sqid:3 cid:123 cdw0:0 > Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:0 nsid:1 > lba:5242495888 len:256 > Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:0 sqid:3 cid:0 cdw0:0 > Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 lba:528 len:16 > Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:5 cid:0 nsid:1 > lba:4934226784 len:96 > Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:0 sqid:3 cid:0 cdw0:0 > Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 > lba:6442449936 len:16 > Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:0 sqid:3 cid:0 cdw0:0 > Oct 13 02:53:25 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 > lba:6442450448 len:16 > Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:0 sqid:3 cid:0 cdw0:0 > Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:0 sqid:5 cid:0 cdw0:0 > Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:0 sqid:6 cid:0 cdw0:0 > Oct 13 02:53:25 emby kernel: nvd1: detached > > > > I had similar issues on my workstation as well. Scrubbing the NVMe > device on my real-hardware workstation hasn't turned up any issues, but > the system has locked up a handful of times. > > Just curious if others have seen the same, or if someone could point me > in the right direction... > > thanks! > -pete > > -- > Pete Wright > p...@nomadlogic.org > >
nvme timeout issues with hardware and bhyve vm's
hey there - i was curious if anyone has had issues with nvme devices recently. i'm chasing down similar issues on my workstation which has a physical NVMe zroot, and on a bhyve VM which has a large pool exposed as a NVMe device (and is backed by a zvol). on the most recent bhyve issue the VM reported this: Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432416007567 vs 13737432371683671 Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432718499597 vs 13737432371683671 Oct 13 02:52:52 emby kernel: nvme1: timeout with nothing complete, resetting Oct 13 02:52:52 emby kernel: nvme1: Resetting controller due to a timeout. Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_WAITING Oct 13 02:52:52 emby kernel: nvme1: resetting controller Oct 13 02:52:53 emby kernel: nvme1: waiting Oct 13 02:53:23 emby syslogd: last message repeated 114 times Oct 13 02:53:23 emby kernel: nvme1: controller ready did not become 1 within 30500 ms Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:1 cid:119 nsid:1 lba:4968850592 len:256 Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:1 sqid:1 cid:119 cdw0:0 Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:6 cid:0 nsid:1 lba:5241952432 len:32 Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:123 nsid:1 lba:4968850336 len:256 Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:1 sqid:3 cid:123 cdw0:0 Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:0 nsid:1 lba:5242495888 len:256 Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:0 sqid:3 cid:0 cdw0:0 Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 lba:528 len:16 Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:5 cid:0 nsid:1 lba:4934226784 len:96 Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:0 sqid:3 cid:0 cdw0:0 Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 lba:6442449936 len:16 Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:0 sqid:3 cid:0 cdw0:0 Oct 13 02:53:25 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 lba:6442450448 len:16 Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:0 sqid:3 cid:0 cdw0:0 Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:0 sqid:5 cid:0 cdw0:0 Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:0 sqid:6 cid:0 cdw0:0 Oct 13 02:53:25 emby kernel: nvd1: detached I had similar issues on my workstation as well. Scrubbing the NVMe device on my real-hardware workstation hasn't turned up any issues, but the system has locked up a handful of times. Just curious if others have seen the same, or if someone could point me in the right direction... thanks! -pete -- Pete Wright p...@nomadlogic.org
Re: how to set vfs.zfs.arc.max in 15-current ?
On Thu, Oct 12, 2023 at 11:27:49AM -0700, Cy Schubert wrote: In message , void writes: Is there a new way to set arc.max in 15-current? It's no longer settable (except to "0") in main-n265801 (Oct 7th) while multiuser. # sysctl vfs.zfs.arc.max=8589934592 vfs.zfs.arc.max: 0 sysctl: vfs.zfs.arc.max=8589934592: Invalid argument Try reducing your arc.max by an order of 10. This suggests that it's probably failing in param_set_arc_max() in the val >= arc_all_memory() comparison.. Hi, thanks for replying. Sadly, your suggestion doesn't work in this case: root@beer:/usr/src# sysctl vfs.zfs.arc.max=8589934592 vfs.zfs.arc.max: 0 sysctl: vfs.zfs.arc.max=8589934592: Invalid argument root@beer:/usr/src# sysctl vfs.zfs.arc.max=858993459 vfs.zfs.arc.max: 0 sysctl: vfs.zfs.arc.max=858993459: Invalid argument root@beer:/usr/src# sysctl vfs.zfs.arc.max=85899345 vfs.zfs.arc.max: 0 sysctl: vfs.zfs.arc.max=85899345: Invalid argument root@beer:/usr/src# sysctl vfs.zfs.arc.max=8589934 vfs.zfs.arc.max: 0 sysctl: vfs.zfs.arc.max=8589934: Invalid argument root@beer:/usr/src# sysctl vfs.zfs.arc.max=858993 vfs.zfs.arc.max: 0 sysctl: vfs.zfs.arc.max=858993: Invalid argument root@beer:/usr/src# sysctl vfs.zfs.arc.max=85899 vfs.zfs.arc.max: 0 sysctl: vfs.zfs.arc.max=85899: Invalid argument root@beer:/usr/src# sysctl vfs.zfs.arc.max=8589 vfs.zfs.arc.max: 0 sysctl: vfs.zfs.arc.max=8589: Invalid argument --
Re: how to set vfs.zfs.arc.max in 15-current ?
In message , void writes: > Is there a new way to set arc.max in 15-current? > > It's no longer settable (except to "0") in main-n265801 (Oct 7th) > while multiuser. > > # sysctl vfs.zfs.arc.max=8589934592 > vfs.zfs.arc.max: 0 > sysctl: vfs.zfs.arc.max=8589934592: Invalid argument Try reducing your arc.max by an order of 10. This suggests that it's probably failing in param_set_arc_max() in the val >= arc_all_memory() comparison.. -- Cheers, Cy Schubert FreeBSD UNIX: Web: https://FreeBSD.org NTP: Web: https://nwtime.org e^(i*pi)+1=0 ÀÀÀÀÀÀÀÀ
how to set vfs.zfs.arc.max in 15-current ?
Is there a new way to set arc.max in 15-current? It's no longer settable (except to "0") in main-n265801 (Oct 7th) while multiuser. # sysctl vfs.zfs.arc.max=8589934592 vfs.zfs.arc.max: 0 sysctl: vfs.zfs.arc.max=8589934592: Invalid argument --
Re: git: 989c5f6da990 - main - freebsd-update: create deep BEs by default [really about if -r for bectl create should just go away]
Am 2023-10-12 07:08, schrieb Mark Millard: I use the likes of: BE Active Mountpoint Space Created build_area_for-main-CA72 - - 1.99G 2023-09-20 10:19 main-CA72NR / 4.50G 2023-09-21 10:10 NAMECANMOUNT MOUNTPOINT zopt0 on/zopt0 . . . zopt0/ROOT onnone zopt0/ROOT/build_area_for-main-CA72 noautonone zopt0/ROOT/main-CA72noautonone zopt0/poudriere on /usr/local/poudriere zopt0/poudriere/dataon /usr/local/poudriere/data zopt0/poudriere/data/.m on /usr/local/poudriere/data/.m zopt0/poudriere/data/cache on /usr/local/poudriere/data/cache zopt0/poudriere/data/images on /usr/local/poudriere/data/images zopt0/poudriere/data/logs on /usr/local/poudriere/data/logs zopt0/poudriere/data/packages on /usr/local/poudriere/data/packages zopt0/poudriere/data/wrkdirson /usr/local/poudriere/data/wrkdirs zopt0/poudriere/jails on /usr/local/poudriere/jails zopt0/poudriere/ports on /usr/local/poudriere/ports zopt0/tmp on/tmp zopt0/usr off /usr zopt0/usr/13_0R-src on/usr/13_0R-src zopt0/usr/alt-main-src on/usr/alt-main-src zopt0/usr/home on/usr/home zopt0/usr/local on/usr/local [...] If such ends up as unsupportable, it will effectively eliminate my reason for using bectl (and, so, zfs): the sharing is important to my use. Additionally/complementary to what Kyle said... The -r option is about zop0/ROOT/main-CA72 zop0/ROOT/main-CA72/subDS1 zop0/ROOT/main-CA72/subDS2 A shallow clone is only taking zop0/ROOT/main-CA72 into account, while a -r clone is also cloning subDS1 and subDS2. So as Kyle said, your (and my) use case are not affected by this. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature