Re: nvme timeout issues with hardware and bhyve vm's

2023-10-12 Thread Pete Wright




On 10/12/23 8:45 PM, Warner Losh wrote:

What version is that kernel?


oh dang i sent this to the wrong list, i'm not running current.  the 
hypervisor and vm are both 13.2 and my workstation is a recent 14.0 
pre-release build.  i'll do more homework tomorrow and post to questions 
or a more appropriate list.


-pete

--
Pete Wright
p...@nomadlogic.org



Re: nvme timeout issues with hardware and bhyve vm's

2023-10-12 Thread Warner Losh
What version is that kernel?

Warner

On Thu, Oct 12, 2023, 9:41 PM Pete Wright  wrote:

> hey there - i was curious if anyone has had issues with nvme devices
> recently.  i'm chasing down similar issues on my workstation which has a
> physical NVMe zroot, and on a bhyve VM which has a large pool exposed as
> a NVMe device (and is backed by a zvol).
>
> on the most recent bhyve issue the VM reported this:
>
> Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432416007567 vs
> 13737432371683671
> Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432718499597 vs
> 13737432371683671
> Oct 13 02:52:52 emby kernel: nvme1: timeout with nothing complete,
> resetting
> Oct 13 02:52:52 emby kernel: nvme1: Resetting controller due to a timeout.
> Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_WAITING
> Oct 13 02:52:52 emby kernel: nvme1: resetting controller
> Oct 13 02:52:53 emby kernel: nvme1: waiting
> Oct 13 02:53:23 emby syslogd: last message repeated 114 times
> Oct 13 02:53:23 emby kernel: nvme1: controller ready did not become 1
> within 30500 ms
> Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o
> Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:1 cid:119 nsid:1
> lba:4968850592 len:256
> Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
> m:0 dnr:1 sqid:1 cid:119 cdw0:0
> Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o
> Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:6 cid:0 nsid:1
> lba:5241952432 len:32
> Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:123 nsid:1
> lba:4968850336 len:256
> Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
> m:0 dnr:1 sqid:3 cid:123 cdw0:0
> Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:0 nsid:1
> lba:5242495888 len:256
> Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
> m:0 dnr:0 sqid:3 cid:0 cdw0:0
> Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 lba:528 len:16
> Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:5 cid:0 nsid:1
> lba:4934226784 len:96
> Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
> m:0 dnr:0 sqid:3 cid:0 cdw0:0
> Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1
> lba:6442449936 len:16
> Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
> m:0 dnr:0 sqid:3 cid:0 cdw0:0
> Oct 13 02:53:25 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1
> lba:6442450448 len:16
> Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
> m:0 dnr:0 sqid:3 cid:0 cdw0:0
> Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
> m:0 dnr:0 sqid:5 cid:0 cdw0:0
> Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
> m:0 dnr:0 sqid:6 cid:0 cdw0:0
> Oct 13 02:53:25 emby kernel: nvd1: detached
>
>
>
> I had similar issues on my workstation as well.  Scrubbing the NVMe
> device on my real-hardware workstation hasn't turned up any issues, but
> the system has locked up a handful of times.
>
> Just curious if others have seen the same, or if someone could point me
> in the right direction...
>
> thanks!
> -pete
>
> --
> Pete Wright
> p...@nomadlogic.org
>
>


nvme timeout issues with hardware and bhyve vm's

2023-10-12 Thread Pete Wright
hey there - i was curious if anyone has had issues with nvme devices 
recently.  i'm chasing down similar issues on my workstation which has a 
physical NVMe zroot, and on a bhyve VM which has a large pool exposed as 
a NVMe device (and is backed by a zvol).


on the most recent bhyve issue the VM reported this:

Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432416007567 vs 
13737432371683671
Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432718499597 vs 
13737432371683671

Oct 13 02:52:52 emby kernel: nvme1: timeout with nothing complete, resetting
Oct 13 02:52:52 emby kernel: nvme1: Resetting controller due to a timeout.
Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_WAITING
Oct 13 02:52:52 emby kernel: nvme1: resetting controller
Oct 13 02:52:53 emby kernel: nvme1: waiting
Oct 13 02:53:23 emby syslogd: last message repeated 114 times
Oct 13 02:53:23 emby kernel: nvme1: controller ready did not become 1 
within 30500 ms

Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o
Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:1 cid:119 nsid:1 
lba:4968850592 len:256
Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 
m:0 dnr:1 sqid:1 cid:119 cdw0:0

Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o
Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:6 cid:0 nsid:1 
lba:5241952432 len:32
Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:123 nsid:1 
lba:4968850336 len:256
Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 
m:0 dnr:1 sqid:3 cid:123 cdw0:0
Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:0 nsid:1 
lba:5242495888 len:256
Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 
m:0 dnr:0 sqid:3 cid:0 cdw0:0

Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 lba:528 len:16
Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:5 cid:0 nsid:1 
lba:4934226784 len:96
Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 
m:0 dnr:0 sqid:3 cid:0 cdw0:0
Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 
lba:6442449936 len:16
Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 
m:0 dnr:0 sqid:3 cid:0 cdw0:0
Oct 13 02:53:25 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 
lba:6442450448 len:16
Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 
m:0 dnr:0 sqid:3 cid:0 cdw0:0
Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 
m:0 dnr:0 sqid:5 cid:0 cdw0:0
Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 
m:0 dnr:0 sqid:6 cid:0 cdw0:0

Oct 13 02:53:25 emby kernel: nvd1: detached



I had similar issues on my workstation as well.  Scrubbing the NVMe 
device on my real-hardware workstation hasn't turned up any issues, but 
the system has locked up a handful of times.


Just curious if others have seen the same, or if someone could point me 
in the right direction...


thanks!
-pete

--
Pete Wright
p...@nomadlogic.org



Re: how to set vfs.zfs.arc.max in 15-current ?

2023-10-12 Thread void

On Thu, Oct 12, 2023 at 11:27:49AM -0700, Cy Schubert wrote:

In message , void writes:

Is there a new way to set arc.max in 15-current?

It's no longer settable (except to "0") in main-n265801 (Oct 7th)
while multiuser.

# sysctl vfs.zfs.arc.max=8589934592
vfs.zfs.arc.max: 0
sysctl: vfs.zfs.arc.max=8589934592: Invalid argument


Try reducing your arc.max by an order of 10. This suggests that it's
probably failing in param_set_arc_max() in the val >= arc_all_memory()
comparison..


Hi, thanks for replying. Sadly, your suggestion doesn't work in this case:

root@beer:/usr/src# sysctl vfs.zfs.arc.max=8589934592
vfs.zfs.arc.max: 0
sysctl: vfs.zfs.arc.max=8589934592: Invalid argument
root@beer:/usr/src# sysctl vfs.zfs.arc.max=858993459
vfs.zfs.arc.max: 0
sysctl: vfs.zfs.arc.max=858993459: Invalid argument
root@beer:/usr/src# sysctl vfs.zfs.arc.max=85899345
vfs.zfs.arc.max: 0
sysctl: vfs.zfs.arc.max=85899345: Invalid argument
root@beer:/usr/src# sysctl vfs.zfs.arc.max=8589934
vfs.zfs.arc.max: 0
sysctl: vfs.zfs.arc.max=8589934: Invalid argument
root@beer:/usr/src# sysctl vfs.zfs.arc.max=858993
vfs.zfs.arc.max: 0
sysctl: vfs.zfs.arc.max=858993: Invalid argument
root@beer:/usr/src# sysctl vfs.zfs.arc.max=85899
vfs.zfs.arc.max: 0
sysctl: vfs.zfs.arc.max=85899: Invalid argument
root@beer:/usr/src# sysctl vfs.zfs.arc.max=8589
vfs.zfs.arc.max: 0
sysctl: vfs.zfs.arc.max=8589: Invalid argument

--



Re: how to set vfs.zfs.arc.max in 15-current ?

2023-10-12 Thread Cy Schubert
In message , void writes:
> Is there a new way to set arc.max in 15-current?
>
> It's no longer settable (except to "0") in main-n265801 (Oct 7th)
> while multiuser.
>
> # sysctl vfs.zfs.arc.max=8589934592
> vfs.zfs.arc.max: 0
> sysctl: vfs.zfs.arc.max=8589934592: Invalid argument

Try reducing your arc.max by an order of 10. This suggests that it's 
probably failing in param_set_arc_max() in the val >= arc_all_memory()
comparison..


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0






how to set vfs.zfs.arc.max in 15-current ?

2023-10-12 Thread void

Is there a new way to set arc.max in 15-current?

It's no longer settable (except to "0") in main-n265801 (Oct 7th)
while multiuser.

# sysctl vfs.zfs.arc.max=8589934592
vfs.zfs.arc.max: 0
sysctl: vfs.zfs.arc.max=8589934592: Invalid argument

--



Re: git: 989c5f6da990 - main - freebsd-update: create deep BEs by default [really about if -r for bectl create should just go away]

2023-10-12 Thread Alexander Leidinger

Am 2023-10-12 07:08, schrieb Mark Millard:


I use the likes of:

BE   Active Mountpoint Space Created
build_area_for-main-CA72 -  -  1.99G 2023-09-20 10:19
main-CA72NR /  4.50G 2023-09-21 10:10

NAMECANMOUNT  MOUNTPOINT
zopt0   on/zopt0
. . .
zopt0/ROOT  onnone
zopt0/ROOT/build_area_for-main-CA72 noautonone
zopt0/ROOT/main-CA72noautonone
zopt0/poudriere on
/usr/local/poudriere
zopt0/poudriere/dataon
/usr/local/poudriere/data
zopt0/poudriere/data/.m on
/usr/local/poudriere/data/.m
zopt0/poudriere/data/cache  on
/usr/local/poudriere/data/cache
zopt0/poudriere/data/images on
/usr/local/poudriere/data/images
zopt0/poudriere/data/logs   on
/usr/local/poudriere/data/logs
zopt0/poudriere/data/packages   on
/usr/local/poudriere/data/packages
zopt0/poudriere/data/wrkdirson
/usr/local/poudriere/data/wrkdirs
zopt0/poudriere/jails   on
/usr/local/poudriere/jails
zopt0/poudriere/ports   on
/usr/local/poudriere/ports

zopt0/tmp   on/tmp
zopt0/usr   off   /usr
zopt0/usr/13_0R-src on/usr/13_0R-src
zopt0/usr/alt-main-src  on/usr/alt-main-src
zopt0/usr/home  on/usr/home
zopt0/usr/local on/usr/local


[...]


If such ends up as unsupportable, it will effectively eliminate my
reason for using bectl (and, so, zfs): the sharing is important to
my use.


Additionally/complementary to what Kyle said...

The -r option is about
zop0/ROOT/main-CA72
zop0/ROOT/main-CA72/subDS1
zop0/ROOT/main-CA72/subDS2

A shallow clone is only taking zop0/ROOT/main-CA72 into account, while a 
-r clone is also cloning subDS1 and subDS2.


So as Kyle said, your (and my) use case are not affected by this.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature