Re: Chasing OOM Issues - good sysctl metrics to use?
Pete Wright wrote on Date: Thu, 21 Apr 2022 19:16:42 -0700 : > on my workstation running CURRENT (amd64/32g of ram) i've been running > into a scenario where after 4 or 5 days of daily use I get an OOM event > and both chromium and firefox are killed. then in the next day or so > the system will become very unresponsive in the morning when i unlock my > screensaver in the morning forcing a manual power cycle. > > one thing i've noticed is growing swap usage but plenty of free and > inactive memory as well as a GB or so of memory in the Laundry state > according top. my understanding is that seeing swap usage grow over > time is expected and doesn't necessarily indicate a problem. but what > concerns me is the system locking up while seeing quite a bit of disk > i/o (maybe from paging back in?). > > in order to help chase this down i've setup the > prometheus_sysctl_exporter(8) to send data to a local prometheus > instance. the goal is to examine memory utilizaton over time to help > detect any issues. so my question is this: > > what OID's would be useful to help see to help diagnose weird memory > issues like this? > > i'm currently looking at: > sysctl_vm_domain_0_stats_laundry > sysctl_vm_domain_0_stats_active > sysctl_vm_domain_0_stats_free_count > sysctl_vm_domain_0_stats_inactive_pps > > > thanks in advance - and i'd be happy to share my data if anyone is > interested :) Messages in the console out would be appropriate to report. Messages might also be available via the following at appropriate times: # dmesg -a . . . or: # more /var/log/messages . . . Generally messages from after the boot is complete are more relevant. Messages like the following are some examples that would be of interest: pid . . .(c++), jid . . ., uid . . ., was killed: failed to reclaim memory pid . . .(c++), jid . . ., uid . . ., was killed: a thread waited too long to allocate a page pid . . .(c++), jid . . ., uid . . ., was killed: out of swap space (That last is somewhat of a misnomer for the internal issue that leads to it.) I'm hoping you got message(s) of one or more of the above kinds. But others are also relevant: . . . kernel: swap_pager: out of swap space . . . kernel: swp_pager_getswapspace(7): failed . . . kernel: swap_pager: indefinite wait buffer: bufobj: . . ., blkno: . . ., size: . . . (Those messages do not announce a process kill but give some evidence about context.) Some of the messages with part of the text matching actually identify somewhat different contexts --so each message type is relevant. There may be other types of messages that are relevant. The sequencing of the messages could be relevant. Do you have any swap partitions set up and in use? The details could be relevant. Do you have swap set up some other way than via swap partition use? No swap? If 1+ swap partitions are in use, things that suggest the speeds/latency characteristics of the I/O to the drive could be relevant. ZFS (so with ARC)? UFS? Both? The first block of lines from a top display could be relevant, particularly when it is clearly progressing towards having the problem. (After the problem is too late.) (I just picked top as a way to get a bunch of the information all together automatically.) These sorts of things might help folks help you. === Mark Millard marklmi at yahoo.com
Chasing OOM Issues - good sysctl metrics to use?
hello - on my workstation running CURRENT (amd64/32g of ram) i've been running into a scenario where after 4 or 5 days of daily use I get an OOM event and both chromium and firefox are killed. then in the next day or so the system will become very unresponsive in the morning when i unlock my screensaver in the morning forcing a manual power cycle. one thing i've noticed is growing swap usage but plenty of free and inactive memory as well as a GB or so of memory in the Laundry state according top. my understanding is that seeing swap usage grow over time is expected and doesn't necessarily indicate a problem. but what concerns me is the system locking up while seeing quite a bit of disk i/o (maybe from paging back in?). in order to help chase this down i've setup the prometheus_sysctl_exporter(8) to send data to a local prometheus instance. the goal is to examine memory utilizaton over time to help detect any issues. so my question is this: what OID's would be useful to help see to help diagnose weird memory issues like this? i'm currently looking at: sysctl_vm_domain_0_stats_laundry sysctl_vm_domain_0_stats_active sysctl_vm_domain_0_stats_free_count sysctl_vm_domain_0_stats_inactive_pps thanks in advance - and i'd be happy to share my data if anyone is interested :) -pete -- Pete Wright p...@nomadlogic.org @nomadlogicLA
Re: nullfs and ZFS issues
On Thu, Apr 21, 2022 at 03:44:02PM +0200, Alexander Leidinger wrote: | Quoting Mateusz Guzik (from Thu, 21 Apr 2022 | 14:50:42 +0200): | | > On 4/21/22, Alexander Leidinger wrote: | >> I tried nocache on a system with a lot of jails which use nullfs, | >> which showed very slow behavior in the daily periodic runs (12h runs | >> in the night after boot, 24h or more in subsequent nights). Now the | >> first nightly run after boot was finished after 4h. | >> | >> What is the benefit of not disabling the cache in nullfs? I would | >> expect zfs (or ufs) to cache the (meta)data anyway. | >> | > | > does the poor performance show up with | > https://people.freebsd.org/~mjg/vnlru_free_pick.diff ? | | I would like to have all the 22 jails run the periodic scripts a | second night in a row before trying this. | | > if the long runs are still there, can you get some profiling from it? | > sysctl -a before and after would be a start. | > | > My guess is that you are the vnode limit and bumping into the 1 second sleep. | | That would explain the behavior I see since I added the last jail | which seems to have crossed a threshold which triggers the slow | behavior. | | Current status (with the 112 nullfs mounts with nocache): | kern.maxvnodes: 10485760 | kern.numvnodes:3791064 | kern.freevnodes: 3613694 | kern.cache.stats.heldvnodes:151707 | kern.vnodes_created: 260288639 | | The maxvnodes value is already increased by 10 times compared to the | default value on this system. With the patch, you shouldn't mount with nocache! However, you might want to tune: vfs.zfs.arc.meta_prune vfs.zfs.arc.meta_adjust_restarts Since the code on restart will increment the prune amount by vfs.zfs.arc.meta_prune and submit that amount to the vnode reclaim code. So then it will end up reclaiming a lot of vnodes. The defaults of 1 * 4096 and submitting it each loop can most of the cache to be freed. With relative small values of them, then the cache didn't shrink to much. Doug A.
Re: nullfs and ZFS issues
On Thu, Apr 21, 2022 at 03:44:02PM +0200, Alexander Leidinger wrote: | Quoting Mateusz Guzik (from Thu, 21 Apr 2022 | 14:50:42 +0200): | | > On 4/21/22, Alexander Leidinger wrote: | >> I tried nocache on a system with a lot of jails which use nullfs, | >> which showed very slow behavior in the daily periodic runs (12h runs | >> in the night after boot, 24h or more in subsequent nights). Now the | >> first nightly run after boot was finished after 4h. | >> | >> What is the benefit of not disabling the cache in nullfs? I would | >> expect zfs (or ufs) to cache the (meta)data anyway. | >> | > | > does the poor performance show up with | > https://people.freebsd.org/~mjg/vnlru_free_pick.diff ? | | I would like to have all the 22 jails run the periodic scripts a | second night in a row before trying this. | | > if the long runs are still there, can you get some profiling from it? | > sysctl -a before and after would be a start. | > | > My guess is that you are the vnode limit and bumping into the 1 second sleep. | | That would explain the behavior I see since I added the last jail | which seems to have crossed a threshold which triggers the slow | behavior. | | Current status (with the 112 nullfs mounts with nocache): | kern.maxvnodes: 10485760 | kern.numvnodes:3791064 | kern.freevnodes: 3613694 | kern.cache.stats.heldvnodes:151707 | kern.vnodes_created: 260288639 | | The maxvnodes value is already increased by 10 times compared to the | default value on this system. I've attached mount.patch that when doing mount -v should show the vnode usage per filesystem. Note that the problem I was running into was after some operations arc_prune and arc_evict would consume 100% of 2 cores and make ZFS really slow. If you are not running into that issue then nocache etc. shouldn't be needed. On my laptop I set ARC to 1G since I don't use swap and in the past ARC would consume to much memory and things would die. When the nullfs holds a bunch of vnodes then ZFS couldn't release them. FYI, on my laptop with nocache and limited vnodes I haven't run into this problem. I haven't tried the patch to let ZFS free it's and nullfs vnodes on my laptop. I have only tried it via bhyve test. I use bhyve and a md drive to avoid wearing out my SSD and it's faster to test. I have found the git, tar, make world etc. could trigger the issue before but haven't had any issues with nocache and capping vnodes. Thanks, Doug A. diff --git a/sbin/mount/mount.c b/sbin/mount/mount.c index 79d9d6cb0ca..00eefb3a5e0 100644 --- a/sbin/mount/mount.c +++ b/sbin/mount/mount.c @@ -692,6 +692,13 @@ prmount(struct statfs *sfp) xo_emit("{D:, }{Lw:fsid}{:fsid}", fsidbuf); free(fsidbuf); } + if (sfp->f_nvnodelistsize != 0 || sfp->f_lazyvnodelistsize != 0) { + xo_open_container("vnodes"); +xo_emit("{D:, }{Lwc:vnodes}{Lw:count}{w:count/%ju}{Lw:lazy}{:lazy/%ju}", +(uintmax_t)sfp->f_nvnodelistsize, +(uintmax_t)sfp->f_lazyvnodelistsize); + xo_close_container("vnodes"); + } } xo_emit("{D:)}\n"); } diff --git a/sys/kern/vfs_mount.c b/sys/kern/vfs_mount.c index a495ad86ac4..3648ef8d080 100644 --- a/sys/kern/vfs_mount.c +++ b/sys/kern/vfs_mount.c @@ -2625,6 +2626,8 @@ __vfs_statfs(struct mount *mp, struct statfs *sbp) sbp->f_version = STATFS_VERSION; sbp->f_namemax = NAME_MAX; sbp->f_flags = mp->mnt_flag & MNT_VISFLAGMASK; + sbp->f_nvnodelistsize = mp->mnt_nvnodelistsize; + sbp->f_lazyvnodelistsize = mp->mnt_lazyvnodelistsize; return (mp->mnt_op->vfs_statfs(mp, sbp)); } diff --git a/sys/sys/mount.h b/sys/sys/mount.h index 3383bfe8f43..95dd3c76ae5 100644 --- a/sys/sys/mount.h +++ b/sys/sys/mount.h @@ -91,7 +91,9 @@ struct statfs { uint64_t f_asyncwrites; /* count of async writes since mount */ uint64_t f_syncreads; /* count of sync reads since mount */ uint64_t f_asyncreads; /* count of async reads since mount */ - uint64_t f_spare[10]; /* unused spare */ + uint32_t f_nvnodelistsize; /* (i) # of vnodes */ + uint32_t f_lazyvnodelistsize;/* (l) # of lazy vnodes */ + uint64_t f_spare[9]; /* unused spare */ uint32_t f_namemax; /* maximum filename length */ uid_t f_owner; /* user that mounted the filesystem */ fsid_t f_fsid; /* filesystem id */
Re: Daily black screen of death
On Thu, Apr 21, 2022 at 09:44:04AM +0200, Emmanuel Vadot wrote: > > Hello Steve, > > On Tue, 19 Apr 2022 11:32:32 -0700 > Steve Kargl wrote: > > > FYI, > > > > I'm experiencing an almost daily black screen of death panic. > > Kernel, world, drm-current-kmod, and gpu-firmware-kmod were > > all rebuilt and installed at the same time. Uname shows > > > > FreeBSD 14.0-CURRENT #0 main-n254360-eb9d205fa69: Tue Apr 5 13:49:47 PDT > > 2022 > > > > So, April 5th sources. > > > > The panic results in a keyboard lock and no dump. The system > > does not have a serial console. Only recourse is a hard rest. > > > > Hand transcribed from photo > > > > _sleep() at _sleep+0x38a/frame 0xfe012b7c0680 > > buf_daemon_shutdown() at buf_daemon_shutdown+0x6b/frame 0xfe012b7c06a0 > > kern_reboot() at kern_reboot+0x2ae/frame 0xfe012b7c06e0 > > vpanic() at vpanic+0x1ee/frame 0xfe012b7c0730 > > panic() at panic+0x43/frame 0xfe012b7c0790 > > > > Above repeats 100s of time scrolling off the screen with ever > > increasing frame pointer. > > > > Final message, > > > > mi_switch() at mi_switch+0x18e/frame 0xfe012b7c14b0 > > __mtx_lock_sleep() at __mtx_lock_sleep+0x173/frame 0xfe012b7c1510 > > __mtx_lock_flags() at __mtx_lock_flags+0xc0/frame 0xfe012b7c1550 > > linux_wake_up() at linux_wake_up+0x38/frame 0xfe012b7c15a0 > > radeon_fence_is_signaled() at radeon_fence_is_signaled+0x99/frame > > 0xfe012b7c15f0 > > dma_resv_add_shared_fence() at dma_resv_add_shared_fence+0x99/frame > > 0xfe012b7c1640 > > ttm_eu_fence_buffer_objects() at ttm_eu_fence_buffer_objects+0x79/frame > > 0xfe012b7c1680 > > radeon_cs_parser_fini() at radeon_cs_parser_fini+0x53/frame > > 0xfe012b7c16b0 > > radeaon_cs_ioctl() at radeaon_cs_ioctl+0x75e/frame 0xfe012b7c1b30 > > drm_ioctl_kernel() at drm_ioctl_kernel+0xc7/frame 0xfe012b7c1b80 > > drm_ioctl() at drm_ioctl+0x2c3/frame 0xfe012b7c1c70 > > linux_file_ioctl() at linux_file_ioctl+0x309/frame 0xfe012b7c1cd0 > > kern_ioctl() at kern_ioctl+0x1dc/frame 0xfe012b7c1d40 > > sys_ioctl() at sys_ioctl+0x121/frame 0xfe012b7c1e10 > > amd64_syscall() at amd64_syscall+0x108/frame 0xfe012b7c1f30 > > fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe012b7c1f30 > > --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x36a096c34ea, rsp = > > 0x3fa11e623eb8, \ > > rbp = 0x3fa11e623ee0 --- > > panic: _sleep: curthread not running > > cpuid = 4 > > time = 1650389478 > > KDB: stack backtrace: > > > > One common trigger appears to be the use of firefox-99.0,2 from > > the ports collection. > > > > -- > > Steve > > > > What version of drm are you using ? > Since when do you experience this ? > drm as not changed much for a long time now except adapting a few > files for new linuxkpi addition. > drm-current-kmod-5.4.144.g20220223 gpu-firmware-kmod-g20210330 I upgraded a Jan 2022 kernel+world+drm+gpu 2 to 3 weeks ago. The Jan 2022 system just worked. I've had the problem since the upgrade. I've also rebuild firefox, libdrm, the X-server, and X11 libraries. Still see the panic. As the panic messages scroll off the screen, I'm not sure the above last bit is the actual cause or simply a side effect. Some additional info from a dmesg after the reboot. WARNING: / was not properly dismounted [drm] radeon kernel modesetting enabled. drmn0: on vgapci0 vgapci0: child drmn0 requested pci_enable_io vgapci0: child drmn0 requested pci_enable_io sysctl_warn_reuse: can't re-use a leaf (hw.dri.debug)! [drm] initializing kernel modesetting (CAICOS 0x1002:0x6779 0x1092:0x6450 0x00). [drm ERROR :radeon_atombios_init] Unable to find PCI I/O BAR; using MMIO for ATOM IIO ATOM BIOS: C26401 drmn0: VRAM: 1024M 0x - 0x3FFF (1024M used) drmn0: GTT: 1024M 0x4000 - 0x7FFF [drm] Detected VRAM RAM=1024M, BAR=256M [drm] RAM width 64bits DDR [TTM] Zone kernel: Available graphics memory: 8359708 KiB [TTM] Zone dma32: Available graphics memory: 2097152 KiB [TTM] Initializing pool allocator [drm] radeon: 1024M of VRAM memory ready [drm] radeon: 1024M of GTT memory ready. [drm] Loading CAICOS Microcode drmn0: successfully loaded firmware image 'radeon/CAICOS_pfp.bin' drmn0: successfully loaded firmware image 'radeon/CAICOS_me.bin' drmn0: successfully loaded firmware image 'radeon/BTC_rlc.bin' drmn0: successfully loaded firmware image 'radeon/CAICOS_mc.bin' drmn0: successfully loaded firmware image 'radeon/CAICOS_smc.bin' [drm] Internal thermal controller with fan control [drm] radeon: dpm initialized drmn0: successfully loaded firmware image 'radeon/SUMO_uvd.bin' [drm] GART: num cpu pages 262144, num gpu pages 262144 [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 [drm] PCIE GART of 1024M enabled (table at 0x00162000). drmn0: WB enabled drmn0: fence driver on ring 0 use gpu addr 0x4c00 and cpu addr 0x0xf8000be96c00 drmn0: fence driver
Re: 'set but unused' breaks drm-*-kmod
On Thu, 21 Apr 2022 08:51:26 -0400 Michael Butler wrote: > On 4/21/22 03:42, Emmanuel Vadot wrote: > > > > Hello Michael, > > > > On Wed, 20 Apr 2022 23:39:12 -0400 > > Michael Butler wrote: > > > >> Seems this new requirement breaks kmod builds too .. > >> > >> The first of many errors was (I stopped chasing them all for lack of > >> time) .. > >> > >> --- amdgpu_cs.o --- > >> /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.7.19_3/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1210:26: > >> error: variable 'priority' set but not used > >> [-Werror,-Wunused-but-set-variable] > >> enum drm_sched_priority priority; > >> ^ > >> 1 error generated. > >> *** [amdgpu_cs.o] Error code 1 > >> > > > > How are you building the port, directly or with PORTS_MODULES ? > > I do make passes on the warning for drm and I did for set-but-not-used > > case but unfortunately this option doesn't exists in 13.0 so I couldn't > > apply those in every branch. > > I build this directly on -current. I'm guessing that these are what > triggered this behaviour: > > commit 8b83d7e0ee54416b0ee58bd85f9c0ae7fb3357a1 > Author: John Baldwin > Date: Mon Apr 18 16:06:27 2022 -0700 > > Make -Wunused-but-set-variable a fatal error for clang 13+ for > kernel builds. > > Reviewed by:imp, emaste > Differential Revision: https://reviews.freebsd.org/D34949 > > commit 615d289ffefe2b175f80caa9b1e113c975576472 > Author: John Baldwin > Date: Mon Apr 18 16:06:14 2022 -0700 > > Re-enable set but not used warnings for kernel builds. > > make tinderbox now passes with this warning enabled as a fatal error, > so revert the change to hide it in preparation for making it fatal. > > This reverts commit e8e691983bb75e80153b802f47733f1531615fa2. > > Reviewed by:imp, emaste > Differential Revision: https://reviews.freebsd.org/D34948 > > Ok I see, I won't have time until monday (maybe tuesday to fix this) but if someone wants to beat me to it we should add some new CWARNFLAGS for each problematic files in the 5.4-lts and 5.7-table branches of drm-kmod (master which is following 5.10 is already good) only if $ {COMPILER_VERSION} >= 13. Cheers, -- Emmanuel Vadot
Re: nullfs and ZFS issues
Quoting Mateusz Guzik (from Thu, 21 Apr 2022 14:50:42 +0200): On 4/21/22, Alexander Leidinger wrote: I tried nocache on a system with a lot of jails which use nullfs, which showed very slow behavior in the daily periodic runs (12h runs in the night after boot, 24h or more in subsequent nights). Now the first nightly run after boot was finished after 4h. What is the benefit of not disabling the cache in nullfs? I would expect zfs (or ufs) to cache the (meta)data anyway. does the poor performance show up with https://people.freebsd.org/~mjg/vnlru_free_pick.diff ? I would like to have all the 22 jails run the periodic scripts a second night in a row before trying this. if the long runs are still there, can you get some profiling from it? sysctl -a before and after would be a start. My guess is that you are the vnode limit and bumping into the 1 second sleep. That would explain the behavior I see since I added the last jail which seems to have crossed a threshold which triggers the slow behavior. Current status (with the 112 nullfs mounts with nocache): kern.maxvnodes: 10485760 kern.numvnodes:3791064 kern.freevnodes: 3613694 kern.cache.stats.heldvnodes:151707 kern.vnodes_created: 260288639 The maxvnodes value is already increased by 10 times compared to the default value on this system. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpDvfG_fAon2.pgp Description: Digitale PGP-Signatur
Re: 'set but unused' breaks drm-*-kmod
On 4/21/22 03:42, Emmanuel Vadot wrote: Hello Michael, On Wed, 20 Apr 2022 23:39:12 -0400 Michael Butler wrote: Seems this new requirement breaks kmod builds too .. The first of many errors was (I stopped chasing them all for lack of time) .. --- amdgpu_cs.o --- /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.7.19_3/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1210:26: error: variable 'priority' set but not used [-Werror,-Wunused-but-set-variable] enum drm_sched_priority priority; ^ 1 error generated. *** [amdgpu_cs.o] Error code 1 How are you building the port, directly or with PORTS_MODULES ? I do make passes on the warning for drm and I did for set-but-not-used case but unfortunately this option doesn't exists in 13.0 so I couldn't apply those in every branch. I build this directly on -current. I'm guessing that these are what triggered this behaviour: commit 8b83d7e0ee54416b0ee58bd85f9c0ae7fb3357a1 Author: John Baldwin Date: Mon Apr 18 16:06:27 2022 -0700 Make -Wunused-but-set-variable a fatal error for clang 13+ for kernel builds. Reviewed by:imp, emaste Differential Revision: https://reviews.freebsd.org/D34949 commit 615d289ffefe2b175f80caa9b1e113c975576472 Author: John Baldwin Date: Mon Apr 18 16:06:14 2022 -0700 Re-enable set but not used warnings for kernel builds. make tinderbox now passes with this warning enabled as a fatal error, so revert the change to hide it in preparation for making it fatal. This reverts commit e8e691983bb75e80153b802f47733f1531615fa2. Reviewed by:imp, emaste Differential Revision: https://reviews.freebsd.org/D34948
Re: nullfs and ZFS issues
On 4/21/22, Alexander Leidinger wrote: > Quoting Doug Ambrisko (from Wed, 20 Apr 2022 > 09:20:33 -0700): > >> On Wed, Apr 20, 2022 at 11:39:44AM +0200, Alexander Leidinger wrote: >> | Quoting Doug Ambrisko (from Mon, 18 Apr 2022 >> | 16:32:38 -0700): >> | >> | > With nullfs, nocache and settings max vnodes to a low number I can >> | >> | Where is nocache documented? I don't see it in mount_nullfs(8), >> | mount(8) or nullfs(5). >> >> I didn't find it but it is in: >> src/sys/fs/nullfs/null_vfsops.c: if (vfs_getopt(mp->mnt_optnew, >> "nocache", NULL, NULL) == 0 || >> >> Also some file systems disable it via MNTK_NULL_NOCACHE > > Does the attached diff look ok? > >> | I tried a nullfs mount with nocache and it doesn't show up in the >> | output of "mount". >> >> Yep, I saw that as well. I could tell by dropping into ddb and then >> do a show mount on the FS and look at the count. That is why I added >> the vnode count to mount -v so I could see the usage without dropping >> into ddb. > > I tried nocache on a system with a lot of jails which use nullfs, > which showed very slow behavior in the daily periodic runs (12h runs > in the night after boot, 24h or more in subsequent nights). Now the > first nightly run after boot was finished after 4h. > > What is the benefit of not disabling the cache in nullfs? I would > expect zfs (or ufs) to cache the (meta)data anyway. > does the poor performance show up with https://people.freebsd.org/~mjg/vnlru_free_pick.diff ? if the long runs are still there, can you get some profiling from it? sysctl -a before and after would be a start. My guess is that you are the vnode limit and bumping into the 1 second sleep. -- Mateusz Guzik
Re: 'set but unused' breaks drm-*-kmod
On dj., abr. 21 2022, Emmanuel Vadot wrote: Hello Michael, On Wed, 20 Apr 2022 23:39:12 -0400 Michael Butler wrote: Seems this new requirement breaks kmod builds too .. The first of many errors was (I stopped chasing them all for lack of time) .. --- amdgpu_cs.o --- /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.7.19_3/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1210:26: error: variable 'priority' set but not used [-Werror,-Wunused-but-set-variable] enum drm_sched_priority priority; ^ 1 error generated. *** [amdgpu_cs.o] Error code 1 How are you building the port, directly or with PORTS_MODULES ? I do make passes on the warning for drm and I did for set-but-not-used case but unfortunately this option doesn't exists in 13.0 so I couldn't apply those in every branch. Cheers, Can confirm the breakage on 14-CURRENT building graphics/drm-devel-kmod in poudriere with matching sources and kernel. Probably due to 8b83d7e0ee54416b0ee58bd85f9c0ae7fb3357a1 -- Evilham
Re: Daily black screen of death
Hello Steve, On Tue, 19 Apr 2022 11:32:32 -0700 Steve Kargl wrote: > FYI, > > I'm experiencing an almost daily black screen of death panic. > Kernel, world, drm-current-kmod, and gpu-firmware-kmod were > all rebuilt and installed at the same time. Uname shows > > FreeBSD 14.0-CURRENT #0 main-n254360-eb9d205fa69: Tue Apr 5 13:49:47 PDT 2022 > > So, April 5th sources. > > The panic results in a keyboard lock and no dump. The system > does not have a serial console. Only recourse is a hard rest. > > Hand transcribed from photo > > _sleep() at _sleep+0x38a/frame 0xfe012b7c0680 > buf_daemon_shutdown() at buf_daemon_shutdown+0x6b/frame 0xfe012b7c06a0 > kern_reboot() at kern_reboot+0x2ae/frame 0xfe012b7c06e0 > vpanic() at vpanic+0x1ee/frame 0xfe012b7c0730 > panic() at panic+0x43/frame 0xfe012b7c0790 > > Above repeats 100s of time scrolling off the screen with ever > increasing frame pointer. > > Final message, > > mi_switch() at mi_switch+0x18e/frame 0xfe012b7c14b0 > __mtx_lock_sleep() at __mtx_lock_sleep+0x173/frame 0xfe012b7c1510 > __mtx_lock_flags() at __mtx_lock_flags+0xc0/frame 0xfe012b7c1550 > linux_wake_up() at linux_wake_up+0x38/frame 0xfe012b7c15a0 > radeon_fence_is_signaled() at radeon_fence_is_signaled+0x99/frame > 0xfe012b7c15f0 > dma_resv_add_shared_fence() at dma_resv_add_shared_fence+0x99/frame > 0xfe012b7c1640 > ttm_eu_fence_buffer_objects() at ttm_eu_fence_buffer_objects+0x79/frame > 0xfe012b7c1680 > radeon_cs_parser_fini() at radeon_cs_parser_fini+0x53/frame 0xfe012b7c16b0 > radeaon_cs_ioctl() at radeaon_cs_ioctl+0x75e/frame 0xfe012b7c1b30 > drm_ioctl_kernel() at drm_ioctl_kernel+0xc7/frame 0xfe012b7c1b80 > drm_ioctl() at drm_ioctl+0x2c3/frame 0xfe012b7c1c70 > linux_file_ioctl() at linux_file_ioctl+0x309/frame 0xfe012b7c1cd0 > kern_ioctl() at kern_ioctl+0x1dc/frame 0xfe012b7c1d40 > sys_ioctl() at sys_ioctl+0x121/frame 0xfe012b7c1e10 > amd64_syscall() at amd64_syscall+0x108/frame 0xfe012b7c1f30 > fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe012b7c1f30 > --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x36a096c34ea, rsp = > 0x3fa11e623eb8, \ > rbp = 0x3fa11e623ee0 --- > panic: _sleep: curthread not running > cpuid = 4 > time = 1650389478 > KDB: stack backtrace: > > One common trigger appears to be the use of firefox-99.0,2 from > the ports collection. > > -- > Steve > What version of drm are you using ? Since when do you experience this ? drm as not changed much for a long time now except adapting a few files for new linuxkpi addition. Cheers, -- Emmanuel Vadot
Re: 'set but unused' breaks drm-*-kmod
Hello Michael, On Wed, 20 Apr 2022 23:39:12 -0400 Michael Butler wrote: > Seems this new requirement breaks kmod builds too .. > > The first of many errors was (I stopped chasing them all for lack of > time) .. > > --- amdgpu_cs.o --- > /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.7.19_3/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1210:26: > > error: variable 'priority' set but not used > [-Werror,-Wunused-but-set-variable] > enum drm_sched_priority priority; > ^ > 1 error generated. > *** [amdgpu_cs.o] Error code 1 > How are you building the port, directly or with PORTS_MODULES ? I do make passes on the warning for drm and I did for set-but-not-used case but unfortunately this option doesn't exists in 13.0 so I couldn't apply those in every branch. Cheers, -- Emmanuel Vadot
Re: nullfs and ZFS issues
Quoting Doug Ambrisko (from Wed, 20 Apr 2022 09:20:33 -0700): On Wed, Apr 20, 2022 at 11:39:44AM +0200, Alexander Leidinger wrote: | Quoting Doug Ambrisko (from Mon, 18 Apr 2022 | 16:32:38 -0700): | | > With nullfs, nocache and settings max vnodes to a low number I can | | Where is nocache documented? I don't see it in mount_nullfs(8), | mount(8) or nullfs(5). I didn't find it but it is in: src/sys/fs/nullfs/null_vfsops.c: if (vfs_getopt(mp->mnt_optnew, "nocache", NULL, NULL) == 0 || Also some file systems disable it via MNTK_NULL_NOCACHE Does the attached diff look ok? | I tried a nullfs mount with nocache and it doesn't show up in the | output of "mount". Yep, I saw that as well. I could tell by dropping into ddb and then do a show mount on the FS and look at the count. That is why I added the vnode count to mount -v so I could see the usage without dropping into ddb. I tried nocache on a system with a lot of jails which use nullfs, which showed very slow behavior in the daily periodic runs (12h runs in the night after boot, 24h or more in subsequent nights). Now the first nightly run after boot was finished after 4h. What is the benefit of not disabling the cache in nullfs? I would expect zfs (or ufs) to cache the (meta)data anyway. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF diff --git a/sbin/mount/mount.8 b/sbin/mount/mount.8 index 2a877c04c07..823df63953d 100644 --- a/sbin/mount/mount.8 +++ b/sbin/mount/mount.8 @@ -28,7 +28,7 @@ .\" @(#)mount.8 8.8 (Berkeley) 6/16/94 .\" $FreeBSD$ .\" -.Dd March 17, 2022 +.Dd April 21, 2022 .Dt MOUNT 8 .Os .Sh NAME @@ -245,6 +245,9 @@ This file system should be skipped when is run with the .Fl a flag. +.It Cm nocache +Disable caching. +Some filesystems may not support this. .It Cm noclusterr Disable read clustering. .It Cm noclusterw pgpYPDNVeBw1Z.pgp Description: Digitale PGP-Signatur