Re: Strange crash of DIAGNOSTIC kernel on cv_destroy(9)
On 7/23/23 17:27, PHO wrote: On 7/22/23 22:41, Taylor R Campbell wrote: Date: Sat, 22 Jul 2023 21:52:40 +0900 From: PHO Jul 17 00:52:34 netbsd-current /netbsd: [ 64017.6151161] vmw_fence_wait() at netbsd:vmw_fence_wait+0xdc Just to confirm, what does `info line *(vmw_fence_wait+0xdc)' say in gdb? And, if you can get to the frame in gdb, what does gdb say is in the vmw_fence_wait frame, and what cv is in the cv_destroy frame? Let's confirm it is the cv you think it is -- I suspect it might be a different one. I just encountered the crash and could obtain a crash dump. It is indeed the "DRM_DESTROY_WAITQUEUE()" in vmw_fence_wait() but the contents of cb does not make sense to me: ... CV_SLEEPQ(cv) is 0x01 (wtf) and CV_WMESG(cv) is not even a string? I realized the cause of this: static long vmw_fence_wait(struct dma_fence *f, bool intr, signed long timeout) { ... if (likely(vmw_fence_obj_signaled(fence))) return timeout; ... spin_lock(f->lock); if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >flags)) goto out; // <-- THIS ONE if (intr && signal_pending(current)) { ret = -ERESTARTSYS; goto out; // <-- OR THIS } #ifdef __NetBSD__ DRM_INIT_WAITQUEUE(, "vmwgfxwf"); #else cb.task = current; #endif ... out: spin_unlock(f->lock); #ifdef __NetBSD__ DRM_DESTROY_WAITQUEUE(); #endif ... } There were cases where the function was destroying a condvar that it didn't initialize! Ugh, this is the very reason why I dislike C...
Re: DRM/KMS: vmwgfx driver is now available
On 7/24/23 11:34, PHO wrote: if (!list_empty(_put)) { spin_unlock(>lock); list_for_each_entry_safe(fence, next_fence, _put, head) { list_del_init(>head); dma_fence_put(>base); } spin_lock(>lock); } I pasted a wrong version. Actually this: while ((fence = list_first_entry_or_null( _put, typeof(*fence), head)) != NULL) { list_del_init(>head); spin_unlock(>lock); dma_fence_put(>base); spin_lock(>lock); }
Re: DRM/KMS: vmwgfx driver is now available
On 7/24/23 00:03, Taylor R Campbell wrote: Date: Sun, 23 Jul 2023 22:57:14 +0900 From: PHO On 7/23/23 21:24, Taylor R Campbell wrote: Why can't the loop just use dma_fence_get_rcu and dma_fence_put? Might need to release and reacquire the lock around dma_fence_put. I tried but it didn't work, apparently because someone was holding a pointer to a dying fence and waiting for it to be signaled. Using dma_fence_get_rcu() meant we didn't signal such fences, and the kernel hanged and eventually got killed by heartbeat(). What exactly did you try and what was the heartbeat panic stack trace? Nothing here should wait for signalling in a way that trips the kernel heartbeat monitor -- that should only happen if you hold onto a spin lock for too long. In __vmw_fences_update(): ... list_for_each_entry_safe(fence, next_fence, >fence_list, head) { if (seqno - fence->base.seqno < VMW_FENCE_WRAP) { list_del_init(>head); struct dma_fence *f = dma_fence_get_rcu(>base); if (f) { dma_fence_signal_locked(f); spin_unlock(>lock); dma_fence_put(f); spin_lock(>lock); } INIT_LIST_HEAD(_list); ... And the stack trace was this: __kernel_end() at 0 sys_reboot() at sys_reboot+0x30 vpanic() at vpanic+0x1ad printf_nostamp() at printf_nostamp+0x30 heartbeat() at heartbeat+0x21c hardclock() at hardclock+0xbb Xresume_lapic_ltimer() at Xresume_lapic_ltimer+0x1e --- interrupt --- mutex_vector_enter() at mutex_vector_enter+0x36f vmw_fifo_send_fence() at vmw_fifo_send_fence+0xea vmw_execbuf_fence_commands() at vmw_execbuf_fence_commands+0x3b vmw_execbuf_process() at vmw_execbuf_process+0x91e vmw_execbuf_ioctl() at vmw_execbuf_ioctl+0x97 drm_ioctl() at drm_ioctl+0x26d drm_read() at drm_read+0x15 sys_ioctl() at sys_ioctl+0x59d syscall() at syscall+0x196 --- syscall (number 54) --- syscall+0x196: vmw_fifo_send_fence+0xea was at vmwgfx_fifo.c:622: spin_lock(_priv->fence_lock); But let's not worry about this too much, as it doesn't occur in the latest code (read below). (That said: yay, the heartbeat monitor is catching what would otherwise be a silent nonresponsive hang!) Yup, it has helped me so many times. 1. Move vmw_fence_obj_destroy to an `RCU' callback so that we can safely do dma_fence_put under the lock. I can't see how to do it without interfering with dma_fence_free(), so 2. Create a list on the stack of signalled fences to put in __vmw_fences_update and put them after the loop, outside the lock: I tried this. Your code didn't work as-is because it could release the same fence twice, but in the end I managed to get it work! Yay, now I can withdraw my changes to linux_dma_fence.c: static void __vmw_fences_update(struct vmw_fence_manager *fman) { ... struct list_head to_put = LIST_HEAD_INIT(to_put); ... list_for_each_entry_safe(fence, next_fence, >fence_list, head) { if (seqno - fence->base.seqno < VMW_FENCE_WRAP) { list_del_init(>head); if (dma_fence_get_rcu(>base) != NULL) { list_add(>head, _put); dma_fence_signal_locked(>base); } INIT_LIST_HEAD(_list); list_splice_init(>seq_passed_actions, _list); vmw_fences_perform_actions(fman, _list); } else break; } ... if (!list_empty(_put)) { spin_unlock(>lock); list_for_each_entry_safe(fence, next_fence, _put, head) { list_del_init(>head); dma_fence_put(>base); } spin_lock(>lock); } }
Re: ALTQ cannot be stopped Was: Fwd: 10-BETA : some network issues
On 23-07-22 04:34, matthew green wrote: | > (gdb) info locals | > No symbol table info available. | > (gdb) | > | > I don't understand why gdb complains about debugging symbols. | | i think it's because our build has a bug. i was recently | trying to debug something in a shared library and had the | same problem and i traced it down to this code: | |share/mk/bsd.README:MKSTRIPSYM If "yes", strip all local symbols from shared libraries; | |share/mk/bsd.lib.mk:.if ${MKSTRIPSYM:Uyes} == "yes" |share/mk/bsd.lib.mk:_LIBLDOPTS+= -Wl,-x |share/mk/bsd.lib.mk:.else |share/mk/bsd.lib.mk:_LIBLDOPTS+= -Wl,-X | | and putting "MKSTRIPSYM=no" in my mk.conf fixed it. | | i believe this is a bug and should default to "no", if | MKDEBUG has also been enabled. I've changed bsd.own.mk so that if MKDEBUG!=no (MKDEBUG=yes), it forces MKSTRIPSYM=no Luke.
Re: DRM/KMS: vmwgfx driver is now available
> Date: Sun, 23 Jul 2023 22:57:14 +0900 > From: PHO > > On 7/23/23 21:24, Taylor R Campbell wrote: > > Why can't the loop just use dma_fence_get_rcu and dma_fence_put? > > > > Might need to release and reacquire the lock around dma_fence_put. > > I tried but it didn't work, apparently because someone was holding a > pointer to a dying fence and waiting for it to be signaled. Using > dma_fence_get_rcu() meant we didn't signal such fences, and the kernel > hanged and eventually got killed by heartbeat(). What exactly did you try and what was the heartbeat panic stack trace? Nothing here should wait for signalling in a way that trips the kernel heartbeat monitor -- that should only happen if you hold onto a spin lock for too long. (That said: yay, the heartbeat monitor is catching what would otherwise be a silent nonresponsive hang!) Thinking about it some more, I see there is a snag with the loop and dma_fence_put, since (a) we can only use list_next(>head) if we read it and then use it under the lock, but (b) we can only dma_fence_put outside the lock because the release callback, vmw_fence_obj_destroy, takes the lock, so there's no way to do the iteration safely as is with an extra reference count acquire/release cycle. Two options come to mind: 1. Move vmw_fence_obj_destroy to an `RCU' callback so that we can safely do dma_fence_put under the lock. 2. Create a list on the stack of signalled fences to put in __vmw_fences_update and put them after the loop, outside the lock: struct list_head to_put = LIST_HEAD_INIT(to_put); ... list_for_each_entry_safe(fence, next_fence, >fence_list, head) { if (dma_fence_get_rcu(fence) != 0) {/* (*) */ list_del_init(>head); continue; } if (seqno - fence->base.seqno < VMW_FENCE_WRAP) { list_del_init(>head); list_add(>head, _put);/* (*) */ dma_fence_signal_locked(>base); ... } break; } ... if (fence != NULL) { spin_unlock(>lock); dma_fence_put(fence); spin_lock(>lock); } while ((fence = list_first(_put)) != NULL) { list_del_init(>head); spin_unlock(>lock); dma_fence_put(fence); spin_lock(>lock); } > > If that doesn't work for some reason and we do have to add a special > > case, I'd strongly prefer to add a new front end like > > dma_fence_signal_vmwgfxspecialhack or something (other examples in the > > tasklet API) that has the relaxed assertion instead of relaxing the > > assertions for every other driver which isn't bonkers. > > Alright. I'll do that. This should be a last resort (but not as bad as relaxing the assertions) if we can't figure out a better way to do it. I'm not sure the patch above is going to be best (if it works), but I'd like to see if there's a way to do this without abusing the API semantics too much before giving in to the abuse.
Re: DRM/KMS: vmwgfx driver is now available
On 7/23/23 21:24, Taylor R Campbell wrote: Date: Sun, 23 Jul 2023 12:36:10 +0900 From: PHO This: https://github.com/depressed-pho/netbsd-src/blob/91daa67f17222da355d3fddd6fa849c786d9c545/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_fence.c#L45 and this: https://github.com/depressed-pho/netbsd-src/blob/91daa67f17222da355d3fddd6fa849c786d9c545/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_fence.c#L527 vma_fence_manager::fence_list holds pointers to all the fence objects in use but the list doesn't retain their refcount. I tried to change the way how it works by simply incrementing the refcount in vmw_fence_obj_init() and decrementing it in __vmw_fences_update(). But it didn't work well because fences could also be released by dma_resv_add_excl_fence() and dma_resv_add_shared_fence(). I cannot recall all the details but sometimes refcounts went down to zero even if they were retained by fence_list, and that's when I gave up fighting against it. I admit I don't fully understand the code. Why can't the loop just use dma_fence_get_rcu and dma_fence_put? Might need to release and reacquire the lock around dma_fence_put. I tried but it didn't work, apparently because someone was holding a pointer to a dying fence and waiting for it to be signaled. Using dma_fence_get_rcu() meant we didn't signal such fences, and the kernel hanged and eventually got killed by heartbeat(). If that doesn't work for some reason and we do have to add a special case, I'd strongly prefer to add a new front end like dma_fence_signal_vmwgfxspecialhack or something (other examples in the tasklet API) that has the relaxed assertion instead of relaxing the assertions for every other driver which isn't bonkers. Alright. I'll do that.
Re: DRM/KMS: vmwgfx driver is now available
> Date: Sun, 23 Jul 2023 12:36:10 +0900 > From: PHO > > This: > https://github.com/depressed-pho/netbsd-src/blob/91daa67f17222da355d3fddd6fa849c786d9c545/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_fence.c#L45 > > and this: > https://github.com/depressed-pho/netbsd-src/blob/91daa67f17222da355d3fddd6fa849c786d9c545/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_fence.c#L527 > > vma_fence_manager::fence_list holds pointers to all the fence objects in > use but the list doesn't retain their refcount. I tried to change the > way how it works by simply incrementing the refcount in > vmw_fence_obj_init() and decrementing it in __vmw_fences_update(). But > it didn't work well because fences could also be released by > dma_resv_add_excl_fence() and dma_resv_add_shared_fence(). I cannot > recall all the details but sometimes refcounts went down to zero even if > they were retained by fence_list, and that's when I gave up fighting > against it. I admit I don't fully understand the code. Why can't the loop just use dma_fence_get_rcu and dma_fence_put? Might need to release and reacquire the lock around dma_fence_put. If that doesn't work for some reason and we do have to add a special case, I'd strongly prefer to add a new front end like dma_fence_signal_vmwgfxspecialhack or something (other examples in the tasklet API) that has the relaxed assertion instead of relaxing the assertions for every other driver which isn't bonkers. It may add some maintenance burden, but the diagnostic burden for all the drivers is enormous too, so I'm extremely disinclined to just relax assertions here.
Re: [PATCH] Driver for Elantech I2C touchpad
Le mar. 18 juil. 2023, 14:35, Martin Husemann a écrit : > On Sat, Jul 15, 2023 at 03:48:54AM +0200, Vladimir 'phcoder' Serbinenko > wrote: > > I've submitted a similar patch for OpenBSD recently and it got merged. It > > adds support for Elantech I2C touchpad used on many laptops. Tested on my > > Chromebook Elemi > > Do you know where the compat list comes from? My notebook has an ELAN0501, > worth trying to add that to the compat array? > The list comes from FreeBSD. Does your touchpad have CID for PNP0C50 or ACPI0C50? If it does then I2C HID driver might be a better match > > Martin >
Re: Strange crash of DIAGNOSTIC kernel on cv_destroy(9)
On 7/22/23 22:41, Taylor R Campbell wrote: Date: Sat, 22 Jul 2023 21:52:40 +0900 From: PHO Jul 17 00:52:34 netbsd-current /netbsd: [ 64017.6151161] vmw_fence_wait() at netbsd:vmw_fence_wait+0xdc Just to confirm, what does `info line *(vmw_fence_wait+0xdc)' say in gdb? And, if you can get to the frame in gdb, what does gdb say is in the vmw_fence_wait frame, and what cv is in the cv_destroy frame? Let's confirm it is the cv you think it is -- I suspect it might be a different one. I just encountered the crash and could obtain a crash dump. It is indeed the "DRM_DESTROY_WAITQUEUE()" in vmw_fence_wait() but the contents of cb does not make sense to me: - (gdb) fr 6 #6 vmw_fence_wait (f=0xaa67d3170608, intr=, timeout=) at /home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_fence.c:289 289 DRM_DESTROY_WAITQUEUE(); (gdb) p cb $11 = {base = {func = 0x814fbb80 , fcb_entry = { tqe_next = 0x80df6a0b , tqe_prev = 0xaa675e629238}, fcb_onqueue = 90}, wq = {cv_opaque = { 0x1, 0x80fa0189 }}} (gdb) fr 4 #4 0x80dcdd20 in cv_destroy (cv=cv@entry=0xbf8138b44630) at /home/pho/sandbox/_netbsd/src/sys/kern/kern_condvar.c:553 553 return !LIST_EMPTY(CV_SLEEPQ(cv)); (gdb) p *cv $12 = {cv_opaque = {0x1, 0x80fa0189 }} - CV_SLEEPQ(cv) is 0x01 (wtf) and CV_WMESG(cv) is not even a string? Here's the stack trace: - #0 0x80239c85 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /home/pho/sandbox/_netbsd/src/sys/arch/amd64/amd64/machdep.c:717 #1 0x80e03cee in kern_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /home/pho/sandbox/_netbsd/src/sys/kern/kern_reboot.c:73 #2 0x80e4c698 in vpanic ( fmt=0x815a5cf8 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ", ap=ap@entry=0xbf8138b44578) at /home/pho/sandbox/_netbsd/src/sys/kern/subr_prf.c:292 #3 0x81012b8f in kern_assert ( fmt=fmt@entry=0x815a5cf8 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ") at /home/pho/sandbox/_netbsd/src/sys/lib/libkern/kern_assert.c:51 #4 0x80dcdd20 in cv_destroy (cv=cv@entry=0xbf8138b44630) at /home/pho/sandbox/_netbsd/src/sys/kern/kern_condvar.c:553 #5 0x80a72824 in DRM_DESTROY_WAITQUEUE (q=0xbf8138b44630) at /home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/include/drm/drm_wait_netbsd.h:59 #6 vmw_fence_wait (f=0xaa67d3170608, intr=, timeout=) at /home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_fence.c:289 #7 0x80fa2184 in linux_dma_fence_wait_timeout ( fence=0xaa67d3170608, intr=intr@entry=false, timeout=timeout@entry=1500) at /home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/linux/linux_dma_fence.c:1005 #8 0x80fa4c96 in linux_dma_resv_wait_timeout_rcu ( robj=0xaa675e629238, shared=shared@entry=true, intr=intr@entry=false, timeout=) at /home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/linux/linux_dma_resv.c:1213 #9 0x80fb5671 in ttm_bo_wait (bo=bo@entry=0xaa675e629170, interruptible=interruptible@entry=false, no_wait=) at /home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:1896 #10 0x80a815d0 in vmw_resource_unbind_list ( vbo=vbo@entry=0xaa675e629170) at /home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_resource.c:860 #11 0x80a6712c in vmw_bo_move_notify (bo=bo@entry=0xaa675e629170, mem=mem@entry=0xbf8138b44928) at /home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_bo.c:1205 #12 0x80a8a6c7 in vmw_move_notify (bo=0xaa675e629170, evict=, mem=0xbf8138b44928) at /home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_ttm_buffer.c:933 #13 0x80fb3505 in ttm_bo_handle_move_mem ( bo=bo@entry=0xaa675e629170, mem=mem@entry=0xbf8138b44928, evict=evict@entry=true, ctx=ctx@entry=0xbf8138b44b00) at /home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:380 #14 0x80fb4382 in ttm_bo_evict (ctx=0xbf8138b44b00, bo=0xaa675e629170) at /home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:757 #15 ttm_mem_evict_first (bdev=bdev@entry=0xaa687f031008, mem_type=, place=place@entry=0x814fea78 , ctx=ctx@entry=0xbf8138b44b00, ticket=ticket@entry=0xbf8138b44c98) at /home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:913 #16 0x80fb4850 in ttm_bo_mem_force_space (ctx=0xbf8138b44b00, mem=0xbf8138b44a58, place=0x814fea78 , bo=0xaa6752dd4b30) at /home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:985 #17 ttm_bo_mem_space (bo=bo@entry=0xaa6752dd4b30,