Re: Strange crash of DIAGNOSTIC kernel on cv_destroy(9)

2023-07-23 Thread PHO

On 7/23/23 17:27, PHO wrote:

On 7/22/23 22:41, Taylor R Campbell wrote:

Date: Sat, 22 Jul 2023 21:52:40 +0900
From: PHO 

Jul 17 00:52:34 netbsd-current /netbsd: [ 64017.6151161]
vmw_fence_wait() at netbsd:vmw_fence_wait+0xdc


Just to confirm, what does `info line *(vmw_fence_wait+0xdc)' say in
gdb?

And, if you can get to the frame in gdb, what does gdb say  is
in the vmw_fence_wait frame, and what cv is in the cv_destroy frame?

Let's confirm it is the cv you think it is -- I suspect it might be a
different one.


I just encountered the crash and could obtain a crash dump. It is indeed 
the "DRM_DESTROY_WAITQUEUE()" in vmw_fence_wait() but the contents 
of cb does not make sense to me:



...


CV_SLEEPQ(cv) is 0x01 (wtf) and CV_WMESG(cv) is not even a string? 



I realized the cause of this:

static long vmw_fence_wait(struct dma_fence *f, bool intr, signed long 
timeout)

{
...
if (likely(vmw_fence_obj_signaled(fence)))
return timeout;
...
spin_lock(f->lock);

if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >flags))
goto out; // <-- THIS ONE

if (intr && signal_pending(current)) {
ret = -ERESTARTSYS;
goto out; // <-- OR THIS
}

#ifdef __NetBSD__
DRM_INIT_WAITQUEUE(, "vmwgfxwf");
#else
cb.task = current;
#endif
...
out:
spin_unlock(f->lock);
#ifdef __NetBSD__
DRM_DESTROY_WAITQUEUE();
#endif
...
}

There were cases where the function was destroying a condvar that it 
didn't initialize! Ugh, this is the very reason why I dislike C...


Re: DRM/KMS: vmwgfx driver is now available

2023-07-23 Thread PHO

On 7/24/23 11:34, PHO wrote:


 if (!list_empty(_put)) {
     spin_unlock(>lock);
     list_for_each_entry_safe(fence, next_fence, _put, head) {
     list_del_init(>head);
     dma_fence_put(>base);
     }
     spin_lock(>lock);
 }


I pasted a wrong version. Actually this:

while ((fence = list_first_entry_or_null(
_put, typeof(*fence), head)) != NULL) {
list_del_init(>head);
spin_unlock(>lock);
dma_fence_put(>base);
spin_lock(>lock);
}


Re: DRM/KMS: vmwgfx driver is now available

2023-07-23 Thread PHO

On 7/24/23 00:03, Taylor R Campbell wrote:

Date: Sun, 23 Jul 2023 22:57:14 +0900
From: PHO 

On 7/23/23 21:24, Taylor R Campbell wrote:

Why can't the loop just use dma_fence_get_rcu and dma_fence_put?

Might need to release and reacquire the lock around dma_fence_put.


I tried but it didn't work, apparently because someone was holding a
pointer to a dying fence and waiting for it to be signaled. Using
dma_fence_get_rcu() meant we didn't signal such fences, and the kernel
hanged and eventually got killed by heartbeat().


What exactly did you try and what was the heartbeat panic stack trace?
Nothing here should wait for signalling in a way that trips the kernel
heartbeat monitor -- that should only happen if you hold onto a spin
lock for too long.


In __vmw_fences_update():

...
list_for_each_entry_safe(fence, next_fence, >fence_list, head) {
if (seqno - fence->base.seqno < VMW_FENCE_WRAP) {
list_del_init(>head);
struct dma_fence *f = 
dma_fence_get_rcu(>base);

if (f) {
dma_fence_signal_locked(f);
spin_unlock(>lock);
dma_fence_put(f);
spin_lock(>lock);
}
INIT_LIST_HEAD(_list);
...

And the stack trace was this:

__kernel_end() at 0
sys_reboot() at sys_reboot+0x30
vpanic() at vpanic+0x1ad
printf_nostamp() at printf_nostamp+0x30
heartbeat() at heartbeat+0x21c
hardclock() at hardclock+0xbb
Xresume_lapic_ltimer() at Xresume_lapic_ltimer+0x1e
--- interrupt ---
mutex_vector_enter() at mutex_vector_enter+0x36f
vmw_fifo_send_fence() at vmw_fifo_send_fence+0xea
vmw_execbuf_fence_commands() at vmw_execbuf_fence_commands+0x3b
vmw_execbuf_process() at vmw_execbuf_process+0x91e
vmw_execbuf_ioctl() at vmw_execbuf_ioctl+0x97
drm_ioctl() at drm_ioctl+0x26d
drm_read() at drm_read+0x15
sys_ioctl() at sys_ioctl+0x59d
syscall() at syscall+0x196
--- syscall (number 54) ---
syscall+0x196:

vmw_fifo_send_fence+0xea was at vmwgfx_fifo.c:622:

spin_lock(_priv->fence_lock);

But let's not worry about this too much, as it doesn't occur in the 
latest code (read below).



(That said: yay, the heartbeat monitor is catching what would
otherwise be a silent nonresponsive hang!)


Yup, it has helped me so many times.


1. Move vmw_fence_obj_destroy to an `RCU' callback so that we can
safely do dma_fence_put under the lock.


I can't see how to do it without interfering with dma_fence_free(), so


2. Create a list on the stack of signalled fences to put in
__vmw_fences_update and put them after the loop, outside the lock:


I tried this. Your code didn't work as-is because it could release the 
same fence twice, but in the end I managed to get it work! Yay, now I 
can withdraw my changes to linux_dma_fence.c:


static void __vmw_fences_update(struct vmw_fence_manager *fman)
{
...
struct list_head to_put = LIST_HEAD_INIT(to_put);
...
list_for_each_entry_safe(fence, next_fence, >fence_list, head) {
if (seqno - fence->base.seqno < VMW_FENCE_WRAP) {
list_del_init(>head);
if (dma_fence_get_rcu(>base) != NULL) {
list_add(>head, _put);
dma_fence_signal_locked(>base);
}

INIT_LIST_HEAD(_list);
list_splice_init(>seq_passed_actions,
 _list);
vmw_fences_perform_actions(fman, _list);
} else
break;
}
...
if (!list_empty(_put)) {
spin_unlock(>lock);
list_for_each_entry_safe(fence, next_fence, _put, head) {
list_del_init(>head);
dma_fence_put(>base);
}
spin_lock(>lock);
}
}


Re: ALTQ cannot be stopped Was: Fwd: 10-BETA : some network issues

2023-07-23 Thread Luke Mewburn
On 23-07-22 04:34, matthew green wrote:
  | > (gdb) info locals
  | > No symbol table info available.
  | > (gdb)
  | >
  | >   I don't understand why gdb complains about debugging symbols.
  | 
  | i think it's because our build has a bug.  i was recently
  | trying to debug something in a shared library and had the
  | same problem and i traced it down to this code:
  | 
  |share/mk/bsd.README:MKSTRIPSYM If "yes", strip all local symbols 
from shared libraries;
  | 
  |share/mk/bsd.lib.mk:.if ${MKSTRIPSYM:Uyes} == "yes"
  |share/mk/bsd.lib.mk:_LIBLDOPTS+=   -Wl,-x
  |share/mk/bsd.lib.mk:.else
  |share/mk/bsd.lib.mk:_LIBLDOPTS+=   -Wl,-X
  | 
  | and putting "MKSTRIPSYM=no" in my mk.conf fixed it.
  | 
  | i believe this is a bug and should default to "no", if
  | MKDEBUG has also been enabled.

I've changed bsd.own.mk so that if MKDEBUG!=no (MKDEBUG=yes),
it forces MKSTRIPSYM=no


Luke.


Re: DRM/KMS: vmwgfx driver is now available

2023-07-23 Thread Taylor R Campbell
> Date: Sun, 23 Jul 2023 22:57:14 +0900
> From: PHO 
> 
> On 7/23/23 21:24, Taylor R Campbell wrote:
> > Why can't the loop just use dma_fence_get_rcu and dma_fence_put?
> > 
> > Might need to release and reacquire the lock around dma_fence_put.
> 
> I tried but it didn't work, apparently because someone was holding a 
> pointer to a dying fence and waiting for it to be signaled. Using 
> dma_fence_get_rcu() meant we didn't signal such fences, and the kernel 
> hanged and eventually got killed by heartbeat().

What exactly did you try and what was the heartbeat panic stack trace?
Nothing here should wait for signalling in a way that trips the kernel
heartbeat monitor -- that should only happen if you hold onto a spin
lock for too long.

(That said: yay, the heartbeat monitor is catching what would
otherwise be a silent nonresponsive hang!)

Thinking about it some more, I see there is a snag with the loop and
dma_fence_put, since

(a) we can only use list_next(>head) if we read it and then use
it under the lock, but

(b) we can only dma_fence_put outside the lock because the release
callback, vmw_fence_obj_destroy, takes the lock,

so there's no way to do the iteration safely as is with an extra
reference count acquire/release cycle.  Two options come to mind:

1. Move vmw_fence_obj_destroy to an `RCU' callback so that we can
   safely do dma_fence_put under the lock.

2. Create a list on the stack of signalled fences to put in
   __vmw_fences_update and put them after the loop, outside the lock:

struct list_head to_put = LIST_HEAD_INIT(to_put);
...
list_for_each_entry_safe(fence, next_fence, >fence_list, head) {
if (dma_fence_get_rcu(fence) != 0) {/* (*) */
list_del_init(>head);
continue;
}
if (seqno - fence->base.seqno < VMW_FENCE_WRAP) {
list_del_init(>head);
list_add(>head, _put);/* (*) */
dma_fence_signal_locked(>base);
...
}
break;
}
...
if (fence != NULL) {
spin_unlock(>lock);
dma_fence_put(fence);
spin_lock(>lock);
}
while ((fence = list_first(_put)) != NULL) {
list_del_init(>head);
spin_unlock(>lock);
dma_fence_put(fence);
spin_lock(>lock);
}

> > If that doesn't work for some reason and we do have to add a special
> > case, I'd strongly prefer to add a new front end like
> > dma_fence_signal_vmwgfxspecialhack or something (other examples in the
> > tasklet API) that has the relaxed assertion instead of relaxing the
> > assertions for every other driver which isn't bonkers.
> 
> Alright. I'll do that.

This should be a last resort (but not as bad as relaxing the
assertions) if we can't figure out a better way to do it.

I'm not sure the patch above is going to be best (if it works), but
I'd like to see if there's a way to do this without abusing the API
semantics too much before giving in to the abuse.


Re: DRM/KMS: vmwgfx driver is now available

2023-07-23 Thread PHO

On 7/23/23 21:24, Taylor R Campbell wrote:

Date: Sun, 23 Jul 2023 12:36:10 +0900
From: PHO 

This:
https://github.com/depressed-pho/netbsd-src/blob/91daa67f17222da355d3fddd6fa849c786d9c545/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_fence.c#L45

and this:
https://github.com/depressed-pho/netbsd-src/blob/91daa67f17222da355d3fddd6fa849c786d9c545/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_fence.c#L527

vma_fence_manager::fence_list holds pointers to all the fence objects in
use but the list doesn't retain their refcount. I tried to change the
way how it works by simply incrementing the refcount in
vmw_fence_obj_init() and decrementing it in __vmw_fences_update(). But
it didn't work well because fences could also be released by
dma_resv_add_excl_fence() and dma_resv_add_shared_fence(). I cannot
recall all the details but sometimes refcounts went down to zero even if
they were retained by fence_list, and that's when I gave up fighting
against it. I admit I don't fully understand the code.


Why can't the loop just use dma_fence_get_rcu and dma_fence_put?

Might need to release and reacquire the lock around dma_fence_put.


I tried but it didn't work, apparently because someone was holding a 
pointer to a dying fence and waiting for it to be signaled. Using 
dma_fence_get_rcu() meant we didn't signal such fences, and the kernel 
hanged and eventually got killed by heartbeat().



If that doesn't work for some reason and we do have to add a special
case, I'd strongly prefer to add a new front end like
dma_fence_signal_vmwgfxspecialhack or something (other examples in the
tasklet API) that has the relaxed assertion instead of relaxing the
assertions for every other driver which isn't bonkers.


Alright. I'll do that.


Re: DRM/KMS: vmwgfx driver is now available

2023-07-23 Thread Taylor R Campbell
> Date: Sun, 23 Jul 2023 12:36:10 +0900
> From: PHO 
> 
> This:
> https://github.com/depressed-pho/netbsd-src/blob/91daa67f17222da355d3fddd6fa849c786d9c545/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_fence.c#L45
> 
> and this:
> https://github.com/depressed-pho/netbsd-src/blob/91daa67f17222da355d3fddd6fa849c786d9c545/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_fence.c#L527
> 
> vma_fence_manager::fence_list holds pointers to all the fence objects in 
> use but the list doesn't retain their refcount. I tried to change the 
> way how it works by simply incrementing the refcount in 
> vmw_fence_obj_init() and decrementing it in __vmw_fences_update(). But 
> it didn't work well because fences could also be released by 
> dma_resv_add_excl_fence() and dma_resv_add_shared_fence(). I cannot 
> recall all the details but sometimes refcounts went down to zero even if 
> they were retained by fence_list, and that's when I gave up fighting 
> against it. I admit I don't fully understand the code.

Why can't the loop just use dma_fence_get_rcu and dma_fence_put?

Might need to release and reacquire the lock around dma_fence_put.

If that doesn't work for some reason and we do have to add a special
case, I'd strongly prefer to add a new front end like
dma_fence_signal_vmwgfxspecialhack or something (other examples in the
tasklet API) that has the relaxed assertion instead of relaxing the
assertions for every other driver which isn't bonkers.

It may add some maintenance burden, but the diagnostic burden for all
the drivers is enormous too, so I'm extremely disinclined to just
relax assertions here.


Re: [PATCH] Driver for Elantech I2C touchpad

2023-07-23 Thread Vladimir 'phcoder' Serbinenko
Le mar. 18 juil. 2023, 14:35, Martin Husemann  a écrit :

> On Sat, Jul 15, 2023 at 03:48:54AM +0200, Vladimir 'phcoder' Serbinenko
> wrote:
> > I've submitted a similar patch for OpenBSD recently and it got merged. It
> > adds support for Elantech I2C touchpad used on many laptops. Tested on my
> > Chromebook Elemi
>
> Do you know where the compat list comes from? My notebook has an ELAN0501,
> worth trying to add that to the compat array?
>
The list comes from FreeBSD. Does your touchpad have CID for PNP0C50 or
ACPI0C50? If it does then I2C HID driver might be a better match

>
> Martin
>


Re: Strange crash of DIAGNOSTIC kernel on cv_destroy(9)

2023-07-23 Thread PHO

On 7/22/23 22:41, Taylor R Campbell wrote:

Date: Sat, 22 Jul 2023 21:52:40 +0900
From: PHO 

Jul 17 00:52:34 netbsd-current /netbsd: [ 64017.6151161]
vmw_fence_wait() at netbsd:vmw_fence_wait+0xdc


Just to confirm, what does `info line *(vmw_fence_wait+0xdc)' say in
gdb?

And, if you can get to the frame in gdb, what does gdb say  is
in the vmw_fence_wait frame, and what cv is in the cv_destroy frame?

Let's confirm it is the cv you think it is -- I suspect it might be a
different one.


I just encountered the crash and could obtain a crash dump. It is indeed 
the "DRM_DESTROY_WAITQUEUE()" in vmw_fence_wait() but the contents 
of cb does not make sense to me:


-
(gdb) fr 6
#6  vmw_fence_wait (f=0xaa67d3170608, intr=,
timeout=)
at 
/home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_fence.c:289

289 DRM_DESTROY_WAITQUEUE();
(gdb) p cb
$11 = {base = {func = 0x814fbb80 , fcb_entry = {
  tqe_next = 0x80df6a0b ,
  tqe_prev = 0xaa675e629238}, fcb_onqueue = 90}, wq = 
{cv_opaque = {

  0x1, 0x80fa0189 }}}
(gdb) fr 4
#4  0x80dcdd20 in cv_destroy (cv=cv@entry=0xbf8138b44630)
at /home/pho/sandbox/_netbsd/src/sys/kern/kern_condvar.c:553
553 return !LIST_EMPTY(CV_SLEEPQ(cv));
(gdb) p *cv
$12 = {cv_opaque = {0x1, 0x80fa0189 }}
-

CV_SLEEPQ(cv) is 0x01 (wtf) and CV_WMESG(cv) is not even a string? 
Here's the stack trace:


-
#0  0x80239c85 in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0)
at /home/pho/sandbox/_netbsd/src/sys/arch/amd64/amd64/machdep.c:717
#1  0x80e03cee in kern_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0)
at /home/pho/sandbox/_netbsd/src/sys/kern/kern_reboot.c:73
#2  0x80e4c698 in vpanic (
fmt=0x815a5cf8 "kernel %sassertion \"%s\" failed: file 
\"%s\", line

%d ", ap=ap@entry=0xbf8138b44578)
at /home/pho/sandbox/_netbsd/src/sys/kern/subr_prf.c:292
#3  0x81012b8f in kern_assert (
fmt=fmt@entry=0x815a5cf8 "kernel %sassertion \"%s\" failed: 
file \"%s\", line %d ")

at /home/pho/sandbox/_netbsd/src/sys/lib/libkern/kern_assert.c:51
#4  0x80dcdd20 in cv_destroy (cv=cv@entry=0xbf8138b44630)
at /home/pho/sandbox/_netbsd/src/sys/kern/kern_condvar.c:553
#5  0x80a72824 in DRM_DESTROY_WAITQUEUE (q=0xbf8138b44630)
at 
/home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/include/drm/drm_wait_netbsd.h:59

#6  vmw_fence_wait (f=0xaa67d3170608, intr=,
timeout=)
at 
/home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_fence.c:289

#7  0x80fa2184 in linux_dma_fence_wait_timeout (
fence=0xaa67d3170608, intr=intr@entry=false,
timeout=timeout@entry=1500)
at 
/home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/linux/linux_dma_fence.c:1005

#8  0x80fa4c96 in linux_dma_resv_wait_timeout_rcu (
robj=0xaa675e629238, shared=shared@entry=true, 
intr=intr@entry=false,

timeout=)
at 
/home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/linux/linux_dma_resv.c:1213

#9  0x80fb5671 in ttm_bo_wait (bo=bo@entry=0xaa675e629170,
interruptible=interruptible@entry=false, no_wait=)
at 
/home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:1896

#10 0x80a815d0 in vmw_resource_unbind_list (
vbo=vbo@entry=0xaa675e629170)
at 
/home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_resource.c:860
#11 0x80a6712c in vmw_bo_move_notify 
(bo=bo@entry=0xaa675e629170,

mem=mem@entry=0xbf8138b44928)
at 
/home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_bo.c:1205

#12 0x80a8a6c7 in vmw_move_notify (bo=0xaa675e629170,
evict=, mem=0xbf8138b44928)
at 
/home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_ttm_buffer.c:933

#13 0x80fb3505 in ttm_bo_handle_move_mem (
bo=bo@entry=0xaa675e629170, mem=mem@entry=0xbf8138b44928,
evict=evict@entry=true, ctx=ctx@entry=0xbf8138b44b00)
at 
/home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:380

#14 0x80fb4382 in ttm_bo_evict (ctx=0xbf8138b44b00,
bo=0xaa675e629170)
at 
/home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:757

#15 ttm_mem_evict_first (bdev=bdev@entry=0xaa687f031008,
mem_type=,
place=place@entry=0x814fea78 ,
ctx=ctx@entry=0xbf8138b44b00, 
ticket=ticket@entry=0xbf8138b44c98)
at 
/home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:913

#16 0x80fb4850 in ttm_bo_mem_force_space (ctx=0xbf8138b44b00,
mem=0xbf8138b44a58, place=0x814fea78 ,
bo=0xaa6752dd4b30)
at 
/home/pho/sandbox/_netbsd/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:985

#17 ttm_bo_mem_space (bo=bo@entry=0xaa6752dd4b30,