Re: pcm(4) related panic

2003-11-26 Thread Mathew Kanner
On Nov 25, Don Lewis wrote:
 On 25 Nov, Don Lewis wrote:
  On 25 Nov, Artur Poplawski wrote:
  Artur Poplawski [EMAIL PROTECTED] wrote:
  
  Hello,  
  
  On a 5.1-RELEASE and 5.2-BETA machines I have been able to cause a panic 
  like this:
  
  Sleeping on swread with the following non-sleepable locks held:
  exclusive sleep mutex pcm0:play:0 (pcm channel) r = 0 (0xc1c3d740) locked @ \   
  /usr/src/sys/dev/sound/pcm/dsp.c:146
  
  This enables the panic.
  
  panic: sleeping thread (pid 583) owns a non-sleepable lock
  
  Then the panic happens when another thread tries to grab the mutex.
  
  
  The problem is that the pcm code attempts to hold a mutex across a call
  to uiomove(), which can sleep if the userland buffer that it is trying
  to access is paged out.  Either the buffer has to be pre-wired before
  calling getchns(), or the mutex has to be dropped around the call to
  uiomove().  The amount of memory to be wired should be limited to
  'sz' as calculated by chn_read() and chn_write(), which complicates the
  logic.  Dropping the mutex probably has other issues.
 
 Following up to myself ...
 
 It might be safe to drop the mutex for the uiomove() call if the code
 set flags to enforce a limit of one reader and one writer at a time to
 keep the code from being re-entered.  The buffer pointer manipulations
 in sndbuf_dispose() and sndbuf_acquire() would probably still have to be
 protected by the mutex.  If this can be made to work, it would probably
 be preferable to wiring the buffer.  It would have a lot less CPU
 overhead, and would work better with large buffers, which could still be
 allowed to page normally.

Don,
I never would have suspected that uio might sleep and panic,
thanks for the clue.

Artur,
Could you try the attached patch.

Thanks,
--Mat

-- 
Any idiot can face a crisis; it is this day-to-day living
that wears you out.
- Chekhov
--- channel.c   Sun Nov  9 04:17:22 2003
+++ /sys/dev/sound/pcm/channel.cWed Nov 26 02:21:14 2003
@@ -250,6 +250,8 @@
 {
int ret, timeout, newsize, count, sz;
struct snd_dbuf *bs = c-bufsoft;
+   void *off;
+   int t, x,togo,p;
 
CHN_LOCKASSERT(c);
/*
@@ -291,7 +293,22 @@
sz = MIN(sz, buf-uio_resid);
KASSERT(sz  0, (confusion in chn_write));
/* printf(sz: %d\n, sz); */
+#if 0
ret = sndbuf_uiomove(bs, buf, sz);
+#else
+   togo = sz;
+   while (ret == 0  togo 0) {
+   p = sndbuf_getfreeptr(bs);
+   t = MIN(togo, sndbuf_getsize(bs) - p);
+   off = sndbuf_getbufofs(bs, p);
+   CHN_UNLOCK(c);
+   ret = uiomove(off, t, buf);
+   CHN_LOCK(c);
+   togo -= t;
+   x = sndbuf_acquire(bs, NULL, t);
+   }
+   ret = 0;
+#endif
if (ret == 0  !(c-flags  CHN_F_TRIGGERED))
chn_start(c, 0);
}
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: pcm(4) related panic

2003-11-26 Thread Artur Poplawski
Mathew Kanner [EMAIL PROTECTED] wrote:

 On Nov 25, Don Lewis wrote:
  On 25 Nov, Don Lewis wrote:
   On 25 Nov, Artur Poplawski wrote:
   Artur Poplawski [EMAIL PROTECTED] wrote:
   
   Hello, 

  

   On a 5.1-RELEASE and 5.2-BETA machines I have been able to cause a panic 
   like this:
   
   Sleeping on swread with the following non-sleepable locks held:
   exclusive sleep mutex pcm0:play:0 (pcm channel) r = 0 (0xc1c3d740) locked @ \  

   /usr/src/sys/dev/sound/pcm/dsp.c:146
   
   This enables the panic.
   
   panic: sleeping thread (pid 583) owns a non-sleepable lock
   
   Then the panic happens when another thread tries to grab the mutex.
   
   
   The problem is that the pcm code attempts to hold a mutex across a call
   to uiomove(), which can sleep if the userland buffer that it is trying
   to access is paged out.  Either the buffer has to be pre-wired before
   calling getchns(), or the mutex has to be dropped around the call to
   uiomove().  The amount of memory to be wired should be limited to
   'sz' as calculated by chn_read() and chn_write(), which complicates the
   logic.  Dropping the mutex probably has other issues.
  
  Following up to myself ...
  
  It might be safe to drop the mutex for the uiomove() call if the code
  set flags to enforce a limit of one reader and one writer at a time to
  keep the code from being re-entered.  The buffer pointer manipulations
  in sndbuf_dispose() and sndbuf_acquire() would probably still have to be
  protected by the mutex.  If this can be made to work, it would probably
  be preferable to wiring the buffer.  It would have a lot less CPU
  overhead, and would work better with large buffers, which could still be
  allowed to page normally.
 
   Don,
   I never would have suspected that uio might sleep and panic,
 thanks for the clue.
 
   Artur,
   Could you try the attached patch.

I've tried the patch -- and it works great! :-) I was unable to trigger
the panic with the patch applied, although I tried really hard -- so I 
guess the problem is solved. 

Mat and Don, I'm really very thankful for your help.

Best regards, Artur


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


pcm(4) related panic

2003-11-25 Thread Artur Poplawski
Hello,  

On a 5.1-RELEASE and 5.2-BETA machines I have been able to cause a panic 
like this:
 
(watch out for folded lines; the stack backtrace below is rewritten by
hand from ddb)

lock order reversal
 1st 0xc22a45ac vm object (vm object) @ /usr/src/sys/vm/swap_pager.c:1323
 2nd 0xc06c0420 swap_pager swhash (swap_pager swhash) @ \
/usr/src/sys/vm/swap_pager.c:1838
 3rd 0xc0c358c4 vm object (vm object) @ /usr/src/sys/vm/uma_core.c:876
Stack backtrace:
  backtrace
  witness_lock
  _mtx_lock_flags
  obj_allock
  slab_zalloc
  uma_zone_slab
  uma_zalloc_internal
  uma_zalloc_arg
  swp_pager_meta_build
  swap_pager_putpages
  default_pager_putpages
  vm_pageout_flush
  vm_pageout_clean
  vm_pageout_scan
  vm_pageout
  fork_exit
  fork_trampoline

Sleeping on swread with the following non-sleepable locks held:
exclusive sleep mutex pcm0:play:0 (pcm channel) r = 0 (0xc1c3d740) locked @ \   
/usr/src/sys/dev/sound/pcm/dsp.c:146
panic: sleeping thread (pid 583) owns a non-sleepable lock
syncing disks, buffers remaining... 1410 1410 panic: mi_switch: \ 
switch in a critical section
Uptime: 1m45s
panic: msleep
Uptime: 1m45s
panic: msleep
Uptime: 1m45s
panic: msleep
Uptime: 1m45s
panic: msleep
[... repeated few more times]
Fatal double fault:
eip = 0xc05e3916
esp = 0xc8db8ff4
ebp = 0xc8db9004
panic: double fault
Uptime: 1m45s
panic: msleep
Uptime: 1m45s 
panic: msleep
Uptime: 1m45s
panic: msleep
Uptime: 1m45s
[...]
And the machine suddenly reboots, so there is no coredump.
 
eip address points close to:
c05e3910 T sc_vtb_putc
 
To reproduce this panic just start some audio player app (like xmms), 
and launch countless memory-eating applications (like mozilla ;).
The machine starts swapping, and it panics. 

% uname -a 
FreeBSD kaszanka.domek 5.2-BETA FreeBSD 5.2-BETA #0: Sun Nov 23 01:23:10\ 
 CET 2003 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/KASZANKA i386 

dmesg fragments:
CPU: AMD Athlon(tm) XP 2000+ (1666.73-MHz 686-class CPU)
pcm0: AudioPCI ES1373-B port 0xec00-0xec3f irq 10 at device 8.0 on pci0 
pcm0: Cirrus Logic CS4297A AC97 Codec
rl0: RealTek 8139 10/100BaseTX port 0xe800-0xe8ff mem \
 0xdf00-0xdfff ir
q 10 at device 10.0 on pci0
miibus0: MII bus on rl0
rlphy0: RealTek internal media interface on miibus0
rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
rl1: RealTek 8139 10/100BaseTX port 0xe400-0xe4ff mem \
 0xde00-0xdeff ir
q 10 at device 11.0 on pci0
rlphy1: RealTek internal media interface on miibus1
rlphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

Regards, Artur
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: pcm(4) related panic

2003-11-25 Thread Artur Poplawski
Artur Poplawski [EMAIL PROTECTED] wrote:

 Hello,  
 
 On a 5.1-RELEASE and 5.2-BETA machines I have been able to cause a panic 
 like this:
  
 (watch out for folded lines; the stack backtrace below is rewritten by
 hand from ddb)
 
 lock order reversal
  1st 0xc22a45ac vm object (vm object) @ /usr/src/sys/vm/swap_pager.c:1323
  2nd 0xc06c0420 swap_pager swhash (swap_pager swhash) @ \
 /usr/src/sys/vm/swap_pager.c:1838
  3rd 0xc0c358c4 vm object (vm object) @ /usr/src/sys/vm/uma_core.c:876
 Stack backtrace:
   backtrace
   witness_lock
   _mtx_lock_flags
   obj_allock
   slab_zalloc
   uma_zone_slab
   uma_zalloc_internal
   uma_zalloc_arg
   swp_pager_meta_build
   swap_pager_putpages
   default_pager_putpages
   vm_pageout_flush
   vm_pageout_clean
   vm_pageout_scan
   vm_pageout
   fork_exit
   fork_trampoline
 
 Sleeping on swread with the following non-sleepable locks held:
 exclusive sleep mutex pcm0:play:0 (pcm channel) r = 0 (0xc1c3d740) locked @ \   
 /usr/src/sys/dev/sound/pcm/dsp.c:146
 panic: sleeping thread (pid 583) owns a non-sleepable lock
 syncing disks, buffers remaining... 1410 1410 panic: mi_switch: \ 
 switch in a critical section
 Uptime: 1m45s
 panic: msleep
 Uptime: 1m45s
 panic: msleep
 Uptime: 1m45s
 panic: msleep
 Uptime: 1m45s
 panic: msleep
 [... repeated few more times]
 Fatal double fault:
 eip = 0xc05e3916
 esp = 0xc8db8ff4
 ebp = 0xc8db9004
 panic: double fault
 Uptime: 1m45s
 panic: msleep
 Uptime: 1m45s 
 panic: msleep
 Uptime: 1m45s
 panic: msleep
 Uptime: 1m45s
 [...]
 And the machine suddenly reboots, so there is no coredump.
  
 eip address points close to:
 c05e3910 T sc_vtb_putc
  
 To reproduce this panic just start some audio player app (like xmms), 
 and launch countless memory-eating applications (like mozilla ;).
 The machine starts swapping, and it panics. 
 
 % uname -a 
 FreeBSD kaszanka.domek 5.2-BETA FreeBSD 5.2-BETA #0: Sun Nov 23 01:23:10\ 
  CET 2003 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/KASZANKA i386 
 
 dmesg fragments:
 CPU: AMD Athlon(tm) XP 2000+ (1666.73-MHz 686-class CPU)
 pcm0: AudioPCI ES1373-B port 0xec00-0xec3f irq 10 at device 8.0 on pci0 
 pcm0: Cirrus Logic CS4297A AC97 Codec
 rl0: RealTek 8139 10/100BaseTX port 0xe800-0xe8ff mem \
  0xdf00-0xdfff ir
 q 10 at device 10.0 on pci0
 miibus0: MII bus on rl0
 rlphy0: RealTek internal media interface on miibus0
 rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 rl1: RealTek 8139 10/100BaseTX port 0xe400-0xe4ff mem \
  0xde00-0xdeff ir
 q 10 at device 11.0 on pci0
 rlphy1: RealTek internal media interface on miibus1
 rlphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto



In the meantime I've managed to get a coredump, by directly calling
doadump() from ddb. Results:


[EMAIL PROTECTED]:/usr/obj/usr/src/sys/KASZANKA# gdb -k kernel.debug 
/var/crash/vmcore.0
GNU gdb 5.2.1 (FreeBSD)  
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-undermydesk-freebsd...
panic: sleeping thread (pid 568) owns a non-sleepable lock
panic messages:
---
panic: sleeping thread (pid 568) owns a non-sleepable lock

syncing disks, buffers remaining... panic: msleep
Dumping 128 MB
 16 32 48 64 80 96 112
---
Reading symbols from 
/usr/obj/usr/src/sys/KASZANKA/modules/usr/src/sys/modules/linprocfs/linprocfs.ko.debug...done.
Loaded symbols for 
/usr/obj/usr/src/sys/KASZANKA/modules/usr/src/sys/modules/linprocfs/linprocfs.ko.debug
Reading symbols from 
/usr/obj/usr/src/sys/KASZANKA/modules/usr/src/sys/modules/linux/linux.ko.debug...done.
Loaded symbols for 
/usr/obj/usr/src/sys/KASZANKA/modules/usr/src/sys/modules/linux/linux.ko.debug
Reading symbols from /boot/kernel/netgraph.ko...done.
Loaded symbols for /boot/kernel/netgraph.ko
Reading symbols from /boot/kernel/ng_ether.ko...done.
Loaded symbols for /boot/kernel/ng_ether.ko
Reading symbols from /boot/kernel/ng_pppoe.ko...done.
Loaded symbols for /boot/kernel/ng_pppoe.ko
Reading symbols from /boot/kernel/ng_socket.ko...done.
Loaded symbols for /boot/kernel/ng_socket.ko
Reading symbols from /boot/kernel/mga.ko...done.
Loaded symbols for /boot/kernel/mga.ko
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:240
240 dumping++;
(kgdb) where
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:240
#1  0xc04292cd in db_fncall (dummy1=0, dummy2=0, dummy3=0, dummy4=0xc8dba7bc à×hÀ) 
at /usr/src/sys/ddb/db_command.c:548
#2  0xc042906a in db_command 

Re: pcm(4) related panic

2003-11-25 Thread Don Lewis
On 25 Nov, Artur Poplawski wrote:
 Artur Poplawski [EMAIL PROTECTED] wrote:
 
 Hello,  
 
 On a 5.1-RELEASE and 5.2-BETA machines I have been able to cause a panic 
 like this:

 Sleeping on swread with the following non-sleepable locks held:
 exclusive sleep mutex pcm0:play:0 (pcm channel) r = 0 (0xc1c3d740) locked @ \   
 /usr/src/sys/dev/sound/pcm/dsp.c:146

This enables the panic.

 panic: sleeping thread (pid 583) owns a non-sleepable lock

Then the panic happens when another thread tries to grab the mutex.


The problem is that the pcm code attempts to hold a mutex across a call
to uiomove(), which can sleep if the userland buffer that it is trying
to access is paged out.  Either the buffer has to be pre-wired before
calling getchns(), or the mutex has to be dropped around the call to
uiomove().  The amount of memory to be wired should be limited to
'sz' as calculated by chn_read() and chn_write(), which complicates the
logic.  Dropping the mutex probably has other issues.


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: pcm(4) related panic

2003-11-25 Thread Don Lewis
On 25 Nov, Don Lewis wrote:
 On 25 Nov, Artur Poplawski wrote:
 Artur Poplawski [EMAIL PROTECTED] wrote:
 
 Hello,  
 
 On a 5.1-RELEASE and 5.2-BETA machines I have been able to cause a panic 
 like this:
 
 Sleeping on swread with the following non-sleepable locks held:
 exclusive sleep mutex pcm0:play:0 (pcm channel) r = 0 (0xc1c3d740) locked @ \   
 /usr/src/sys/dev/sound/pcm/dsp.c:146
 
 This enables the panic.
 
 panic: sleeping thread (pid 583) owns a non-sleepable lock
 
 Then the panic happens when another thread tries to grab the mutex.
 
 
 The problem is that the pcm code attempts to hold a mutex across a call
 to uiomove(), which can sleep if the userland buffer that it is trying
 to access is paged out.  Either the buffer has to be pre-wired before
 calling getchns(), or the mutex has to be dropped around the call to
 uiomove().  The amount of memory to be wired should be limited to
 'sz' as calculated by chn_read() and chn_write(), which complicates the
 logic.  Dropping the mutex probably has other issues.

Following up to myself ...

It might be safe to drop the mutex for the uiomove() call if the code
set flags to enforce a limit of one reader and one writer at a time to
keep the code from being re-entered.  The buffer pointer manipulations
in sndbuf_dispose() and sndbuf_acquire() would probably still have to be
protected by the mutex.  If this can be made to work, it would probably
be preferable to wiring the buffer.  It would have a lot less CPU
overhead, and would work better with large buffers, which could still be
allowed to page normally.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]