Re: Problems with interrupts on -current.

2001-09-21 Thread Josef Karthauser

[This is the continuation of a thread that started on -committers]

On Sun, Sep 16, 2001 at 02:48:48PM +0100, Josef Karthauser wrote:
 On Sun, Sep 16, 2001 at 01:35:20AM +0100, Josef Karthauser wrote:
  On Sat, Sep 15, 2001 at 03:51:07PM +0200, Dag-Erling Smorgrav wrote:
   Josef Karthauser [EMAIL PROTECTED] writes:
Is there a possibility that this commit is causing me to lose key
presses?  I'm finding it hard to imagine that I'm miss typing as
I've never noticed it before.  (Every N, where N is  30 or 40, a key
that I press doesn't register and I have to press it again).
   
   Educated guess: your interrupt latency just went to hell (where mine's
   been for three months now, I'm still waiting to hear if Matt could
   make any sense out of my crash dump) and you're losing interrupts.  If
   you have a serial mouse, try moving it around a lot and see if it
   seems to hang (you should see mentions of interrupt-level buffer
   overflows in your /var/log/messages).  Also, just for kicks, check how
   much CPU time your syncer process is using, and try running sync(8)
   and see if your keyboard wedges for a couple of seconds when you do
   that.
  
  My mouse is /dev/psm0. From time to time the ata device's
  interrupt/second goes through the roof for not apparent reason (i.e.
  several hundred interrupts/sec).  Sync never wedges anything.
 
 There's almost definitely an interrupt problem.  I regularly have
 the machine wedge almost solid when rsyncing a lot of data to and
 fro.  The machine begins to behave eratically, which I now think
 happens mainly because all the timers stop working (maybe the
 interrupts stop working?), 'systat -vmstat' doesn't produce any
 numbers because the initial time delay never passes.  :(.  Also, I
 don't appear to be able to enter the kernel debugger when this
 happens!  :(  Can someone in the know give me a hand debugging this.
 It really ought to be fixed, but my knowledge isn't sufficient to
 find this on my own.
 
 Thanks,
 Joe

This also happens from time to time:


6 usersLoad  1.39  1.23  1.14  Sep 21 13:32 

Mem:KBREALVIRTUAL VN PAGER  SWAP PAGER  
Tot   Share  TotShareFree in  out in  out   
Act   626968932   11176414728   15052 count 
All  249864   12164  280693225860 pages 
 Interrupts 
Proc:r  p  d  s  wCsw  Trp  Sys  Int  Sof  Flt  1 cow1743 total 
   6 32 12398   13  866 182326  45516 wirestray irq0
90820 act stray irq6
 8.3%Sys   5.1%Intr  0.2%User  0.0%Nice 86.4%Idl   102140 inact   stray irq7
||||||||||  11388 cache 1 acpi0 irq9
+++  3664 free   1505 ata0 irq14
  daefr   uhci0 irq5
Namei Name-cacheDir-cache   5 prcfr 2 pcm0 irq5 
Calls hits% hits% react 7 atkbd0 irq
  688  687  100   pdwak   psm0 irq12
4 zfodpdpgs   100 clk irq0  
Disks   ad0   fd0 ofodintrn   128 rtc irq8  
KB/t   6.00  0.00   9 %slo-z35712 buf   
tps1507 0   7 tfree10 dirtybuf  
MB/s   8.83  0.00   17913 desiredvnodes 
% busy   98 0   14595 numvnodes 
 4798 freevnodes


Look at the number of interrupts that the ata device is generating.
This is in no way normal!  It happens randomly and causes the machine
to basically grind to a halt.

As a comparison on the same machine, here's the output of systat -vmstat
for the machine after I rebooted it and it was running a background
fsck:


4 usersLoad  1.01  0.42  0.16  Sep 21 13:50 

Mem:KBREALVIRTUAL VN PAGER  SWAP PAGER  
Tot   Share  TotShareFree in  out in  out   
Act   40328384871980 4408   53308 count 
All  2002486884  108513210232 pages 
 Interrupts 
Proc:r  p  d  s  wCsw  Trp  

Re: Problems with interrupts on -current.

2001-09-21 Thread Terry Lambert

John Baldwin wrote:
 The problem is that during a fast interrupt handler, we don't acknowledge the
 interrupt until we return from the interrupt handler, so if we preempt it may
 be a while before we get back to the interrupted process so it can finish the
 interrupt handler and ack the interrupt in the PIC.

I think that you're also going to find some overhead problems
related to interrupt threads when it comes to NETISR running
in a seperate thread, as well.  If nothing else, you are going
to be paying an additional context switch overhead to switch
to the NETISR thread that you weren't paying before.

I don't really buy Julian's IDE stack depth worst case
argument: the fix for that is to fix the IDE drivers to not
suck up huge chunks of stack to do their work.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Problems with interrupts on -current.

2001-09-21 Thread Terry Lambert

John Baldwin wrote:
 Bah, we leave interrupts disabled during fast interrupt handlers, so this
 should be fine in -current since the softclock swi_sched() uses SWI_NOSWITCH
 (there is no NOSWITCH flag in a preemptive kernel, it's automatic and that is
 what bit the preemption kernel).

I think you will see this problem elsewhere, as time goes on.

It is reasonable to leave ethernet card interrupts disabled,
in order to load-shed prior to having to handle an interrupt,
as a means of receiver livelock avoidance.

Were you to implement a fair share scheduler to ensure that
your web server or firewall got a chance to process incoming
packets in a user space process to completion, rather than
spending all its time fielding hardware interrupts to the
exclusion of all else, part of the implementation would be
to disable interrupts on the card when the kernel-to-user
queue hit a certain depth, and not reenable them until it
hit a corresponding low water mark, at some later time.  A
reasonable approximation of this would establish high and
low watermarks on mbuf usage, and not reenable interrupts if
there were less than some low watermark of mbufs free.  Then
on the freeing of mbufs, the reenables would be delayed until
more than a high watermark of mbufs had been freed back to
the system.  Jeff Mogul, of DEC Western Research Laboratories,
suggested this approach in 1994.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message