I would be intersted seeing this code with none quota code in it.
Right now I don't run the quota patch.
As well I had a crash with the 2.4.20-ac2-ctx16 as well.
Only diff is that the bug happens at a far different row now but still
in sched.c and seems to still be the same reason. However I got 4 days
out of it instead of only 2 as previously.
Consider the discussions about semaphors and soft irq the little
assembler I know and understand (been over 9 years and I was no good
at it) this seems to have some merit to it and sounds like a plausible
explanation to it.
bug trace to follow
ksymoops 2.4.5 on i686 2.4.20-ac2-ctx16. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.20-ac2-ctx16/ (default)
-m /boot/System.map-2.4.20-ac2-ctx16 (default)
kernel BUG at sched.c:990!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c0115f61>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 00000002 ebx: c02b8220 ecx: c02b8224 edx: ffffffff
esi: dce8c000 edi: dce8d9b4 ebp: dce8d98c esp: dce8d974
ds: 0018 es: 0018 ss: 0018
Process httpd (pid: 1894, stackpage=dce0d000)
Stack: 00000000 00000000 00000000 c02b8220 dce8c000 dce8d9b4 eee70bc0 c026af4c
00000212 c02b8220 e8d7b840 e8a3aa40 c026ae29 c02b8220 dce8d9b4 ffffffff
c02b8224 c02b8224 dce8c000 00000002 e38cfb40 c0125e8a f090f000 c0244488
Call Trace: [<c026af4c>] [<c026ae29>] [<c0125e8a>] {<c0244488>] [<c0138fef>]
[<c0241687>] [<c0244a4f>] [<c0138fef>] [<c01288c0>] [<c0128a19>] [<c01c231e>]
[<c01f2a65>] [<c01f34c8>] [<c02119b8>] [<c024831d>] [<c024876e>] [<c024181c>]
[<c0241ba3>] [<c024205a>] [<c0229bb0>] [<c0229cfb>] [<c021ce43>] [<c0229bb0>]
[<c022994d>] [<c0229bb0>] [<c0229f00>] [<c0216631>] [<c021675d>] [<c021686a>]
[<c011da94>] [<c010a47e>] [<c010c9d8>] [<c0269226>] [<c02121f1>] [<c0124fd4>]
[<c0145445>] [<c0146568>] [<c013a523>] [<c013a91a>] [<c0108db7>]
Code: 0f 0b de 03 10 92 27 c0 e9 29 fd ff ff 89 f6 55 89 e5 57 89
>>EIP; c0115f61 <schedule+261/270> <=====
>>ebx; c02b8220 <uts_sem+0/20>
>>ecx; c02b8224 <uts_sem+4/20>
>>edx; ffffffff <END_OF_CODE+f704694/????>
>>esi; dce8c000 <_end+1cb4d03c/304d90bc>
>>edi; dce8d9b4 <_end+1cb4e9f0/304d90bc>
>>ebp; dce8d98c <_end+1cb4e9c8/304d90bc>
>>esp; dce8d974 <_end+1cb4e9b0/304d90bc>
Trace; c026af4c <rwsem_down_failed_common+4c/344d>
Trace; c026ae29 <rwsem_down_write_failed+29/40>
Trace; c0125e8a <.text.lock.sys+aa/160>
Trace; c0241687 <tcp_v4_syn_recv_sock+47/180>
Trace; c0244a4f <tcp_check_req+1cf/420>
Trace; c0138fef <page_add_rmap+2f/80>
Trace; c01288c0 <do_no_page+e0/1b0>
Trace; c0128a19 <handle_mm_fault+89/150>
Trace; c01c231e <intr_handler+be/f0>
Trace; c01f2a65 <ide_build_sglist+95/180>
Trace; c01f34c8 <__ide_dma_begin+38/50>
Trace; c02119b8 <sock_def_readable+58/60>
Trace; c024831d <udp_queue_rcv_skb+15d/180>
Trace; c024876e <udp_rcv+1de/330>
Trace; c024181c <tcp_v4_hnd_req+5c/170>
Trace; c0241ba3 <tcp_v4_do_rcv+133/190>
Trace; c024205a <tcp_v4_rcv+45a/510>
Trace; c0229bb0 <ip_local_deliver_finish+0/170>
Trace; c0229cfb <ip_local_deliver_finish+14b/170>
Trace; c021ce43 <nf_hook_slow+b3/180>
Trace; c0229bb0 <ip_local_deliver_finish+0/170>
Trace; c022994d <ip_local_deliver+4d/70>
Trace; c0229bb0 <ip_local_deliver_finish+0/170>
Trace; c0229f00 <ip_rcv_finish+1e0/260>
Trace; c0216631 <netif_receive_skb+111/1d0>
Trace; c021675d <process_backlog+6d/110>
Trace; c021686a <net_rx_action+6a/100>
Trace; c011da94 <do_softirq+94/a0>
Trace; c010a47e <do_IRQ+9e/a0>
Trace; c010c9d8 <call_do_IRQ+5/d>
Trace; c0269226 <memcpy+26/60>
Trace; c02121f1 <__kfree_skb+101/160>
Trace; c0124fd4 <sys_newuname+c4/100>
Trace; c0145445 <path_release+15/40>
Trace; c0146568 <open_namei+238/5c0>
Trace; c013a523 <filp_open+43/70>
Trace; c013a91a <sys_open+8a/a0>
Trace; c0108db7 <system_call+33/38>
Code; c0115f61 <schedule+261/270>
00000000 <_EIP>:
Code; c0115f61 <schedule+261/270> <=====
0: 0f 0b ud2a <=====
Code; c0115f63 <schedule+263/270>
2: de 03 fiadd (%ebx)
Code; c0115f65 <schedule+265/270>
4: 10 92 27 c0 e9 29 adc %dl,0x29e9c027(%edx)
Code; c0115f6b <schedule+26b/270>
a: fd std
Code; c0115f6c <schedule+26c/270>
b: ff (bad)
Code; c0115f6d <schedule+26d/270>
c: ff 89 f6 55 89 e5 decl 0xe58955f6(%ecx)
Code; c0115f73 <__wake_up+3/70>
12: 57 push %edi
Code; c0115f74 <__wake_up+4/70>
13: 89 00 mov %eax,(%eax)
<0>Kernel Panic: Aiee, killing interrupt handler!
Tuesday, March 18, 2003, 10:35:39 AM, you wrote:
JS> At 14:49 on Tue 18/03/03, [EMAIL PROTECTED] masquerading as 'Herbert Poetzl' wrote:
>> okay, I should rephrase:
>>
>> - it is obvious that (as the BUG message states)
>> schedule() is called from an (soft)irq, which
>> in turn bails out (correct behaviour ;)
>>
>> - I can follow your deduction that (if the crash
>> is only ctx related, what seems very likely)
>> the schedule() got triggered by some semaphore
>> acquiring, which failed (for whatever reason)
>>
>> - what I was unable to trace so far is the path,
>> on which sys_assign_ip_info (for example)
>> gets called from a softirq ...
>>
>> tcp_create_openreq_child -> sys_assign_ip_info
>> tcp_v4/6_syn_recv_sock -> tcp_create_openreq_child
>> tcp_check_req -> syn_recv_sock()
>> tcp_v4/6_hnd_req -> tcp_check_req
>> tcp_v4/6_do_rcv -> tcp_v4/6_hnd_req
>> tcp_v4/6_rcv -> tcp_v4/6_do_rcv
>>
>> ip_mr_input -> ip_local_deliver (multicast)
>>
>> process_backlog -> netif_receive_skb
JS> The argument in a nutshell is:
JS> *) do_softirq() occurs in the backtrace
JS> *) semaphores, which are not allowed at IRQ level, are spotted near
JS> the oops
JS> *) assuming that the backtrace is continuous
JS> =) this indicates that semaphores are being used illegally
JS> =>) replace semaphores with spinlocks or redesign
JS> I want to say 'dammit, it works and it's "obvious"', but you're quite
JS> correct, this is not rigorous enough!
JS> Another decoded oops (from the email I sent to the list and cc'd you on
JS> last Thursday) replete with tortuous ascii code path:
JS> //----------------------------------------------------------------------
JS> X rwsem_down_write_failed
JS> ? .text.lock.sys
JS> ^ tcp_create_openreq_child <== bingo
JS> ^ tcp_v4_syn_recv_sock
JS> \ tcp_check_req -<-<-<-<-_ <== only called by tcp_v4_hnd_req
JS> \
JS> __alloc_pages |
JS> do_no_page ^
JS> __kfree_skb |
JS> ipt_do_table ^
JS> ipt_do_table |
JS> ip_local_deliver_finish ^
JS> ipt_do_table |
JS> netif_rx ^
JS> |
JS> / tcp_v4_hnd_req -->->->-/
JS> ^ tcp_v4_do_rcv
JS> \ tcp_v4_rcv -<-<-<-<-<-_ <== Setup as static struct inet_protocol
JS> \ tcp_protocol:handler in protocol.c
JS> \
JS> ip_local_deliver_finish \
JS> ip_local_deliver_finish |
JS> nf_hook_slow |
JS> |
JS> / ip_local_deliver_finish -/ <== Calls struct inet_protocol*ipprot->
JS> ^ handler()
JS> \ ip_local_deliver -<-_ <== route.c in various places sets
JS> \ struct rtable * rth->u.dst.input =
JS> \ ip_local_deliver;
JS> \
JS> ip_local_deliver_finish |
JS> ip_rcv_finish |
JS> nf_iterate /
JS> nf_hook_slow /
JS> /
JS> / ip_rcv_finish ->->->-/ <= calls ip_route_input() which links rth
JS> ^ above to skb, then calls skb->dst->
JS> | input(skb);
JS> ^
JS> \ ip_rcv -<-<-<-<-<-<-_ <= ip_rcv() set as struct packet_type
JS> \ ip_packet_type->func in ip_output.c
JS>
^
JS> ip_rcv_finish |
JS> ^
JS>
|
JS> / netif_receive_skb --/ <= calls struct packet_type * pt_prev->func()
JS> ^ process_backlog
JS> ^ net_rx_action
JS> \ do_softirq <== dev.c's net_dev_init() calls:
JS> open_softirq(NET_RX_SOFTIRQ,
JS>
net_rx_action, NULL);
JS> .text.lock.af_inet
JS> inet_stream_connect
JS> sys_connect
JS> sock_map_fd
JS> sys_socket
JS> sys_socketcall
JS> sys_write
JS> do_page_fault
JS> //----------------------------------------------------------------------
JS> Enough?
JS> The answer had better be 'yes' 'cos that's your lot for now :)
JS> Regards,
JS> Jonathan
Best regards,
Eje Gustafsson mailto:[EMAIL PROTECTED]
---
The Family Entertainment Network http://www.fament.com
Phone : 620-231-7777 Fax : 620-231-4066
eBay UserID : macahan
- Your Full Time Professionals -
---
[This E-mail scanned for viruses by Declude Virus]