Re: [Asterisk-Users] Dual T400P, SMP, performance issues

2003-06-27 Thread The Traveller
] [c0147fb8] [f89e7737] [f89e7737]
  Jun 24 18:23:25 mspgate03 kernel:   [c01f0998] [c01f0fac]
  [c01f112e] [c01f53c2] [c0117fce] [c0117ef0]
  Jun 24 18:23:25 mspgate03 kernel:   [c0144a64] [c01246db]
  [c0109023]
  Jun 24 18:23:25 mspgate03 kernel:
  Jun 24 18:23:25 mspgate03 kernel: CPU 2:  
      
  Jun 24 18:23:25 mspgate03 kernel:  
      
  Jun 24 18:23:25 mspgate03 kernel:  
      
  Jun 24 18:23:25 mspgate03 kernel: Call Trace:
  Jun 24 18:23:25 mspgate03 kernel:
  Jun 24 18:23:25 mspgate03 kernel: CPU 3:0070 cce30002 0cd8
  08fa 6953 656c706d 6c616e41 73697379
  Jun 24 18:23:25 mspgate03 kernel:0009a700 46534c00 65746e69
  6c6f7072 32657461 6e655f61 0a810063 6953
  Jun 24 18:23:25 mspgate03 kernel:656c706d 65746e49 6c6f7072
  4c657461 39004653 530b 6c706d69 66736c65
  Jun 24 18:23:25 mspgate03 kernel: Call Trace:
  Jun 24 18:23:25 mspgate03 kernel:
  Jun 24 18:23:25 mspgate03 kernel: CPU 1:e14d5eac c025c896 0001
  0001  0001 c010a7c2 c025c8ab
  Jun 24 18:23:25 mspgate03 kernel: f2d92124 e14d5f00
  c0191104 0500 1805 00bf 8a01
  Jun 24 18:23:25 mspgate03 kernel:7f1c0300 01000415 1a131100
  170f1200  e14d4000  
  Jun 24 18:23:25 mspgate03 kernel: Call Trace:[c010a7c2]
  [c0191104] [c01913d4] [c018e1e2] [c014c2c7]
  Jun 24 18:23:25 mspgate03 kernel:   [c0109023]
  Jun 24 18:23:25 mspgate03 kernel:
  
  Thank you.
  Alex Zarubin
  
  -Original Message-
  From: The Traveller [mailto:[EMAIL PROTECTED]
  Sent: Tuesday, June 17, 2003 3:10 PM
  To: [EMAIL PROTECTED]
  Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues
  
  
  On Tue, Jun 17, 2003 at 20:54:39 +0200, The Traveller wrote:
   
   BTW: As I reported in my previous mail to the list, I've now installed
  kernel
   2.4.21-rc2 with ACPI-patch on the box with the E100P.  I've been trying
   very hard to reproduce a freeze with this kernel, but haven't succeeded
  yet.
  [...]
  
  Ok, it crashed again, so that wasn't it either.  What I did to trigger
  it was using the auto-dialer to loop as many calls to app_datetime out
  and then back over the same E-1 as it would take, queueing the calls
  to /var/spool/asterisk/outgoing/ 14 at a time.  It froze at the first
  attempt.  The good news is that it produced a visible kernel-panic.
  This time.  My guess is that you only don't see it if the console
  screensaver has already come on while it happens.
  
  It read something like Unable to handle kernel paging request and
  happened in the swapper-task.  As usual, it dumped a lot of numbers on the
  screen, which I didn't want to write down.
  
  Mark: If you want my help in debugging this, I'll hook it up to a
  serial console, trigger the crash and provide you with the exact
  panic, together with the ksyms and modules-info to trace it.
  
  
  
  Grtz,
  
 Oliver
  ___
  Asterisk-Users mailing list
  [EMAIL PROTECTED]
  http://lists.digium.com/mailman/listinfo/asterisk-users
  ___
  Asterisk-Users mailing list
  [EMAIL PROTECTED]
  http://lists.digium.com/mailman/listinfo/asterisk-users
 
 -- 
 Matthias Granberry
 [EMAIL PROTECTED]
 (469) 371-0596
 ___
 Asterisk-Users mailing list
 [EMAIL PROTECTED]
 http://lists.digium.com/mailman/listinfo/asterisk-users
 
___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Dual T400P, SMP, performance issues

2003-06-26 Thread The Traveller
Hi Alex,

The problem is most likely to occur with high volumes of call-setups and
disconnects.  This could be reproduced by putting 2 of your T-1 ports
back to back and then using the auto-dialer to generate a large amount of
very short calls between the ports.

I'm currently attempting to figure out what's causing the problem,
by trying different kernels with different options.  Trying a different
version of GCC is a good idea.  Didn't think of that yet.

So far, I had limited success.  The panics popped up in all the kernels
I tested with, although some things, like some other hardware / drivers, seem
to make them more likely to appear.  See the other thread I started about
this problem.



Grtz,

  Oliver

On Tue, Jun 24, 2003 at 19:10:08 -0500, Alex Zarubin wrote:

 Mark  Oliver,
 
 It is too early to say, but the picture is different now. Our dual CPU,
 dual T400P box is up for 4 days, under the load of 10 - 100 simultaneous
 PRI - SIP calls. We installed 2.4.21 #2 SMP (it was still freezing after
 that) and, what I think made the difference, recompiled
 zaptel-libpri-asterisk
 with gcc 3.3.
 
 The problem, on the way, was that asterisk wouldn't start after that. It was
 crashing while loading mp3 and lpc10 codecs. We put 'noload' for these two
 into modules.conf - temporary solution, of course.
 
 There are problems, still, with multiple connections at the same time.
 Windows
 to the box get frozen for a sec, D-channel error messages. The following
 messages are dumped into /var/log/messages. What do you think?
 
 Jun 24 18:23:25 mspgate03 kernel:
 Jun 24 18:23:25 mspgate03 kernel: wait_on_irq, CPU 1:
 Jun 24 18:23:25 mspgate03 kernel: irq:  1 [ 0 0 1 0 ]
 Jun 24 18:23:25 mspgate03 kernel: bh:   0 [ 0 0 0 0 ]
 Jun 24 18:23:25 mspgate03 kernel: Stack dumps:
 Jun 24 18:23:25 mspgate03 kernel: CPU 0:0200 036f 00e14603
 1802 0310 6647 008e0200 4803
 Jun 24 18:23:25 mspgate03 kernel:0078 001ffa02 5b490300
 0600 01c7 074e0308 1afe 01c74d03
 Jun 24 18:23:25 mspgate03 kernel:2302 d708 e101
 0900 01d7 f5030001 0423 09300207
 Jun 24 18:23:25 mspgate03 kernel: Call Trace:[f89bd281]
 [f89bb132] [f89bbb47] [f89bd281] [f89bd281]
 Jun 24 18:23:25 mspgate03 kernel:   [f89bb132] [f89bd281]
 [f89bd281] [f89bb132] [f89bbb47] [f89e7737]
 Jun 24 18:23:25 mspgate03 kernel:   [f89aa80a] [f89aa80a]
 [c01feee4] [f89e7737] [c01f4eae] [c010a98e]
 Jun 24 18:23:25 mspgate03 kernel:   [c020d122] [c010abe3]
 [c020d122] [c020d550] [c010a98e] [c020d550]
 Jun 24 18:23:25 mspgate03 kernel:   [c010abfe] [c01f0919]
 [c01f0919] [c022a1ef] [c022a1ef] [c022a5f5]
 Jun 24 18:23:25 mspgate03 kernel:   [f89bd281] [f89bd281]
 [f89bd281] [f89bb132] [f89bd510] [f89e7737]
 Jun 24 18:23:25 mspgate03 kernel:   [c022a5f5] [c01f0ffd]
 [c01f112e] [c01f53c2] [c012005b] [c010abfe]
 Jun 24 18:23:25 mspgate03 kernel:   [c015147a] [c01509dc]
 [c0147460] [c0147fb8] [f89e7737] [f89e7737]
 Jun 24 18:23:25 mspgate03 kernel:   [c01f0998] [c01f0fac]
 [c01f112e] [c01f53c2] [c0117fce] [c0117ef0]
 Jun 24 18:23:25 mspgate03 kernel:   [c0144a64] [c01246db]
 [c0109023]
 Jun 24 18:23:25 mspgate03 kernel:
 Jun 24 18:23:25 mspgate03 kernel: CPU 2:  
     
 Jun 24 18:23:25 mspgate03 kernel:  
     
 Jun 24 18:23:25 mspgate03 kernel:  
     
 Jun 24 18:23:25 mspgate03 kernel: Call Trace:
 Jun 24 18:23:25 mspgate03 kernel:
 Jun 24 18:23:25 mspgate03 kernel: CPU 3:0070 cce30002 0cd8
 08fa 6953 656c706d 6c616e41 73697379
 Jun 24 18:23:25 mspgate03 kernel:0009a700 46534c00 65746e69
 6c6f7072 32657461 6e655f61 0a810063 6953
 Jun 24 18:23:25 mspgate03 kernel:656c706d 65746e49 6c6f7072
 4c657461 39004653 530b 6c706d69 66736c65
 Jun 24 18:23:25 mspgate03 kernel: Call Trace:
 Jun 24 18:23:25 mspgate03 kernel:
 Jun 24 18:23:25 mspgate03 kernel: CPU 1:e14d5eac c025c896 0001
 0001  0001 c010a7c2 c025c8ab
 Jun 24 18:23:25 mspgate03 kernel: f2d92124 e14d5f00
 c0191104 0500 1805 00bf 8a01
 Jun 24 18:23:25 mspgate03 kernel:7f1c0300 01000415 1a131100
 170f1200  e14d4000  
 Jun 24 18:23:25 mspgate03 kernel: Call Trace:[c010a7c2]
 [c0191104] [c01913d4] [c018e1e2] [c014c2c7]
 Jun 24 18:23:25 mspgate03 kernel:   [c0109023]
 Jun 24 18:23:25 mspgate03 kernel:
 
 Thank you.
 Alex Zarubin
 
 -Original Message-
 From: The Traveller [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, June 17, 2003 3:10 PM
 To: [EMAIL PROTECTED]
 Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues
 
 
 On Tue, Jun 17, 2003 at 20:54:39 +0200, The Traveller wrote:
  
  BTW: As I reported in my previous mail to the list, I've now installed
 kernel
  2.4.21-rc2 with ACPI-patch on the box

RE: [Asterisk-Users] Dual T400P, SMP, performance issues

2003-06-26 Thread Alex Zarubin
Title: RE: [Asterisk-Users] Dual T400P, SMP, performance issues





Here is info on the kernel panic with the high volume (110+) of calls.
Same configuration as before. Comments would be appreciated.


ksymoops 2.4.4 on i686 2.4.21. Options used
 -V (default)
 -k /proc/ksyms (default)
 -l /proc/modules (default)
 -o /lib/modules/2.4.21 (specified)
 -m /boot/System.map-2.4.21 (default)
 -i


eax: 0100 ebx:  ecx:  edx: f71b5a14 esi: 0002 edi: f71b4000 ebp: f71b4000 esp: f71b59ec ds: 0018 es: 0018 ss: 0018

Process irqbalance (pid: 713, stackpage=f71b5000)
Stack: 6e6d6c6b 7271706f 76757473 7a797877 0001 c0115ef4 f71b4000 c02578fd f71b5a14 0001  0003 c0115ef4 f71b4000 f71b4000  f71b0018 c0110018 ffef c0114546 0010 0286 c0114470 

Call Trace: [c0115ef4] [c0115ef4] [c0110018] [c0114546] [c0114470] [c011bc88] [c01144c0] [c0114470] [c011b2d5] [c011eae2] [c011badb] [c011bc88] [c0116ff0] [c010960a] [c0115ef4] [c01173a8] [f89e7737] [f89fb1e0] [f89fb1e0] [c0117000] [c0109114] [c0115ef4] [c010abe3] [f897a8c0] [f897a8c0] [c0110018] [c0124345] [c012042b] [c01202d1] [c012005b] [c010abfe] [c015e751] [c0147513] [c01479f1] [f89e7737] [f89fb1e0] [c010e1b6] [c0123fc0] [c01482ab] [c01487c4] [c012042b] [c01202d1] [c012005b] [c010abfe] [c013c606] [c01471ae] [c013c953] [c0109023]

Code: 89 1d b0 e0 ff ff ff 80 04 48 33 c0 eb 02 f3 90 a1 88 f3 30 
Using defaults from ksymoops -t elf32-i386 -a i386


Trace; c0115ef4 end_level_ioapic_irq+24/f0
Trace; c0115ef4 end_level_ioapic_irq+24/f0
Trace; c0110018 pci_conf2_write+88/f0
Trace; c0114546 .text.lock.smp+19/23
Trace; c0114470 stop_this_cpu+0/40
Trace; c011bc88 printk+128/140
Trace; c01144c0 smp_send_stop+10/30
Trace; c0114470 stop_this_cpu+0/40
Trace; c011b2d5 panic+85/180
Trace; c011eae2 do_exit+32/2d0
Trace; c011badb call_console_drivers+eb/100
Trace; c011bc88 printk+128/140
Trace; c0116ff0 bust_spinlocks+50/60
Trace; c010960a die+5a/80
Trace; c0115ef4 end_level_ioapic_irq+24/f0
Trace; c01173a8 do_page_fault+3a8/4db
Trace; f89e7737 END_OF_CODE+309d4/
Trace; f89fb1e0 END_OF_CODE+4447d/
Trace; f89fb1e0 END_OF_CODE+4447d/
Trace; c0117000 do_page_fault+0/4db
Trace; c0109114 error_code+34/3c
Trace; c0115ef4 end_level_ioapic_irq+24/f0
Trace; c010abe3 do_IRQ+e3/110
Trace; f897a8c0 [usb-ohci]rh_int_timer_do+0/70
Trace; f897a8c0 [usb-ohci]rh_int_timer_do+0/70
Trace; c0110018 pci_conf2_write+88/f0
Trace; c0124345 timer_bh+2b5/3f0
Trace; c012042b bh_action+4b/80
Trace; c01202d1 tasklet_hi_action+61/a0
Trace; c012005b do_softirq+6b/d0
Trace; c010abfe do_IRQ+fe/110
Trace; c015e751 proc_lookup+51/c0
Trace; c0147513 real_lookup+73/100
Trace; c01479f1 link_path_walk+331/a10
Trace; f89e7737 END_OF_CODE+309d4/
Trace; f89fb1e0 END_OF_CODE+4447d/
Trace; c010e1b6 timer_interrupt+e6/170
Trace; c0123fc0 update_process_times+20/a0
Trace; c01482ab path_lookup+1b/30
Trace; c01487c4 open_namei+94/650
Trace; c012042b bh_action+4b/80
Trace; c01202d1 tasklet_hi_action+61/a0
Trace; c012005b do_softirq+6b/d0
Trace; c010abfe do_IRQ+fe/110
Trace; c013c606 filp_open+36/60
Trace; c01471ae getname+5e/a0
Trace; c013c953 sys_open+33/a0
Trace; c0109023 system_call+33/38
Code;  Before first symbol
 _EIP:
Code;  Before first symbol
 0: 89 1d b0 e0 ff ff mov %ebx,0xe0b0
Code; 0006 Before first symbol
 6: ff 80 04 48 33 c0 incl 0xc0334804(%eax)
Code; 000c Before first symbol
 c: eb 02 jmp 10 _EIP+0x10 0010 Before first symbol
Code; 000e Before first symbol
 e: f3 90 repz nop 
Code; 0010 Before first symbol
 10: a1 88 f3 30 00 mov 0x30f388,%eax


Thank you.
Alex Zarubin






RE: [Asterisk-Users] Dual T400P, SMP, performance issues

2003-06-25 Thread Mark Spencer
Oooh, how neat!  I wonder if there is some sort of race and that the
kernel is detecting and defeating it somehow.  Will ksymoops on your
machine handle that output?  Maybe we can track it down!

Again, does the problem occur with only one board?  i.e. is the problem
tied to having multiple boards in the machine?

Mark

On Tue, 24 Jun 2003, Alex Zarubin wrote:

 Mark  Oliver,

 It is too early to say, but the picture is different now. Our dual CPU,
 dual T400P box is up for 4 days, under the load of 10 - 100 simultaneous
 PRI - SIP calls. We installed 2.4.21 #2 SMP (it was still freezing after
 that) and, what I think made the difference, recompiled
 zaptel-libpri-asterisk
 with gcc 3.3.

 The problem, on the way, was that asterisk wouldn't start after that. It was
 crashing while loading mp3 and lpc10 codecs. We put 'noload' for these two
 into modules.conf - temporary solution, of course.

 There are problems, still, with multiple connections at the same time.
 Windows
 to the box get frozen for a sec, D-channel error messages. The following
 messages are dumped into /var/log/messages. What do you think?

 Jun 24 18:23:25 mspgate03 kernel:
 Jun 24 18:23:25 mspgate03 kernel: wait_on_irq, CPU 1:
 Jun 24 18:23:25 mspgate03 kernel: irq:  1 [ 0 0 1 0 ]
 Jun 24 18:23:25 mspgate03 kernel: bh:   0 [ 0 0 0 0 ]
 Jun 24 18:23:25 mspgate03 kernel: Stack dumps:
 Jun 24 18:23:25 mspgate03 kernel: CPU 0:0200 036f 00e14603
 1802 0310 6647 008e0200 4803
 Jun 24 18:23:25 mspgate03 kernel:0078 001ffa02 5b490300
 0600 01c7 074e0308 1afe 01c74d03
 Jun 24 18:23:25 mspgate03 kernel:2302 d708 e101
 0900 01d7 f5030001 0423 09300207
 Jun 24 18:23:25 mspgate03 kernel: Call Trace:[f89bd281]
 [f89bb132] [f89bbb47] [f89bd281] [f89bd281]
 Jun 24 18:23:25 mspgate03 kernel:   [f89bb132] [f89bd281]
 [f89bd281] [f89bb132] [f89bbb47] [f89e7737]
 Jun 24 18:23:25 mspgate03 kernel:   [f89aa80a] [f89aa80a]
 [c01feee4] [f89e7737] [c01f4eae] [c010a98e]
 Jun 24 18:23:25 mspgate03 kernel:   [c020d122] [c010abe3]
 [c020d122] [c020d550] [c010a98e] [c020d550]
 Jun 24 18:23:25 mspgate03 kernel:   [c010abfe] [c01f0919]
 [c01f0919] [c022a1ef] [c022a1ef] [c022a5f5]
 Jun 24 18:23:25 mspgate03 kernel:   [f89bd281] [f89bd281]
 [f89bd281] [f89bb132] [f89bd510] [f89e7737]
 Jun 24 18:23:25 mspgate03 kernel:   [c022a5f5] [c01f0ffd]
 [c01f112e] [c01f53c2] [c012005b] [c010abfe]
 Jun 24 18:23:25 mspgate03 kernel:   [c015147a] [c01509dc]
 [c0147460] [c0147fb8] [f89e7737] [f89e7737]
 Jun 24 18:23:25 mspgate03 kernel:   [c01f0998] [c01f0fac]
 [c01f112e] [c01f53c2] [c0117fce] [c0117ef0]
 Jun 24 18:23:25 mspgate03 kernel:   [c0144a64] [c01246db]
 [c0109023]
 Jun 24 18:23:25 mspgate03 kernel:
 Jun 24 18:23:25 mspgate03 kernel: CPU 2:  
     
 Jun 24 18:23:25 mspgate03 kernel:  
     
 Jun 24 18:23:25 mspgate03 kernel:  
     
 Jun 24 18:23:25 mspgate03 kernel: Call Trace:
 Jun 24 18:23:25 mspgate03 kernel:
 Jun 24 18:23:25 mspgate03 kernel: CPU 3:0070 cce30002 0cd8
 08fa 6953 656c706d 6c616e41 73697379
 Jun 24 18:23:25 mspgate03 kernel:0009a700 46534c00 65746e69
 6c6f7072 32657461 6e655f61 0a810063 6953
 Jun 24 18:23:25 mspgate03 kernel:656c706d 65746e49 6c6f7072
 4c657461 39004653 530b 6c706d69 66736c65
 Jun 24 18:23:25 mspgate03 kernel: Call Trace:
 Jun 24 18:23:25 mspgate03 kernel:
 Jun 24 18:23:25 mspgate03 kernel: CPU 1:e14d5eac c025c896 0001
 0001  0001 c010a7c2 c025c8ab
 Jun 24 18:23:25 mspgate03 kernel: f2d92124 e14d5f00
 c0191104 0500 1805 00bf 8a01
 Jun 24 18:23:25 mspgate03 kernel:7f1c0300 01000415 1a131100
 170f1200  e14d4000  
 Jun 24 18:23:25 mspgate03 kernel: Call Trace:[c010a7c2]
 [c0191104] [c01913d4] [c018e1e2] [c014c2c7]
 Jun 24 18:23:25 mspgate03 kernel:   [c0109023]
 Jun 24 18:23:25 mspgate03 kernel:

 Thank you.
 Alex Zarubin

 -Original Message-
 From: The Traveller [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, June 17, 2003 3:10 PM
 To: [EMAIL PROTECTED]
 Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues


 On Tue, Jun 17, 2003 at 20:54:39 +0200, The Traveller wrote:
 
  BTW: As I reported in my previous mail to the list, I've now installed
 kernel
  2.4.21-rc2 with ACPI-patch on the box with the E100P.  I've been trying
  very hard to reproduce a freeze with this kernel, but haven't succeeded
 yet.
 [...]

 Ok, it crashed again, so that wasn't it either.  What I did to trigger
 it was using the auto-dialer to loop as many calls to app_datetime out
 and then back over the same E-1 as it would take, queueing the calls
 to /var/spool/asterisk/outgoing/ 14 at a time.  It froze at the first
 attempt.  The good news

RE: [Asterisk-Users] Dual T400P, SMP, performance issues

2003-06-25 Thread Alex Zarubin
Title: RE: [Asterisk-Users] Dual T400P, SMP, performance issues





Mark, here is the info you requested. As far as multiple T400P boards
question, I believe this is the most probable reason for this behavior
(we haven't seen it on a single board machines). But in order to
prove it we need 4-5 days of load testing. Hopefully we'll be able
to do it next week.


ksymoops 2.4.4 on i686 2.4.21. Options used
 -V (default)
 -k /proc/ksyms (default)
 -l /proc/modules (default)
 -o /lib/modules/2.4.21 (specified)
 -m /boot/System.map-2.4.21 (default)
 -i


Jun 24 18:23:25 mspgate03 kernel: wait_on_irq, CPU 1:
Jun 24 18:23:25 mspgate03 kernel: irq: 1 [ 0 0 1 0 ]
Jun 24 18:23:25 mspgate03 kernel: bh: 0 [ 0 0 0 0 ]
Jun 24 18:23:25 mspgate03 kernel: Stack dumps:
Jun 24 18:23:25 mspgate03 kernel: CPU 0:0200 036f 00e14603 1802 0310 6647 008e0200 4803
Jun 24 18:23:25 mspgate03 kernel: 0078 001ffa02 5b490300 0600 01c7 074e0308 1afe 01c74d03
Jun 24 18:23:25 mspgate03 kernel: 2302 d708 e101 0900 01d7 f5030001 0423 09300207
Jun 24 18:23:25 mspgate03 kernel: Call Trace: [f89bd281] [f89bb132] [f89bbb47] [f89bd281] [f89bd281]
Jun 24 18:23:25 mspgate03 kernel: [f89bb132] [f89bd281] [f89bd281] [f89bb132] [f89bbb47] [f89e7737]
Jun 24 18:23:25 mspgate03 kernel: [f89aa80a] [f89aa80a] [c01feee4] [f89e7737] [c01f4eae] [c010a98e]
Jun 24 18:23:25 mspgate03 kernel: [c020d122] [c010abe3] [c020d122] [c020d550] [c010a98e] [c020d550]
Jun 24 18:23:25 mspgate03 kernel: [c010abfe] [c01f0919] [c01f0919] [c022a1ef] [c022a1ef] [c022a5f5]
Jun 24 18:23:25 mspgate03 kernel: [f89bd281] [f89bd281] [f89bd281] [f89bb132] [f89bd510] [f89e7737]
Jun 24 18:23:25 mspgate03 kernel: [c022a5f5] [c01f0ffd] [c01f112e] [c01f53c2] [c012005b] [c010abfe]
Jun 24 18:23:25 mspgate03 kernel: [c015147a] [c01509dc] [c0147460] [c0147fb8] [f89e7737] [f89e7737]
Jun 24 18:23:25 mspgate03 kernel: [c01f0998] [c01f0fac] [c01f112e] [c01f53c2] [c0117fce] [c0117ef0]
Jun 24 18:23:25 mspgate03 kernel: [c0144a64] [c01246db] [c0109023]
Jun 24 18:23:25 mspgate03 kernel: CPU 2:       
Jun 24 18:23:25 mspgate03 kernel:        
Jun 24 18:23:25 mspgate03 kernel:        
Jun 24 18:23:25 mspgate03 kernel: CPU 3:0070 cce30002 0cd8 08fa 6953 656c706d 6c616e41 73697379
Jun 24 18:23:25 mspgate03 kernel: 0009a700 46534c00 65746e69 6c6f7072 32657461 6e655f61 0a810063 6953
Jun 24 18:23:25 mspgate03 kernel: 656c706d 65746e49 6c6f7072 4c657461 39004653 530b 6c706d69 66736c65
Jun 24 18:23:25 mspgate03 kernel: CPU 1:e14d5eac c025c896 0001 0001  0001 c010a7c2 c025c8ab
Jun 24 18:23:25 mspgate03 kernel:  f2d92124 e14d5f00 c0191104 0500 1805 00bf 8a01
Jun 24 18:23:25 mspgate03 kernel: 7f1c0300 01000415 1a131100 170f1200  e14d4000  
Jun 24 18:23:25 mspgate03 kernel: Call Trace: [c010a7c2] [c0191104] [c01913d4] [c018e1e2] [c014c2c7]
Jun 24 18:23:25 mspgate03 kernel: [c0109023]
Warning (Oops_read): Code line not seen, dumping what data is available


Trace; f89bd281 [zaptel]zt_process_putaudio_chunk+9a1/b70
Trace; f89bb132 [zaptel]zt_process_getaudio_chunk+f2/910
Trace; f89bbb47 [zaptel]zt_getbuf_chunk+1f7/4b0
Trace; f89bd281 [zaptel]zt_process_putaudio_chunk+9a1/b70
Trace; f89bd281 [zaptel]zt_process_putaudio_chunk+9a1/b70
Trace; f89bb132 [zaptel]zt_process_getaudio_chunk+f2/910
Trace; f89bd281 [zaptel]zt_process_putaudio_chunk+9a1/b70
Trace; f89bd281 [zaptel]zt_process_putaudio_chunk+9a1/b70
Trace; f89bb132 [zaptel]zt_process_getaudio_chunk+f2/910
Trace; f89bbb47 [zaptel]zt_getbuf_chunk+1f7/4b0
Trace; f89e7737 [tor2]tor2_intr+847/cb0
Trace; f89aa80a [eepro100]speedo_start_xmit+17a/210
Trace; f89aa80a [eepro100]speedo_start_xmit+17a/210
Trace; c01feee4 qdisc_restart+14/170
Trace; f89e7737 [tor2]tor2_intr+847/cb0
Trace; c01f4eae dev_queue_xmit+14e/320
Trace; c010a98e handle_IRQ_event+5e/90
Trace; c020d122 ip_output+102/170
Trace; c010abe3 do_IRQ+e3/110
Trace; c020d122 ip_output+102/170
Trace; c020d550 ip_queue_xmit+3c0/520
Trace; c010a98e handle_IRQ_event+5e/90
Trace; c020d550 ip_queue_xmit+3c0/520
Trace; c010abfe do_IRQ+fe/110
Trace; c01f0919 sock_def_readable+39/70
Trace; c01f0919 sock_def_readable+39/70
Trace; c022a1ef udp_queue_rcv_skb+18f/200
Trace; c022a1ef udp_queue_rcv_skb+18f/200
Trace; c022a5f5 udp_rcv+165/340
Trace; f89bd281 [zaptel]zt_process_putaudio_chunk+9a1/b70
Trace; f89bd281 [zaptel]zt_process_putaudio_chunk+9a1/b70
Trace; f89bd281 [zaptel]zt_process_putaudio_chunk+9a1/b70
Trace; f89bb132 [zaptel]zt_process_getaudio_chunk+f2/910
Trace; f89bd510 [zaptel]zt_putbuf_chunk+c0/730
Trace; f89e7737 [tor2]tor2_intr+847/cb0
Trace; c022a5f5 udp_rcv+165/340
Trace; c01f0ffd kfree_skbmem+5d/70
Trace; c01f112e __kfree_skb+11e/130
Trace; c01f53c2 net_tx_action+62/140

RE: [Asterisk-Users] Dual T400P, SMP, performance issues

2003-06-17 Thread Mark Spencer
 As far as SMP and single T400P - we'll try and report the results
 but the idea was to go with as high density as possible ...

Right, I'm just trying to narrow down the problem.  I'm theorizing that
the problem is some sort of spinlock deadlock.  Does it only occur if
there is activity or even if the lines are up but no calls taking place?

 What do you think of using hyperthreading - should we enable or disable it
 for the box running asterisk?

We use hyperthreading but have not run tests longer than a few hours on
those machines.

 What about -DCONFIG_ZAPTEL_WATCHDOG ? Can it help and how to use it?
k
Likely will make no difference in this situation.

Mark

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


RE: [Asterisk-Users] Dual T400P, SMP, performance issues

2003-06-17 Thread Alex Zarubin
Title: RE: [Asterisk-Users] Dual T400P, SMP, performance issues





I believe this is related to the load, there are always calls in our test.
Attached is a part of /var/log/messages file with SysRq memory info - in
case you can see something in it. The box was rebooted 06-16 17:08 and
the problem occurred 06-17 11:36.


Thank you.
Alex Zarubin




-Original Message-
From: Mark Spencer [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, June 17, 2003 6:58 AM
To: '[EMAIL PROTECTED]'
Subject: RE: [Asterisk-Users] Dual T400P, SMP, performance issues



 As far as SMP and single T400P - we'll try and report the results
 but the idea was to go with as high density as possible ...


Right, I'm just trying to narrow down the problem. I'm theorizing that
the problem is some sort of spinlock deadlock. Does it only occur if
there is activity or even if the lines are up but no calls taking place?


 What do you think of using hyperthreading - should we enable or disable it
 for the box running asterisk?


We use hyperthreading but have not run tests longer than a few hours on
those machines.


 What about -DCONFIG_ZAPTEL_WATCHDOG ? Can it help and how to use it?
k
Likely will make no difference in this situation.


Mark


___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users







mes_ast.gz
Description: Binary data


RE: [Asterisk-Users] Dual T400P, SMP, performance issues

2003-06-16 Thread Alex Zarubin
Title: RE: [Asterisk-Users] Dual T400P, SMP, performance issues





Mark,


As far as pings - we have cases when we could ping the box on both
interfaces and there are cases when we could not (we tried 3-4 sets of
NICs and drivers). All telnets, X, ssh etc. are definitely dead.
No coredumps (asterisk was started with -g option), no kernel panics.
Black console, Alt-SysRq combinations don't work.
Pretty much no options but rebooting the box.


As far as SMP and single T400P - we'll try and report the results
but the idea was to go with as high density as possible ...


What do you think of using hyperthreading - should we enable or disable it
for the box running asterisk?


What about -DCONFIG_ZAPTEL_WATCHDOG ? Can it help and how to use it?


Thank you.
Alex Zarubin


-Original Message-
From: Mark Spencer [mailto:[EMAIL PROTECTED]]
Sent: Saturday, June 14, 2003 10:23 AM
To: '[EMAIL PROTECTED]'
Subject: RE: [Asterisk-Users] Dual T400P, SMP, performance issues



When you say stops responding do you mean no more pings, telnet dead,
etc? Or do you mean asterisk stops responding? Is there a segfault or
kernel panic, or any other failure diagnostic?


Mark


On Thu, 12 Jun 2003, Alex Zarubin wrote:


 Zaptel was compiled with -D__SMP__

 We've installed irqbalance and the picture improved a lot
 (thanks to Jared Smith). Do you still see problems in our /proc/interrupts?

 The big issue for us now is that after 24+ hours of the test load PRI-SIP
 our Dell PE2650, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, 2.4.20-18.7smp #1 SMP
 stops responding to anything.

 So the questions are:
  - are there known issues with PE2650 and ways to fix them?
  - can someone recommend the 'stable' 2.4 SMP kernel for this
   kind of load?
  - any expertise in this area will be appreciated

 CPU0 CPU1 CPU2 CPU3
 0: 230710 30030 50050 0 IO-APIC-edge timer
 1: 5 0 0 233 IO-APIC-edge keyboard
 2: 0 0 0 0 XT-PIC cascade
 5: 0 0 0 0 IO-APIC-level usb-ohci
 8: 1 0 0 0 IO-APIC-edge rtc
 14: 27 0 2 0 IO-APIC-edge ide0
 20: 2085442 400221 0 230232 IO-APIC-level tor2
 24: 293848 1841658 10010 570568 IO-APIC-level tor2
 28: 5 25643 0 0 IO-APIC-level eth0
 29: 5 0 5165040 0 IO-APIC-level eth1
 30: 43720 35467 1291 3296 IO-APIC-level aacraid
 NMI: 0 0 0 0
 LOC: 310618 310616 310616 310616
 ERR: 0
 MIS: 0

 Thank you.
 Alex Zarubin

 -Original Message-
 From: Martin Pycko [mailto:[EMAIL PROTECTED]]
 Sent: Tuesday, June 10, 2003 9:48 AM
 To: '[EMAIL PROTECTED]'
 Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues


 Are you sure that you compiled zaptel for __SMP__ ?
 Edit your zaptel/Makefile.

 0: 75283844 75241320 75286285 75247088 IO-APIC-edge timer
 1: 1 0 1 1 IO-APIC-edge keyboard
 2: 0 0 0 0 XT-PIC cascade
 3: 0 0 0 0 IO-APIC-level usb-ohci
 8: 1 0 0 0 IO-APIC-edge rtc
 15: 1 0 0 1 IO-APIC-edge ide1
 16: 22134870 22120997 22135905 22122829 IO-APIC-level eth0
 25: 4670 4548 4614 4518 IO-APIC-level tor2

 All the four CPU's should have IRQ's like in the example above.

 Martin

 On Mon, 9 Jun 2003, Alex Zarubin wrote:

  Hi,
 
  We are trying to validate Asterisk as a media gateway PRI - SIP with two
  T400P (8 T1s) per box. The first
  experience with BOX1 (Compaq, 2.53 GHz, 1 Gb RAM) and just one T400P was
  encouraging - on the load
  test with 3 T1s worth of calls we had on average 75% idle CPU.
 
  Not so with BOX2 (Dell, single 2.6 GHz Xeon, 1 Gb RAM, 2 T400P) and BOX3
  (Dell, dual 2.6 GHz Xeon,
  2 Gb RAM, 2 T400P, asterisk/zaptel is built with SMP support).
 
  On the similar load test (as with the BOX1) BOX2 was showing 0% idle CPU
 70%
  of the time. Just 3 T1s
  out of 8.
 
  On the load test with just 2 T1s BOX3 was very close to 0% idle on CPU0,
  CPU1 was at 95% idle.
  The process ksoftirqd_CPU0 was close to the top of the 'top', with
  /proc/interrupts showing tor2 related
  numbers growing very fast. We had 2 T1s plugged into the first T400P
 board,
  with nothing going into the second,
  but the number of interrupts for the both boards was growing at the same
  pace. Here are the interrupts
  (after the box reboot, so they are not that big as they were) - do they
 look
  OK?
 
 
  CPU0 CPU1 CPU2 CPU3
  0: 122556 0 0 0 IO-APIC-edge timer
  1: 4 0 0 0 IO-APIC-edge keyboard
  2: 0 0 0 0 XT-PIC cascade
  5: 0 0 0 0 IO-APIC-level usb-ohci
  8: 1 0 0 0 IO-APIC-edge rtc
  12: 20 0 0 0 IO-APIC-edge PS/2
 Mouse
  14: 23 0 2 0 IO-APIC-edge ide0
  20: 516930 0 0 0 IO-APIC-level tor2
  24: 516524 0 0 0 IO-APIC-level tor2
  28: 10600 0 0 0 IO-APIC-level eth0
  29: 4837 0 0 0 IO-APIC-level eth1
  30: 24831 0 0 0 IO-APIC-level aacraid
  NMI: 0 0 0 0
  LOC: 122430 122429 122429 122428
  ERR: 0
  MIS: 0
 
  Not sure what went wrong. Any suggestions on how to work with 2 T400P in a
  box (without hurting performance)
  and how to get advantage of SMP for Asterisk would be appreciated.
 
  Any known Linux kernel related issues (2.4.20-13.7smp #1 SMP for BOX3 )?
 
  Thank you.
 
  Alex Zarubin

Re: [Asterisk-Users] Dual T400P, SMP, performance issues

2003-06-16 Thread The Traveller
Yo,

I've seen very similar Zaptel-related freezes on a wide variety of
mainboards (SMP as well as non-SMP), with X100P's as well as with an E100P.
At some point, almost always at the moment a call through one of those cards
connects or disconnects, the machine completely stops responding and needs
a reset to come back to life.  A very nice way to trigger it with the E100P
seems to be to put around 10-20 channels of it into a meetme-conference and
then issue the stop now-command on the Asterisk-console.  A high volume
of connects / disconnects seems to trigger the freezes.  I'm still
investigating the issue and am going to try different kernels and
some custom kernel-patches.

One of my boxes (dual PIII-750, Intel L440GX+-board) with an X100P and
a TDM40P in it hasn't frozen since I installed kernel 2.4.21-rc2 with
the ACPI-patch (http://sourceforge.net/projects/acpi/).  I'll probably try
that on the box with the E100P first.  Be sure enable Power Management
support in your kernel-config, disable APM, enable ACPI and check all
ACPI-options, except for CPU Enumeration Only.  Note that this ACPI-
patch also handles IRQ-routing and might help in cases where the BIOS assigns
the same IRQ to some devices (or, as was the case for me, none at all).



Grtz,

  Oliver

On Mon, Jun 16, 2003 at 13:03:20 -0500, Alex Zarubin wrote:

 Mark,
 
 As far as pings - we have cases when we could ping the box on both
 interfaces and there are cases when we could not (we tried 3-4 sets of
 NICs and drivers). All telnets, X, ssh etc. are definitely dead.
 No coredumps (asterisk was started with -g option), no kernel panics.
 Black console, Alt-SysRq combinations don't work.
 Pretty much no options but rebooting the box.
 
 As far as SMP and single T400P - we'll try and report the results
 but the idea was to go with as high density as possible ...
 
 What do you think of using hyperthreading - should we enable or disable it
 for the box running asterisk?
 
 What about -DCONFIG_ZAPTEL_WATCHDOG ? Can it help and how to use it?
 
 Thank you.
 Alex Zarubin
 
 -Original Message-
 From: Mark Spencer [mailto:[EMAIL PROTECTED]
 Sent: Saturday, June 14, 2003 10:23 AM
 To: '[EMAIL PROTECTED]'
 Subject: RE: [Asterisk-Users] Dual T400P, SMP, performance issues
 
 
 When you say stops responding do you mean no more pings, telnet dead,
 etc?  Or do you mean asterisk stops responding?  Is there a segfault or
 kernel panic, or any other failure diagnostic?
 
 Mark
 
 On Thu, 12 Jun 2003, Alex Zarubin wrote:
 
  Zaptel was compiled with -D__SMP__
 
  We've installed irqbalance and the picture improved a lot
  (thanks to Jared Smith). Do you still see problems in our
 /proc/interrupts?
 
  The big issue for us now is that after 24+ hours of the test load PRI-SIP
  our Dell PE2650, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, 2.4.20-18.7smp #1
 SMP
  stops responding to anything.
 
  So the questions are:
  - are there known issues with PE2650 and ways to fix them?
  - can someone recommend the 'stable' 2.4 SMP kernel for this
kind of load?
  - any expertise in this area will be appreciated
 
 CPU0   CPU1   CPU2   CPU3
0: 230710  30030  50050  0IO-APIC-edge  timer
1:  5  0  0233IO-APIC-edge  keyboard
2:  0  0  0  0  XT-PIC  cascade
5:  0  0  0  0   IO-APIC-level  usb-ohci
8:  1  0  0  0IO-APIC-edge  rtc
   14: 27  0  2  0IO-APIC-edge  ide0
   20:2085442 400221  0 230232   IO-APIC-level  tor2
   24: 2938481841658  10010 570568   IO-APIC-level  tor2
   28:  5  25643  0  0   IO-APIC-level  eth0
   29:  5  05165040  0   IO-APIC-level  eth1
   30:  43720  35467   1291   3296   IO-APIC-level  aacraid
  NMI:  0  0  0  0
  LOC: 310618 310616 310616 310616
  ERR:  0
  MIS:  0
 
  Thank you.
  Alex Zarubin
 
  -Original Message-
  From: Martin Pycko [mailto:[EMAIL PROTECTED]
  Sent: Tuesday, June 10, 2003 9:48 AM
  To: '[EMAIL PROTECTED]'
  Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues
 
 
  Are you sure that you compiled zaptel for __SMP__ ?
  Edit your zaptel/Makefile.
 
0:   75283844   75241320   75286285   75247088IO-APIC-edge  timer
1:  1  0  1  1IO-APIC-edge  keyboard
2:  0  0  0  0  XT-PIC  cascade
3:  0  0  0  0   IO-APIC-level  usb-ohci
8:  1  0  0  0IO-APIC-edge  rtc
   15:  1  0  0  1IO-APIC-edge  ide1
   16:   22134870   22120997   22135905   22122829   IO-APIC-level  eth0

RE: [Asterisk-Users] Dual T400P, SMP, performance issues

2003-06-14 Thread Mark Spencer
When you say stops responding do you mean no more pings, telnet dead,
etc?  Or do you mean asterisk stops responding?  Is there a segfault or
kernel panic, or any other failure diagnostic?

Mark

On Thu, 12 Jun 2003, Alex Zarubin wrote:

 Zaptel was compiled with -D__SMP__

 We've installed irqbalance and the picture improved a lot
 (thanks to Jared Smith). Do you still see problems in our /proc/interrupts?

 The big issue for us now is that after 24+ hours of the test load PRI-SIP
 our Dell PE2650, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, 2.4.20-18.7smp #1 SMP
 stops responding to anything.

 So the questions are:
   - are there known issues with PE2650 and ways to fix them?
   - can someone recommend the 'stable' 2.4 SMP kernel for this
 kind of load?
   - any expertise in this area will be appreciated

CPU0   CPU1   CPU2   CPU3
   0: 230710  30030  50050  0IO-APIC-edge  timer
   1:  5  0  0233IO-APIC-edge  keyboard
   2:  0  0  0  0  XT-PIC  cascade
   5:  0  0  0  0   IO-APIC-level  usb-ohci
   8:  1  0  0  0IO-APIC-edge  rtc
  14: 27  0  2  0IO-APIC-edge  ide0
  20:2085442 400221  0 230232   IO-APIC-level  tor2
  24: 2938481841658  10010 570568   IO-APIC-level  tor2
  28:  5  25643  0  0   IO-APIC-level  eth0
  29:  5  05165040  0   IO-APIC-level  eth1
  30:  43720  35467   1291   3296   IO-APIC-level  aacraid
 NMI:  0  0  0  0
 LOC: 310618 310616 310616 310616
 ERR:  0
 MIS:  0

 Thank you.
 Alex Zarubin

 -Original Message-
 From: Martin Pycko [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, June 10, 2003 9:48 AM
 To: '[EMAIL PROTECTED]'
 Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues


 Are you sure that you compiled zaptel for __SMP__ ?
 Edit your zaptel/Makefile.

   0:   75283844   75241320   75286285   75247088IO-APIC-edge  timer
   1:  1  0  1  1IO-APIC-edge  keyboard
   2:  0  0  0  0  XT-PIC  cascade
   3:  0  0  0  0   IO-APIC-level  usb-ohci
   8:  1  0  0  0IO-APIC-edge  rtc
  15:  1  0  0  1IO-APIC-edge  ide1
  16:   22134870   22120997   22135905   22122829   IO-APIC-level  eth0
  25:   4670   4548   4614   4518   IO-APIC-level  tor2

 All the four CPU's should have IRQ's like in the example above.

 Martin

 On Mon, 9 Jun 2003, Alex Zarubin wrote:

  Hi,
 
  We are trying to validate Asterisk as a media gateway PRI - SIP with two
  T400P (8 T1s) per box. The first
  experience with BOX1 (Compaq, 2.53 GHz, 1 Gb RAM) and just one T400P was
  encouraging - on the load
  test with 3 T1s worth of calls we had on average 75% idle CPU.
 
  Not so with BOX2 (Dell, single 2.6 GHz Xeon, 1 Gb RAM, 2 T400P) and BOX3
  (Dell, dual 2.6 GHz Xeon,
  2 Gb RAM, 2 T400P, asterisk/zaptel is built with SMP support).
 
  On the similar load test (as with the BOX1) BOX2 was showing 0% idle CPU
 70%
  of the time. Just 3 T1s
  out of 8.
 
  On the load test with just 2 T1s BOX3 was very close to 0% idle on CPU0,
  CPU1 was at 95% idle.
  The process ksoftirqd_CPU0 was close to the top of the 'top', with
  /proc/interrupts showing tor2 related
  numbers growing very fast. We had 2 T1s plugged into the first T400P
 board,
  with nothing going into the second,
  but the number of interrupts for the both boards was growing at the same
  pace. Here are the interrupts
  (after the box reboot, so they are not that big as they were) - do they
 look
  OK?
 
 
  CPU0   CPU1   CPU2   CPU3
0: 122556  0  0  0IO-APIC-edge  timer
1:  4  0  0  0IO-APIC-edge  keyboard
2:  0  0  0  0  XT-PIC  cascade
5:  0  0  0  0   IO-APIC-level  usb-ohci
8:  1  0  0  0IO-APIC-edge  rtc
   12: 20  0  0  0IO-APIC-edge  PS/2
 Mouse
   14: 23  0  2  0IO-APIC-edge  ide0
   20: 516930  0  0  0   IO-APIC-level  tor2
   24: 516524  0  0  0   IO-APIC-level  tor2
   28:  10600  0  0  0   IO-APIC-level  eth0
   29:   4837  0  0  0   IO-APIC-level  eth1
   30:  24831  0  0  0   IO-APIC-level  aacraid
  NMI:  0  0  0  0
  LOC: 122430 122429 122429 122428

Re: [Asterisk-Users] Dual T400P, SMP, performance issues

2003-06-13 Thread Alberto Bertogli
On Thu, Jun 12, 2003 at 07:23:42PM -0500, Alex Zarubin wrote:
 Zaptel was compiled with -D__SMP__
 
 We've installed irqbalance and the picture improved a lot
 (thanks to Jared Smith). Do you still see problems in our /proc/interrupts?

Well, maybe i'm nitpicking, but there's a subtle issue there.

You mentioned that the machine is dual processor, but you see 4 CPUs there
due to hypertheading support into your Xeons.

Now, the thing is that IMHO you should try to balance the IRQs only
between the two real CPUs, leaving the HT cores aside; it might show some
improvement.

Even better, you might also want to try out splitting the interrupts among
each processor, according to the card. Like binding interrupts from card0
to cpu0 and from card1 to cpu1; but really don't remember if you can do
such things in 2.4.


Have you tried using 'vmstat' to monitor interrupt rate? Maybe it's not
that high, and your problem lays somewhere else.. it'd be a very good
thing to take look at. After all, a Pentium 2 machine can easily handle
full 100Mbit load so it should, in theory, cope with your 8 channels,
even if the interrupts are generated a bit more frequently.


BTW, does anyone know if the zapata drivers have been tested with preempt?


 The big issue for us now is that after 24+ hours of the test load PRI-SIP
 our Dell PE2650, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, 2.4.20-18.7smp #1 SMP
 stops responding to anything.

Well, no 'expertise' here but I'd recommend you to try vanilla 2.4.21-rc
(the last one, IIRC it's about 9 but poor Marcelo is being mailbombed with
patches so it's harder to keep track of -rc =); and if you can reproduce
the problem with it, then post to lkml.

Testing 2.5 would be great too, but I highly doubt that zapata drivers
work with it (at least last time I tried they doesn't even work with
latest 2.4 if HDLC is enabled, due to kernel HDLC internal changes).


A good idea is to enable SysRq (from the kernel hacking menu) and when it
locks up (if it does) try to use it to see 'how much of it is death',
printing the stack traces, task lists and memory state.

Pinging it might also be useful (yeah, that might sound dumb but it's a
good sign if it responds to pings because it means that interrupts and
some parts of the kernel are still pretty alive).


Thanks,
Alberto


___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


RE: [Asterisk-Users] Dual T400P, SMP, performance issues

2003-06-12 Thread Alex Zarubin
Title: RE: [Asterisk-Users] Dual T400P, SMP, performance issues





Zaptel was compiled with -D__SMP__


We've installed irqbalance and the picture improved a lot
(thanks to Jared Smith). Do you still see problems in our /proc/interrupts?


The big issue for us now is that after 24+ hours of the test load PRI-SIP
our Dell PE2650, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, 2.4.20-18.7smp #1 SMP
stops responding to anything.


So the questions are:
 - are there known issues with PE2650 and ways to fix them?
 - can someone recommend the 'stable' 2.4 SMP kernel for this
  kind of load?
 - any expertise in this area will be appreciated


 CPU0 CPU1 CPU2 CPU3 
 0: 230710 30030 50050 0 IO-APIC-edge timer
 1: 5 0 0 233 IO-APIC-edge keyboard
 2: 0 0 0 0 XT-PIC cascade
 5: 0 0 0 0 IO-APIC-level usb-ohci
 8: 1 0 0 0 IO-APIC-edge rtc
14: 27 0 2 0 IO-APIC-edge ide0
20: 2085442 400221 0 230232 IO-APIC-level tor2
24: 293848 1841658 10010 570568 IO-APIC-level tor2
28: 5 25643 0 0 IO-APIC-level eth0
29: 5 0 5165040 0 IO-APIC-level eth1
30: 43720 35467 1291 3296 IO-APIC-level aacraid
NMI: 0 0 0 0 
LOC: 310618 310616 310616 310616 
ERR: 0
MIS: 0


Thank you.
Alex Zarubin


-Original Message-
From: Martin Pycko [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, June 10, 2003 9:48 AM
To: '[EMAIL PROTECTED]'
Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues



Are you sure that you compiled zaptel for __SMP__ ?
Edit your zaptel/Makefile.


 0: 75283844 75241320 75286285 75247088 IO-APIC-edge timer
 1: 1 0 1 1 IO-APIC-edge keyboard
 2: 0 0 0 0 XT-PIC cascade
 3: 0 0 0 0 IO-APIC-level usb-ohci
 8: 1 0 0 0 IO-APIC-edge rtc
15: 1 0 0 1 IO-APIC-edge ide1
16: 22134870 22120997 22135905 22122829 IO-APIC-level eth0
25: 4670 4548 4614 4518 IO-APIC-level tor2


All the four CPU's should have IRQ's like in the example above.


Martin


On Mon, 9 Jun 2003, Alex Zarubin wrote:


 Hi,

 We are trying to validate Asterisk as a media gateway PRI - SIP with two
 T400P (8 T1s) per box. The first
 experience with BOX1 (Compaq, 2.53 GHz, 1 Gb RAM) and just one T400P was
 encouraging - on the load
 test with 3 T1s worth of calls we had on average 75% idle CPU.

 Not so with BOX2 (Dell, single 2.6 GHz Xeon, 1 Gb RAM, 2 T400P) and BOX3
 (Dell, dual 2.6 GHz Xeon,
 2 Gb RAM, 2 T400P, asterisk/zaptel is built with SMP support).

 On the similar load test (as with the BOX1) BOX2 was showing 0% idle CPU 70%
 of the time. Just 3 T1s
 out of 8.

 On the load test with just 2 T1s BOX3 was very close to 0% idle on CPU0,
 CPU1 was at 95% idle.
 The process ksoftirqd_CPU0 was close to the top of the 'top', with
 /proc/interrupts showing tor2 related
 numbers growing very fast. We had 2 T1s plugged into the first T400P board,
 with nothing going into the second,
 but the number of interrupts for the both boards was growing at the same
 pace. Here are the interrupts
 (after the box reboot, so they are not that big as they were) - do they look
 OK?


 CPU0 CPU1 CPU2 CPU3
 0: 122556 0 0 0 IO-APIC-edge timer
 1: 4 0 0 0 IO-APIC-edge keyboard
 2: 0 0 0 0 XT-PIC cascade
 5: 0 0 0 0 IO-APIC-level usb-ohci
 8: 1 0 0 0 IO-APIC-edge rtc
 12: 20 0 0 0 IO-APIC-edge PS/2 Mouse
 14: 23 0 2 0 IO-APIC-edge ide0
 20: 516930 0 0 0 IO-APIC-level tor2
 24: 516524 0 0 0 IO-APIC-level tor2
 28: 10600 0 0 0 IO-APIC-level eth0
 29: 4837 0 0 0 IO-APIC-level eth1
 30: 24831 0 0 0 IO-APIC-level aacraid
 NMI: 0 0 0 0
 LOC: 122430 122429 122429 122428
 ERR: 0
 MIS: 0

 Not sure what went wrong. Any suggestions on how to work with 2 T400P in a
 box (without hurting performance)
 and how to get advantage of SMP for Asterisk would be appreciated.

 Any known Linux kernel related issues (2.4.20-13.7smp #1 SMP for BOX3 )?

 Thank you.

 Alex Zarubin





___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users





Re: [Asterisk-Users] Dual T400P, SMP, performance issues

2003-06-10 Thread Martin Pycko
Are you sure that you compiled zaptel for __SMP__ ?
Edit your zaptel/Makefile.

  0:   75283844   75241320   75286285   75247088IO-APIC-edge  timer
  1:  1  0  1  1IO-APIC-edge  keyboard
  2:  0  0  0  0  XT-PIC  cascade
  3:  0  0  0  0   IO-APIC-level  usb-ohci
  8:  1  0  0  0IO-APIC-edge  rtc
 15:  1  0  0  1IO-APIC-edge  ide1
 16:   22134870   22120997   22135905   22122829   IO-APIC-level  eth0
 25:   4670   4548   4614   4518   IO-APIC-level  tor2

All the four CPU's should have IRQ's like in the example above.

Martin

On Mon, 9 Jun 2003, Alex Zarubin wrote:

 Hi,

 We are trying to validate Asterisk as a media gateway PRI - SIP with two
 T400P (8 T1s) per box. The first
 experience with BOX1 (Compaq, 2.53 GHz, 1 Gb RAM) and just one T400P was
 encouraging - on the load
 test with 3 T1s worth of calls we had on average 75% idle CPU.

 Not so with BOX2 (Dell, single 2.6 GHz Xeon, 1 Gb RAM, 2 T400P) and BOX3
 (Dell, dual 2.6 GHz Xeon,
 2 Gb RAM, 2 T400P, asterisk/zaptel is built with SMP support).

 On the similar load test (as with the BOX1) BOX2 was showing 0% idle CPU 70%
 of the time. Just 3 T1s
 out of 8.

 On the load test with just 2 T1s BOX3 was very close to 0% idle on CPU0,
 CPU1 was at 95% idle.
 The process ksoftirqd_CPU0 was close to the top of the 'top', with
 /proc/interrupts showing tor2 related
 numbers growing very fast. We had 2 T1s plugged into the first T400P board,
 with nothing going into the second,
 but the number of interrupts for the both boards was growing at the same
 pace. Here are the interrupts
 (after the box reboot, so they are not that big as they were) - do they look
 OK?


 CPU0   CPU1   CPU2   CPU3
   0: 122556  0  0  0IO-APIC-edge  timer
   1:  4  0  0  0IO-APIC-edge  keyboard
   2:  0  0  0  0  XT-PIC  cascade
   5:  0  0  0  0   IO-APIC-level  usb-ohci
   8:  1  0  0  0IO-APIC-edge  rtc
  12: 20  0  0  0IO-APIC-edge  PS/2 Mouse
  14: 23  0  2  0IO-APIC-edge  ide0
  20: 516930  0  0  0   IO-APIC-level  tor2
  24: 516524  0  0  0   IO-APIC-level  tor2
  28:  10600  0  0  0   IO-APIC-level  eth0
  29:   4837  0  0  0   IO-APIC-level  eth1
  30:  24831  0  0  0   IO-APIC-level  aacraid
 NMI:  0  0  0  0
 LOC: 122430 122429 122429 122428
 ERR:  0
 MIS:  0

 Not sure what went wrong. Any suggestions on how to work with 2 T400P in a
 box (without hurting performance)
 and how to get advantage of SMP for Asterisk would be appreciated.

 Any known Linux kernel related issues (2.4.20-13.7smp #1 SMP for BOX3 )?

 Thank you.

 Alex Zarubin




___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Dual T400P, SMP, performance issues

2003-06-10 Thread asterisk
H, I to appear to have an odd mix of interrupts.  It seems that the second CPU 
doesn't do much
at all on my dual Xeon...
   CPU0   CPU1
  0:   40652580  0IO-APIC-edge  timer
  1:926  0IO-APIC-edge  keyboard
  2:  0  0  XT-PIC  cascade
  6:  0  0   IO-APIC-level  usb-ohci
  8:  1  0IO-APIC-edge  rtc
 12:308  0IO-APIC-edge  PS/2 Mouse
 14:  2  0IO-APIC-edge  ide0
 20:  406481379  0   IO-APIC-level  tor2
 24:  0  0   IO-APIC-level  tor2
 28:4516659  0   IO-APIC-level  eth0
 30: 911870  0   IO-APIC-level  aacraid
NMI:  0  0
LOC:   40653025   40653047
ERR:  0
MIS:  0
I haven't enables the second card yet but will be enabling soon.  I should probably 
recompile * and
zaptel for SMP though I thought I had...
Bill

Martin Pycko wrote:
Are you sure that you compiled zaptel for __SMP__ ?
Edit your zaptel/Makefile.
  0:   75283844   75241320   75286285   75247088IO-APIC-edge  timer
  1:  1  0  1  1IO-APIC-edge  keyboard
  2:  0  0  0  0  XT-PIC  cascade
  3:  0  0  0  0   IO-APIC-level  usb-ohci
  8:  1  0  0  0IO-APIC-edge  rtc
 15:  1  0  0  1IO-APIC-edge  ide1
 16:   22134870   22120997   22135905   22122829   IO-APIC-level  eth0
 25:   4670   4548   4614   4518   IO-APIC-level  tor2
All the four CPU's should have IRQ's like in the example above.

Martin

On Mon, 9 Jun 2003, Alex Zarubin wrote:


Hi,

We are trying to validate Asterisk as a media gateway PRI - SIP with two
T400P (8 T1s) per box. The first
experience with BOX1 (Compaq, 2.53 GHz, 1 Gb RAM) and just one T400P was
encouraging - on the load
test with 3 T1s worth of calls we had on average 75% idle CPU.
Not so with BOX2 (Dell, single 2.6 GHz Xeon, 1 Gb RAM, 2 T400P) and BOX3
(Dell, dual 2.6 GHz Xeon,
2 Gb RAM, 2 T400P, asterisk/zaptel is built with SMP support).
On the similar load test (as with the BOX1) BOX2 was showing 0% idle CPU 70%
of the time. Just 3 T1s
out of 8.
On the load test with just 2 T1s BOX3 was very close to 0% idle on CPU0,
CPU1 was at 95% idle.
The process ksoftirqd_CPU0 was close to the top of the 'top', with
/proc/interrupts showing tor2 related
numbers growing very fast. We had 2 T1s plugged into the first T400P board,
with nothing going into the second,
but the number of interrupts for the both boards was growing at the same
pace. Here are the interrupts
(after the box reboot, so they are not that big as they were) - do they look
OK?
   CPU0   CPU1   CPU2   CPU3
 0: 122556  0  0  0IO-APIC-edge  timer
 1:  4  0  0  0IO-APIC-edge  keyboard
 2:  0  0  0  0  XT-PIC  cascade
 5:  0  0  0  0   IO-APIC-level  usb-ohci
 8:  1  0  0  0IO-APIC-edge  rtc
12: 20  0  0  0IO-APIC-edge  PS/2 Mouse
14: 23  0  2  0IO-APIC-edge  ide0
20: 516930  0  0  0   IO-APIC-level  tor2
24: 516524  0  0  0   IO-APIC-level  tor2
28:  10600  0  0  0   IO-APIC-level  eth0
29:   4837  0  0  0   IO-APIC-level  eth1
30:  24831  0  0  0   IO-APIC-level  aacraid
NMI:  0  0  0  0
LOC: 122430 122429 122429 122428
ERR:  0
MIS:  0
Not sure what went wrong. Any suggestions on how to work with 2 T400P in a
box (without hurting performance)
and how to get advantage of SMP for Asterisk would be appreciated.
Any known Linux kernel related issues (2.4.20-13.7smp #1 SMP for BOX3 )?

Thank you.

Alex Zarubin





___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users



___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Dual T400P, SMP, performance issues

2003-06-10 Thread Jared Smith
My dual-proc Xeon boxes didn't share IRQs across CPUs until I installed
the kernel-utils RPM and made sure the irqbalance service was
running...  Just a word to the wise!

Jared Smith

On Tue, 2003-06-10 at 09:52, [EMAIL PROTECTED] wrote:
 H, I to appear to have an odd mix of interrupts.  It seems that the second CPU 
 doesn't do much
 at all on my dual Xeon...
 
 CPU0   CPU1
0:   40652580  0IO-APIC-edge  timer
1:926  0IO-APIC-edge  keyboard
2:  0  0  XT-PIC  cascade
6:  0  0   IO-APIC-level  usb-ohci
8:  1  0IO-APIC-edge  rtc
   12:308  0IO-APIC-edge  PS/2 Mouse
   14:  2  0IO-APIC-edge  ide0
   20:  406481379  0   IO-APIC-level  tor2
   24:  0  0   IO-APIC-level  tor2
   28:4516659  0   IO-APIC-level  eth0
   30: 911870  0   IO-APIC-level  aacraid
 NMI:  0  0
 LOC:   40653025   40653047
 ERR:  0
 MIS:  0
 
 I haven't enables the second card yet but will be enabling soon.  I should probably 
 recompile * and
 zaptel for SMP though I thought I had...
 
 Bill
 
 
 Martin Pycko wrote:
  Are you sure that you compiled zaptel for __SMP__ ?
  Edit your zaptel/Makefile.
  
0:   75283844   75241320   75286285   75247088IO-APIC-edge  timer
1:  1  0  1  1IO-APIC-edge  keyboard
2:  0  0  0  0  XT-PIC  cascade
3:  0  0  0  0   IO-APIC-level  usb-ohci
8:  1  0  0  0IO-APIC-edge  rtc
   15:  1  0  0  1IO-APIC-edge  ide1
   16:   22134870   22120997   22135905   22122829   IO-APIC-level  eth0
   25:   4670   4548   4614   4518   IO-APIC-level  tor2
  
  All the four CPU's should have IRQ's like in the example above.
  
  Martin
  
  On Mon, 9 Jun 2003, Alex Zarubin wrote:
  
  
 Hi,
 
 We are trying to validate Asterisk as a media gateway PRI - SIP with two
 T400P (8 T1s) per box. The first
 experience with BOX1 (Compaq, 2.53 GHz, 1 Gb RAM) and just one T400P was
 encouraging - on the load
 test with 3 T1s worth of calls we had on average 75% idle CPU.
 
 Not so with BOX2 (Dell, single 2.6 GHz Xeon, 1 Gb RAM, 2 T400P) and BOX3
 (Dell, dual 2.6 GHz Xeon,
 2 Gb RAM, 2 T400P, asterisk/zaptel is built with SMP support).
 
 On the similar load test (as with the BOX1) BOX2 was showing 0% idle CPU 70%
 of the time. Just 3 T1s
 out of 8.
 
 On the load test with just 2 T1s BOX3 was very close to 0% idle on CPU0,
 CPU1 was at 95% idle.
 The process ksoftirqd_CPU0 was close to the top of the 'top', with
 /proc/interrupts showing tor2 related
 numbers growing very fast. We had 2 T1s plugged into the first T400P board,
 with nothing going into the second,
 but the number of interrupts for the both boards was growing at the same
 pace. Here are the interrupts
 (after the box reboot, so they are not that big as they were) - do they look
 OK?
 
 
 CPU0   CPU1   CPU2   CPU3
   0: 122556  0  0  0IO-APIC-edge  timer
   1:  4  0  0  0IO-APIC-edge  keyboard
   2:  0  0  0  0  XT-PIC  cascade
   5:  0  0  0  0   IO-APIC-level  usb-ohci
   8:  1  0  0  0IO-APIC-edge  rtc
  12: 20  0  0  0IO-APIC-edge  PS/2 Mouse
  14: 23  0  2  0IO-APIC-edge  ide0
  20: 516930  0  0  0   IO-APIC-level  tor2
  24: 516524  0  0  0   IO-APIC-level  tor2
  28:  10600  0  0  0   IO-APIC-level  eth0
  29:   4837  0  0  0   IO-APIC-level  eth1
  30:  24831  0  0  0   IO-APIC-level  aacraid
 NMI:  0  0  0  0
 LOC: 122430 122429 122429 122428
 ERR:  0
 MIS:  0
 
 Not sure what went wrong. Any suggestions on how to work with 2 T400P in a
 box (without hurting performance)
 and how to get advantage of SMP for Asterisk would be appreciated.
 
 Any known Linux kernel related issues (2.4.20-13.7smp #1 SMP for BOX3 )?
 
 Thank you.
 
 Alex Zarubin
 
 
 
  
  
  ___
  Asterisk-Users mailing list
  [EMAIL PROTECTED]
  http://lists.digium.com/mailman/listinfo/asterisk-users
  
  
 
 
 ___
 Asterisk-Users mailing list
 [EMAIL PROTECTED]
 http://lists.digium.com/mailman/listinfo/asterisk-users

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Dual T400P, SMP, performance issues

2003-06-10 Thread Alberto Bertogli
On Tue, Jun 10, 2003 at 10:14:09AM -0600, Jared Smith wrote:
 My dual-proc Xeon boxes didn't share IRQs across CPUs until I installed
 the kernel-utils RPM and made sure the irqbalance service was
 running...  Just a word to the wise!

Yes, you need irqbalance and a kinda modern kernel in order to be able to
balance IRQs across different CPUs.

This has nothing to do with *, because it's not up to it which CPU can
handle each interrupt.

You may also want to try out 2.5 and see how it behaves, it has improved a
lot on those areas.


BTW, some NAPI-alike stuff would help here, has anyone thought/tried out
anything like it?

Thanks,
Alberto


___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Dual T400P, SMP, performance issues

2003-06-10 Thread asterisk
On Tue, 10 Jun 2003 [EMAIL PROTECTED] wrote:
 H, I to appear to have an odd mix of interrupts.  It seems that the second CPU 
 doesn't do much
 at all on my dual Xeon...

You might have 'noapic' on your kernel command line... or your bios isnt 
configured for MP 1.4 ...

-Dan

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users