Re: [Asterisk-Users] Dual T400P, SMP, performance issues
] [c0147fb8] [f89e7737] [f89e7737] Jun 24 18:23:25 mspgate03 kernel: [c01f0998] [c01f0fac] [c01f112e] [c01f53c2] [c0117fce] [c0117ef0] Jun 24 18:23:25 mspgate03 kernel: [c0144a64] [c01246db] [c0109023] Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: CPU 2: Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: Call Trace: Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: CPU 3:0070 cce30002 0cd8 08fa 6953 656c706d 6c616e41 73697379 Jun 24 18:23:25 mspgate03 kernel:0009a700 46534c00 65746e69 6c6f7072 32657461 6e655f61 0a810063 6953 Jun 24 18:23:25 mspgate03 kernel:656c706d 65746e49 6c6f7072 4c657461 39004653 530b 6c706d69 66736c65 Jun 24 18:23:25 mspgate03 kernel: Call Trace: Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: CPU 1:e14d5eac c025c896 0001 0001 0001 c010a7c2 c025c8ab Jun 24 18:23:25 mspgate03 kernel: f2d92124 e14d5f00 c0191104 0500 1805 00bf 8a01 Jun 24 18:23:25 mspgate03 kernel:7f1c0300 01000415 1a131100 170f1200 e14d4000 Jun 24 18:23:25 mspgate03 kernel: Call Trace:[c010a7c2] [c0191104] [c01913d4] [c018e1e2] [c014c2c7] Jun 24 18:23:25 mspgate03 kernel: [c0109023] Jun 24 18:23:25 mspgate03 kernel: Thank you. Alex Zarubin -Original Message- From: The Traveller [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 17, 2003 3:10 PM To: [EMAIL PROTECTED] Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues On Tue, Jun 17, 2003 at 20:54:39 +0200, The Traveller wrote: BTW: As I reported in my previous mail to the list, I've now installed kernel 2.4.21-rc2 with ACPI-patch on the box with the E100P. I've been trying very hard to reproduce a freeze with this kernel, but haven't succeeded yet. [...] Ok, it crashed again, so that wasn't it either. What I did to trigger it was using the auto-dialer to loop as many calls to app_datetime out and then back over the same E-1 as it would take, queueing the calls to /var/spool/asterisk/outgoing/ 14 at a time. It froze at the first attempt. The good news is that it produced a visible kernel-panic. This time. My guess is that you only don't see it if the console screensaver has already come on while it happens. It read something like Unable to handle kernel paging request and happened in the swapper-task. As usual, it dumped a lot of numbers on the screen, which I didn't want to write down. Mark: If you want my help in debugging this, I'll hook it up to a serial console, trigger the crash and provide you with the exact panic, together with the ksyms and modules-info to trace it. Grtz, Oliver ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users -- Matthias Granberry [EMAIL PROTECTED] (469) 371-0596 ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Dual T400P, SMP, performance issues
Hi Alex, The problem is most likely to occur with high volumes of call-setups and disconnects. This could be reproduced by putting 2 of your T-1 ports back to back and then using the auto-dialer to generate a large amount of very short calls between the ports. I'm currently attempting to figure out what's causing the problem, by trying different kernels with different options. Trying a different version of GCC is a good idea. Didn't think of that yet. So far, I had limited success. The panics popped up in all the kernels I tested with, although some things, like some other hardware / drivers, seem to make them more likely to appear. See the other thread I started about this problem. Grtz, Oliver On Tue, Jun 24, 2003 at 19:10:08 -0500, Alex Zarubin wrote: Mark Oliver, It is too early to say, but the picture is different now. Our dual CPU, dual T400P box is up for 4 days, under the load of 10 - 100 simultaneous PRI - SIP calls. We installed 2.4.21 #2 SMP (it was still freezing after that) and, what I think made the difference, recompiled zaptel-libpri-asterisk with gcc 3.3. The problem, on the way, was that asterisk wouldn't start after that. It was crashing while loading mp3 and lpc10 codecs. We put 'noload' for these two into modules.conf - temporary solution, of course. There are problems, still, with multiple connections at the same time. Windows to the box get frozen for a sec, D-channel error messages. The following messages are dumped into /var/log/messages. What do you think? Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: wait_on_irq, CPU 1: Jun 24 18:23:25 mspgate03 kernel: irq: 1 [ 0 0 1 0 ] Jun 24 18:23:25 mspgate03 kernel: bh: 0 [ 0 0 0 0 ] Jun 24 18:23:25 mspgate03 kernel: Stack dumps: Jun 24 18:23:25 mspgate03 kernel: CPU 0:0200 036f 00e14603 1802 0310 6647 008e0200 4803 Jun 24 18:23:25 mspgate03 kernel:0078 001ffa02 5b490300 0600 01c7 074e0308 1afe 01c74d03 Jun 24 18:23:25 mspgate03 kernel:2302 d708 e101 0900 01d7 f5030001 0423 09300207 Jun 24 18:23:25 mspgate03 kernel: Call Trace:[f89bd281] [f89bb132] [f89bbb47] [f89bd281] [f89bd281] Jun 24 18:23:25 mspgate03 kernel: [f89bb132] [f89bd281] [f89bd281] [f89bb132] [f89bbb47] [f89e7737] Jun 24 18:23:25 mspgate03 kernel: [f89aa80a] [f89aa80a] [c01feee4] [f89e7737] [c01f4eae] [c010a98e] Jun 24 18:23:25 mspgate03 kernel: [c020d122] [c010abe3] [c020d122] [c020d550] [c010a98e] [c020d550] Jun 24 18:23:25 mspgate03 kernel: [c010abfe] [c01f0919] [c01f0919] [c022a1ef] [c022a1ef] [c022a5f5] Jun 24 18:23:25 mspgate03 kernel: [f89bd281] [f89bd281] [f89bd281] [f89bb132] [f89bd510] [f89e7737] Jun 24 18:23:25 mspgate03 kernel: [c022a5f5] [c01f0ffd] [c01f112e] [c01f53c2] [c012005b] [c010abfe] Jun 24 18:23:25 mspgate03 kernel: [c015147a] [c01509dc] [c0147460] [c0147fb8] [f89e7737] [f89e7737] Jun 24 18:23:25 mspgate03 kernel: [c01f0998] [c01f0fac] [c01f112e] [c01f53c2] [c0117fce] [c0117ef0] Jun 24 18:23:25 mspgate03 kernel: [c0144a64] [c01246db] [c0109023] Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: CPU 2: Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: Call Trace: Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: CPU 3:0070 cce30002 0cd8 08fa 6953 656c706d 6c616e41 73697379 Jun 24 18:23:25 mspgate03 kernel:0009a700 46534c00 65746e69 6c6f7072 32657461 6e655f61 0a810063 6953 Jun 24 18:23:25 mspgate03 kernel:656c706d 65746e49 6c6f7072 4c657461 39004653 530b 6c706d69 66736c65 Jun 24 18:23:25 mspgate03 kernel: Call Trace: Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: CPU 1:e14d5eac c025c896 0001 0001 0001 c010a7c2 c025c8ab Jun 24 18:23:25 mspgate03 kernel: f2d92124 e14d5f00 c0191104 0500 1805 00bf 8a01 Jun 24 18:23:25 mspgate03 kernel:7f1c0300 01000415 1a131100 170f1200 e14d4000 Jun 24 18:23:25 mspgate03 kernel: Call Trace:[c010a7c2] [c0191104] [c01913d4] [c018e1e2] [c014c2c7] Jun 24 18:23:25 mspgate03 kernel: [c0109023] Jun 24 18:23:25 mspgate03 kernel: Thank you. Alex Zarubin -Original Message- From: The Traveller [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 17, 2003 3:10 PM To: [EMAIL PROTECTED] Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues On Tue, Jun 17, 2003 at 20:54:39 +0200, The Traveller wrote: BTW: As I reported in my previous mail to the list, I've now installed kernel 2.4.21-rc2 with ACPI-patch on the box
RE: [Asterisk-Users] Dual T400P, SMP, performance issues
Title: RE: [Asterisk-Users] Dual T400P, SMP, performance issues Here is info on the kernel panic with the high volume (110+) of calls. Same configuration as before. Comments would be appreciated. ksymoops 2.4.4 on i686 2.4.21. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.21 (specified) -m /boot/System.map-2.4.21 (default) -i eax: 0100 ebx: ecx: edx: f71b5a14 esi: 0002 edi: f71b4000 ebp: f71b4000 esp: f71b59ec ds: 0018 es: 0018 ss: 0018 Process irqbalance (pid: 713, stackpage=f71b5000) Stack: 6e6d6c6b 7271706f 76757473 7a797877 0001 c0115ef4 f71b4000 c02578fd f71b5a14 0001 0003 c0115ef4 f71b4000 f71b4000 f71b0018 c0110018 ffef c0114546 0010 0286 c0114470 Call Trace: [c0115ef4] [c0115ef4] [c0110018] [c0114546] [c0114470] [c011bc88] [c01144c0] [c0114470] [c011b2d5] [c011eae2] [c011badb] [c011bc88] [c0116ff0] [c010960a] [c0115ef4] [c01173a8] [f89e7737] [f89fb1e0] [f89fb1e0] [c0117000] [c0109114] [c0115ef4] [c010abe3] [f897a8c0] [f897a8c0] [c0110018] [c0124345] [c012042b] [c01202d1] [c012005b] [c010abfe] [c015e751] [c0147513] [c01479f1] [f89e7737] [f89fb1e0] [c010e1b6] [c0123fc0] [c01482ab] [c01487c4] [c012042b] [c01202d1] [c012005b] [c010abfe] [c013c606] [c01471ae] [c013c953] [c0109023] Code: 89 1d b0 e0 ff ff ff 80 04 48 33 c0 eb 02 f3 90 a1 88 f3 30 Using defaults from ksymoops -t elf32-i386 -a i386 Trace; c0115ef4 end_level_ioapic_irq+24/f0 Trace; c0115ef4 end_level_ioapic_irq+24/f0 Trace; c0110018 pci_conf2_write+88/f0 Trace; c0114546 .text.lock.smp+19/23 Trace; c0114470 stop_this_cpu+0/40 Trace; c011bc88 printk+128/140 Trace; c01144c0 smp_send_stop+10/30 Trace; c0114470 stop_this_cpu+0/40 Trace; c011b2d5 panic+85/180 Trace; c011eae2 do_exit+32/2d0 Trace; c011badb call_console_drivers+eb/100 Trace; c011bc88 printk+128/140 Trace; c0116ff0 bust_spinlocks+50/60 Trace; c010960a die+5a/80 Trace; c0115ef4 end_level_ioapic_irq+24/f0 Trace; c01173a8 do_page_fault+3a8/4db Trace; f89e7737 END_OF_CODE+309d4/ Trace; f89fb1e0 END_OF_CODE+4447d/ Trace; f89fb1e0 END_OF_CODE+4447d/ Trace; c0117000 do_page_fault+0/4db Trace; c0109114 error_code+34/3c Trace; c0115ef4 end_level_ioapic_irq+24/f0 Trace; c010abe3 do_IRQ+e3/110 Trace; f897a8c0 [usb-ohci]rh_int_timer_do+0/70 Trace; f897a8c0 [usb-ohci]rh_int_timer_do+0/70 Trace; c0110018 pci_conf2_write+88/f0 Trace; c0124345 timer_bh+2b5/3f0 Trace; c012042b bh_action+4b/80 Trace; c01202d1 tasklet_hi_action+61/a0 Trace; c012005b do_softirq+6b/d0 Trace; c010abfe do_IRQ+fe/110 Trace; c015e751 proc_lookup+51/c0 Trace; c0147513 real_lookup+73/100 Trace; c01479f1 link_path_walk+331/a10 Trace; f89e7737 END_OF_CODE+309d4/ Trace; f89fb1e0 END_OF_CODE+4447d/ Trace; c010e1b6 timer_interrupt+e6/170 Trace; c0123fc0 update_process_times+20/a0 Trace; c01482ab path_lookup+1b/30 Trace; c01487c4 open_namei+94/650 Trace; c012042b bh_action+4b/80 Trace; c01202d1 tasklet_hi_action+61/a0 Trace; c012005b do_softirq+6b/d0 Trace; c010abfe do_IRQ+fe/110 Trace; c013c606 filp_open+36/60 Trace; c01471ae getname+5e/a0 Trace; c013c953 sys_open+33/a0 Trace; c0109023 system_call+33/38 Code; Before first symbol _EIP: Code; Before first symbol 0: 89 1d b0 e0 ff ff mov %ebx,0xe0b0 Code; 0006 Before first symbol 6: ff 80 04 48 33 c0 incl 0xc0334804(%eax) Code; 000c Before first symbol c: eb 02 jmp 10 _EIP+0x10 0010 Before first symbol Code; 000e Before first symbol e: f3 90 repz nop Code; 0010 Before first symbol 10: a1 88 f3 30 00 mov 0x30f388,%eax Thank you. Alex Zarubin
RE: [Asterisk-Users] Dual T400P, SMP, performance issues
Oooh, how neat! I wonder if there is some sort of race and that the kernel is detecting and defeating it somehow. Will ksymoops on your machine handle that output? Maybe we can track it down! Again, does the problem occur with only one board? i.e. is the problem tied to having multiple boards in the machine? Mark On Tue, 24 Jun 2003, Alex Zarubin wrote: Mark Oliver, It is too early to say, but the picture is different now. Our dual CPU, dual T400P box is up for 4 days, under the load of 10 - 100 simultaneous PRI - SIP calls. We installed 2.4.21 #2 SMP (it was still freezing after that) and, what I think made the difference, recompiled zaptel-libpri-asterisk with gcc 3.3. The problem, on the way, was that asterisk wouldn't start after that. It was crashing while loading mp3 and lpc10 codecs. We put 'noload' for these two into modules.conf - temporary solution, of course. There are problems, still, with multiple connections at the same time. Windows to the box get frozen for a sec, D-channel error messages. The following messages are dumped into /var/log/messages. What do you think? Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: wait_on_irq, CPU 1: Jun 24 18:23:25 mspgate03 kernel: irq: 1 [ 0 0 1 0 ] Jun 24 18:23:25 mspgate03 kernel: bh: 0 [ 0 0 0 0 ] Jun 24 18:23:25 mspgate03 kernel: Stack dumps: Jun 24 18:23:25 mspgate03 kernel: CPU 0:0200 036f 00e14603 1802 0310 6647 008e0200 4803 Jun 24 18:23:25 mspgate03 kernel:0078 001ffa02 5b490300 0600 01c7 074e0308 1afe 01c74d03 Jun 24 18:23:25 mspgate03 kernel:2302 d708 e101 0900 01d7 f5030001 0423 09300207 Jun 24 18:23:25 mspgate03 kernel: Call Trace:[f89bd281] [f89bb132] [f89bbb47] [f89bd281] [f89bd281] Jun 24 18:23:25 mspgate03 kernel: [f89bb132] [f89bd281] [f89bd281] [f89bb132] [f89bbb47] [f89e7737] Jun 24 18:23:25 mspgate03 kernel: [f89aa80a] [f89aa80a] [c01feee4] [f89e7737] [c01f4eae] [c010a98e] Jun 24 18:23:25 mspgate03 kernel: [c020d122] [c010abe3] [c020d122] [c020d550] [c010a98e] [c020d550] Jun 24 18:23:25 mspgate03 kernel: [c010abfe] [c01f0919] [c01f0919] [c022a1ef] [c022a1ef] [c022a5f5] Jun 24 18:23:25 mspgate03 kernel: [f89bd281] [f89bd281] [f89bd281] [f89bb132] [f89bd510] [f89e7737] Jun 24 18:23:25 mspgate03 kernel: [c022a5f5] [c01f0ffd] [c01f112e] [c01f53c2] [c012005b] [c010abfe] Jun 24 18:23:25 mspgate03 kernel: [c015147a] [c01509dc] [c0147460] [c0147fb8] [f89e7737] [f89e7737] Jun 24 18:23:25 mspgate03 kernel: [c01f0998] [c01f0fac] [c01f112e] [c01f53c2] [c0117fce] [c0117ef0] Jun 24 18:23:25 mspgate03 kernel: [c0144a64] [c01246db] [c0109023] Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: CPU 2: Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: Call Trace: Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: CPU 3:0070 cce30002 0cd8 08fa 6953 656c706d 6c616e41 73697379 Jun 24 18:23:25 mspgate03 kernel:0009a700 46534c00 65746e69 6c6f7072 32657461 6e655f61 0a810063 6953 Jun 24 18:23:25 mspgate03 kernel:656c706d 65746e49 6c6f7072 4c657461 39004653 530b 6c706d69 66736c65 Jun 24 18:23:25 mspgate03 kernel: Call Trace: Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: CPU 1:e14d5eac c025c896 0001 0001 0001 c010a7c2 c025c8ab Jun 24 18:23:25 mspgate03 kernel: f2d92124 e14d5f00 c0191104 0500 1805 00bf 8a01 Jun 24 18:23:25 mspgate03 kernel:7f1c0300 01000415 1a131100 170f1200 e14d4000 Jun 24 18:23:25 mspgate03 kernel: Call Trace:[c010a7c2] [c0191104] [c01913d4] [c018e1e2] [c014c2c7] Jun 24 18:23:25 mspgate03 kernel: [c0109023] Jun 24 18:23:25 mspgate03 kernel: Thank you. Alex Zarubin -Original Message- From: The Traveller [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 17, 2003 3:10 PM To: [EMAIL PROTECTED] Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues On Tue, Jun 17, 2003 at 20:54:39 +0200, The Traveller wrote: BTW: As I reported in my previous mail to the list, I've now installed kernel 2.4.21-rc2 with ACPI-patch on the box with the E100P. I've been trying very hard to reproduce a freeze with this kernel, but haven't succeeded yet. [...] Ok, it crashed again, so that wasn't it either. What I did to trigger it was using the auto-dialer to loop as many calls to app_datetime out and then back over the same E-1 as it would take, queueing the calls to /var/spool/asterisk/outgoing/ 14 at a time. It froze at the first attempt. The good news
RE: [Asterisk-Users] Dual T400P, SMP, performance issues
Title: RE: [Asterisk-Users] Dual T400P, SMP, performance issues Mark, here is the info you requested. As far as multiple T400P boards question, I believe this is the most probable reason for this behavior (we haven't seen it on a single board machines). But in order to prove it we need 4-5 days of load testing. Hopefully we'll be able to do it next week. ksymoops 2.4.4 on i686 2.4.21. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.21 (specified) -m /boot/System.map-2.4.21 (default) -i Jun 24 18:23:25 mspgate03 kernel: wait_on_irq, CPU 1: Jun 24 18:23:25 mspgate03 kernel: irq: 1 [ 0 0 1 0 ] Jun 24 18:23:25 mspgate03 kernel: bh: 0 [ 0 0 0 0 ] Jun 24 18:23:25 mspgate03 kernel: Stack dumps: Jun 24 18:23:25 mspgate03 kernel: CPU 0:0200 036f 00e14603 1802 0310 6647 008e0200 4803 Jun 24 18:23:25 mspgate03 kernel: 0078 001ffa02 5b490300 0600 01c7 074e0308 1afe 01c74d03 Jun 24 18:23:25 mspgate03 kernel: 2302 d708 e101 0900 01d7 f5030001 0423 09300207 Jun 24 18:23:25 mspgate03 kernel: Call Trace: [f89bd281] [f89bb132] [f89bbb47] [f89bd281] [f89bd281] Jun 24 18:23:25 mspgate03 kernel: [f89bb132] [f89bd281] [f89bd281] [f89bb132] [f89bbb47] [f89e7737] Jun 24 18:23:25 mspgate03 kernel: [f89aa80a] [f89aa80a] [c01feee4] [f89e7737] [c01f4eae] [c010a98e] Jun 24 18:23:25 mspgate03 kernel: [c020d122] [c010abe3] [c020d122] [c020d550] [c010a98e] [c020d550] Jun 24 18:23:25 mspgate03 kernel: [c010abfe] [c01f0919] [c01f0919] [c022a1ef] [c022a1ef] [c022a5f5] Jun 24 18:23:25 mspgate03 kernel: [f89bd281] [f89bd281] [f89bd281] [f89bb132] [f89bd510] [f89e7737] Jun 24 18:23:25 mspgate03 kernel: [c022a5f5] [c01f0ffd] [c01f112e] [c01f53c2] [c012005b] [c010abfe] Jun 24 18:23:25 mspgate03 kernel: [c015147a] [c01509dc] [c0147460] [c0147fb8] [f89e7737] [f89e7737] Jun 24 18:23:25 mspgate03 kernel: [c01f0998] [c01f0fac] [c01f112e] [c01f53c2] [c0117fce] [c0117ef0] Jun 24 18:23:25 mspgate03 kernel: [c0144a64] [c01246db] [c0109023] Jun 24 18:23:25 mspgate03 kernel: CPU 2: Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: CPU 3:0070 cce30002 0cd8 08fa 6953 656c706d 6c616e41 73697379 Jun 24 18:23:25 mspgate03 kernel: 0009a700 46534c00 65746e69 6c6f7072 32657461 6e655f61 0a810063 6953 Jun 24 18:23:25 mspgate03 kernel: 656c706d 65746e49 6c6f7072 4c657461 39004653 530b 6c706d69 66736c65 Jun 24 18:23:25 mspgate03 kernel: CPU 1:e14d5eac c025c896 0001 0001 0001 c010a7c2 c025c8ab Jun 24 18:23:25 mspgate03 kernel: f2d92124 e14d5f00 c0191104 0500 1805 00bf 8a01 Jun 24 18:23:25 mspgate03 kernel: 7f1c0300 01000415 1a131100 170f1200 e14d4000 Jun 24 18:23:25 mspgate03 kernel: Call Trace: [c010a7c2] [c0191104] [c01913d4] [c018e1e2] [c014c2c7] Jun 24 18:23:25 mspgate03 kernel: [c0109023] Warning (Oops_read): Code line not seen, dumping what data is available Trace; f89bd281 [zaptel]zt_process_putaudio_chunk+9a1/b70 Trace; f89bb132 [zaptel]zt_process_getaudio_chunk+f2/910 Trace; f89bbb47 [zaptel]zt_getbuf_chunk+1f7/4b0 Trace; f89bd281 [zaptel]zt_process_putaudio_chunk+9a1/b70 Trace; f89bd281 [zaptel]zt_process_putaudio_chunk+9a1/b70 Trace; f89bb132 [zaptel]zt_process_getaudio_chunk+f2/910 Trace; f89bd281 [zaptel]zt_process_putaudio_chunk+9a1/b70 Trace; f89bd281 [zaptel]zt_process_putaudio_chunk+9a1/b70 Trace; f89bb132 [zaptel]zt_process_getaudio_chunk+f2/910 Trace; f89bbb47 [zaptel]zt_getbuf_chunk+1f7/4b0 Trace; f89e7737 [tor2]tor2_intr+847/cb0 Trace; f89aa80a [eepro100]speedo_start_xmit+17a/210 Trace; f89aa80a [eepro100]speedo_start_xmit+17a/210 Trace; c01feee4 qdisc_restart+14/170 Trace; f89e7737 [tor2]tor2_intr+847/cb0 Trace; c01f4eae dev_queue_xmit+14e/320 Trace; c010a98e handle_IRQ_event+5e/90 Trace; c020d122 ip_output+102/170 Trace; c010abe3 do_IRQ+e3/110 Trace; c020d122 ip_output+102/170 Trace; c020d550 ip_queue_xmit+3c0/520 Trace; c010a98e handle_IRQ_event+5e/90 Trace; c020d550 ip_queue_xmit+3c0/520 Trace; c010abfe do_IRQ+fe/110 Trace; c01f0919 sock_def_readable+39/70 Trace; c01f0919 sock_def_readable+39/70 Trace; c022a1ef udp_queue_rcv_skb+18f/200 Trace; c022a1ef udp_queue_rcv_skb+18f/200 Trace; c022a5f5 udp_rcv+165/340 Trace; f89bd281 [zaptel]zt_process_putaudio_chunk+9a1/b70 Trace; f89bd281 [zaptel]zt_process_putaudio_chunk+9a1/b70 Trace; f89bd281 [zaptel]zt_process_putaudio_chunk+9a1/b70 Trace; f89bb132 [zaptel]zt_process_getaudio_chunk+f2/910 Trace; f89bd510 [zaptel]zt_putbuf_chunk+c0/730 Trace; f89e7737 [tor2]tor2_intr+847/cb0 Trace; c022a5f5 udp_rcv+165/340 Trace; c01f0ffd kfree_skbmem+5d/70 Trace; c01f112e __kfree_skb+11e/130 Trace; c01f53c2 net_tx_action+62/140
RE: [Asterisk-Users] Dual T400P, SMP, performance issues
As far as SMP and single T400P - we'll try and report the results but the idea was to go with as high density as possible ... Right, I'm just trying to narrow down the problem. I'm theorizing that the problem is some sort of spinlock deadlock. Does it only occur if there is activity or even if the lines are up but no calls taking place? What do you think of using hyperthreading - should we enable or disable it for the box running asterisk? We use hyperthreading but have not run tests longer than a few hours on those machines. What about -DCONFIG_ZAPTEL_WATCHDOG ? Can it help and how to use it? k Likely will make no difference in this situation. Mark ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
RE: [Asterisk-Users] Dual T400P, SMP, performance issues
Title: RE: [Asterisk-Users] Dual T400P, SMP, performance issues I believe this is related to the load, there are always calls in our test. Attached is a part of /var/log/messages file with SysRq memory info - in case you can see something in it. The box was rebooted 06-16 17:08 and the problem occurred 06-17 11:36. Thank you. Alex Zarubin -Original Message- From: Mark Spencer [mailto:[EMAIL PROTECTED]] Sent: Tuesday, June 17, 2003 6:58 AM To: '[EMAIL PROTECTED]' Subject: RE: [Asterisk-Users] Dual T400P, SMP, performance issues As far as SMP and single T400P - we'll try and report the results but the idea was to go with as high density as possible ... Right, I'm just trying to narrow down the problem. I'm theorizing that the problem is some sort of spinlock deadlock. Does it only occur if there is activity or even if the lines are up but no calls taking place? What do you think of using hyperthreading - should we enable or disable it for the box running asterisk? We use hyperthreading but have not run tests longer than a few hours on those machines. What about -DCONFIG_ZAPTEL_WATCHDOG ? Can it help and how to use it? k Likely will make no difference in this situation. Mark ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users mes_ast.gz Description: Binary data
RE: [Asterisk-Users] Dual T400P, SMP, performance issues
Title: RE: [Asterisk-Users] Dual T400P, SMP, performance issues Mark, As far as pings - we have cases when we could ping the box on both interfaces and there are cases when we could not (we tried 3-4 sets of NICs and drivers). All telnets, X, ssh etc. are definitely dead. No coredumps (asterisk was started with -g option), no kernel panics. Black console, Alt-SysRq combinations don't work. Pretty much no options but rebooting the box. As far as SMP and single T400P - we'll try and report the results but the idea was to go with as high density as possible ... What do you think of using hyperthreading - should we enable or disable it for the box running asterisk? What about -DCONFIG_ZAPTEL_WATCHDOG ? Can it help and how to use it? Thank you. Alex Zarubin -Original Message- From: Mark Spencer [mailto:[EMAIL PROTECTED]] Sent: Saturday, June 14, 2003 10:23 AM To: '[EMAIL PROTECTED]' Subject: RE: [Asterisk-Users] Dual T400P, SMP, performance issues When you say stops responding do you mean no more pings, telnet dead, etc? Or do you mean asterisk stops responding? Is there a segfault or kernel panic, or any other failure diagnostic? Mark On Thu, 12 Jun 2003, Alex Zarubin wrote: Zaptel was compiled with -D__SMP__ We've installed irqbalance and the picture improved a lot (thanks to Jared Smith). Do you still see problems in our /proc/interrupts? The big issue for us now is that after 24+ hours of the test load PRI-SIP our Dell PE2650, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, 2.4.20-18.7smp #1 SMP stops responding to anything. So the questions are: - are there known issues with PE2650 and ways to fix them? - can someone recommend the 'stable' 2.4 SMP kernel for this kind of load? - any expertise in this area will be appreciated CPU0 CPU1 CPU2 CPU3 0: 230710 30030 50050 0 IO-APIC-edge timer 1: 5 0 0 233 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 5: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0 IO-APIC-edge rtc 14: 27 0 2 0 IO-APIC-edge ide0 20: 2085442 400221 0 230232 IO-APIC-level tor2 24: 293848 1841658 10010 570568 IO-APIC-level tor2 28: 5 25643 0 0 IO-APIC-level eth0 29: 5 0 5165040 0 IO-APIC-level eth1 30: 43720 35467 1291 3296 IO-APIC-level aacraid NMI: 0 0 0 0 LOC: 310618 310616 310616 310616 ERR: 0 MIS: 0 Thank you. Alex Zarubin -Original Message- From: Martin Pycko [mailto:[EMAIL PROTECTED]] Sent: Tuesday, June 10, 2003 9:48 AM To: '[EMAIL PROTECTED]' Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues Are you sure that you compiled zaptel for __SMP__ ? Edit your zaptel/Makefile. 0: 75283844 75241320 75286285 75247088 IO-APIC-edge timer 1: 1 0 1 1 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 3: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0 IO-APIC-edge rtc 15: 1 0 0 1 IO-APIC-edge ide1 16: 22134870 22120997 22135905 22122829 IO-APIC-level eth0 25: 4670 4548 4614 4518 IO-APIC-level tor2 All the four CPU's should have IRQ's like in the example above. Martin On Mon, 9 Jun 2003, Alex Zarubin wrote: Hi, We are trying to validate Asterisk as a media gateway PRI - SIP with two T400P (8 T1s) per box. The first experience with BOX1 (Compaq, 2.53 GHz, 1 Gb RAM) and just one T400P was encouraging - on the load test with 3 T1s worth of calls we had on average 75% idle CPU. Not so with BOX2 (Dell, single 2.6 GHz Xeon, 1 Gb RAM, 2 T400P) and BOX3 (Dell, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, asterisk/zaptel is built with SMP support). On the similar load test (as with the BOX1) BOX2 was showing 0% idle CPU 70% of the time. Just 3 T1s out of 8. On the load test with just 2 T1s BOX3 was very close to 0% idle on CPU0, CPU1 was at 95% idle. The process ksoftirqd_CPU0 was close to the top of the 'top', with /proc/interrupts showing tor2 related numbers growing very fast. We had 2 T1s plugged into the first T400P board, with nothing going into the second, but the number of interrupts for the both boards was growing at the same pace. Here are the interrupts (after the box reboot, so they are not that big as they were) - do they look OK? CPU0 CPU1 CPU2 CPU3 0: 122556 0 0 0 IO-APIC-edge timer 1: 4 0 0 0 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 5: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0 IO-APIC-edge rtc 12: 20 0 0 0 IO-APIC-edge PS/2 Mouse 14: 23 0 2 0 IO-APIC-edge ide0 20: 516930 0 0 0 IO-APIC-level tor2 24: 516524 0 0 0 IO-APIC-level tor2 28: 10600 0 0 0 IO-APIC-level eth0 29: 4837 0 0 0 IO-APIC-level eth1 30: 24831 0 0 0 IO-APIC-level aacraid NMI: 0 0 0 0 LOC: 122430 122429 122429 122428 ERR: 0 MIS: 0 Not sure what went wrong. Any suggestions on how to work with 2 T400P in a box (without hurting performance) and how to get advantage of SMP for Asterisk would be appreciated. Any known Linux kernel related issues (2.4.20-13.7smp #1 SMP for BOX3 )? Thank you. Alex Zarubin
Re: [Asterisk-Users] Dual T400P, SMP, performance issues
Yo, I've seen very similar Zaptel-related freezes on a wide variety of mainboards (SMP as well as non-SMP), with X100P's as well as with an E100P. At some point, almost always at the moment a call through one of those cards connects or disconnects, the machine completely stops responding and needs a reset to come back to life. A very nice way to trigger it with the E100P seems to be to put around 10-20 channels of it into a meetme-conference and then issue the stop now-command on the Asterisk-console. A high volume of connects / disconnects seems to trigger the freezes. I'm still investigating the issue and am going to try different kernels and some custom kernel-patches. One of my boxes (dual PIII-750, Intel L440GX+-board) with an X100P and a TDM40P in it hasn't frozen since I installed kernel 2.4.21-rc2 with the ACPI-patch (http://sourceforge.net/projects/acpi/). I'll probably try that on the box with the E100P first. Be sure enable Power Management support in your kernel-config, disable APM, enable ACPI and check all ACPI-options, except for CPU Enumeration Only. Note that this ACPI- patch also handles IRQ-routing and might help in cases where the BIOS assigns the same IRQ to some devices (or, as was the case for me, none at all). Grtz, Oliver On Mon, Jun 16, 2003 at 13:03:20 -0500, Alex Zarubin wrote: Mark, As far as pings - we have cases when we could ping the box on both interfaces and there are cases when we could not (we tried 3-4 sets of NICs and drivers). All telnets, X, ssh etc. are definitely dead. No coredumps (asterisk was started with -g option), no kernel panics. Black console, Alt-SysRq combinations don't work. Pretty much no options but rebooting the box. As far as SMP and single T400P - we'll try and report the results but the idea was to go with as high density as possible ... What do you think of using hyperthreading - should we enable or disable it for the box running asterisk? What about -DCONFIG_ZAPTEL_WATCHDOG ? Can it help and how to use it? Thank you. Alex Zarubin -Original Message- From: Mark Spencer [mailto:[EMAIL PROTECTED] Sent: Saturday, June 14, 2003 10:23 AM To: '[EMAIL PROTECTED]' Subject: RE: [Asterisk-Users] Dual T400P, SMP, performance issues When you say stops responding do you mean no more pings, telnet dead, etc? Or do you mean asterisk stops responding? Is there a segfault or kernel panic, or any other failure diagnostic? Mark On Thu, 12 Jun 2003, Alex Zarubin wrote: Zaptel was compiled with -D__SMP__ We've installed irqbalance and the picture improved a lot (thanks to Jared Smith). Do you still see problems in our /proc/interrupts? The big issue for us now is that after 24+ hours of the test load PRI-SIP our Dell PE2650, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, 2.4.20-18.7smp #1 SMP stops responding to anything. So the questions are: - are there known issues with PE2650 and ways to fix them? - can someone recommend the 'stable' 2.4 SMP kernel for this kind of load? - any expertise in this area will be appreciated CPU0 CPU1 CPU2 CPU3 0: 230710 30030 50050 0IO-APIC-edge timer 1: 5 0 0233IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 5: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0IO-APIC-edge rtc 14: 27 0 2 0IO-APIC-edge ide0 20:2085442 400221 0 230232 IO-APIC-level tor2 24: 2938481841658 10010 570568 IO-APIC-level tor2 28: 5 25643 0 0 IO-APIC-level eth0 29: 5 05165040 0 IO-APIC-level eth1 30: 43720 35467 1291 3296 IO-APIC-level aacraid NMI: 0 0 0 0 LOC: 310618 310616 310616 310616 ERR: 0 MIS: 0 Thank you. Alex Zarubin -Original Message- From: Martin Pycko [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 10, 2003 9:48 AM To: '[EMAIL PROTECTED]' Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues Are you sure that you compiled zaptel for __SMP__ ? Edit your zaptel/Makefile. 0: 75283844 75241320 75286285 75247088IO-APIC-edge timer 1: 1 0 1 1IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 3: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0IO-APIC-edge rtc 15: 1 0 0 1IO-APIC-edge ide1 16: 22134870 22120997 22135905 22122829 IO-APIC-level eth0
RE: [Asterisk-Users] Dual T400P, SMP, performance issues
When you say stops responding do you mean no more pings, telnet dead, etc? Or do you mean asterisk stops responding? Is there a segfault or kernel panic, or any other failure diagnostic? Mark On Thu, 12 Jun 2003, Alex Zarubin wrote: Zaptel was compiled with -D__SMP__ We've installed irqbalance and the picture improved a lot (thanks to Jared Smith). Do you still see problems in our /proc/interrupts? The big issue for us now is that after 24+ hours of the test load PRI-SIP our Dell PE2650, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, 2.4.20-18.7smp #1 SMP stops responding to anything. So the questions are: - are there known issues with PE2650 and ways to fix them? - can someone recommend the 'stable' 2.4 SMP kernel for this kind of load? - any expertise in this area will be appreciated CPU0 CPU1 CPU2 CPU3 0: 230710 30030 50050 0IO-APIC-edge timer 1: 5 0 0233IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 5: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0IO-APIC-edge rtc 14: 27 0 2 0IO-APIC-edge ide0 20:2085442 400221 0 230232 IO-APIC-level tor2 24: 2938481841658 10010 570568 IO-APIC-level tor2 28: 5 25643 0 0 IO-APIC-level eth0 29: 5 05165040 0 IO-APIC-level eth1 30: 43720 35467 1291 3296 IO-APIC-level aacraid NMI: 0 0 0 0 LOC: 310618 310616 310616 310616 ERR: 0 MIS: 0 Thank you. Alex Zarubin -Original Message- From: Martin Pycko [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 10, 2003 9:48 AM To: '[EMAIL PROTECTED]' Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues Are you sure that you compiled zaptel for __SMP__ ? Edit your zaptel/Makefile. 0: 75283844 75241320 75286285 75247088IO-APIC-edge timer 1: 1 0 1 1IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 3: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0IO-APIC-edge rtc 15: 1 0 0 1IO-APIC-edge ide1 16: 22134870 22120997 22135905 22122829 IO-APIC-level eth0 25: 4670 4548 4614 4518 IO-APIC-level tor2 All the four CPU's should have IRQ's like in the example above. Martin On Mon, 9 Jun 2003, Alex Zarubin wrote: Hi, We are trying to validate Asterisk as a media gateway PRI - SIP with two T400P (8 T1s) per box. The first experience with BOX1 (Compaq, 2.53 GHz, 1 Gb RAM) and just one T400P was encouraging - on the load test with 3 T1s worth of calls we had on average 75% idle CPU. Not so with BOX2 (Dell, single 2.6 GHz Xeon, 1 Gb RAM, 2 T400P) and BOX3 (Dell, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, asterisk/zaptel is built with SMP support). On the similar load test (as with the BOX1) BOX2 was showing 0% idle CPU 70% of the time. Just 3 T1s out of 8. On the load test with just 2 T1s BOX3 was very close to 0% idle on CPU0, CPU1 was at 95% idle. The process ksoftirqd_CPU0 was close to the top of the 'top', with /proc/interrupts showing tor2 related numbers growing very fast. We had 2 T1s plugged into the first T400P board, with nothing going into the second, but the number of interrupts for the both boards was growing at the same pace. Here are the interrupts (after the box reboot, so they are not that big as they were) - do they look OK? CPU0 CPU1 CPU2 CPU3 0: 122556 0 0 0IO-APIC-edge timer 1: 4 0 0 0IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 5: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0IO-APIC-edge rtc 12: 20 0 0 0IO-APIC-edge PS/2 Mouse 14: 23 0 2 0IO-APIC-edge ide0 20: 516930 0 0 0 IO-APIC-level tor2 24: 516524 0 0 0 IO-APIC-level tor2 28: 10600 0 0 0 IO-APIC-level eth0 29: 4837 0 0 0 IO-APIC-level eth1 30: 24831 0 0 0 IO-APIC-level aacraid NMI: 0 0 0 0 LOC: 122430 122429 122429 122428
Re: [Asterisk-Users] Dual T400P, SMP, performance issues
On Thu, Jun 12, 2003 at 07:23:42PM -0500, Alex Zarubin wrote: Zaptel was compiled with -D__SMP__ We've installed irqbalance and the picture improved a lot (thanks to Jared Smith). Do you still see problems in our /proc/interrupts? Well, maybe i'm nitpicking, but there's a subtle issue there. You mentioned that the machine is dual processor, but you see 4 CPUs there due to hypertheading support into your Xeons. Now, the thing is that IMHO you should try to balance the IRQs only between the two real CPUs, leaving the HT cores aside; it might show some improvement. Even better, you might also want to try out splitting the interrupts among each processor, according to the card. Like binding interrupts from card0 to cpu0 and from card1 to cpu1; but really don't remember if you can do such things in 2.4. Have you tried using 'vmstat' to monitor interrupt rate? Maybe it's not that high, and your problem lays somewhere else.. it'd be a very good thing to take look at. After all, a Pentium 2 machine can easily handle full 100Mbit load so it should, in theory, cope with your 8 channels, even if the interrupts are generated a bit more frequently. BTW, does anyone know if the zapata drivers have been tested with preempt? The big issue for us now is that after 24+ hours of the test load PRI-SIP our Dell PE2650, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, 2.4.20-18.7smp #1 SMP stops responding to anything. Well, no 'expertise' here but I'd recommend you to try vanilla 2.4.21-rc (the last one, IIRC it's about 9 but poor Marcelo is being mailbombed with patches so it's harder to keep track of -rc =); and if you can reproduce the problem with it, then post to lkml. Testing 2.5 would be great too, but I highly doubt that zapata drivers work with it (at least last time I tried they doesn't even work with latest 2.4 if HDLC is enabled, due to kernel HDLC internal changes). A good idea is to enable SysRq (from the kernel hacking menu) and when it locks up (if it does) try to use it to see 'how much of it is death', printing the stack traces, task lists and memory state. Pinging it might also be useful (yeah, that might sound dumb but it's a good sign if it responds to pings because it means that interrupts and some parts of the kernel are still pretty alive). Thanks, Alberto ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
RE: [Asterisk-Users] Dual T400P, SMP, performance issues
Title: RE: [Asterisk-Users] Dual T400P, SMP, performance issues Zaptel was compiled with -D__SMP__ We've installed irqbalance and the picture improved a lot (thanks to Jared Smith). Do you still see problems in our /proc/interrupts? The big issue for us now is that after 24+ hours of the test load PRI-SIP our Dell PE2650, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, 2.4.20-18.7smp #1 SMP stops responding to anything. So the questions are: - are there known issues with PE2650 and ways to fix them? - can someone recommend the 'stable' 2.4 SMP kernel for this kind of load? - any expertise in this area will be appreciated CPU0 CPU1 CPU2 CPU3 0: 230710 30030 50050 0 IO-APIC-edge timer 1: 5 0 0 233 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 5: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0 IO-APIC-edge rtc 14: 27 0 2 0 IO-APIC-edge ide0 20: 2085442 400221 0 230232 IO-APIC-level tor2 24: 293848 1841658 10010 570568 IO-APIC-level tor2 28: 5 25643 0 0 IO-APIC-level eth0 29: 5 0 5165040 0 IO-APIC-level eth1 30: 43720 35467 1291 3296 IO-APIC-level aacraid NMI: 0 0 0 0 LOC: 310618 310616 310616 310616 ERR: 0 MIS: 0 Thank you. Alex Zarubin -Original Message- From: Martin Pycko [mailto:[EMAIL PROTECTED]] Sent: Tuesday, June 10, 2003 9:48 AM To: '[EMAIL PROTECTED]' Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues Are you sure that you compiled zaptel for __SMP__ ? Edit your zaptel/Makefile. 0: 75283844 75241320 75286285 75247088 IO-APIC-edge timer 1: 1 0 1 1 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 3: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0 IO-APIC-edge rtc 15: 1 0 0 1 IO-APIC-edge ide1 16: 22134870 22120997 22135905 22122829 IO-APIC-level eth0 25: 4670 4548 4614 4518 IO-APIC-level tor2 All the four CPU's should have IRQ's like in the example above. Martin On Mon, 9 Jun 2003, Alex Zarubin wrote: Hi, We are trying to validate Asterisk as a media gateway PRI - SIP with two T400P (8 T1s) per box. The first experience with BOX1 (Compaq, 2.53 GHz, 1 Gb RAM) and just one T400P was encouraging - on the load test with 3 T1s worth of calls we had on average 75% idle CPU. Not so with BOX2 (Dell, single 2.6 GHz Xeon, 1 Gb RAM, 2 T400P) and BOX3 (Dell, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, asterisk/zaptel is built with SMP support). On the similar load test (as with the BOX1) BOX2 was showing 0% idle CPU 70% of the time. Just 3 T1s out of 8. On the load test with just 2 T1s BOX3 was very close to 0% idle on CPU0, CPU1 was at 95% idle. The process ksoftirqd_CPU0 was close to the top of the 'top', with /proc/interrupts showing tor2 related numbers growing very fast. We had 2 T1s plugged into the first T400P board, with nothing going into the second, but the number of interrupts for the both boards was growing at the same pace. Here are the interrupts (after the box reboot, so they are not that big as they were) - do they look OK? CPU0 CPU1 CPU2 CPU3 0: 122556 0 0 0 IO-APIC-edge timer 1: 4 0 0 0 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 5: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0 IO-APIC-edge rtc 12: 20 0 0 0 IO-APIC-edge PS/2 Mouse 14: 23 0 2 0 IO-APIC-edge ide0 20: 516930 0 0 0 IO-APIC-level tor2 24: 516524 0 0 0 IO-APIC-level tor2 28: 10600 0 0 0 IO-APIC-level eth0 29: 4837 0 0 0 IO-APIC-level eth1 30: 24831 0 0 0 IO-APIC-level aacraid NMI: 0 0 0 0 LOC: 122430 122429 122429 122428 ERR: 0 MIS: 0 Not sure what went wrong. Any suggestions on how to work with 2 T400P in a box (without hurting performance) and how to get advantage of SMP for Asterisk would be appreciated. Any known Linux kernel related issues (2.4.20-13.7smp #1 SMP for BOX3 )? Thank you. Alex Zarubin ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Dual T400P, SMP, performance issues
Are you sure that you compiled zaptel for __SMP__ ? Edit your zaptel/Makefile. 0: 75283844 75241320 75286285 75247088IO-APIC-edge timer 1: 1 0 1 1IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 3: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0IO-APIC-edge rtc 15: 1 0 0 1IO-APIC-edge ide1 16: 22134870 22120997 22135905 22122829 IO-APIC-level eth0 25: 4670 4548 4614 4518 IO-APIC-level tor2 All the four CPU's should have IRQ's like in the example above. Martin On Mon, 9 Jun 2003, Alex Zarubin wrote: Hi, We are trying to validate Asterisk as a media gateway PRI - SIP with two T400P (8 T1s) per box. The first experience with BOX1 (Compaq, 2.53 GHz, 1 Gb RAM) and just one T400P was encouraging - on the load test with 3 T1s worth of calls we had on average 75% idle CPU. Not so with BOX2 (Dell, single 2.6 GHz Xeon, 1 Gb RAM, 2 T400P) and BOX3 (Dell, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, asterisk/zaptel is built with SMP support). On the similar load test (as with the BOX1) BOX2 was showing 0% idle CPU 70% of the time. Just 3 T1s out of 8. On the load test with just 2 T1s BOX3 was very close to 0% idle on CPU0, CPU1 was at 95% idle. The process ksoftirqd_CPU0 was close to the top of the 'top', with /proc/interrupts showing tor2 related numbers growing very fast. We had 2 T1s plugged into the first T400P board, with nothing going into the second, but the number of interrupts for the both boards was growing at the same pace. Here are the interrupts (after the box reboot, so they are not that big as they were) - do they look OK? CPU0 CPU1 CPU2 CPU3 0: 122556 0 0 0IO-APIC-edge timer 1: 4 0 0 0IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 5: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0IO-APIC-edge rtc 12: 20 0 0 0IO-APIC-edge PS/2 Mouse 14: 23 0 2 0IO-APIC-edge ide0 20: 516930 0 0 0 IO-APIC-level tor2 24: 516524 0 0 0 IO-APIC-level tor2 28: 10600 0 0 0 IO-APIC-level eth0 29: 4837 0 0 0 IO-APIC-level eth1 30: 24831 0 0 0 IO-APIC-level aacraid NMI: 0 0 0 0 LOC: 122430 122429 122429 122428 ERR: 0 MIS: 0 Not sure what went wrong. Any suggestions on how to work with 2 T400P in a box (without hurting performance) and how to get advantage of SMP for Asterisk would be appreciated. Any known Linux kernel related issues (2.4.20-13.7smp #1 SMP for BOX3 )? Thank you. Alex Zarubin ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Dual T400P, SMP, performance issues
H, I to appear to have an odd mix of interrupts. It seems that the second CPU doesn't do much at all on my dual Xeon... CPU0 CPU1 0: 40652580 0IO-APIC-edge timer 1:926 0IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 6: 0 0 IO-APIC-level usb-ohci 8: 1 0IO-APIC-edge rtc 12:308 0IO-APIC-edge PS/2 Mouse 14: 2 0IO-APIC-edge ide0 20: 406481379 0 IO-APIC-level tor2 24: 0 0 IO-APIC-level tor2 28:4516659 0 IO-APIC-level eth0 30: 911870 0 IO-APIC-level aacraid NMI: 0 0 LOC: 40653025 40653047 ERR: 0 MIS: 0 I haven't enables the second card yet but will be enabling soon. I should probably recompile * and zaptel for SMP though I thought I had... Bill Martin Pycko wrote: Are you sure that you compiled zaptel for __SMP__ ? Edit your zaptel/Makefile. 0: 75283844 75241320 75286285 75247088IO-APIC-edge timer 1: 1 0 1 1IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 3: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0IO-APIC-edge rtc 15: 1 0 0 1IO-APIC-edge ide1 16: 22134870 22120997 22135905 22122829 IO-APIC-level eth0 25: 4670 4548 4614 4518 IO-APIC-level tor2 All the four CPU's should have IRQ's like in the example above. Martin On Mon, 9 Jun 2003, Alex Zarubin wrote: Hi, We are trying to validate Asterisk as a media gateway PRI - SIP with two T400P (8 T1s) per box. The first experience with BOX1 (Compaq, 2.53 GHz, 1 Gb RAM) and just one T400P was encouraging - on the load test with 3 T1s worth of calls we had on average 75% idle CPU. Not so with BOX2 (Dell, single 2.6 GHz Xeon, 1 Gb RAM, 2 T400P) and BOX3 (Dell, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, asterisk/zaptel is built with SMP support). On the similar load test (as with the BOX1) BOX2 was showing 0% idle CPU 70% of the time. Just 3 T1s out of 8. On the load test with just 2 T1s BOX3 was very close to 0% idle on CPU0, CPU1 was at 95% idle. The process ksoftirqd_CPU0 was close to the top of the 'top', with /proc/interrupts showing tor2 related numbers growing very fast. We had 2 T1s plugged into the first T400P board, with nothing going into the second, but the number of interrupts for the both boards was growing at the same pace. Here are the interrupts (after the box reboot, so they are not that big as they were) - do they look OK? CPU0 CPU1 CPU2 CPU3 0: 122556 0 0 0IO-APIC-edge timer 1: 4 0 0 0IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 5: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0IO-APIC-edge rtc 12: 20 0 0 0IO-APIC-edge PS/2 Mouse 14: 23 0 2 0IO-APIC-edge ide0 20: 516930 0 0 0 IO-APIC-level tor2 24: 516524 0 0 0 IO-APIC-level tor2 28: 10600 0 0 0 IO-APIC-level eth0 29: 4837 0 0 0 IO-APIC-level eth1 30: 24831 0 0 0 IO-APIC-level aacraid NMI: 0 0 0 0 LOC: 122430 122429 122429 122428 ERR: 0 MIS: 0 Not sure what went wrong. Any suggestions on how to work with 2 T400P in a box (without hurting performance) and how to get advantage of SMP for Asterisk would be appreciated. Any known Linux kernel related issues (2.4.20-13.7smp #1 SMP for BOX3 )? Thank you. Alex Zarubin ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Dual T400P, SMP, performance issues
My dual-proc Xeon boxes didn't share IRQs across CPUs until I installed the kernel-utils RPM and made sure the irqbalance service was running... Just a word to the wise! Jared Smith On Tue, 2003-06-10 at 09:52, [EMAIL PROTECTED] wrote: H, I to appear to have an odd mix of interrupts. It seems that the second CPU doesn't do much at all on my dual Xeon... CPU0 CPU1 0: 40652580 0IO-APIC-edge timer 1:926 0IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 6: 0 0 IO-APIC-level usb-ohci 8: 1 0IO-APIC-edge rtc 12:308 0IO-APIC-edge PS/2 Mouse 14: 2 0IO-APIC-edge ide0 20: 406481379 0 IO-APIC-level tor2 24: 0 0 IO-APIC-level tor2 28:4516659 0 IO-APIC-level eth0 30: 911870 0 IO-APIC-level aacraid NMI: 0 0 LOC: 40653025 40653047 ERR: 0 MIS: 0 I haven't enables the second card yet but will be enabling soon. I should probably recompile * and zaptel for SMP though I thought I had... Bill Martin Pycko wrote: Are you sure that you compiled zaptel for __SMP__ ? Edit your zaptel/Makefile. 0: 75283844 75241320 75286285 75247088IO-APIC-edge timer 1: 1 0 1 1IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 3: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0IO-APIC-edge rtc 15: 1 0 0 1IO-APIC-edge ide1 16: 22134870 22120997 22135905 22122829 IO-APIC-level eth0 25: 4670 4548 4614 4518 IO-APIC-level tor2 All the four CPU's should have IRQ's like in the example above. Martin On Mon, 9 Jun 2003, Alex Zarubin wrote: Hi, We are trying to validate Asterisk as a media gateway PRI - SIP with two T400P (8 T1s) per box. The first experience with BOX1 (Compaq, 2.53 GHz, 1 Gb RAM) and just one T400P was encouraging - on the load test with 3 T1s worth of calls we had on average 75% idle CPU. Not so with BOX2 (Dell, single 2.6 GHz Xeon, 1 Gb RAM, 2 T400P) and BOX3 (Dell, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, asterisk/zaptel is built with SMP support). On the similar load test (as with the BOX1) BOX2 was showing 0% idle CPU 70% of the time. Just 3 T1s out of 8. On the load test with just 2 T1s BOX3 was very close to 0% idle on CPU0, CPU1 was at 95% idle. The process ksoftirqd_CPU0 was close to the top of the 'top', with /proc/interrupts showing tor2 related numbers growing very fast. We had 2 T1s plugged into the first T400P board, with nothing going into the second, but the number of interrupts for the both boards was growing at the same pace. Here are the interrupts (after the box reboot, so they are not that big as they were) - do they look OK? CPU0 CPU1 CPU2 CPU3 0: 122556 0 0 0IO-APIC-edge timer 1: 4 0 0 0IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 5: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0IO-APIC-edge rtc 12: 20 0 0 0IO-APIC-edge PS/2 Mouse 14: 23 0 2 0IO-APIC-edge ide0 20: 516930 0 0 0 IO-APIC-level tor2 24: 516524 0 0 0 IO-APIC-level tor2 28: 10600 0 0 0 IO-APIC-level eth0 29: 4837 0 0 0 IO-APIC-level eth1 30: 24831 0 0 0 IO-APIC-level aacraid NMI: 0 0 0 0 LOC: 122430 122429 122429 122428 ERR: 0 MIS: 0 Not sure what went wrong. Any suggestions on how to work with 2 T400P in a box (without hurting performance) and how to get advantage of SMP for Asterisk would be appreciated. Any known Linux kernel related issues (2.4.20-13.7smp #1 SMP for BOX3 )? Thank you. Alex Zarubin ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Dual T400P, SMP, performance issues
On Tue, Jun 10, 2003 at 10:14:09AM -0600, Jared Smith wrote: My dual-proc Xeon boxes didn't share IRQs across CPUs until I installed the kernel-utils RPM and made sure the irqbalance service was running... Just a word to the wise! Yes, you need irqbalance and a kinda modern kernel in order to be able to balance IRQs across different CPUs. This has nothing to do with *, because it's not up to it which CPU can handle each interrupt. You may also want to try out 2.5 and see how it behaves, it has improved a lot on those areas. BTW, some NAPI-alike stuff would help here, has anyone thought/tried out anything like it? Thanks, Alberto ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Dual T400P, SMP, performance issues
On Tue, 10 Jun 2003 [EMAIL PROTECTED] wrote: H, I to appear to have an odd mix of interrupts. It seems that the second CPU doesn't do much at all on my dual Xeon... You might have 'noapic' on your kernel command line... or your bios isnt configured for MP 1.4 ... -Dan ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users