Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-17 Thread Jarek Poplawski
On Fri, Apr 06, 2007 at 07:19:25PM +0100, Christian Kujau wrote: > On Wed, 4 Apr 2007, Christian Kujau wrote: > >>Maybe it's a real locking problem. Here are some more > >>suggestions for testing (if you don't find anything better): > >>- try without SMP, so: 'acpi=off lapic nosmp' > > We were

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-17 Thread Jarek Poplawski
On Fri, Apr 06, 2007 at 07:19:25PM +0100, Christian Kujau wrote: On Wed, 4 Apr 2007, Christian Kujau wrote: Maybe it's a real locking problem. Here are some more suggestions for testing (if you don't find anything better): - try without SMP, so: 'acpi=off lapic nosmp' We were able to have

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-06 Thread Christian Kujau
On Fri, 6 Apr 2007, Christian Kujau wrote: but yes, this seem to be different problems, for the curious among you I've put details here: http://nerdbynature.de/bits/2.6.20.4/db2/ that's http://nerdbynature.de/bits/2.6.20.4/db1/2/ sorry. -- BOFH excuse #270: Someone has messed up the

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-06 Thread Christian Kujau
On Wed, 4 Apr 2007, Christian Kujau wrote: Maybe it's a real locking problem. Here are some more suggestions for testing (if you don't find anything better): - try without SMP, so: 'acpi=off lapic nosmp' We were able to have our hosting provider to replace the 8139too with a E100, the onboard

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-06 Thread Christian Kujau
On Wed, 4 Apr 2007, Christian Kujau wrote: Maybe it's a real locking problem. Here are some more suggestions for testing (if you don't find anything better): - try without SMP, so: 'acpi=off lapic nosmp' We were able to have our hosting provider to replace the 8139too with a E100, the onboard

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-06 Thread Christian Kujau
On Fri, 6 Apr 2007, Christian Kujau wrote: but yes, this seem to be different problems, for the curious among you I've put details here: http://nerdbynature.de/bits/2.6.20.4/db2/ that's http://nerdbynature.de/bits/2.6.20.4/db1/2/ sorry. -- BOFH excuse #270: Someone has messed up the

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-05 Thread Jarek Poplawski
On Wed, Apr 04, 2007 at 02:20:23PM +0100, Christian Kujau wrote: > On Wed, 4 Apr 2007, Jarek Poplawski wrote: > >So, it's a lot sooner than before. (BTW, isn't there anything > >in debug log?) > > No, nothing. I've set up remote-syslgging to the other node (node1 > logging to node2 and vice

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-05 Thread Jarek Poplawski
On Wed, Apr 04, 2007 at 02:20:23PM +0100, Christian Kujau wrote: On Wed, 4 Apr 2007, Jarek Poplawski wrote: So, it's a lot sooner than before. (BTW, isn't there anything in debug log?) No, nothing. I've set up remote-syslgging to the other node (node1 logging to node2 and vice versa) -

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Christian Kujau
On Wed, 4 Apr 2007, Francois Romieu wrote: No serial cable ? No, unfortunately this hosting provider does not have a serial console to access :( 4 - try: http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.21-rc5/r8169-20070402 Are they in -rc5 yet or 'not in -rc5 but should be applied to

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Francois Romieu
Christian Kujau <[EMAIL PROTECTED]> : [...] > Actually I was thinking about *using* netconsole, since even setting up > remote (userspace-)syslog left nothing on the syslog-server, when the > machine crashed. But if it's b0rked in 8139, I will refrain from doing > so. Please refrain :o) No

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Christian Kujau
On Wed, 4 Apr 2007, Denys wrote: IMHO it can be hardware issue also, i had something very similar with faulty hardware combinations. Since it's happening on 2 nodes, I somehow doubt that... -- BOFH excuse #447: According to Microsoft, it's by design - To unsubscribe from this list: send the

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Denys
IMHO it can be hardware issue also, i had something very similar with faulty hardware combinations. On Wed, 4 Apr 2007 13:21:00 +0200, Jarek Poplawski wrote > On Tue, Apr 03, 2007 at 04:19:46PM +0100, Christian Kujau wrote: > > On Tue, 3 Apr 2007, Jarek Poplawski wrote: > > >Did you try with

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Christian Kujau
On Wed, 4 Apr 2007, Jarek Poplawski wrote: So, it's a lot sooner than before. (BTW, isn't there anything in debug log?) No, nothing. I've set up remote-syslgging to the other node (node1 logging to node2 and vice versa) - nothing :( I see both CPUs did interrupt handling again. Yes, when

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Christian Kujau
On Tue, 3 Apr 2007, Francois Romieu wrote: Christian Kujau <[EMAIL PROTECTED]> : If the apic voodoo makes no difference, you can: 1 - leave it enabled Well, we tried to boot with ACPI compiled in again, but disabled during boot: - acpi=off lapic, crashed after 1h (almost exactly) of service

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Jarek Poplawski
On Tue, Apr 03, 2007 at 04:19:46PM +0100, Christian Kujau wrote: > On Tue, 3 Apr 2007, Jarek Poplawski wrote: > >Did you try with 8139cp instead of 8139too? > > Tried that, 8139cp could not be loaded :( Sorry for misleading! > >(Maybe even try some other card to narrow the problem?) > >You

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Jarek Poplawski
On Tue, Apr 03, 2007 at 04:19:46PM +0100, Christian Kujau wrote: On Tue, 3 Apr 2007, Jarek Poplawski wrote: Did you try with 8139cp instead of 8139too? Tried that, 8139cp could not be loaded :( Sorry for misleading! (Maybe even try some other card to narrow the problem?) You could also

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Christian Kujau
On Tue, 3 Apr 2007, Francois Romieu wrote: Christian Kujau [EMAIL PROTECTED] : If the apic voodoo makes no difference, you can: 1 - leave it enabled Well, we tried to boot with ACPI compiled in again, but disabled during boot: - acpi=off lapic, crashed after 1h (almost exactly) of service -

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Christian Kujau
On Wed, 4 Apr 2007, Jarek Poplawski wrote: So, it's a lot sooner than before. (BTW, isn't there anything in debug log?) No, nothing. I've set up remote-syslgging to the other node (node1 logging to node2 and vice versa) - nothing :( I see both CPUs did interrupt handling again. Yes, when

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Denys
IMHO it can be hardware issue also, i had something very similar with faulty hardware combinations. On Wed, 4 Apr 2007 13:21:00 +0200, Jarek Poplawski wrote On Tue, Apr 03, 2007 at 04:19:46PM +0100, Christian Kujau wrote: On Tue, 3 Apr 2007, Jarek Poplawski wrote: Did you try with 8139cp

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Christian Kujau
On Wed, 4 Apr 2007, Denys wrote: IMHO it can be hardware issue also, i had something very similar with faulty hardware combinations. Since it's happening on 2 nodes, I somehow doubt that... -- BOFH excuse #447: According to Microsoft, it's by design - To unsubscribe from this list: send the

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Francois Romieu
Christian Kujau [EMAIL PROTECTED] : [...] Actually I was thinking about *using* netconsole, since even setting up remote (userspace-)syslog left nothing on the syslog-server, when the machine crashed. But if it's b0rked in 8139, I will refrain from doing so. Please refrain :o) No serial

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Christian Kujau
On Wed, 4 Apr 2007, Francois Romieu wrote: No serial cable ? No, unfortunately this hosting provider does not have a serial console to access :( 4 - try: http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.21-rc5/r8169-20070402 Are they in -rc5 yet or 'not in -rc5 but should be applied to

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Francois Romieu
Christian Kujau <[EMAIL PROTECTED]> : [...] > Please see http://nerdbynature.de/bits/2.6.20.4/ for details for both > hosts and feel free to ask for more details. Although both boxes are in > production we'll be happy test more bootoptions/patches and the like. If the apic voodoo makes no

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Francois Romieu
Christian Kujau <[EMAIL PROTECTED]> : > On Tue, 3 Apr 2007, Jarek Poplawski wrote: > >Did you try with 8139cp instead of 8139too? > > Tried that, 8139cp could not be loaded :( It is a different beast. -- Ueimor - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Lee Revell
On 4/3/07, Christian Kujau <[EMAIL PROTECTED]> wrote: On Tue, 3 Apr 2007, Robert Hancock wrote: > Although it's not as bad with servers, many machines are designed to run only > Windows (which normally always uses ACPI) and simply aren't tested well or at > all with ACPI disabled so you can run

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Christian Kujau
On Tue, 3 Apr 2007, Robert Hancock wrote: These days I think it's usually best to have ACPI on with current systems. Whooha, really? While I honor the acpi-folks' work when using a desktop machine I am otherwise always reminded to the comment in arch/i386/kernel/apm.c, which basically says:

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Christian Kujau
On Tue, 3 Apr 2007, Jarek Poplawski wrote: Did you try with 8139cp instead of 8139too? Tried that, 8139cp could not be loaded :( (Maybe even try some other card to narrow the problem?) You could also try to test without ehci, if it's possible. USB has been disabled completely. After

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Christian Kujau
On Mon, 2 Apr 2007, Chuck Ebbert wrote: Where is the info from before you changed to "noapic"? Or were the machines always using XT-PIC for all the interrupts??? We booted with 'acpi=off lapic' (with ACPI options compiled in, to be able to boot with acpi=on later on) and the box locked up

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Robert Hancock
Christian Kujau wrote: Len et al., do you even suggest to use ACPI on a server system at all? I myself always thought of ACPI being evil and to avoid when possible (thus switching it off completely on a serversystem). These days I think it's usually best to have ACPI on with current systems.

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Christian Kujau
On Tue, 3 Apr 2007, Jarek Poplawski wrote: Did you try with 8139cp instead of 8139too? I forgot about that, thanks. (Maybe even try some other card to narrow the problem?) We're try to convince our hosting provider to replace the NIC with a e1000. You could also try to test without

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Jarek Poplawski
On 02-04-2007 21:41, Christian Kujau wrote: > > Hi there, > > we have serious problems with 2 of our servers: both shiny new amd64 > dual core, with both 2GB RAM, 32bit kernel+userland (Debian/testing). > Both servers have 2 NICs, RTL8139 (eth0, irq10) and RTL8169s > (eth1, irq11). Hi, Did

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Jarek Poplawski
On 02-04-2007 21:41, Christian Kujau wrote: Hi there, we have serious problems with 2 of our servers: both shiny new amd64 dual core, with both 2GB RAM, 32bit kernel+userland (Debian/testing). Both servers have 2 NICs, RTL8139 (eth0, irq10) and RTL8169s (eth1, irq11). Hi, Did you try

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Christian Kujau
On Tue, 3 Apr 2007, Jarek Poplawski wrote: Did you try with 8139cp instead of 8139too? I forgot about that, thanks. (Maybe even try some other card to narrow the problem?) We're try to convince our hosting provider to replace the NIC with a e1000. You could also try to test without

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Robert Hancock
Christian Kujau wrote: Len et al., do you even suggest to use ACPI on a server system at all? I myself always thought of ACPI being evil and to avoid when possible (thus switching it off completely on a serversystem). These days I think it's usually best to have ACPI on with current systems.

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Christian Kujau
On Mon, 2 Apr 2007, Chuck Ebbert wrote: Where is the info from before you changed to noapic? Or were the machines always using XT-PIC for all the interrupts??? We booted with 'acpi=off lapic' (with ACPI options compiled in, to be able to boot with acpi=on later on) and the box locked up

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Christian Kujau
On Tue, 3 Apr 2007, Jarek Poplawski wrote: Did you try with 8139cp instead of 8139too? Tried that, 8139cp could not be loaded :( (Maybe even try some other card to narrow the problem?) You could also try to test without ehci, if it's possible. USB has been disabled completely. After

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Christian Kujau
On Tue, 3 Apr 2007, Robert Hancock wrote: These days I think it's usually best to have ACPI on with current systems. Whooha, really? While I honor the acpi-folks' work when using a desktop machine I am otherwise always reminded to the comment in arch/i386/kernel/apm.c, which basically says:

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Lee Revell
On 4/3/07, Christian Kujau [EMAIL PROTECTED] wrote: On Tue, 3 Apr 2007, Robert Hancock wrote: Although it's not as bad with servers, many machines are designed to run only Windows (which normally always uses ACPI) and simply aren't tested well or at all with ACPI disabled so you can run into

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Francois Romieu
Christian Kujau [EMAIL PROTECTED] : On Tue, 3 Apr 2007, Jarek Poplawski wrote: Did you try with 8139cp instead of 8139too? Tried that, 8139cp could not be loaded :( It is a different beast. -- Ueimor - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Francois Romieu
Christian Kujau [EMAIL PROTECTED] : [...] Please see http://nerdbynature.de/bits/2.6.20.4/ for details for both hosts and feel free to ask for more details. Although both boxes are in production we'll be happy test more bootoptions/patches and the like. If the apic voodoo makes no

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Christian Kujau
On Tue, 3 Apr 2007, Len Brown wrote: Which increased stability, disabling ACPI, or disabling the IOAPIC? To be honest, we're not sure. See below. Your box has MPS, so you should be able to use the IOAPIC in either mode. MPS - Multiprocessor Specification? SMP? Yes, it'd be good to use the

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Christian Kujau
On Mon, 2 Apr 2007, Chuck Ebbert wrote: Where is the info from before you changed to "noapic"? Or were the machines always using XT-PIC for all the interrupts??? XT-PIC is only used since we switched to noapic, before there was IO-APIC-fasteoi on both ethernet cards and interrupts were

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Len Brown
On Monday 02 April 2007 15:41, Christian Kujau wrote: > > Hi there, > > we have serious problems with 2 of our servers: both shiny new amd64 > dual core, with both 2GB RAM, 32bit kernel+userland (Debian/testing). > Both servers have 2 NICs, RTL8139 (eth0, irq10) and RTL8169s > (eth1, irq11). >

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Christian Kujau
On Mon, 2 Apr 2007, Chuck Ebbert wrote: Please see http://nerdbynature.de/bits/2.6.20.4/ for details for both hosts and feel free to ask for more details. Although both boxes are in production we'll be happy test more bootoptions/patches and the like. Where is the info from before you changed

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Chuck Ebbert
Christian Kujau wrote: > > Please see http://nerdbynature.de/bits/2.6.20.4/ for details for both > hosts and feel free to ask for more details. Although both boxes are in > production we'll be happy test more bootoptions/patches and the like. Where is the info from before you changed to

2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Christian Kujau
Hi there, we have serious problems with 2 of our servers: both shiny new amd64 dual core, with both 2GB RAM, 32bit kernel+userland (Debian/testing). Both servers have 2 NICs, RTL8139 (eth0, irq10) and RTL8169s (eth1, irq11). Both boxes are running fine but after "a while" they lock up and

2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Christian Kujau
Hi there, we have serious problems with 2 of our servers: both shiny new amd64 dual core, with both 2GB RAM, 32bit kernel+userland (Debian/testing). Both servers have 2 NICs, RTL8139 (eth0, irq10) and RTL8169s (eth1, irq11). Both boxes are running fine but after a while they lock up and

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Chuck Ebbert
Christian Kujau wrote: Please see http://nerdbynature.de/bits/2.6.20.4/ for details for both hosts and feel free to ask for more details. Although both boxes are in production we'll be happy test more bootoptions/patches and the like. Where is the info from before you changed to noapic? Or

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Christian Kujau
On Mon, 2 Apr 2007, Chuck Ebbert wrote: Please see http://nerdbynature.de/bits/2.6.20.4/ for details for both hosts and feel free to ask for more details. Although both boxes are in production we'll be happy test more bootoptions/patches and the like. Where is the info from before you changed

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Len Brown
On Monday 02 April 2007 15:41, Christian Kujau wrote: Hi there, we have serious problems with 2 of our servers: both shiny new amd64 dual core, with both 2GB RAM, 32bit kernel+userland (Debian/testing). Both servers have 2 NICs, RTL8139 (eth0, irq10) and RTL8169s (eth1, irq11). Both

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Christian Kujau
On Mon, 2 Apr 2007, Chuck Ebbert wrote: Where is the info from before you changed to noapic? Or were the machines always using XT-PIC for all the interrupts??? XT-PIC is only used since we switched to noapic, before there was IO-APIC-fasteoi on both ethernet cards and interrupts were balanced

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Christian Kujau
On Tue, 3 Apr 2007, Len Brown wrote: Which increased stability, disabling ACPI, or disabling the IOAPIC? To be honest, we're not sure. See below. Your box has MPS, so you should be able to use the IOAPIC in either mode. MPS - Multiprocessor Specification? SMP? Yes, it'd be good to use the