Re: APIC error on 32-bit kernel

2007-05-12 Thread Jay Cliburn
Thank you very much for looking at this, Len.


On Fri, 11 May 2007 23:28:58 -0400
Len Brown <[EMAIL PROTECTED]> wrote:

> > > [   94.754852] APIC error on CPU0: 08(40)
> > > [   94.806045] APIC error on CPU0: 40(08)
> 
> /* Here is what the APIC error bits mean:
>0: Send CS error
>1: Receive CS error
>2: Send accept error
>3: Receive accept error
>4: Reserved
>5: Send illegal vector
>6: Received illegal vector
>7: Illegal register address
> */
> 
> So the 40 means the APIC got an illegal vector.
> Certainly this is consistent with the fact that
> the errors start when a specific device is being
> used.  I assume that device is using MSI?

Yes, the device is using MSI.

> Curious that it is different in 32-bit and 64-bit mode.

Agreed, although I had one user back in March report APIC errors on the
Asus M2V board while running Debian x86_64.  I personally have never
encountered the problem under a 64-bit kernel, but I admit that just
might be random luck.


> > > We also do not see this problem on Intel-based motherboards, with
> > > either 32- or 64-bit kernels.
> > 
> > A full raft of documentation -- including acpidump and
> > linux-firmware-kit output, console capture, kernel config, lspci
> > -vvxxx (with apic=debug boot option), dmesg, and /proc/interrupts
> > -- is available at http://www.hogchain.net/m2v/apic-problem/
> 
> 
> [06Dh 109  2]  Boot Architecture Flags : 0003
> 
> for what it is worth, the bit in ACPI that is used to
> disable MSI support is not set -- so as  far as the BIOS
> is concerned, this system should support MSI.
> 
> Is it an add-in card, or lan-on-motherboard?

This is a PCIe LAN-on-motherboard.

My goal is to understand whether this is a problem in the atl1 driver,
or a problem on the motherboard.  If it's the former, obviously I want
to fix it.  If it's the latter, then I want to disable MSI in the driver
when we discover we're running on this motherboard.

Thanks again for taking time to look at this.  Any advice or hints you
provide will be greatly appreciated.

Jay
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: APIC error on 32-bit kernel

2007-05-12 Thread Jay Cliburn
Thank you very much for looking at this, Len.


On Fri, 11 May 2007 23:28:58 -0400
Len Brown [EMAIL PROTECTED] wrote:

   [   94.754852] APIC error on CPU0: 08(40)
   [   94.806045] APIC error on CPU0: 40(08)
 
 /* Here is what the APIC error bits mean:
0: Send CS error
1: Receive CS error
2: Send accept error
3: Receive accept error
4: Reserved
5: Send illegal vector
6: Received illegal vector
7: Illegal register address
 */
 
 So the 40 means the APIC got an illegal vector.
 Certainly this is consistent with the fact that
 the errors start when a specific device is being
 used.  I assume that device is using MSI?

Yes, the device is using MSI.

 Curious that it is different in 32-bit and 64-bit mode.

Agreed, although I had one user back in March report APIC errors on the
Asus M2V board while running Debian x86_64.  I personally have never
encountered the problem under a 64-bit kernel, but I admit that just
might be random luck.


   We also do not see this problem on Intel-based motherboards, with
   either 32- or 64-bit kernels.
  
  A full raft of documentation -- including acpidump and
  linux-firmware-kit output, console capture, kernel config, lspci
  -vvxxx (with apic=debug boot option), dmesg, and /proc/interrupts
  -- is available at http://www.hogchain.net/m2v/apic-problem/
 
 
 [06Dh 109  2]  Boot Architecture Flags : 0003
 
 for what it is worth, the bit in ACPI that is used to
 disable MSI support is not set -- so as  far as the BIOS
 is concerned, this system should support MSI.
 
 Is it an add-in card, or lan-on-motherboard?

This is a PCIe LAN-on-motherboard.

My goal is to understand whether this is a problem in the atl1 driver,
or a problem on the motherboard.  If it's the former, obviously I want
to fix it.  If it's the latter, then I want to disable MSI in the driver
when we discover we're running on this motherboard.

Thanks again for taking time to look at this.  Any advice or hints you
provide will be greatly appreciated.

Jay
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: APIC error on 32-bit kernel

2007-05-11 Thread Len Brown
> > We're trying to track down the source of a problem that occurs
> > whenever the atl1 network driver is activated on a 32-bit 2.6.21-rc4
> 
> and -rc5, -rc6, 2.6.20.x, 2.6.19.3, and probably others.
> 
> > We can load the driver just fine, but whenever we activate the
> > network, we see APIC errors (a sample of them are shown here,
> > captured from a serial console):
> > 
> > [EMAIL PROTECTED] ~]# echo 8 > /proc/sys/kernel/printk
> > [EMAIL PROTECTED] ~]# [   93.942012] process `sysctl' is using deprecated
> > sysctl (sysc.
> > [   94.396609] atl1: eth0 link is up 1000 Mbps full duplex
> > [   94.498887] APIC error on CPU0: 00(08)
> > [   94.498534] APIC error on CPU1: 00(08)
> > [   94.550079] APIC error on CPU0: 08(08)
> > [   94.549725] APIC error on CPU1: 08(08)
> > [   94.600915] APIC error on CPU1: 08(08)
> > [   94.601276] APIC error on CPU0: 08(08)
> > [   94.652108] APIC error on CPU1: 08(08)
> > [   94.652470] APIC error on CPU0: 08(08)
> > [   94.703659] APIC error on CPU0: 08(08)
> > [   94.703305] APIC error on CPU1: 08(08)
> > [   94.754852] APIC error on CPU0: 08(40)
> > [   94.806045] APIC error on CPU0: 40(08)

/* Here is what the APIC error bits mean:
   0: Send CS error
   1: Receive CS error
   2: Send accept error
   3: Receive accept error
   4: Reserved
   5: Send illegal vector
   6: Received illegal vector
   7: Illegal register address
*/

So the 40 means the APIC got an illegal vector.
Certainly this is consistent with the fact that
the errors start when a specific device is being
used.  I assume that device is using MSI?
Curious that it is different in 32-bit and 64-bit mode.



> > [   94.805692] APIC error on CPU1: 08(08)
> > [   94.857238] APIC error on CPU0: 08(08)
> > [   94.856884] APIC error on CPU1: 08(08)
> > [   94.908432] APIC error on CPU0: 08(08)
> > [   94.908078] APIC error on CPU1: 08(08)
> > [snip, more of the same]
> > [   98.901156] APIC error on CPU1: 08(08)
> > [   98.952702] APIC error on CPU0: 08(08)
> > [   98.952349] APIC error on CPU1: 08(08)
> > [   99.003895] APIC error on CPU0: 08(08)
> > [   99.003542] APIC error on CPU1: 08(08)
> > 
> > The machine hangs for about 5-10 seconds, then spontaneously reboots
> > without further console output.
> 
> I can prompt an oops by pinging my router while the apic errors are
> scrolling by.
> 
> > 
> > This is an Asus M2V (Via K8T890) motherboard.
> > 
> > The problem does not occur on a 32-bit kernel if we boot with
> > pci=nomsi, and it doesn't occur at all on a 64-bit kernel on the same
> > motherboard.

pci=nomsi, works, okay...


> > We also do not see this problem on Intel-based motherboards, with
> > either 32- or 64-bit kernels.
> 
> A full raft of documentation -- including acpidump and
> linux-firmware-kit output, console capture, kernel config, lspci -vvxxx
> (with apic=debug boot option), dmesg, and /proc/interrupts -- is
> available at http://www.hogchain.net/m2v/apic-problem/


[06Dh 109  2]  Boot Architecture Flags : 0003

for what it is worth, the bit in ACPI that is used to
disable MSI support is not set -- so as  far as the BIOS
is concerned, this system should support MSI.

Is it an add-in card, or lan-on-motherboard?

-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: APIC error on 32-bit kernel

2007-05-11 Thread Len Brown
  We're trying to track down the source of a problem that occurs
  whenever the atl1 network driver is activated on a 32-bit 2.6.21-rc4
 
 and -rc5, -rc6, 2.6.20.x, 2.6.19.3, and probably others.
 
  We can load the driver just fine, but whenever we activate the
  network, we see APIC errors (a sample of them are shown here,
  captured from a serial console):
  
  [EMAIL PROTECTED] ~]# echo 8  /proc/sys/kernel/printk
  [EMAIL PROTECTED] ~]# [   93.942012] process `sysctl' is using deprecated
  sysctl (sysc.
  [   94.396609] atl1: eth0 link is up 1000 Mbps full duplex
  [   94.498887] APIC error on CPU0: 00(08)
  [   94.498534] APIC error on CPU1: 00(08)
  [   94.550079] APIC error on CPU0: 08(08)
  [   94.549725] APIC error on CPU1: 08(08)
  [   94.600915] APIC error on CPU1: 08(08)
  [   94.601276] APIC error on CPU0: 08(08)
  [   94.652108] APIC error on CPU1: 08(08)
  [   94.652470] APIC error on CPU0: 08(08)
  [   94.703659] APIC error on CPU0: 08(08)
  [   94.703305] APIC error on CPU1: 08(08)
  [   94.754852] APIC error on CPU0: 08(40)
  [   94.806045] APIC error on CPU0: 40(08)

/* Here is what the APIC error bits mean:
   0: Send CS error
   1: Receive CS error
   2: Send accept error
   3: Receive accept error
   4: Reserved
   5: Send illegal vector
   6: Received illegal vector
   7: Illegal register address
*/

So the 40 means the APIC got an illegal vector.
Certainly this is consistent with the fact that
the errors start when a specific device is being
used.  I assume that device is using MSI?
Curious that it is different in 32-bit and 64-bit mode.



  [   94.805692] APIC error on CPU1: 08(08)
  [   94.857238] APIC error on CPU0: 08(08)
  [   94.856884] APIC error on CPU1: 08(08)
  [   94.908432] APIC error on CPU0: 08(08)
  [   94.908078] APIC error on CPU1: 08(08)
  [snip, more of the same]
  [   98.901156] APIC error on CPU1: 08(08)
  [   98.952702] APIC error on CPU0: 08(08)
  [   98.952349] APIC error on CPU1: 08(08)
  [   99.003895] APIC error on CPU0: 08(08)
  [   99.003542] APIC error on CPU1: 08(08)
  
  The machine hangs for about 5-10 seconds, then spontaneously reboots
  without further console output.
 
 I can prompt an oops by pinging my router while the apic errors are
 scrolling by.
 
  
  This is an Asus M2V (Via K8T890) motherboard.
  
  The problem does not occur on a 32-bit kernel if we boot with
  pci=nomsi, and it doesn't occur at all on a 64-bit kernel on the same
  motherboard.

pci=nomsi, works, okay...


  We also do not see this problem on Intel-based motherboards, with
  either 32- or 64-bit kernels.
 
 A full raft of documentation -- including acpidump and
 linux-firmware-kit output, console capture, kernel config, lspci -vvxxx
 (with apic=debug boot option), dmesg, and /proc/interrupts -- is
 available at http://www.hogchain.net/m2v/apic-problem/


[06Dh 109  2]  Boot Architecture Flags : 0003

for what it is worth, the bit in ACPI that is used to
disable MSI support is not set -- so as  far as the BIOS
is concerned, this system should support MSI.

Is it an add-in card, or lan-on-motherboard?

-Len
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: APIC error on 32-bit kernel

2007-04-09 Thread Jay Cliburn

Chuck Ebbert wrote:


Where is the text of the oops?


In one of the files on the website I referenced.  Here's the text...

[  173.584000] APIC error on CPU1: 08(08)
[  173.665000] APIC error on CPU0: 08(08)
[  173.665000] APIC error on CPU1: 08(08)
[  173.746000] APIC error on CPU0: 08(08)
[  173.746000] APIC error on CPU1: 08(08)
[  173.827000] APIC error on CPU0: 08(08)
[  173.827000] APIC error on CPU1: 08(08)
[  173.908000] APIC error on CPU0: 08(08)
[  173.908000] APIC error on CPU1: 08(08)
[  173.989000] APIC error on CPU0: 08(08)
[  173.989000] APIC error on CPU1: 08(08)

pinged my router somewhere along about here...

[  174.069000] BUG: unable to handle kernel NULL pointer 
dereference<1>BUG: unable to 0

[  174.069000]  printing eip:
[  174.069000] 
[  174.069000] *pde = 1feb8067
[  174.069000] Oops:  [#1]
[  174.069000] SMP
[  174.069000] Modules linked in: nf_conntrack_netbios_ns ipt_REJECT 
nf_conntrack_ipv4d

[  174.069000] CPU:1
[  174.069000] EIP:0060:[<>]Not tainted VLI
[  174.069000] EFLAGS: 00010006   (2.6.21-rc5-git1 #1)
[  174.069000] EIP is at 0x0
[  174.069000] eax: 00a0   ebx: dfe99f98   ecx: c07bb000   edx: c074de00
[  174.069000] esi: 00a0   edi:    ebp:    esp: c07bbffc
[  174.069000] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
[  174.069000] Process beagled-helper (pid: 3393, ti=c07bb000 
task=dfe28270 task.ti=df)

[  174.069000] Stack: c040704b
[  174.069000] Call Trace:
[  174.069000]  [] do_IRQ+0xac/0xd1
[  174.069000]  [] common_interrupt+0x2e/0x34
[  174.069000]  ===
[  174.069000] Code:  Bad EIP value.
[  174.069000] EIP: [<>] 0x0 SS:ESP 0068:c07bbffc
[  174.069000] Kernel panic - not syncing: Fatal exception in interrupt
[  174.069000] BUG: at arch/i386/kernel/smp.c:546 smp_call_function()
[  174.069000]  [] smp_call_function+0x5c/0xc8
[  174.069000]  [] do_unblank_screen+0x2a/0x120
[  174.069000]  [] smp_send_stop+0x1b/0x2e
[  174.069000]  [] panic+0x54/0xf2
[  174.069000]  [] die+0x1f8/0x22c
[  174.069000]  [] do_page_fault+0x40c/0x4df
[  174.069000]  [] do_page_fault+0x0/0x4df
[  174.069000]  [] error_code+0x7c/0x84
[  174.069000]  [] do_IRQ+0xac/0xd1
[  174.069000]  [] common_interrupt+0x2e/0x34
[  174.069000]  ===
[  174.069000]  at virtual address 
[  174.069000]  printing eip:
[  174.069000] 
[  174.069000] *pde = 20bd3067
[  174.069000] Oops:  [#2]
[  174.069000] SMP
[  174.069000] Modules linked in: nf_conntrack_netbios_ns ipt_REJECT 
nf_conntrack_ipv4d

[  174.069000] CPU:0
[  174.069000] EIP:0060:[<>]Not tainted VLI
[  174.069000] EFLAGS: 00010087   (2.6.21-rc5-git1 #1)
[  174.069000] EIP is at 0x0
[  174.069000] eax: 00a0   ebx: c0753f74   ecx: c07ba000   edx: c074de00
[  174.069000] esi: 00a0   edi:    ebp:    esp: c07baffc
[  174.069000] ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
[  174.069000] Process swapper (pid: 0, ti=c07ba000 task=c07094c0 
task.ti=c0753000)

[  174.069000] Stack: c040704b
[  174.069000] Call Trace:
[  174.069000]  [] do_IRQ+0xac/0xd1
[  174.069000]  [] common_interrupt+0x2e/0x34
[  174.069000]  [] default_idle+0x3d/0x54
[  174.069000]  [] cpu_idle+0xa3/0xbc
[  174.069000]  [] start_kernel+0x45d/0x465
[  174.069000]  [] unknown_bootoption+0x0/0x202
[  174.069000]  ===
[  174.069000] Code:  Bad EIP value.
[  174.069000] EIP: [<>] 0x0 SS:ESP 0068:c07baffc
[  174.069000] Kernel panic - not syncing: Fatal exception in interrupt

Short hang, then spontaneous reboot.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: APIC error on 32-bit kernel

2007-04-09 Thread Chuck Ebbert
Jay Cliburn wrote:
> [Adding linux-kernel to the cc list, hoping for wider exposure.]
> 
> On Fri, 23 Mar 2007 20:08:17 -0500
> Jay Cliburn <[EMAIL PROTECTED]> wrote:
> 
>> We're trying to track down the source of a problem that occurs
>> whenever the atl1 network driver is activated on a 32-bit 2.6.21-rc4
> 
> and -rc5, -rc6, 2.6.20.x, 2.6.19.3, and probably others.
> 
>> We can load the driver just fine, but whenever we activate the
>> network, we see APIC errors (a sample of them are shown here,
>> captured from a serial console):
>>
>> [EMAIL PROTECTED] ~]# echo 8 > /proc/sys/kernel/printk
>> [EMAIL PROTECTED] ~]# [   93.942012] process `sysctl' is using deprecated
>> sysctl (sysc.
>> [   94.396609] atl1: eth0 link is up 1000 Mbps full duplex
>> [   94.498887] APIC error on CPU0: 00(08)
>> [   94.498534] APIC error on CPU1: 00(08)
>> [   94.550079] APIC error on CPU0: 08(08)
>> [   94.549725] APIC error on CPU1: 08(08)
>> [   94.600915] APIC error on CPU1: 08(08)
>> [   94.601276] APIC error on CPU0: 08(08)
>> [   94.652108] APIC error on CPU1: 08(08)
>> [   94.652470] APIC error on CPU0: 08(08)
>> [   94.703659] APIC error on CPU0: 08(08)
>> [   94.703305] APIC error on CPU1: 08(08)
>> [   94.754852] APIC error on CPU0: 08(40)
>> [   94.806045] APIC error on CPU0: 40(08)
>> [   94.805692] APIC error on CPU1: 08(08)
>> [   94.857238] APIC error on CPU0: 08(08)
>> [   94.856884] APIC error on CPU1: 08(08)
>> [   94.908432] APIC error on CPU0: 08(08)
>> [   94.908078] APIC error on CPU1: 08(08)
>> [snip, more of the same]
>> [   98.901156] APIC error on CPU1: 08(08)
>> [   98.952702] APIC error on CPU0: 08(08)
>> [   98.952349] APIC error on CPU1: 08(08)
>> [   99.003895] APIC error on CPU0: 08(08)
>> [   99.003542] APIC error on CPU1: 08(08)
>>
>> The machine hangs for about 5-10 seconds, then spontaneously reboots
>> without further console output.
> 
> I can prompt an oops by pinging my router while the apic errors are
> scrolling by.

Where is the text of the oops?


> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: APIC error on 32-bit kernel

2007-04-09 Thread Chuck Ebbert
Jay Cliburn wrote:
 [Adding linux-kernel to the cc list, hoping for wider exposure.]
 
 On Fri, 23 Mar 2007 20:08:17 -0500
 Jay Cliburn [EMAIL PROTECTED] wrote:
 
 We're trying to track down the source of a problem that occurs
 whenever the atl1 network driver is activated on a 32-bit 2.6.21-rc4
 
 and -rc5, -rc6, 2.6.20.x, 2.6.19.3, and probably others.
 
 We can load the driver just fine, but whenever we activate the
 network, we see APIC errors (a sample of them are shown here,
 captured from a serial console):

 [EMAIL PROTECTED] ~]# echo 8  /proc/sys/kernel/printk
 [EMAIL PROTECTED] ~]# [   93.942012] process `sysctl' is using deprecated
 sysctl (sysc.
 [   94.396609] atl1: eth0 link is up 1000 Mbps full duplex
 [   94.498887] APIC error on CPU0: 00(08)
 [   94.498534] APIC error on CPU1: 00(08)
 [   94.550079] APIC error on CPU0: 08(08)
 [   94.549725] APIC error on CPU1: 08(08)
 [   94.600915] APIC error on CPU1: 08(08)
 [   94.601276] APIC error on CPU0: 08(08)
 [   94.652108] APIC error on CPU1: 08(08)
 [   94.652470] APIC error on CPU0: 08(08)
 [   94.703659] APIC error on CPU0: 08(08)
 [   94.703305] APIC error on CPU1: 08(08)
 [   94.754852] APIC error on CPU0: 08(40)
 [   94.806045] APIC error on CPU0: 40(08)
 [   94.805692] APIC error on CPU1: 08(08)
 [   94.857238] APIC error on CPU0: 08(08)
 [   94.856884] APIC error on CPU1: 08(08)
 [   94.908432] APIC error on CPU0: 08(08)
 [   94.908078] APIC error on CPU1: 08(08)
 [snip, more of the same]
 [   98.901156] APIC error on CPU1: 08(08)
 [   98.952702] APIC error on CPU0: 08(08)
 [   98.952349] APIC error on CPU1: 08(08)
 [   99.003895] APIC error on CPU0: 08(08)
 [   99.003542] APIC error on CPU1: 08(08)

 The machine hangs for about 5-10 seconds, then spontaneously reboots
 without further console output.
 
 I can prompt an oops by pinging my router while the apic errors are
 scrolling by.

Where is the text of the oops?


 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: APIC error on 32-bit kernel

2007-04-09 Thread Jay Cliburn

Chuck Ebbert wrote:


Where is the text of the oops?


In one of the files on the website I referenced.  Here's the text...

[  173.584000] APIC error on CPU1: 08(08)
[  173.665000] APIC error on CPU0: 08(08)
[  173.665000] APIC error on CPU1: 08(08)
[  173.746000] APIC error on CPU0: 08(08)
[  173.746000] APIC error on CPU1: 08(08)
[  173.827000] APIC error on CPU0: 08(08)
[  173.827000] APIC error on CPU1: 08(08)
[  173.908000] APIC error on CPU0: 08(08)
[  173.908000] APIC error on CPU1: 08(08)
[  173.989000] APIC error on CPU0: 08(08)
[  173.989000] APIC error on CPU1: 08(08)

pinged my router somewhere along about here...

[  174.069000] BUG: unable to handle kernel NULL pointer 
dereference1BUG: unable to 0

[  174.069000]  printing eip:
[  174.069000] 
[  174.069000] *pde = 1feb8067
[  174.069000] Oops:  [#1]
[  174.069000] SMP
[  174.069000] Modules linked in: nf_conntrack_netbios_ns ipt_REJECT 
nf_conntrack_ipv4d

[  174.069000] CPU:1
[  174.069000] EIP:0060:[]Not tainted VLI
[  174.069000] EFLAGS: 00010006   (2.6.21-rc5-git1 #1)
[  174.069000] EIP is at 0x0
[  174.069000] eax: 00a0   ebx: dfe99f98   ecx: c07bb000   edx: c074de00
[  174.069000] esi: 00a0   edi:    ebp:    esp: c07bbffc
[  174.069000] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
[  174.069000] Process beagled-helper (pid: 3393, ti=c07bb000 
task=dfe28270 task.ti=df)

[  174.069000] Stack: c040704b
[  174.069000] Call Trace:
[  174.069000]  [c040704b] do_IRQ+0xac/0xd1
[  174.069000]  [c040580e] common_interrupt+0x2e/0x34
[  174.069000]  ===
[  174.069000] Code:  Bad EIP value.
[  174.069000] EIP: [] 0x0 SS:ESP 0068:c07bbffc
[  174.069000] Kernel panic - not syncing: Fatal exception in interrupt
[  174.069000] BUG: at arch/i386/kernel/smp.c:546 smp_call_function()
[  174.069000]  [c0417b4f] smp_call_function+0x5c/0xc8
[  174.069000]  [c054052e] do_unblank_screen+0x2a/0x120
[  174.069000]  [c0417bd6] smp_send_stop+0x1b/0x2e
[  174.069000]  [c04271ca] panic+0x54/0xf2
[  174.069000]  [c04062c5] die+0x1f8/0x22c
[  174.069000]  [c0623d13] do_page_fault+0x40c/0x4df
[  174.069000]  [c0623907] do_page_fault+0x0/0x4df
[  174.069000]  [c0622574] error_code+0x7c/0x84
[  174.069000]  [c040704b] do_IRQ+0xac/0xd1
[  174.069000]  [c040580e] common_interrupt+0x2e/0x34
[  174.069000]  ===
[  174.069000]  at virtual address 
[  174.069000]  printing eip:
[  174.069000] 
[  174.069000] *pde = 20bd3067
[  174.069000] Oops:  [#2]
[  174.069000] SMP
[  174.069000] Modules linked in: nf_conntrack_netbios_ns ipt_REJECT 
nf_conntrack_ipv4d

[  174.069000] CPU:0
[  174.069000] EIP:0060:[]Not tainted VLI
[  174.069000] EFLAGS: 00010087   (2.6.21-rc5-git1 #1)
[  174.069000] EIP is at 0x0
[  174.069000] eax: 00a0   ebx: c0753f74   ecx: c07ba000   edx: c074de00
[  174.069000] esi: 00a0   edi:    ebp:    esp: c07baffc
[  174.069000] ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
[  174.069000] Process swapper (pid: 0, ti=c07ba000 task=c07094c0 
task.ti=c0753000)

[  174.069000] Stack: c040704b
[  174.069000] Call Trace:
[  174.069000]  [c040704b] do_IRQ+0xac/0xd1
[  174.069000]  [c040580e] common_interrupt+0x2e/0x34
[  174.069000]  [c0403c74] default_idle+0x3d/0x54
[  174.069000]  [c040339b] cpu_idle+0xa3/0xbc
[  174.069000]  [c0758a37] start_kernel+0x45d/0x465
[  174.069000]  [c07581ae] unknown_bootoption+0x0/0x202
[  174.069000]  ===
[  174.069000] Code:  Bad EIP value.
[  174.069000] EIP: [] 0x0 SS:ESP 0068:c07baffc
[  174.069000] Kernel panic - not syncing: Fatal exception in interrupt

Short hang, then spontaneous reboot.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: APIC error on 32-bit kernel

2007-04-08 Thread Jay Cliburn
[Adding linux-kernel to the cc list, hoping for wider exposure.]

On Fri, 23 Mar 2007 20:08:17 -0500
Jay Cliburn <[EMAIL PROTECTED]> wrote:

> We're trying to track down the source of a problem that occurs
> whenever the atl1 network driver is activated on a 32-bit 2.6.21-rc4

and -rc5, -rc6, 2.6.20.x, 2.6.19.3, and probably others.

> We can load the driver just fine, but whenever we activate the
> network, we see APIC errors (a sample of them are shown here,
> captured from a serial console):
> 
> [EMAIL PROTECTED] ~]# echo 8 > /proc/sys/kernel/printk
> [EMAIL PROTECTED] ~]# [   93.942012] process `sysctl' is using deprecated
> sysctl (sysc.
> [   94.396609] atl1: eth0 link is up 1000 Mbps full duplex
> [   94.498887] APIC error on CPU0: 00(08)
> [   94.498534] APIC error on CPU1: 00(08)
> [   94.550079] APIC error on CPU0: 08(08)
> [   94.549725] APIC error on CPU1: 08(08)
> [   94.600915] APIC error on CPU1: 08(08)
> [   94.601276] APIC error on CPU0: 08(08)
> [   94.652108] APIC error on CPU1: 08(08)
> [   94.652470] APIC error on CPU0: 08(08)
> [   94.703659] APIC error on CPU0: 08(08)
> [   94.703305] APIC error on CPU1: 08(08)
> [   94.754852] APIC error on CPU0: 08(40)
> [   94.806045] APIC error on CPU0: 40(08)
> [   94.805692] APIC error on CPU1: 08(08)
> [   94.857238] APIC error on CPU0: 08(08)
> [   94.856884] APIC error on CPU1: 08(08)
> [   94.908432] APIC error on CPU0: 08(08)
> [   94.908078] APIC error on CPU1: 08(08)
> [snip, more of the same]
> [   98.901156] APIC error on CPU1: 08(08)
> [   98.952702] APIC error on CPU0: 08(08)
> [   98.952349] APIC error on CPU1: 08(08)
> [   99.003895] APIC error on CPU0: 08(08)
> [   99.003542] APIC error on CPU1: 08(08)
> 
> The machine hangs for about 5-10 seconds, then spontaneously reboots
> without further console output.

I can prompt an oops by pinging my router while the apic errors are
scrolling by.

> 
> This is an Asus M2V (Via K8T890) motherboard.
> 
> The problem does not occur on a 32-bit kernel if we boot with
> pci=nomsi, and it doesn't occur at all on a 64-bit kernel on the same
> motherboard.
> 
> We also do not see this problem on Intel-based motherboards, with
> either 32- or 64-bit kernels.

A full raft of documentation -- including acpidump and
linux-firmware-kit output, console capture, kernel config, lspci -vvxxx
(with apic=debug boot option), dmesg, and /proc/interrupts -- is
available at http://www.hogchain.net/m2v/apic-problem/

If this is a motherboard problem, that's fine; I'd just like to know
the details so I tell users something more than "it's a motherboard
problem."

Thanks,
Jay
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: APIC error on 32-bit kernel

2007-04-08 Thread Jay Cliburn
[Adding linux-kernel to the cc list, hoping for wider exposure.]

On Fri, 23 Mar 2007 20:08:17 -0500
Jay Cliburn [EMAIL PROTECTED] wrote:

 We're trying to track down the source of a problem that occurs
 whenever the atl1 network driver is activated on a 32-bit 2.6.21-rc4

and -rc5, -rc6, 2.6.20.x, 2.6.19.3, and probably others.

 We can load the driver just fine, but whenever we activate the
 network, we see APIC errors (a sample of them are shown here,
 captured from a serial console):
 
 [EMAIL PROTECTED] ~]# echo 8  /proc/sys/kernel/printk
 [EMAIL PROTECTED] ~]# [   93.942012] process `sysctl' is using deprecated
 sysctl (sysc.
 [   94.396609] atl1: eth0 link is up 1000 Mbps full duplex
 [   94.498887] APIC error on CPU0: 00(08)
 [   94.498534] APIC error on CPU1: 00(08)
 [   94.550079] APIC error on CPU0: 08(08)
 [   94.549725] APIC error on CPU1: 08(08)
 [   94.600915] APIC error on CPU1: 08(08)
 [   94.601276] APIC error on CPU0: 08(08)
 [   94.652108] APIC error on CPU1: 08(08)
 [   94.652470] APIC error on CPU0: 08(08)
 [   94.703659] APIC error on CPU0: 08(08)
 [   94.703305] APIC error on CPU1: 08(08)
 [   94.754852] APIC error on CPU0: 08(40)
 [   94.806045] APIC error on CPU0: 40(08)
 [   94.805692] APIC error on CPU1: 08(08)
 [   94.857238] APIC error on CPU0: 08(08)
 [   94.856884] APIC error on CPU1: 08(08)
 [   94.908432] APIC error on CPU0: 08(08)
 [   94.908078] APIC error on CPU1: 08(08)
 [snip, more of the same]
 [   98.901156] APIC error on CPU1: 08(08)
 [   98.952702] APIC error on CPU0: 08(08)
 [   98.952349] APIC error on CPU1: 08(08)
 [   99.003895] APIC error on CPU0: 08(08)
 [   99.003542] APIC error on CPU1: 08(08)
 
 The machine hangs for about 5-10 seconds, then spontaneously reboots
 without further console output.

I can prompt an oops by pinging my router while the apic errors are
scrolling by.

 
 This is an Asus M2V (Via K8T890) motherboard.
 
 The problem does not occur on a 32-bit kernel if we boot with
 pci=nomsi, and it doesn't occur at all on a 64-bit kernel on the same
 motherboard.
 
 We also do not see this problem on Intel-based motherboards, with
 either 32- or 64-bit kernels.

A full raft of documentation -- including acpidump and
linux-firmware-kit output, console capture, kernel config, lspci -vvxxx
(with apic=debug boot option), dmesg, and /proc/interrupts -- is
available at http://www.hogchain.net/m2v/apic-problem/

If this is a motherboard problem, that's fine; I'd just like to know
the details so I tell users something more than it's a motherboard
problem.

Thanks,
Jay
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/