Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-07-11 Thread Dave, Tushar N
-Original Message-
From: Joe Jin [mailto:joe@oracle.com]
Sent: Tuesday, July 10, 2012 10:03 PM
To: Dave, Tushar N
Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
ker...@vger.kernel.org
Subject: Re: 82571EB: Detected Hardware Unit Hang

On 07/11/12 12:05, Dave, Tushar N wrote:
 When you said you had this issue with RHEL5 and RHEL6 drivers, have you
install RHEl5/6 kernel and reproduced it? If so I think I should install
RHEL6 and try reproduce it locally!

Yes I reproduced this on both RHEL5 and RHEL6.

So far I tried to scp big file (~1GB) will hit it at once.

Thanks,
Joe

Joe,
Can you please send lspci -vvv output for failing port before issue occurs.
Thanks.

-Tushar

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-07-11 Thread Joe Jin
On 07/11/12 15:11, Dave, Tushar N wrote:
 -Original Message-
 From: Joe Jin [mailto:joe@oracle.com]
 Sent: Tuesday, July 10, 2012 10:03 PM
 To: Dave, Tushar N
 Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
 ker...@vger.kernel.org
 Subject: Re: 82571EB: Detected Hardware Unit Hang

 On 07/11/12 12:05, Dave, Tushar N wrote:
 When you said you had this issue with RHEL5 and RHEL6 drivers, have you
 install RHEl5/6 kernel and reproduced it? If so I think I should install
 RHEL6 and try reproduce it locally!

 Yes I reproduced this on both RHEL5 and RHEL6.

 So far I tried to scp big file (~1GB) will hit it at once.

 Thanks,
 Joe
 
 Joe,
 Can you please send lspci -vvv output for failing port before issue occurs.
 Thanks.
 
# lspci -s 05:00.0 -vvv
05:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (Copper) (rev 06)
Subsystem: Oracle Corporation x4 PCI-Express Quad Gigabit Ethernet UTP 
Low Profile Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
MAbort- SERR- PERR- INTx-
Latency: 0, Cache Line Size: 256 bytes
Interrupt: pin B routed to IRQ 80
Region 0: Memory at fbde (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at fbdc (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at dc00 [size=32]
Expansion ROM at fbda [disabled] [size=128K]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: fee21000  Data: 40cb
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s 512ns, 
L1 64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- 
Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ 
TransPend-
LnkCap: Port #2, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 
4us, L1 64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-
Capabilities: [100 v1] Advanced Error Reporting
UESta:  DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP+ ECRC- UnsupReq+ ACSViol-
UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
AERCap: First Error Pointer: 12, GenCap- CGenEn- ChkCap- ChkEn-
Capabilities: [140 v1] Device Serial Number 00-15-17-ff-ff-b9-77-9c
Kernel driver in use: e1000e
Kernel modules: e1000e


Thanks,
Joe

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-07-11 Thread Joe Jin
On 07/11/12 15:37, Dave, Tushar N wrote:
 -Original Message-
 From: Joe Jin [mailto:joe@oracle.com]
 Sent: Wednesday, July 11, 2012 12:18 AM
 To: Dave, Tushar N
 Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
 ker...@vger.kernel.org
 Subject: Re: 82571EB: Detected Hardware Unit Hang

 On 07/11/12 15:11, Dave, Tushar N wrote:
 -Original Message-
 From: Joe Jin [mailto:joe@oracle.com]
 Sent: Tuesday, July 10, 2012 10:03 PM
 To: Dave, Tushar N
 Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
 ker...@vger.kernel.org
 Subject: Re: 82571EB: Detected Hardware Unit Hang

 On 07/11/12 12:05, Dave, Tushar N wrote:
 When you said you had this issue with RHEL5 and RHEL6 drivers, have
 you
 install RHEl5/6 kernel and reproduced it? If so I think I should
 install
 RHEL6 and try reproduce it locally!

 Yes I reproduced this on both RHEL5 and RHEL6.

 So far I tried to scp big file (~1GB) will hit it at once.

 Thanks,
 Joe

 Joe,
 Can you please send lspci -vvv output for failing port before issue
 occurs.
 Thanks.

 # lspci -s 05:00.0 -vvv
 05:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet
 Controller (Copper) (rev 06)
  Subsystem: Oracle Corporation x4 PCI-Express Quad Gigabit Ethernet
 UTP Low Profile Adapter
  Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
 Stepping- SERR- FastB2B- DisINTx+
  Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort-
 TAbort- MAbort- SERR- PERR- INTx-
  Latency: 0, Cache Line Size: 256 bytes
  Interrupt: pin B routed to IRQ 80
  Region 0: Memory at fbde (32-bit, non-prefetchable) [size=128K]
  Region 1: Memory at fbdc (32-bit, non-prefetchable) [size=128K]
  Region 2: I/O ports at dc00 [size=32]
  Expansion ROM at fbda [disabled] [size=128K]
  Capabilities: [c8] Power Management version 2
  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-
 ,D3hot+,D3cold+)
  Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
  Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
  Address: fee21000  Data: 40cb
  Capabilities: [e0] Express (v1) Endpoint, MSI 00
  DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
 512ns, L1 64us
  ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
  DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
 Unsupported-
  RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
  MaxPayload 128 bytes, MaxReadReq 512 bytes
  DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+
 TransPend-
  LnkCap: Port #2, Speed 2.5GT/s, Width x4, ASPM L0s,
 Latency L0 4us, L1 64us
  ClockPM- Surprise- LLActRep- BwNot-
  LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain-
 CommClk-
  ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
  LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+
 DLActive- BWMgmt- ABWMgmt-
  Capabilities: [100 v1] Advanced Error Reporting
  UESta:  DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt- UnxCmplt-
 RxOF- MalfTLP+ ECRC- UnsupReq+ ACSViol-
  UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
 RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
  UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt-
 UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
  CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
  CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
  AERCap: First Error Pointer: 12, GenCap- CGenEn- ChkCap-
 ChkEn-
  Capabilities: [140 v1] Device Serial Number 00-15-17-ff-ff-b9-77-9c
  Kernel driver in use: e1000e
  Kernel modules: e1000e


 Thanks,
 Joe
 
 was this lspci output taken on freshly booted system?
 

Yes, any issue do you find?

Thanks,
Joe



--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-07-11 Thread Dave, Tushar N
-Original Message-
From: Joe Jin [mailto:joe@oracle.com]
Sent: Wednesday, July 11, 2012 12:39 AM
To: Dave, Tushar N
Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
ker...@vger.kernel.org
Subject: Re: 82571EB: Detected Hardware Unit Hang

On 07/11/12 15:37, Dave, Tushar N wrote:
 -Original Message-
 From: Joe Jin [mailto:joe@oracle.com]
 Sent: Wednesday, July 11, 2012 12:18 AM
 To: Dave, Tushar N
 Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
 ker...@vger.kernel.org
 Subject: Re: 82571EB: Detected Hardware Unit Hang

 On 07/11/12 15:11, Dave, Tushar N wrote:
 -Original Message-
 From: Joe Jin [mailto:joe@oracle.com]
 Sent: Tuesday, July 10, 2012 10:03 PM
 To: Dave, Tushar N
 Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
 ker...@vger.kernel.org
 Subject: Re: 82571EB: Detected Hardware Unit Hang

 On 07/11/12 12:05, Dave, Tushar N wrote:
 When you said you had this issue with RHEL5 and RHEL6 drivers,
 have you
 install RHEl5/6 kernel and reproduced it? If so I think I should
 install
 RHEL6 and try reproduce it locally!

 Yes I reproduced this on both RHEL5 and RHEL6.

 So far I tried to scp big file (~1GB) will hit it at once.

 Thanks,
 Joe

 Joe,
 Can you please send lspci -vvv output for failing port before issue
 occurs.
 Thanks.

 # lspci -s 05:00.0 -vvv
 05:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit
 Ethernet Controller (Copper) (rev 06)
 Subsystem: Oracle Corporation x4 PCI-Express Quad Gigabit Ethernet
 UTP Low Profile Adapter
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
 Stepping- SERR- FastB2B- DisINTx+
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort-
 TAbort- MAbort- SERR- PERR- INTx-
 Latency: 0, Cache Line Size: 256 bytes
 Interrupt: pin B routed to IRQ 80
 Region 0: Memory at fbde (32-bit, non-prefetchable) [size=128K]
 Region 1: Memory at fbdc (32-bit, non-prefetchable) [size=128K]
 Region 2: I/O ports at dc00 [size=32]
 Expansion ROM at fbda [disabled] [size=128K]
 Capabilities: [c8] Power Management version 2
 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-
 ,D3hot+,D3cold+)
 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
 Address: fee21000  Data: 40cb
 Capabilities: [e0] Express (v1) Endpoint, MSI 00
 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
 512ns, L1 64us
 ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
 DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
 Unsupported-
 RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
 MaxPayload 128 bytes, MaxReadReq 512 bytes
 DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+
 TransPend-
 LnkCap: Port #2, Speed 2.5GT/s, Width x4, ASPM L0s,
 Latency L0 4us, L1 64us
 ClockPM- Surprise- LLActRep- BwNot-
 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain-
 CommClk-
 ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
 LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+
 DLActive- BWMgmt- ABWMgmt-
 Capabilities: [100 v1] Advanced Error Reporting
 UESta:  DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt- UnxCmplt-
 RxOF- MalfTLP+ ECRC- UnsupReq+ ACSViol-
 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
 RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
 UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt-
 UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
 AERCap: First Error Pointer: 12, GenCap- CGenEn- ChkCap-
 ChkEn-
 Capabilities: [140 v1] Device Serial Number 00-15-17-ff-ff-b9-77-9c
 Kernel driver in use: e1000e
 Kernel modules: e1000e


 Thanks,
 Joe

 was this lspci output taken on freshly booted system?


Yes, any issue do you find?

Thanks,
Joe


Device status and AER sections show some errors that looks little suspicious to 
me but I'm not too sure. I will get back tomorrow.

-Tushar

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-07-11 Thread Joe Jin
On 07/11/12 15:50, Dave, Tushar N wrote:
 Device status and AER sections show some errors that looks little suspicious 
 to me but I'm not too sure. I will get back tomorrow.
 

Thanks a lot, Tushar!

Joe


-- 
Oracle http://www.oracle.com
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 



--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


[E1000-devel] 82571EB - Detected Hardware Unit Hang

2012-07-11 Thread Andrew Peng
Folks, I've been getting some strange error messages in my home server
/ router that I've been having trouble debugging. I'm decently
proficient in Linux, but I fear I'm in over my head with this one.

The hardware is a HP N40L Microserver - here are the hardware details
- http://n40l.wikia.com/wiki/Base_Hardware

I am running Debian Squeeze 6.0:
pengc99@gaia:/$ sudo uname -a
Linux gaia 2.6.32-5-amd64 #1 SMP Sun May 6 04:00:17 UTC 2012 x86_64 GNU/Linux

I also subscribe to Ksplice's Uptrack system but since I have the
newest kernel installed (as released by Debian) there have been no
hot-patches yet.

This is the message I've been getting in /var/log/kern.log:
Jul 11 08:55:38 gaia kernel: [402056.009687] e1000e :02:00.0:
eth1: Detected Hardware Unit Hang:
Jul 11 08:55:38 gaia kernel: [402056.009690]   TDH  fc
Jul 11 08:55:38 gaia kernel: [402056.009692]   TDT  fd
Jul 11 08:55:38 gaia kernel: [402056.009693]   next_to_use  fd
Jul 11 08:55:38 gaia kernel: [402056.009694]   next_to_cleanfc
Jul 11 08:55:38 gaia kernel: [402056.009695] buffer_info[next_to_clean]:
Jul 11 08:55:38 gaia kernel: [402056.009697]   time_stamp   105fc92b2
Jul 11 08:55:38 gaia kernel: [402056.009698]   next_to_watchfc
Jul 11 08:55:38 gaia kernel: [402056.009699]   jiffies  105fc93da
Jul 11 08:55:38 gaia kernel: [402056.009700]   next_to_watch.status 0
Jul 11 08:55:38 gaia kernel: [402056.009701] MAC Status 80383
Jul 11 08:55:38 gaia kernel: [402056.009702] PHY Status 792d
Jul 11 08:55:38 gaia kernel: [402056.009703] PHY 1000BASE-T Status  3800
Jul 11 08:55:38 gaia kernel: [402056.009705] PHY Extended Status3000
Jul 11 08:55:38 gaia kernel: [402056.009706] PCI Status 10

Complete output of lspci:
pengc99@gaia:/$ lspci
00:00.0 Host bridge: Advanced Micro Devices [AMD] RS880 Host Bridge
00:01.0 PCI bridge: Hewlett-Packard Company Device 9602
00:02.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI
bridge (ext gfx port 0)
00:06.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI
bridge (PCIE port 2)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA
Controller [AHCI mode] (rev 40)
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 42)
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host
controller (rev 40)
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge (rev 40)
00:16.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:16.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
Link Control
01:05.0 VGA compatible controller: ATI Technologies Inc M880G
[Mobility Radeon HD 4200]
02:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit
Ethernet Controller (rev 06)
02:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit
Ethernet Controller (rev 06)
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5723
Gigabit Ethernet PCIe (rev 10)

Output of lspci -vvv (as root, network adapter section):
02:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit
Ethernet Controller (rev 06)
Subsystem: Hewlett-Packard Company NC360T PCI Express Dual
Port Gigabit Server Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort-
TAbort- MAbort- SERR- PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 26
Region 0: Memory at fe8e (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at fe8c (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at e800 [size=32]
Expansion ROM at fe8a [disabled] [size=128K]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: fee0300c  Data: 4191
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, 

Re: [E1000-devel] Bonding + ixgbe breaks with jumbo frames if the MTU is not set on bond0 before adding slaves

2012-07-11 Thread Ko, Stephen S
Hi Nathan,

Our engineers have root caused this issue in the driver and is currently 
undergoing validation. Once it passes testing we'll post it up on sourceforge.

Thanks again for the report.

Stephen

 -Original Message-
 From: Nathan March [mailto:nat...@gt.net]
 Sent: Tuesday, July 03, 2012 10:53 AM
 To: Ko, Stephen S
 Cc: e1000-devel@lists.sourceforge.net
 Subject: Re: [E1000-devel] Bonding + ixgbe breaks with jumbo frames if the
 MTU is not set on bond0 before adding slaves
 
 Here's a slightly cleaner dmesg with some changes removed that I'd hacked
 in to try to fix:
 
 http://pastebin.com/DkjFEdfc
 
 I'm not doing anything special on bootup anymore, this is a standard gentoo
 system booting. Configured using:
 
 config_eth2=null
 config_eth3=null
 rc_need_bond0=net.eth2 net.eth3
 
 config_bond0=(
  10.1.14.23 broadcast 10.1.14.255 netmask 255.255.255.0
 )
 slaves_bond0=eth2 eth3
 modules_bond0=( ifconfig !iproute2 )
 
 mtu_eth2=9000
 mtu_eth3=9000
 mtu_bond0=9000
 
 - Nathan
 
 On 7/3/2012 10:32 AM, Nathan March wrote:
  Hi Stephen,
 
  Sure, dmesg is here: http://pastebin.com/VyH6gA4A
 
  xen13 ~ # modinfo bonding | head
  filename: /lib/modules/3.2.7/kernel/drivers/net/bonding/bonding.ko
  alias:  rtnl-link-bond
  author: Thomas Davis, tada...@lbl.gov and many others
  description:Ethernet Channel Bonding Driver, v3.7.1
  version:3.7.1
  license:GPL
  srcversion: 35B9A516B4FC085FFCBEF61
  depends:
  intree: Y
  vermagic:   3.2.7 SMP mod_unload
 
  Happy to provide any other info if needed, or access to a test machine.
 
  - Nathan
 
  On 6/29/2012 9:01 PM, Ko, Stephen S wrote:
  Hi Nathan,
 
  Thanks for the report. We are trying to reproduce this issue in our lab.
 
  Could you please send us:
 
  - dmesg
  - modinfo bonding | head
 
  Just to narrow down the issue, have you tried this on all other
  interfaces besides ixgbe? (e.g. igb, e1000, or e1000e)?
We will keep you informed of progress on our end.
 
  Thanks,
  Stephen
 
  -Original Message-
  From: Nathan March [mailto:nat...@gt.net]
  Sent: Friday, June 29, 2012 3:17 PM
  To: e1000-devel@lists.sourceforge.net
  Subject: [E1000-devel] Bonding + ixgbe breaks with jumbo frames if
  the MTU is not set on bond0 before adding slaves
 
  Hi All,
 
  I think I've found a bug in the ixgbe driver when using bonding +
  jumbo frames. Adding slaves to the bond device and setting mtu 9000
  after enslaving, results in one of the slaves dropping traffic. The
  strange thing is putting bond0 into promiscuous mode (by running
  tcpdump) will solve the problem (until you close tcpdump).
 
  Here's a test script I've put together to reproduce the problem:
 
  #!/bin/bash -x
  rmmod ixgbe
  rmmod bonding
  modprobe bonding miimon=100 mode=4
  modprobe ixgbe
  ifconfig bond0 up mtu 1500
  ifconfig eth2 up
  ifconfig eth3 up
  ifenslave bond0 eth2 eth3
  ifconfig bond0 10.1.14.23 broadcast 10.1.14.255 netmask
  255.255.255.0 mtu
  9000
 
  Changing line #6 to be 'mtu 9000' no longer triggers the bug and
  networking works perfectly.
 
  This is on an Intel X540-T2 connected to a pair of Arista 1050T
  (mlag) on kernel
  3.2.7. I'm using the bonding module built into the kernel with ixgbe
  3.9.17.
 
  - Nathan
 
  --
  Nathan March nat...@gt.net
  Gossamer Threads Inc. http://www.gossamer-threads.com/
  Tel: (604) 687-5804 Fax: (604) 687-5806
 
 
  
  --
 
  Live Security Virtual Conference
  Exclusive live event will cover all the ways today's security and
  threat landscape has changed and how IT managers can respond.
  Discussions
  will include endpoint security, mobile security and the latest in
  malware threats.
  http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
  ___
  E1000-devel mailing list
  E1000-devel@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/e1000-devel
  To learn more about Intel#174; Ethernet, visit
  http://communities.intel.com/community/wired
 
 
 
 
 --
 Nathan March nat...@gt.net
 Gossamer Threads Inc. http://www.gossamer-threads.com/
 Tel: (604) 687-5804 Fax: (604) 687-5806


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-07-11 Thread Dave, Tushar N
-Original Message-
From: Joe Jin [mailto:joe@oracle.com]
Sent: Tuesday, July 10, 2012 10:03 PM
To: Dave, Tushar N
Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
ker...@vger.kernel.org
Subject: Re: 82571EB: Detected Hardware Unit Hang

On 07/11/12 12:05, Dave, Tushar N wrote:
 When you said you had this issue with RHEL5 and RHEL6 drivers, have you
install RHEl5/6 kernel and reproduced it? If so I think I should install
RHEL6 and try reproduce it locally!

Yes I reproduced this on both RHEL5 and RHEL6.

So far I tried to scp big file (~1GB) will hit it at once.

Thanks,
Joe

Joe,

I see couple of errors in lspci output.
Device capability status register shows UnCorrectable PCIe error. This means 
there is certainly something went wrong. The only way to recover from 
Uncorrectable errors is reset.
   
DevSta: CorrErr- *UncorrErr+ FatalErr+ UnsuppReq+ AuxPwr+ TransPend-

Also AER sections in lspci output shows PCIe completion timeout.

Capabilities: [100 v1] Advanced Error Reporting
UESta:  DLP- SDES- TLP- FCP- *CmpltTO+ CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP+ ECRC- UnsupReq+ ACSViol-

I suggest you should load AER driver and check for any error messages in log. 
Also please check any error message reported by system in BIOS log. Are there 
any machine check errors? 

When did you notice this issue? have 82571 ever been working before on this 
server?

One more thing, Cache line size 256 is little unusual( I never seen this value 
before, mostly it's 64). Does BIOS settings have been changed? Are you using 
default BIOS setting?

Thanks.

-Tushar

  




--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Bonding + ixgbe breaks with jumbo frames if the MTU is not set on bond0 before adding slaves

2012-07-11 Thread Ko, Stephen S
Hi Nathan,

If you'd like to try, attached is a patch to fix the issue. The patch was 
generated against 3.9.17 driver.

Thanks,
Stephen

 -Original Message-
 From: Nathan March [mailto:nat...@gt.net]
 Sent: Tuesday, July 03, 2012 10:53 AM
 To: Ko, Stephen S
 Cc: e1000-devel@lists.sourceforge.net
 Subject: Re: [E1000-devel] Bonding + ixgbe breaks with jumbo frames if the
 MTU is not set on bond0 before adding slaves
 
 Here's a slightly cleaner dmesg with some changes removed that I'd hacked
 in to try to fix:
 
 http://pastebin.com/DkjFEdfc
 
 I'm not doing anything special on bootup anymore, this is a standard gentoo
 system booting. Configured using:
 
 config_eth2=null
 config_eth3=null
 rc_need_bond0=net.eth2 net.eth3
 
 config_bond0=(
  10.1.14.23 broadcast 10.1.14.255 netmask 255.255.255.0
 )
 slaves_bond0=eth2 eth3
 modules_bond0=( ifconfig !iproute2 )
 
 mtu_eth2=9000
 mtu_eth3=9000
 mtu_bond0=9000
 
 - Nathan
 
 On 7/3/2012 10:32 AM, Nathan March wrote:
  Hi Stephen,
 
  Sure, dmesg is here: http://pastebin.com/VyH6gA4A
 
  xen13 ~ # modinfo bonding | head
  filename: /lib/modules/3.2.7/kernel/drivers/net/bonding/bonding.ko
  alias:  rtnl-link-bond
  author: Thomas Davis, tada...@lbl.gov and many others
  description:Ethernet Channel Bonding Driver, v3.7.1
  version:3.7.1
  license:GPL
  srcversion: 35B9A516B4FC085FFCBEF61
  depends:
  intree: Y
  vermagic:   3.2.7 SMP mod_unload
 
  Happy to provide any other info if needed, or access to a test machine.
 
  - Nathan
 
  On 6/29/2012 9:01 PM, Ko, Stephen S wrote:
  Hi Nathan,
 
  Thanks for the report. We are trying to reproduce this issue in our lab.
 
  Could you please send us:
 
  - dmesg
  - modinfo bonding | head
 
  Just to narrow down the issue, have you tried this on all other
  interfaces besides ixgbe? (e.g. igb, e1000, or e1000e)?
We will keep you informed of progress on our end.
 
  Thanks,
  Stephen
 
  -Original Message-
  From: Nathan March [mailto:nat...@gt.net]
  Sent: Friday, June 29, 2012 3:17 PM
  To: e1000-devel@lists.sourceforge.net
  Subject: [E1000-devel] Bonding + ixgbe breaks with jumbo frames if
  the MTU is not set on bond0 before adding slaves
 
  Hi All,
 
  I think I've found a bug in the ixgbe driver when using bonding +
  jumbo frames. Adding slaves to the bond device and setting mtu 9000
  after enslaving, results in one of the slaves dropping traffic. The
  strange thing is putting bond0 into promiscuous mode (by running
  tcpdump) will solve the problem (until you close tcpdump).
 
  Here's a test script I've put together to reproduce the problem:
 
  #!/bin/bash -x
  rmmod ixgbe
  rmmod bonding
  modprobe bonding miimon=100 mode=4
  modprobe ixgbe
  ifconfig bond0 up mtu 1500
  ifconfig eth2 up
  ifconfig eth3 up
  ifenslave bond0 eth2 eth3
  ifconfig bond0 10.1.14.23 broadcast 10.1.14.255 netmask
  255.255.255.0 mtu
  9000
 
  Changing line #6 to be 'mtu 9000' no longer triggers the bug and
  networking works perfectly.
 
  This is on an Intel X540-T2 connected to a pair of Arista 1050T
  (mlag) on kernel
  3.2.7. I'm using the bonding module built into the kernel with ixgbe
  3.9.17.
 
  - Nathan
 
  --
  Nathan March nat...@gt.net
  Gossamer Threads Inc. http://www.gossamer-threads.com/
  Tel: (604) 687-5804 Fax: (604) 687-5806
 
 
  
  --
 
  Live Security Virtual Conference
  Exclusive live event will cover all the ways today's security and
  threat landscape has changed and how IT managers can respond.
  Discussions
  will include endpoint security, mobile security and the latest in
  malware threats.
  http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
  ___
  E1000-devel mailing list
  E1000-devel@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/e1000-devel
  To learn more about Intel#174; Ethernet, visit
  http://communities.intel.com/community/wired
 
 
 
 
 --
 Nathan March nat...@gt.net
 Gossamer Threads Inc. http://www.gossamer-threads.com/
 Tel: (604) 687-5804 Fax: (604) 687-5806



ixgbe_fix_mac_flush.patch
Description: ixgbe_fix_mac_flush.patch
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


[E1000-devel] ArraiƔa de Ofertas Refrimur!!!

2012-07-11 Thread Refrimur
nbsp;


Problemas para visualizar a mensagem? Acesse aqui
  

  nbsp;
  


  
  


Clique para natilde;o receber nossos emails
  
  
nbsp;
  
  

  
  
nbsp;

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


[E1000-devel] ixgbe 3.7.21: NULL skb deref in ixgbe_clean_rx_irq_ps()

2012-07-11 Thread akepner

Using the 3.7.21 version of the ixgbe driver we can reliably 
produce a crash with this signature: 

BUG: unable to handle kernel NULL pointer dereference at 006c
IP: [a005afef] ixgbe_poll+0x9df/0x1710 [ixgbe]
PGD 814c7b067 PUD 8074dd067 PMD 0 
Oops:  [#1] SMP 
last sysfs file: /sys/devices/virtual/bypass/8-9/ping_watchdog
CPU 2 
Pid: 18925, comm: sport Tainted: P      2.6.32-perf #1 
To Be Filled By O.E.M.
RIP: 0010:[a005afef]  [a005afef] ixgbe_poll+0x9df/0x1710 
[ixgbe]
RSP: 0018:88080750b8b0  EFLAGS: 00010246
RAX:  RBX: 88040f816f00 RCX: 
RDX: 0020 RSI: c9000429c000 RDI: 88040f891d80
RBP: 88080750b970 R08: 0100 R09: 
R10: 0100 R11: 88080750bfd8 R12: 
R13: c900041221b8 R14: 8804077580b0 R15: 000b
FS:  7f61ccda9700() GS:88002828() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 006c CR3: 000814436000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process sport (pid: 18925, threadinfo 88080750a000, task 880814beeb60)
Stack:
   8804148b0540 000e 88040703d1c0
0 8804148b0598 88080750b918 815671ac 0001000359c2
0 880410744700 0040 88040f891d80 004011087c9c
Call Trace:
 [815671ac] ? ip_finish_output+0x13c/0x310
 [8152b468] net_rx_action+0xb8/0x400
 [81517a84] ? sock_def_readable+0x44/0x80
 [81066a91] __do_softirq+0xc1/0x1d0
 [8100c1ec] call_softirq+0x1c/0x30
 [8100de25] do_softirq+0x65/0xa0
 [8106699a] local_bh_enable+0x9a/0xb0
 [815176fc] lock_sock_nested+0xac/0xc0
 [81641f0b] ? _spin_unlock_bh+0x1b/0x20
 [81517627] ? release_sock+0xd7/0x100
 [81571838] tcp_recvmsg+0x38/0xe80
 [812d4c19] ? cpumask_next_and+0x29/0x50
 [8104b6f4] ? find_busiest_group+0x244/0xb10
 [810544d2] ? default_wake_function+0x12/0x20
 [81516cf9] sock_common_recvmsg+0x39/0x50
 [81516829] sock_aio_read+0x159/0x160
 [8104dbd3] ? perf_event_task_sched_out+0x33/0x80
 [810097ac] ? __switch_to+0x1ac/0x320
 [815166d0] ? sock_aio_read+0x0/0x160
 [811533bb] do_sync_readv_writev+0xfb/0x140
 [810853b0] ? autoremove_wake_function+0x0/0x40
 [811543df] do_readv_writev+0xcf/0x1f0
 [8156dc0d] ? do_tcp_getsockopt+0x3d/0x5f0
 [81012879] ? read_tsc+0x9/0x20
 [8108fc13] ? ktime_get+0x63/0xe0
 [810650c2] ? ns_to_timeval+0x12/0x40
 [810896af] ? hrtimer_get_remaining+0x3f/0x50
 [811546d3] vfs_readv+0x43/0x60
 [811547d1] sys_readv+0x51/0x80
 [8100b132] system_call_fastpath+0x16/0x1b
Code: c1 e5 03 4c 03 6b 20 4d 8b 65 00 49 c7 45 00 00 00 00 00 0f ae e8 48 8b 
53 28 31 c0 f6 c2 10 74 0a 41 f7 06 00 00 1e 00 0f 95 c0 41 8b 74 24 6c 49 8b 
8c 24 b0 01 00 00 85 f6 0f 18 09 0f 85 c0 
RIP  [a005afef] ixgbe_poll+0x9df/0x1710 [ixgbe]
 RSP 88080750b8b0
CR2: 006c
---[ end trace 9db4623b9591cd54 ]---

addr2line says this is happening on line 2028 below - so a NULL skb 
pointer is being passed to skb_is_nonlinear():

 1990 static bool ixgbe_clean_rx_irq_ps(struct ixgbe_q_vector *q_vector,
 1991   struct ixgbe_ring *rx_ring,
 1992   int budget)
 1993 {
.
 2021 rmb();
 2022 
 2023 pkt_is_rsc = ixgbe_get_rsc_state(rx_ring, rx_desc);
 2024 
 2025 prefetch(skb-data);
 2026 
 2027 /* pull the header of the skb in if no data is already 
present */
 2028 if (!skb_is_nonlinear(skb)) {
 2029 __skb_put(skb, ixgbe_get_hlen(rx_ring, rx_desc));

Anyone have a guess as to the cause? Or have you seen similar? 

One good clue that we've found is that the problem disappears if we 
turn off irq balancing. 

-- 
Arthur


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82571EB - Detected Hardware Unit Hang

2012-07-11 Thread Dave, Tushar N
-Original Message-
From: Andrew Peng [mailto:peng...@gmail.com]
Sent: Wednesday, July 11, 2012 8:50 AM
To: e1000-devel@lists.sourceforge.net
Subject: [E1000-devel] 82571EB - Detected Hardware Unit Hang

Folks, I've been getting some strange error messages in my home server /
router that I've been having trouble debugging. I'm decently proficient in
Linux, but I fear I'm in over my head with this one.

The hardware is a HP N40L Microserver - here are the hardware details
- http://n40l.wikia.com/wiki/Base_Hardware

I am running Debian Squeeze 6.0:
pengc99@gaia:/$ sudo uname -a
Linux gaia 2.6.32-5-amd64 #1 SMP Sun May 6 04:00:17 UTC 2012 x86_64
GNU/Linux

I also subscribe to Ksplice's Uptrack system but since I have the newest
kernel installed (as released by Debian) there have been no hot-patches
yet.

This is the message I've been getting in /var/log/kern.log:
Jul 11 08:55:38 gaia kernel: [402056.009687] e1000e :02:00.0:
eth1: Detected Hardware Unit Hang:
Jul 11 08:55:38 gaia kernel: [402056.009690]   TDH  fc
Jul 11 08:55:38 gaia kernel: [402056.009692]   TDT  fd
Jul 11 08:55:38 gaia kernel: [402056.009693]   next_to_use  fd
Jul 11 08:55:38 gaia kernel: [402056.009694]   next_to_cleanfc
Jul 11 08:55:38 gaia kernel: [402056.009695] buffer_info[next_to_clean]:
Jul 11 08:55:38 gaia kernel: [402056.009697]   time_stamp
105fc92b2
Jul 11 08:55:38 gaia kernel: [402056.009698]   next_to_watchfc
Jul 11 08:55:38 gaia kernel: [402056.009699]   jiffies
105fc93da
Jul 11 08:55:38 gaia kernel: [402056.009700]   next_to_watch.status 0
Jul 11 08:55:38 gaia kernel: [402056.009701] MAC Status
80383
Jul 11 08:55:38 gaia kernel: [402056.009702] PHY Status 792d
Jul 11 08:55:38 gaia kernel: [402056.009703] PHY 1000BASE-T Status  3800
Jul 11 08:55:38 gaia kernel: [402056.009705] PHY Extended Status3000
Jul 11 08:55:38 gaia kernel: [402056.009706] PCI Status 10

Complete output of lspci:
pengc99@gaia:/$ lspci
00:00.0 Host bridge: Advanced Micro Devices [AMD] RS880 Host Bridge
00:01.0 PCI bridge: Hewlett-Packard Company Device 9602
00:02.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge
(ext gfx port 0)
00:06.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge
(PCIE port 2)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller
[AHCI mode] (rev 40)
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0
Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI
Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0
Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI
Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 42)
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller
(rev 40)
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge (rev 40)
00:16.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0
Controller
00:16.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI
Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
Link Control
01:05.0 VGA compatible controller: ATI Technologies Inc M880G [Mobility
Radeon HD 4200]
02:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet
Controller (rev 06)
02:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet
Controller (rev 06)
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5723
Gigabit Ethernet PCIe (rev 10)

Output of lspci -vvv (as root, network adapter section):
02:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet
Controller (rev 06)
Subsystem: Hewlett-Packard Company NC360T PCI Express Dual Port
Gigabit Server Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort-
TAbort- MAbort- SERR- PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 26
Region 0: Memory at fe8e (32-bit, non-prefetchable)
[size=128K]
Region 1: Memory at fe8c (32-bit, non-prefetchable)
[size=128K]
Region 2: I/O ports at e800 [size=32]
Expansion ROM at fe8a [disabled] [size=128K]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 

Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-07-11 Thread Joe Jin
On 07/12/12 02:51, Dave, Tushar N wrote:
 
 Joe,
 
 I see couple of errors in lspci output.
 Device capability status register shows UnCorrectable PCIe error. This means 
 there is certainly something went wrong. The only way to recover from 
 Uncorrectable errors is reset.

   DevSta: CorrErr- *UncorrErr+ FatalErr+ UnsuppReq+ AuxPwr+ TransPend-
 
 Also AER sections in lspci output shows PCIe completion timeout.
   
   Capabilities: [100 v1] Advanced Error Reporting
   UESta:  DLP- SDES- TLP- FCP- *CmpltTO+ CmpltAbrt- UnxCmplt- 
 RxOF- MalfTLP+ ECRC- UnsupReq+ ACSViol-
 
 I suggest you should load AER driver and check for any error messages in log. 
 Also please check any error message reported by system in BIOS log. Are there 
 any machine check errors? 
 
 When did you notice this issue? have 82571 ever been working before on this 
 server?
 
 One more thing, Cache line size 256 is little unusual( I never seen this 
 value before, mostly it's 64). Does BIOS settings have been changed? Are you 
 using default BIOS setting?
 

I checked BIOS's log found the fault from the device, I changed PCI-E Payload 
Size
from 256(default) to 128, now the device works.

I compared lspci output found Address for data of MSI Capabilities's be changed:

Old:
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: fee21000  Data: 40cb

New:
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: fee24000  Data: 405c

Mostly like it's a BIOS bug? please comments.

Thanks,
Joe


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-07-11 Thread Joe Jin
On 07/12/12 10:52, Dave, Tushar N wrote:
 What is the exact error messages in BIOS log?

Error message from BIOS event log:
07/12/12 05:54:00
PCI Express Non-Fatal Error

Thanks,
Joe

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-07-11 Thread Dave, Tushar N
-Original Message-
From: Joe Jin [mailto:joe@oracle.com]
Sent: Wednesday, July 11, 2012 7:58 PM
To: Dave, Tushar N
Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
ker...@vger.kernel.org
Subject: Re: 82571EB: Detected Hardware Unit Hang

On 07/12/12 10:52, Dave, Tushar N wrote:
 What is the exact error messages in BIOS log?

Error message from BIOS event log:
07/12/12 05:54:00
PCI Express Non-Fatal Error

Thanks,
Joe

Thanks.  Well, I will check with team tomorrow if this  (max payload size) can 
be treated as solution to this issue. 
We can know more about what exact non-fatal error occurred if we capture bus 
trace.
We should check the eeprom on this device to make sure they are up-to-date.
Send me the full eeprom dump in a file and I will confirm with team that it is 
up-to-date.
Thanks for your work.

-Tushar

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-07-11 Thread Dave, Tushar N
-Original Message-
From: Joe Jin [mailto:joe@oracle.com]
Sent: Wednesday, July 11, 2012 8:13 PM
To: Dave, Tushar N
Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
ker...@vger.kernel.org
Subject: Re: 82571EB: Detected Hardware Unit Hang

On 07/12/12 11:07, Dave, Tushar N wrote:
 -Original Message-
 From: Joe Jin [mailto:joe@oracle.com]
 Sent: Wednesday, July 11, 2012 7:58 PM
 To: Dave, Tushar N
 Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
 ker...@vger.kernel.org
 Subject: Re: 82571EB: Detected Hardware Unit Hang

 On 07/12/12 10:52, Dave, Tushar N wrote:
 What is the exact error messages in BIOS log?

 Error message from BIOS event log:
 07/12/12 05:54:00
PCI Express Non-Fatal Error

 Thanks,
 Joe
Hi Tushar,

Please find eeprom from attachment.

Do you have lspci -vvv dump of entire system before and after issue occurs? If 
you have can you send it to me?


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired