Re: [E1000-devel] 82571EB - Detected Hardware Unit Hang

2012-08-28 Thread Nikolay Popov
29.08.2012 6:29, Dave, Tushar N пишет:
 Thanks for the info.
 For both, 82571 and 80003ES2LAN, I see UnsuppReq+ and  UncorrErr+ in lspci
 (DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend+)

 Have you tried disabling tso (ethtool -K tso off)?
Yes, this doesn't help

 Was this working okay before with old driver or old kernel?

At least at 3.3.6 I don't see this warning messages in syslog

Regards, Nikolay



--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82571EB - Detected Hardware Unit Hang

2012-08-28 Thread Nikolay Popov
Hi, Dave!

Ok, I have set msglevel as you requested, let's wait for some logs
Also, about versions - we using 1.11.3-NAPI on both 3.3.6 and 3.5.2 hosts.
We was enforced to do that because with default kernel driver (at least 2.0.0 
at 3.5.2) we see some misterious drops and delays (~1-2%, and delays up to 
2000ms) that appears once per few minutes. Downgrading driver to 1.11.3-NAPI 
solves this issue (that we'll discuss in separate topic I suppose) but with 
this driver version we're running into TX hang trouble we're trying to find now.
I can't test if this problem appears in 2.x.x driver versions because hosts are 
in production and such kind of delays/losses aren't acceptable at all.

Regards, Nikolay


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82571EB - Detected Hardware Unit Hang

2012-08-25 Thread Nikolay Popov
Hi, All

It seems that I'm getting same problems with 3.5.2 kernel - 80003ES2LAN 
onboard NIC is going to reset from time to time under load

Aug 25 10:27:53 bras2 kernel: [134612.808590] e1000e :05:00.0: eth2: 
Detected Hardware Unit Hang:
Aug 25 10:27:53 bras2 kernel: [134612.808590]   TDH cd
Aug 25 10:27:53 bras2 kernel: [134612.808590]   TDT b9
Aug 25 10:27:53 bras2 kernel: [134612.808590]   next_to_use b9
Aug 25 10:27:53 bras2 kernel: [134612.808590]   next_to_clean cc
Aug 25 10:27:53 bras2 kernel: [134612.808590] buffer_info[next_to_clean]:
Aug 25 10:27:53 bras2 kernel: [134612.808590]   time_stamp 1020057ff
Aug 25 10:27:53 bras2 kernel: [134612.808590]   next_to_watch cf
Aug 25 10:27:53 bras2 kernel: [134612.808590]   jiffies 102005cda
Aug 25 10:27:53 bras2 kernel: [134612.808590]   next_to_watch.status 0
Aug 25 10:27:53 bras2 kernel: [134612.808590] MAC Status 2080783
Aug 25 10:27:53 bras2 kernel: [134612.808590] PHY Status 792d
Aug 25 10:27:53 bras2 kernel: [134612.808590] PHY 1000BASE-T Status 7800
Aug 25 10:27:53 bras2 kernel: [134612.808590] PHY Extended Status 3000
Aug 25 10:27:53 bras2 kernel: [134612.808590] PCI Status 10
Aug 25 10:27:55 bras2 kernel: [134614.816086] e1000e :05:00.0: eth2: 
Reset adapter
Aug 25 10:27:58 bras2 kernel: [134617.654599] e1000e: eth2 NIC Link is 
Up 1000 Mbps Full Duplex, Flow Control: Rx


root@bras2:~# ethtool -i eth2
driver: e1000e
version: 1.11.3-NAPI
firmware-version: 1.0-0
bus-info: :05:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

root@bras2:~# lspci | grep 05:00.0
05:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit 
Ethernet Controller (Copper) (rev 01)

Mainboard: Intel S5000PAL

I used to fall back to 1.11.3-NAPI driver version because with kernel 
2.0.0 (and also with 2.0.0.1 from sf.net) there were a lot of random 
packet drops and latency spikes, so 1.11.3 is more acceptable to 
production.
While reset traffic stop going, iowait increase up to 100% and then link 
flaps and all became normal until next reset that could happen in 1 
hour, or in 1 day. Also I noticed, that resets aren't correlate with 
traffic load. It could happen ever when NIC is almost idle, transferring 
~30-40 mbps.

Is there anything we can do to fix this issue?

Regards, Nikolay



--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired