Re: [E1000-devel] High number of rx_missed_errors when chaning from 1.0.2-k2 to 1.2.20-k2/1.5.1-k
Hi as this email is not showing up in the SF list archive, I'm resending it (sorry Jesse for you receiving it again) On Friday 27 January 2012 13:43:54 Carsten Aulbert wrote: Hi On Friday 27 January 2012 08:31:06 Jesse Brandeburg wrote: Hi Carsten, it sounds to me like this might be related to ASPM, can you try the boot option pcie_aspm=off before you do that please capture the output of lspci -vvv and attach it to the bug (or send it here I suppose) also include ethtool -e ethX output as an attachment, I'm interested to see some settings in your eeprom. good guess. I've played a bit with git bisection today and the first bad commit was: 6f461f6c7c961f0b1b73c0f27becf472a0ac606b is the first bad commit commit 6f461f6c7c961f0b1b73c0f27becf472a0ac606b Author: Bruce Allan bruce.w.al...@intel.com Date: Tue Apr 27 03:33:04 2010 + e1000e: enable/disable ASPM L0s and L1 and ERT according to hardware errata Prompted by a previous patch submitted by Matthew Garret m...@redhat.com, further digging into errata documentation reveals the current enabling or disabling of ASPM L0s and L1 states for certain parts supported by this driver are incorrect. 82571 and 82572 should always disable L1. For standard frames, 82573/82574/82583 can enable L1 but L0s must be disabled, and for jumbo frames 82573/82574 must disable L1. This allows for some parts to enable L1 in certain configurations leading to better power savings. Also according to the same errata, Early Receive (ERT) should be disabled on 82573 when using jumbo frames. Cc: Matthew Garret m...@redhat.com Signed-off-by: Bruce Allan bruce.w.al...@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com Signed-off-by: David S. Miller da...@davemloft.net :04 04 d147144ee9bb8987b603de3f168193f771c6b05b 11799137fec9091ebacbe3532cd5c4029806bfb2 M drivers (I used the linux-stable tree). Does this help and can I enable/disable ERT? Cheers Carsten -- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
[E1000-devel] High number of rx_missed_errors when chaning from 1.0.2-k2 to 1.2.20-k2/1.5.1-k
Hi all on our old venerable compute nodes, we have 2 e1000e driven network connections (82573E/82573L combo - Supermicro PDSML-LN2+) Currently we are upgrading from Debian Lenny to Debian Squeeze and during this process also upgrade from 2.6.32.y vanilla to 3.2.x vanilla (but see the same with Debian's default kernel 2.6.32-z). The e1000e version thus changes from 1.0.2-k2 to 1.2.20-k2 or 1.5.1-k2. The standard testcase is running goodhost: dd if=/dev/zero | nc badhost 5 badhost: nc -l -p 5 /dev/null with 1.0.2-k2 and default options (except crcstripping=0) we get close to 120 MB/s and no dropped packets. rebooting the system to a kernel with a newer driver yields only 150-250kB/s throughput and a packet drop-rate close to 20%.. I'm attaching quite a number of files to this post, but would like to learn how to find out, what's wrong and how to fix it. This error seemed to be popping up here and there on this list and elsewhere, but so far I've yet to find a definite answer ... Cheers TIA Carsten -- Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics Callinstrasse 38, 30167 Hannover, Germany Phone/Fax: +49 511 762-17185 / -17193 http://www.top500.org/system/9234 | http://www.top500.org/connfam/6 CaCert Assurer | Get free certificates from http://www.cacert.org/ Pause parameters for eth1: Autonegotiate: on RX: off TX: off Coalesce parameters for eth1: Adaptive RX: off TX: off stats-block-usecs: 0 sample-interval: 0 pkt-rate-low: 0 pkt-rate-high: 0 rx-usecs: 3 rx-frames: 0 rx-usecs-irq: 0 rx-frames-irq: 0 tx-usecs: 0 tx-frames: 0 tx-usecs-irq: 0 tx-frames-irq: 0 rx-usecs-low: 0 rx-frame-low: 0 tx-usecs-low: 0 tx-frame-low: 0 rx-usecs-high: 0 rx-frame-high: 0 tx-usecs-high: 0 tx-frame-high: 0 Ring parameters for eth1: Pre-set maximums: RX: 4096 RX Mini:0 RX Jumbo: 0 TX: 4096 Current hardware settings: RX: 256 RX Mini:0 RX Jumbo: 0 TX: 256 driver: e1000e version: 1.5.1-k firmware-version: 0.5-7 bus-info: :0e:00.0 Offload parameters for eth1: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp-segmentation-offload: on udp-fragmentation-offload: off generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off ntuple-filters: off receive-hashing: off NIC statistics: rx_packets: 56544 tx_packets: 33023 rx_bytes: 268744245 tx_bytes: 3116133 rx_broadcast: 24362 tx_broadcast: 57 rx_multicast: 0 tx_multicast: 5 rx_errors: 0 tx_errors: 0 tx_dropped: 0 multicast: 0 collisions: 0 rx_length_errors: 0 rx_over_errors: 0 rx_crc_errors: 0 rx_frame_errors: 0 rx_no_buffer_count: 0 rx_missed_errors: 11784 tx_aborted_errors: 0 tx_carrier_errors: 0 tx_fifo_errors: 0 tx_heartbeat_errors: 0 tx_window_errors: 0 tx_abort_late_coll: 0 tx_deferred_ok: 0 tx_single_coll_ok: 0 tx_multi_coll_ok: 0 tx_timeout_count: 0 tx_restart_queue: 0 rx_long_length_errors: 0 rx_short_length_errors: 0 rx_align_errors: 0 tx_tcp_seg_good: 0 tx_tcp_seg_failed: 0 rx_flow_control_xon: 0 rx_flow_control_xoff: 0 tx_flow_control_xon: 0 tx_flow_control_xoff: 0 rx_long_byte_count: 268744245 rx_csum_offload_good: 32033 rx_csum_offload_errors: 0 rx_header_split: 30473 alloc_rx_buff_failed: 0 tx_smbus: 0 rx_smbus: 0 dropped_smbus: 0 rx_dma_failed: 0 tx_dma_failed: 0 -[:00]-+-00.0 8086:2778 +-01.0-[01]-- +-1c.0-[09]-- +-1c.4-[0d]00.0 8086:108c +-1c.5-[0e]00.0 8086:109a +-1d.0 8086:27c8 +-1d.1 8086:27c9 +-1d.2 8086:27ca +-1d.3 8086:27cb +-1d.7 8086:27cc +-1e.0-[0f]00.0 18ca:0020 +-1f.0 8086:27b8 +-1f.2 8086:27c0 \-1f.3 8086:27da -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] High number of rx_missed_errors when chaning from 1.0.2-k2 to 1.2.20-k2/1.5.1-k
Thanks for the report. The stats are showing true dropped packets at the NIC level. You do not have any no buffer counts so it's like the interrupts aren't being serviced fast enough. So some things to check are: - are the interrupts configured correctly? (error reports in the system log) - are interrupts being shared between the NIC and another device in the system? - are there other types of error messages in the system log files? - if you load and use the 1.0.2 driver (available on our Sourceforge site) with the newer Linux OSes, does the problem still happen? Please let us know. Cheers, John -Original Message- From: Carsten Aulbert [mailto:carsten.aulb...@aei.mpg.de] Sent: Thursday, January 26, 2012 2:07 AM To: e1000-de...@lists.sf.net Cc: Henning Fehrmann Subject: [E1000-devel] High number of rx_missed_errors when chaning from 1.0.2-k2 to 1.2.20-k2/1.5.1-k Hi all on our old venerable compute nodes, we have 2 e1000e driven network connections (82573E/82573L combo -) Currently we are upgrading from Debian Lenny to Debian Squeeze and during this process also upgrade from 2.6.32.y vanilla to 3.2.x vanilla (but see the same with Debian's default kernel 2.6.32-z). The e1000e version thus changes from 1.0.2-k2 to 1.2.20-k2 or 1.5.1-k2. The standard testcase is running goodhost: dd if=/dev/zero | nc badhost 5 badhost: nc -l -p 5 /dev/null with 1.0.2-k2 and default options (except crcstripping=0) we get close to 120 MB/s and no dropped packets. rebooting the system to a kernel with a newer driver yields only 150- 250kB/s throughput and a packet drop-rate close to 20%.. I'm attaching quite a number of files to this post, but would like to learn how to find out, what's wrong and how to fix it. This error seemed to be popping up here and there on this list and elsewhere, but so far I've yet to find a definite answer ... Cheers TIA Carsten -- Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics Callinstrasse 38, 30167 Hannover, Germany Phone/Fax: +49 511 762-17185 / -17193 http://www.top500.org/system/9234 | http://www.top500.org/connfam/6 CaCert Assurer | Get free certificates from http://www.cacert.org/ -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] High number of rx_missed_errors when chaning from 1.0.2-k2 to 1.2.20-k2/1.5.1-k
On Thu, 2012-01-26 at 02:06 -0800, Carsten Aulbert wrote: with 1.0.2-k2 and default options (except crcstripping=0) we get close to 120 MB/s and no dropped packets. rebooting the system to a kernel with a newer driver yields only 150-250kB/s throughput and a packet drop-rate close to 20%.. Hi Carsten, it sounds to me like this might be related to ASPM, can you try the boot option pcie_aspm=off before you do that please capture the output of lspci -vvv and attach it to the bug (or send it here I suppose) also include ethtool -e ethX output as an attachment, I'm interested to see some settings in your eeprom. I'm attaching quite a number of files to this post, but would like to learn how to find out, what's wrong and how to fix it. This error seemed to be popping up here and there on this list and elsewhere, but so far I've yet to find a definite answer ... as John said, rx_missed with no rx_no_buffer_count means that you're dropping packets in hardware which typically means that something is going wrong at the bus level or the PCIe transaction level, that ends up delaying packets, due to long memory latencies or something like that (just typical problems, not saying it is exactly your issue) aspm is one of those causes, there can be others -- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired