Re: [E1000-devel] High number of rx_missed_errors when chaning from 1.0.2-k2 to 1.2.20-k2/1.5.1-k

2012-01-30 Thread Carsten Aulbert
Hi

as this email is not showing up in the SF list archive, I'm resending it 
(sorry Jesse for you receiving it again)

On Friday 27 January 2012 13:43:54 Carsten Aulbert wrote:
 Hi
 
 On Friday 27 January 2012 08:31:06 Jesse Brandeburg wrote:
  Hi Carsten, it sounds to me like this might be related to ASPM, can you
  try the boot option pcie_aspm=off
  
  before you do that please capture the output of lspci -vvv and attach it
  to the bug (or send it here I suppose)  also include ethtool -e ethX
  output as an attachment, I'm interested to see some settings in your
  eeprom.
 
 good guess.
 
 I've played a bit with git bisection today and the first bad commit was:
 
 6f461f6c7c961f0b1b73c0f27becf472a0ac606b is the first bad commit
 commit 6f461f6c7c961f0b1b73c0f27becf472a0ac606b
 Author: Bruce Allan bruce.w.al...@intel.com
 Date:   Tue Apr 27 03:33:04 2010 +
 
 e1000e: enable/disable ASPM L0s and L1 and ERT according to hardware
 errata
 
 Prompted by a previous patch submitted by Matthew Garret
 m...@redhat.com, further digging into errata documentation reveals the
 current enabling or disabling of ASPM L0s and L1 states for certain parts
 supported by this driver are incorrect.  82571 and 82572 should always
 disable L1.  For standard frames, 82573/82574/82583 can enable L1 but L0s
 must be disabled, and for jumbo frames 82573/82574 must disable L1.  This
 allows for some parts to enable L1 in certain configurations leading to
 better power savings.
 
 Also according to the same errata, Early Receive (ERT) should be
 disabled on 82573 when using jumbo frames.
 
 Cc: Matthew Garret m...@redhat.com
 Signed-off-by: Bruce Allan bruce.w.al...@intel.com
 Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
 Signed-off-by: David S. Miller da...@davemloft.net
 
 :04 04 d147144ee9bb8987b603de3f168193f771c6b05b
 
 11799137fec9091ebacbe3532cd5c4029806bfb2 M  drivers
 
 
 (I used the linux-stable tree).
 
 Does this help and can I enable/disable ERT?
 
 Cheers
 
 Carsten

--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


[E1000-devel] High number of rx_missed_errors when chaning from 1.0.2-k2 to 1.2.20-k2/1.5.1-k

2012-01-26 Thread Carsten Aulbert
Hi all

on our old venerable compute nodes, we have 2 e1000e driven network 
connections (82573E/82573L combo - Supermicro PDSML-LN2+)

Currently we are upgrading from Debian Lenny to Debian Squeeze and during this 
process also upgrade from 2.6.32.y vanilla to 3.2.x vanilla (but see the same 
with Debian's default kernel 2.6.32-z). The e1000e version thus changes from 
1.0.2-k2 to 1.2.20-k2 or 1.5.1-k2.

The standard testcase is running

goodhost:
dd if=/dev/zero | nc badhost 5
badhost:
nc -l -p 5  /dev/null

with 1.0.2-k2 and default options (except crcstripping=0) we get close to 120 
MB/s and no dropped packets.

rebooting the system to a kernel with a newer driver yields only 150-250kB/s 
throughput and a packet drop-rate close to 20%..

I'm attaching quite a number of files to this post, but would like to learn 
how to find out, what's wrong and how to fix it.

This error seemed to be popping up here and there on this list and elsewhere, 
but so far I've yet to find a definite answer ...

Cheers  TIA

Carsten
-- 
Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics
Callinstrasse 38, 30167 Hannover, Germany
Phone/Fax: +49 511 762-17185 / -17193
http://www.top500.org/system/9234 | http://www.top500.org/connfam/6
CaCert Assurer | Get free certificates from http://www.cacert.org/
Pause parameters for eth1:
Autonegotiate:  on
RX: off
TX: off

Coalesce parameters for eth1:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 3
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 0
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 0

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

Ring parameters for eth1:
Pre-set maximums:
RX: 4096
RX Mini:0
RX Jumbo:   0
TX: 4096
Current hardware settings:
RX: 256
RX Mini:0
RX Jumbo:   0
TX: 256

driver: e1000e
version: 1.5.1-k
firmware-version: 0.5-7
bus-info: :0e:00.0
Offload parameters for eth1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
ntuple-filters: off
receive-hashing: off
NIC statistics:
 rx_packets: 56544
 tx_packets: 33023
 rx_bytes: 268744245
 tx_bytes: 3116133
 rx_broadcast: 24362
 tx_broadcast: 57
 rx_multicast: 0
 tx_multicast: 5
 rx_errors: 0
 tx_errors: 0
 tx_dropped: 0
 multicast: 0
 collisions: 0
 rx_length_errors: 0
 rx_over_errors: 0
 rx_crc_errors: 0
 rx_frame_errors: 0
 rx_no_buffer_count: 0
 rx_missed_errors: 11784
 tx_aborted_errors: 0
 tx_carrier_errors: 0
 tx_fifo_errors: 0
 tx_heartbeat_errors: 0
 tx_window_errors: 0
 tx_abort_late_coll: 0
 tx_deferred_ok: 0
 tx_single_coll_ok: 0
 tx_multi_coll_ok: 0
 tx_timeout_count: 0
 tx_restart_queue: 0
 rx_long_length_errors: 0
 rx_short_length_errors: 0
 rx_align_errors: 0
 tx_tcp_seg_good: 0
 tx_tcp_seg_failed: 0
 rx_flow_control_xon: 0
 rx_flow_control_xoff: 0
 tx_flow_control_xon: 0
 tx_flow_control_xoff: 0
 rx_long_byte_count: 268744245
 rx_csum_offload_good: 32033
 rx_csum_offload_errors: 0
 rx_header_split: 30473
 alloc_rx_buff_failed: 0
 tx_smbus: 0
 rx_smbus: 0
 dropped_smbus: 0
 rx_dma_failed: 0
 tx_dma_failed: 0
-[:00]-+-00.0  8086:2778
   +-01.0-[01]--
   +-1c.0-[09]--
   +-1c.4-[0d]00.0  8086:108c
   +-1c.5-[0e]00.0  8086:109a
   +-1d.0  8086:27c8
   +-1d.1  8086:27c9
   +-1d.2  8086:27ca
   +-1d.3  8086:27cb
   +-1d.7  8086:27cc
   +-1e.0-[0f]00.0  18ca:0020
   +-1f.0  8086:27b8
   +-1f.2  8086:27c0
   \-1f.3  8086:27da
--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] High number of rx_missed_errors when chaning from 1.0.2-k2 to 1.2.20-k2/1.5.1-k

2012-01-26 Thread Ronciak, John
Thanks for the report.  The stats are showing true dropped packets at the NIC 
level.  You do not have any no buffer counts so it's like the interrupts aren't 
being serviced fast enough.  So some things to check are:
- are the interrupts configured correctly? (error reports in the system log)
- are interrupts being shared between the NIC and another device in the system?
- are there other types of error messages in the system log files?
- if you load and use the 1.0.2 driver (available on our Sourceforge site) with 
the newer Linux OSes, does the problem still happen?

Please let us know.

Cheers,
John


 -Original Message-
 From: Carsten Aulbert [mailto:carsten.aulb...@aei.mpg.de]
 Sent: Thursday, January 26, 2012 2:07 AM
 To: e1000-de...@lists.sf.net
 Cc: Henning Fehrmann
 Subject: [E1000-devel] High number of rx_missed_errors when chaning
 from 1.0.2-k2 to 1.2.20-k2/1.5.1-k
 
 Hi all
 
 on our old venerable compute nodes, we have 2 e1000e driven network
 connections (82573E/82573L combo -)
 
 Currently we are upgrading from Debian Lenny to Debian Squeeze and
 during this process also upgrade from 2.6.32.y vanilla to 3.2.x vanilla
 (but see the same with Debian's default kernel 2.6.32-z). The e1000e
 version thus changes from
 1.0.2-k2 to 1.2.20-k2 or 1.5.1-k2.
 
 The standard testcase is running
 
 goodhost:
 dd if=/dev/zero | nc badhost 5
 badhost:
 nc -l -p 5  /dev/null
 
 with 1.0.2-k2 and default options (except crcstripping=0) we get close
 to 120 MB/s and no dropped packets.
 
 rebooting the system to a kernel with a newer driver yields only 150-
 250kB/s throughput and a packet drop-rate close to 20%..
 
 I'm attaching quite a number of files to this post, but would like to
 learn how to find out, what's wrong and how to fix it.
 
 This error seemed to be popping up here and there on this list and
 elsewhere, but so far I've yet to find a definite answer ...
 
 Cheers  TIA
 
 Carsten
 --
 Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics
 Callinstrasse 38, 30167 Hannover, Germany
 Phone/Fax: +49 511 762-17185 / -17193
 http://www.top500.org/system/9234 | http://www.top500.org/connfam/6
 CaCert Assurer | Get free certificates from http://www.cacert.org/
--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] High number of rx_missed_errors when chaning from 1.0.2-k2 to 1.2.20-k2/1.5.1-k

2012-01-26 Thread Jesse Brandeburg
On Thu, 2012-01-26 at 02:06 -0800, Carsten Aulbert wrote:

 with 1.0.2-k2 and default options (except crcstripping=0) we get close to 120 
 MB/s and no dropped packets.
 
 rebooting the system to a kernel with a newer driver yields only 150-250kB/s 
 throughput and a packet drop-rate close to 20%..
 

Hi Carsten, it sounds to me like this might be related to ASPM, can you
try the boot option pcie_aspm=off

before you do that please capture the output of lspci -vvv and attach it
to the bug (or send it here I suppose)  also include ethtool -e ethX
output as an attachment, I'm interested to see some settings in your
eeprom.

 I'm attaching quite a number of files to this post, but would like to learn 
 how to find out, what's wrong and how to fix it.
 
 This error seemed to be popping up here and there on this list and elsewhere, 
 but so far I've yet to find a definite answer ...

as John said, rx_missed with no rx_no_buffer_count means that you're
dropping packets in hardware which typically means that something is
going wrong at the bus level or the PCIe transaction level, that ends up
delaying packets, due to long memory latencies or something like that
(just typical problems, not saying it is exactly your issue)

aspm is one of those causes, there can be others


--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired