Re: [E1000-devel] Ixgbe and VLAN filtering

2010-01-26 Thread Jesse Brandeburg
I believe your patch is correct to enable VLAN filtering when in promisc
mode.

On Tue, 2010-01-26 at 00:22 -0700, György Szaniszló wrote:
 Hi,
 
 Ok, I do not want to force my quick patch to the official ixgbe driver,
 because I know, that there were a lot of discussions about this topic 
 (promiscuous mode and HW VLAN filtering).
 
 I want to use my patch only in my application.
 In very short term my only need is a confirmation from the e1000 development, 
 that my pacht would work fine.
 
 In the future instead of forcing my quick patch, I would ask a feature 
 request for a modprobe parameter or a compilation flag in the driver,
 to able to switch on/off the HW VLAN filtering in promiscuous mode.
 
 Thanks,
 Gyorgy Szaniszlo
 
 
 From: Nelson, Shannon [mailto:shannon.nel...@intel.com]
 Sent: Monday, January 25, 2010 6:32 PM
 To: György Szaniszló
 Subject: RE: [E1000-devel] Ixgbe and VLAN filtering
 
 Thanks for looking into this.  Please post as a patch to the e1000-devel list 
 so that the primary ixgbe developers can get a look at it and get it into the 
 proper update path.
 
 sln
 
 
 
 
 On Sun, 2010-01-24 at 07:48 -0800, György Szaniszló wrote:
 
 Hi,
 
 Googling e1000 devel list, I have realized, that the HW VLAN filtering is 
 turned off by ixgbe driver in promiscuous mode.
 Promiscuous mode was set by the bridge used by my application.
 
 I have modified the ixgbe driver: enabled HW VLAN filtering in promiscuous 
 mode.
 This behaviour is better for my application, as I exactly know the VLANs to 
 monitor, and performance is critical.
 
 Please confirm, that this is the right way to reach my goal.
 My patch is attached. Do I have to do any other modification, or is this 
 enough?
 
 Thanks,
 Gyorgy Szaniszlo
 
 --- ./ixgbe_main.c.orig 2010-01-24 15:38:57.454570244 +0100
 +++ ./ixgbe_main.c 2010-01-24 15:54:41.904571304 +0100
 @@ -2984,7 +2984,7 @@
   if (netdev-flags  IFF_PROMISC) {
hw-addr_ctrl.user_set_promisc = 1;
fctrl |= (IXGBE_FCTRL_UPE | IXGBE_FCTRL_MPE);
 -  vlnctrl = ~IXGBE_VLNCTRL_VFE;
 +  vlnctrl |= IXGBE_VLNCTRL_VFE;
   } else {
if (netdev-flags  IFF_ALLMULTI) {
 fctrl |= IXGBE_FCTRL_MPE;
 
 
 
 
 From: Nelson, Shannon [mailto:shannon.nel...@intel.com]
 Sent: Friday, January 22, 2010 9:30 PM
 To: György Szaniszló
 Subject: RE: [E1000-devel] Ixgbe and VLAN filtering
 
 
 György, I apologize for not getting back to you very quickly on this.  I was 
 hoping someone else on the 1000-devel list would answer as I don't do much 
 network config at that level, and I'm currently busy dealing with a couple of 
 other issues.
 
 sln
 
 On Thu, 2010-01-21 at 08:41 -0800, György Szaniszló wrote:
 
 Hi,
 
 I have performance problems with VLAN filtering.
 
 In my test traffic there are packets with 4 VLAN id-s: 11,12,21,22
 
 I can change the test traffic load.
 
 NOT using vconfig and VLAN filtering, and bridging the original 
 eth1-eth4 interfaces I can handle 7 Gbit/sec traffic without drop at NIC 
 level (rx_missed_errors).
 
 (In my bridge there is a netfilter hook function, that drops all 
 packets, so I test the low-level RX performance of the network card + driver 
 + Linux)
 
 Now I want to bridge only the traffic of VLANs 21 and 22. This is 
 about 50% of the total traffic volume.
 
 Using vconfig, and set up VLANs 21 and 22 on the ethernet interfaces, 
 and bridging the eth.vlan interfaces, I can handle only 9 Gbit/sec traffic 
 (total traffic on all vlans 11,12,21,22) without drop.
 
 Assuming that HW vlan filtering uses only limited resources I 
 expected 12 Gbit/sec without drop.
 
 As you can see filtering out the traffic of vlan 11 and 12 requires 
 massive SW resources, so I assume that the traffic of vlan 11 and 12 are not 
 filtered out at HW level, maybe in the driver.
 
 How can I make sure, that vlan filtering is done in HW?
 
 What did I wrong?
 
 Should I explicitly forbid vlans 11 and 12 (setting up ethx.11 and 
 ethx.12 and shut down the those virtual interfaces)?
 
 Maybe bridge opens the inteface in promiscuous mode, and HW vlan 
 filterinng does not work in promiscuous mode?
 
 Please help me!
 
 Thanks,
 
 Gyorgy Szaniszlo
 
 
 
 
 
 
 
 
 
 From: Nelson, Shannon [mailto:shannon.nel...@intel.com]
 Sent: Thursday, December 17, 2009 6:40 PM
 To: György Szaniszló
 Cc: 
 e1000-devel@lists.sourceforge.netmailto:e1000-devel@lists.sourceforge.net
 Subject: Re: [E1000-devel] Ixgbe and VLAN filtering
 
 
 
 
 
 On Thu, 2009-12-17 at 03:46 -0800, György Szaniszló wrote:
 
 Hi,
 
 Please help me to understand how VLAN filtering works with 
 ixgbe driver.
 
 I have the following setup
 2 x Intel(R) 10 Gigabit XF SR Dual 

Re: [E1000-devel] [PATCH v2 0/3] e1000e, igb, ixgbe: add registers etc. printout code just before resetting adapters

2010-01-26 Thread Taku Izumi
Hi Jesse,

(2010/01/23 6:54), Brandeburg, Jesse wrote:
 Taku, thanks for these, we are talking the patches over and reviewing
 them.  While I agree with the idea of these patches is good, I still don't
 agree with the default being enabled.  Usually if someone is getting tx
 hangs they are repeatable and we can work with them to get the debug
 turned on.  I DO think it is useful to have the feature available by
 default but NOT enabled.
 
 If we wanted to enable something by default it might be useful to print
 something that actually draws some conclusions from known failure modes,
 like if TDH!=TDT after some amount of time.  I think one or two lines
 maximum for default printing.
 
 If you're working in this area I had an idea.  I had wanted to be able to
 print the large amount of ring information (especially in the ixgbe case
 with many rings) to the ftrace buffers in order to not overrun the syslog
 daemon.  Not sure if you're interested in more new features, it certainly
 is separate but related to this patch.

I thought similar things, that is, all information should be dumped to
the private ring buffer to avoid filling syslog up with driver messages.
But I didn't have any good idea to extract information from ring buffers,
so as the first step, I decided to printout it by using printk().
Is there the easy way to extract it from ring buffers?

Best regards,
Taku Izumi


--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


[E1000-devel] sf.net bug ID 2934941: Detected Tx Unit Hang on quad port copper 82576

2010-01-26 Thread Покотиленко Костик
Hi,

Can somebody investigate please? Bug posted 19.01.2010/

I have tried:
- 2.6.29 + igb 2.0.6
- 2.6.30 + igb 2.0.6
- 2.6.30 + igb 2.1.9

all resulting in deep hang or network down or reboot in 1-20 hours
randomly.

I have only 3 more variations to try:
- 2.6.30 + in kernel igb
- 2.6.32 + in kernel igb
- 2.6.32 + igb 2.1.9

And please can somebody tell which one of the drivers is to be
considered more stable, the one in kernel or the one from sf.net?

-- 
Покотиленко Костик cas...@meteor.dp.ua


--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [PATCH 1/2] e1000: Fix DMA mapping error handling on TX

2010-01-26 Thread Roel Kluin

 This patch does not apply to the current e1000 driver in net-2.6, much
 of this patch has already been corrected (applied) by Roel Kluin
 recent patch.

 Sorry I was basing off net-next. I just compared it to my fix and looks like
 the patch in net-2.6 has an off by one error doesn't it?
 
 This was discussed during our code review of Roel's patch, and it was
 found that there was not an issue.  But I will review the code again
 to ensure that there is not an off by one error.  Thanks for looking
 at this.

He is right, as also reported by Juha Leppanen:

 Before your patch I suppose the logic disregarding the signed/unsigned error 
 was :
 1) if count==0, no unmapping/freeing inside while loop
 2) if count0, do 'count' loops unmapping/freeing
 
 After your patch the logic is :
 1) if count==0, no unmapping/freeing inside while loop
 1) if count==1, no unmapping/freeing inside while loop
 2) if count1, do 'count-1' loops unmapping/freeing

 Can tx_ring-count be zero? I hope not.

His suggested fix works:

 dma_error:
   dev_err(pdev-dev, TX DMA map failed\n);
   buffer_info-dma = 0;
 - if (count)
 - count--;
 
   while (count--) {
   if (i==0)
 - i += tx_ring-count;
 + i = tx_ring-count;
   i--;
   buffer_info = tx_ring-buffer_info[i];
   e1000_unmap_and_free_tx_resource(adapter, buffer_info);
   }
 
   return 0;
 }

This affects the patches:
[PATCH] e1000: Fix tests of unsigned in e1000_tx_map()
and the other patch in the same thread.

Do you want me to send a delta patch?

Roel

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] sf.net bug ID 2934941: Detected Tx Unit Hang on quad port copper 82576

2010-01-26 Thread Duyck, Alexander H
Покотиленко Костик wrote:
 Hi,
 
 Can somebody investigate please? Bug posted 19.01.2010/
 
 I have tried:
 - 2.6.29 + igb 2.0.6
 - 2.6.30 + igb 2.0.6
 - 2.6.30 + igb 2.1.9
 
 all resulting in deep hang or network down or reboot in 1-20 hours
 randomly.
 
 I have only 3 more variations to try:
 - 2.6.30 + in kernel igb
 - 2.6.32 + in kernel igb
 - 2.6.32 + igb 2.1.9
 
 And please can somebody tell which one of the drivers is to be
 considered more stable, the one in kernel or the one from sf.net?

I'm curious.  You say the device is causing reboots.  Is this due to a kernel 
panic followed by a reboot or does the system just reboot?  If the entire 
system is rebooting I would suspect a bigger issue such as problems in the 
system memory, power issues, or an issue in the kernel.  

In 2907473 you mentioned also having SATA issues.  This leads me to wonder if 
there is a problem with the Mainboard or components in the system you are 
currently using.  In the bug you mentioned that you had recently upgraded to 
this server.  Would it be possible to try installing the ET Quad port server 
adapter in that system and run the same tests that you are currently running in 
this system.  My main concern is that this issue could be due to something 
outside of our control since the SATA seemed to be experiencing an I/O stall at 
the same time as the network adapter.  If we can test this in a known good 
platform we might be able to verify if the issue is a problem in the server or 
not.

In the bugs that you filed you mentioned that you have been putting additional 
patches on top of the kernel.  In the tests you have recently done have any of 
the kernels you tested not included the patches you mentioned?  If not you may 
want to try running just a plain kernel and see if the same issues occur.

Thanks,

Alex




--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Very small number of packets dropping with 8257x on Xeon 5500

2010-01-26 Thread Kelvin Ku
I've been conducting some more tests on the Xeon E5530 machine and discovered
that cpuspeed was throttling the cores to 1.6 GHz. When I disabled cpuspeed,
the clock speeds went back to the maximum, 2.4 GHz. However, when I did this, I
started seeing high rx_missed_errors counts from the e1000e driver, which
indicates that the NIC FIFO was overflowing.

To remedy this, I passed the option InterruptRateThrottle=0,0 to the driver
to disable interrupt throttling and the rx_missed_errors counts went back to
zero or nearly zero.

Any idea why the NICs (82574L) would react like this to an increase in CPU
clock speed?

- Kelvin

On Fri, Jan 15, 2010 at 08:34:49PM -0500, Kelvin Ku wrote:
 On Tue, Jan 12, 2010 at 05:05:06PM -0700, Ronciak, John wrote:
  -Original Message-
  From: Kelvin Ku [mailto:kel...@telemetry-investments.com] 
  Sent: Tuesday, January 12, 2010 3:45 PM
  To: e1000-devel@lists.sourceforge.net
  Subject: Re: [E1000-devel] Very small number of packets 
  dropping with 82574x on Xeon 5500
  
  On Mon, Jan 11, 2010 at 10:37:56PM -0500, Kelvin Ku wrote:
   On Mon, Jan 11, 2010 at 10:30:55PM -0500, Kelvin Ku wrote:
On Mon, Jan 11, 2010 at 10:10:10PM -0500, Kelvin Ku wrote:
 On Mon, Jan 11, 2010 at 02:10:04AM +0100, Luca Deri wrote:
  
  
  --
  
  Message: 1
  Date: Sun, 10 Jan 2010 18:25:04 +
  From: Peter Grandi pg_...@e1k.for.sabi.co.uk
  Subject: Re: [E1000-devel] Very small number of 
  packets dropping with
 82574x  on Xeon 5500
  To: List e1000 devel e1000-devel@lists.sourceforge.net
  Message-ID: 19274.7040.589562.638...@tree.ty.sabi.co.uk
  Content-Type: text/plain; charset=us-ascii
  
  
   We recently purchased a Xeon 5500-series system, the 
  first in our
   network. It has two on-board 82574L NICs which are 
  supported by
   e1000e. In our UDP network tests at 1 Gbps, it is 
  dropping a tiny
   fraction of packets, whereas all our existing 
  systems do not drop
   any. [ ... ] so it is dropping (14500-144810050) 
  /1450 == 131
   packets out of 10, whereas on our existing 
  machines, we get no
   drops (transmitter is as before): [ ... ] The new 
  machine drops a
   small fraction of packets when running our 
  production application,
   so it is currently performing worse than our older 
  machines. [ ... ]
  
  Well, that looks like everything is working fine, as 
  neither UDP nor
  Ethernet are guaranteed lossless, and tiny timing 
  issues or many
  other things can cause occasional packet disappearances.
  
  If your application depends on 0% network drops and 
  100% network
  performance using Ethernet and UDP it seems quite misdesigned.
  
  As a wise person once stated:
  
   
  http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing
  
The fallacies are summarized as follows:
 1. The network is reliable.
 2. Latency is zero.
 3. Bandwidth is infinite.
 4. The network is secure.
 5. Topology doesn't change.
 6. There is one administrator.
 7. Transport cost is zero.
 8. The network is homogeneous.
  
  Probably in your case numbers 1, 2 (perhaps 3) and 8 seem to
  mis-apply. That's why TCP exists.
  
  BTW, as to that, what is doing congestion control in 
  your setup? If
  your reply is that no congestion control is necessary 
  because it is
  a 1Gb/s network end-to-end, I can offer you a very 
  good deal on a
  nice used bridge near San Francisco...
  
  Because there may be something quite questionable 
  within your app if
  a packet drop rate of 131/100,000 (0.13%) results in a 
  noticeable
  impact on performance.
  
   The data doesn't show any packet drops at the 
  switch, NIC, driver,
   or socket levels, but a small number of packets are 
  clearly not
   reaching the application.
  
  If there really are no packet losses at any step yet 
  the application
  is not receiving all packets then the first point of 
  investigation
  is the application.
  
  Perhaps due to to the vagaries of a different sequence 
  of events
  due to different CPU (the Xeon 5500 is rather different from
  previous Xeons) and different network chip a few packet
  disappear because the application is scheduled differently.
  
  Or perhaps all packets arrive but then fail the 
  checksum check (but
  the stats from the card seem to indicate
  
  Or perhaps, like in a case familiar to me, the sender may
  transmit a bit too fast (because of TSO) for the receiving
  chipset.
  
  Or perhaps the very large percentage of broadcasts 
  apparent from the
  switch port and NIC statistics has a CPU or chipset 
  dependent effect.
  
  

Re: [E1000-devel] Very small number of packets dropping with 8257x on Xeon 5500

2010-01-26 Thread Kelvin Ku
On Tue, Jan 26, 2010 at 01:20:12PM -0500, Kelvin Ku wrote:
 I've been conducting some more tests on the Xeon E5530 machine and discovered
 that cpuspeed was throttling the cores to 1.6 GHz. When I disabled cpuspeed,
 the clock speeds went back to the maximum, 2.4 GHz. However, when I did this, 
 I
 started seeing high rx_missed_errors counts from the e1000e driver, which
 indicates that the NIC FIFO was overflowing.
 
 To remedy this, I passed the option InterruptRateThrottle=0,0 to the driver
 to disable interrupt throttling and the rx_missed_errors counts went back to
 zero or nearly zero.
 
 Any idea why the NICs (82574L) would react like this to an increase in CPU
 clock speed?
 
 - Kelvin

Oops, I meant InterruptThrottleRate=0,0 above.

- Kelvin

 
 On Fri, Jan 15, 2010 at 08:34:49PM -0500, Kelvin Ku wrote:
  On Tue, Jan 12, 2010 at 05:05:06PM -0700, Ronciak, John wrote:
   -Original Message-
   From: Kelvin Ku [mailto:kel...@telemetry-investments.com] 
   Sent: Tuesday, January 12, 2010 3:45 PM
   To: e1000-devel@lists.sourceforge.net
   Subject: Re: [E1000-devel] Very small number of packets 
   dropping with 82574x on Xeon 5500
   
   On Mon, Jan 11, 2010 at 10:37:56PM -0500, Kelvin Ku wrote:
On Mon, Jan 11, 2010 at 10:30:55PM -0500, Kelvin Ku wrote:
 On Mon, Jan 11, 2010 at 10:10:10PM -0500, Kelvin Ku wrote:
  On Mon, Jan 11, 2010 at 02:10:04AM +0100, Luca Deri wrote:
   
   
   --
   
   Message: 1
   Date: Sun, 10 Jan 2010 18:25:04 +
   From: Peter Grandi pg_...@e1k.for.sabi.co.uk
   Subject: Re: [E1000-devel] Very small number of 
   packets dropping with
82574x  on Xeon 5500
   To: List e1000 devel e1000-devel@lists.sourceforge.net
   Message-ID: 19274.7040.589562.638...@tree.ty.sabi.co.uk
   Content-Type: text/plain; charset=us-ascii
   
   
We recently purchased a Xeon 5500-series system, the 
   first in our
network. It has two on-board 82574L NICs which are 
   supported by
e1000e. In our UDP network tests at 1 Gbps, it is 
   dropping a tiny
fraction of packets, whereas all our existing 
   systems do not drop
any. [ ... ] so it is dropping (14500-144810050) 
   /1450 == 131
packets out of 10, whereas on our existing 
   machines, we get no
drops (transmitter is as before): [ ... ] The new 
   machine drops a
small fraction of packets when running our 
   production application,
so it is currently performing worse than our older 
   machines. [ ... ]
   
   Well, that looks like everything is working fine, as 
   neither UDP nor
   Ethernet are guaranteed lossless, and tiny timing 
   issues or many
   other things can cause occasional packet disappearances.
   
   If your application depends on 0% network drops and 
   100% network
   performance using Ethernet and UDP it seems quite misdesigned.
   
   As a wise person once stated:
   

   http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing
   
 The fallacies are summarized as follows:
  1. The network is reliable.
  2. Latency is zero.
  3. Bandwidth is infinite.
  4. The network is secure.
  5. Topology doesn't change.
  6. There is one administrator.
  7. Transport cost is zero.
  8. The network is homogeneous.
   
   Probably in your case numbers 1, 2 (perhaps 3) and 8 seem to
   mis-apply. That's why TCP exists.
   
   BTW, as to that, what is doing congestion control in 
   your setup? If
   your reply is that no congestion control is necessary 
   because it is
   a 1Gb/s network end-to-end, I can offer you a very 
   good deal on a
   nice used bridge near San Francisco...
   
   Because there may be something quite questionable 
   within your app if
   a packet drop rate of 131/100,000 (0.13%) results in a 
   noticeable
   impact on performance.
   
The data doesn't show any packet drops at the 
   switch, NIC, driver,
or socket levels, but a small number of packets are 
   clearly not
reaching the application.
   
   If there really are no packet losses at any step yet 
   the application
   is not receiving all packets then the first point of 
   investigation
   is the application.
   
   Perhaps due to to the vagaries of a different sequence 
   of events
   due to different CPU (the Xeon 5500 is rather different from
   previous Xeons) and different network chip a few packet
   disappear because the application is scheduled differently.
   
   Or perhaps all packets arrive but then fail the 
   checksum check (but
   the stats from the card seem to indicate
   
   Or perhaps, like in a case familiar to me, the sender may
   

Re: [E1000-devel] sf.net bug ID 2934941: Detected Tx Unit Hang on quad port copper 82576

2010-01-26 Thread Покотиленко Костик
В Вто, 26/01/2010 в 09:35 -0800, Duyck, Alexander H пишет:
 Покотиленко Костик wrote:
  Hi,
  
  Can somebody investigate please? Bug posted 19.01.2010/
  
  I have tried:
  - 2.6.29 + igb 2.0.6
  - 2.6.30 + igb 2.0.6
  - 2.6.30 + igb 2.1.9
  
  all resulting in deep hang or network down or reboot in 1-20 hours
  randomly.
  
  I have only 3 more variations to try:
  - 2.6.30 + in kernel igb
  - 2.6.32 + in kernel igb
  - 2.6.32 + igb 2.1.9
  

Today I switched to 2.6.30 + in kernel igb 1.3.16-k2. Working fine for
6+ hours, as for now. Noticed that it by default use 4 rx-queue and 4
tx-queue for each NIC and uses all cores available. 2.0.6 and 2.1.9 used
1 core per NIC by default.

  And please can somebody tell which one of the drivers is to be
  considered more stable, the one in kernel or the one from sf.net?

 I'm curious.  You say the device is causing reboots.  Is this due to a
 kernel panic followed by a reboot or does the system just reboot?

Regarding last bug ID: 2934941, system become disconnected from network
at the same time alot of Detected Tx Unit Hang printing to console and
logs. Some times it just stays in this state (disconnected + error being
printed, but system is responding), sometimes after being in this state
for few minutes it just reboots.

I didn't have any chance to see kernel panic message. Most of the time
system become disconnected when there are nobody around it, so we just
remotely power down/up through cli like IPMI.

Today I've set up serial console connected to a router nearby with
independant Internet connection, so I can see what happens when it get
disconnected, and if it still alive I can do clean reboot.

  If the entire system is rebooting I would suspect a bigger issue such
 as problems in the system memory, power issues, or an issue in the
 kernel.

Good guess, but until Detected Tx Unit Hang there is no other signs of
any instabilities. Everything works perfect until that.

 In 2907473 you mentioned also having SATA issues.  This leads me to
 wonder if there is a problem with the Mainboard or components in the
 system you are currently using.

In this case everything also worked perfect until NIC problems. I would
notice, we have nagois and munin. Also I was working on console while
few of those problem occured.

   In the bug you mentioned that you had recently upgraded to this
 server.  Would it be possible to try installing the ET Quad port
 server adapter in that system and run the same tests that you are
 currently running in this system.

If you mean installing ET Quad port server adapter in old system - it's
impossible, there was PCI only board.

   My main concern is that this issue could be due to something outside
 of our control since the SATA seemed to be experiencing an I/O stall
 at the same time as the network adapter.

Well, first, SATA and NIC problems poped up in the same time only in
2907473 case with 82574L. Now with ET Quad port I don't see anything
except NIC problems. Also, this hardware successfully compiles kernel
with CONCURENCY_LEVEL=10, done many times.

   If we can test this in a known good platform we might be able to
 verify if the issue is a problem in the server or not.

Agreed, but we don't have any spare server with PCI-e x4 v2.0 :(

 In the bugs that you filed you mentioned that you have been putting
 additional patches on top of the kernel.  In the tests you have
 recently done have any of the kernels you tested not included the
 patches you mentioned?  If not you may want to try running just a
 plain kernel and see if the same issues occur.

I thought about that. But, the router is closely interconnected with a
billing software, and the whole solution requires ipset and imq. So,
making such test means leaving network down. Also, problem may not occur
for more than 20 hours. With ET Quad port the record is ~36 hours.

-- 
Покотиленко Костик cas...@meteor.dp.ua


--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Very small number of packets dropping with 8257x on Xeon 5500

2010-01-26 Thread Duyck, Alexander H
Kelvin Ku wrote:
 On Tue, Jan 26, 2010 at 01:20:12PM -0500, Kelvin Ku wrote:
 I've been conducting some more tests on the Xeon E5530 machine and
 discovered that cpuspeed was throttling the cores to 1.6 GHz. When
 I disabled cpuspeed, the clock speeds went back to the maximum, 2.4
 GHz. However, when I did this, I started seeing high
 rx_missed_errors counts from the e1000e driver, which indicates that
 the NIC FIFO was overflowing. 
 
 To remedy this, I passed the option InterruptRateThrottle=0,0 to
 the driver to disable interrupt throttling and the rx_missed_errors
 counts went back to zero or nearly zero. 
 
 Any idea why the NICs (82574L) would react like this to an increase
 in CPU clock speed? 
 
 - Kelvin
 
 Oops, I meant InterruptThrottleRate=0,0 above.
 
 - Kelvin

I was wondering if you have any CPU Cn states, or ASPM enabled on any of the 
PCIe slots for the system?  What you are describing sounds like it could be an 
issue with the CPU or PCIe links going to sleep and by the time they wake up 
you are already overrunning the NIC FIFO.  If you cannot find any information 
in the BIOS you might try downloading and installing PowerTop 
(http://www.lesswatts.org/projects/powertop/) to check and see if CPU Cn states 
are being used.

Other than that the only other thing I could recommend for the 82574 would be 
to modify the .pba value in the e1000_82574_info structure in the driver to 
be 36 by default instead of 20.  This would increase the amount of RX FIFO 
available and should have no negative impact on TX performance.

Thanks,

Alex


--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [Bugme-new] [Bug 14748] New: e1000e NIC not working after reboot

2010-01-26 Thread Jesse Brandeburg
On Mon, Dec 7, 2009 at 2:01 PM, Brandeburg, Jesse
jesse.brandeb...@intel.com wrote:
 On Mon, 7 Dec 2009, Andrew Morton wrote:
  When I power up my system the NIC is working properly.
  After every reboot the NIC is not working. I mean the eth0 is created, but
  neither dhcpcd gets IP nor static setup helps

 We have a userspace tool called ethregs downloadable from
 http://downloads.sourceforge.net/project/e1000/Register%20Dump%20Tool/1.7.2/ethregs-1.7.2.tar.gz?use_mirror=iweb

 if it is not too much trouble can you build this tool and run it before
 (when the port is working) and after (when the link didn't come up)

 you can attach them to the bug, and reply to this thread would be best.

I've looked at the ethregs dumps, the good news is it looks like the
hardware succeeds to self-init, but on the ethregs-fails.txt did you
load the driver?  it appears you did not, or at least didn't do
# ip link set eth0 up
# ethregs  regs.txt

also looked at the lspci -vvv information and in both cases MSI was
enabled, but in the fails case the value in the data field for the MSI
vector is different, which seems a a little strange but I'm not sure
if it is responsible for failure

if the driver was loaded, and failed dhcp, what happens when you run
ethtool -t eth0 offline?

when the driver is loaded, and the dhcp fails, can you assign an
address manually (and bring the interface up) and have it work?

one more thing to note please, can you send cat /proc/interrupts from
10 seconds apart when the driver is loaded and the port is UP, but not
working.  dhcpcd or dhclient both have a tendency to put the port DOWN
after they fail to get address, so thats why you may need to do # ip
link command above before gathering /proc/interrupts.

is your bios up to date?

Thanks, sorry for the delay, lets see if we can figure out what is up.

Jesse

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired